Summary of Responsible Ai in Construction Safety: Systematic Evaluation Of Large Language Models and Prompt Engineering, by Farouq Sammour et al.
Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
by Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, Zhenyu Zhang
First submitted to arxiv on: 13 Nov 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This study evaluates the performance of two Large Language Models (LLMs), GPT-3.5 and GPT-4o, in enhancing workplace safety in the construction sector. The researchers analyzed the models’ accuracy, consistency, and reliability across three standardized exams administered by the Board of Certified Safety Professionals (BCSP). Results show that both models consistently exceed the BCSP benchmark, with GPT-4o achieving an accuracy rate of 84.6% and GPT-3.5 reaching 73.8%. The study highlights strengths in safety management systems and hazard identification and control, but also identifies weaknesses in science, mathematics, emergency response, and fire prevention. Furthermore, the study provides insights into improving LLM implementation through prompt engineering, offering evidence-based direction for future research and development. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how to use artificial intelligence (AI) to make workplaces safer. The authors tested two special kinds of AI called Large Language Models (LLMs), GPT-3.5 and GPT-4o. They gave the models a bunch of questions about safety, and then compared their answers to see how well they did. The results show that both models are pretty good at answering questions about safety management and identifying hazards, but not so great when it comes to more complicated topics like science and math. The authors also found that the way you ask the question can affect how well the model does, which is important to know if you want to use AI to make decisions. |
Keywords
» Artificial intelligence » Gpt » Prompt