Summary of Responsible Ai in Construction Safety: Systematic Evaluation Of Large Language Models and Prompt Engineering, by Farouq Sammour et al.

Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering

by Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, Zhenyu Zhang

First submitted to arxiv on: 13 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This study evaluates the performance of two Large Language Models (LLMs), GPT-3.5 and GPT-4o, in enhancing workplace safety in the construction sector. The researchers analyzed the models’ accuracy, consistency, and reliability across three standardized exams administered by the Board of Certified Safety Professionals (BCSP). Results show that both models consistently exceed the BCSP benchmark, with GPT-4o achieving an accuracy rate of 84.6% and GPT-3.5 reaching 73.8%. The study highlights strengths in safety management systems and hazard identification and control, but also identifies weaknesses in science, mathematics, emergency response, and fire prevention. Furthermore, the study provides insights into improving LLM implementation through prompt engineering, offering evidence-based direction for future research and development.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how to use artificial intelligence (AI) to make workplaces safer. The authors tested two special kinds of AI called Large Language Models (LLMs), GPT-3.5 and GPT-4o. They gave the models a bunch of questions about safety, and then compared their answers to see how well they did. The results show that both models are pretty good at answering questions about safety management and identifying hazards, but not so great when it comes to more complicated topics like science and math. The authors also found that the way you ask the question can affect how well the model does, which is important to know if you want to use AI to make decisions.

Keywords

* Artificial intelligence * Gpt * Prompt

Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering

by Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, Zhenyu Zhang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Perceivers: a Multi-scale Perceiver with Effective Segmentation For Long-term Expressive Symbolic Music Generation, by Yungang Yi et al.

Summary of Mire: Enhancing Multimodal Queries Representation Via Fusion-free Modality Interaction For Multimodal Retrieval, by Yeong-joon Ju et al.

Related Posts