Loading Now

Summary of Assessing Large Language Models in Mechanical Engineering Education: a Study on Mechanics-focused Conceptual Understanding, by Jie Tian et al.


Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding

by Jie Tian, Jixin Hou, Zihao Wu, Peng Shu, Zhengliang Liu, Yujie Xiang, Beikang Gu, Nicholas Filla, Yiwei Li, Ning Liu, Xianyan Chen, Keke Tang, Tianming Liu, Xianqiao Wang

First submitted to arxiv on: 13 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Physics Education (physics.ed-ph)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A pioneering study explores the capabilities of Large Language Models (LLMs) in addressing conceptual questions within mechanical engineering, focusing on mechanics topics. The researchers manually crafted an exam with 126 multiple-choice questions covering various mechanics courses. Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1), were evaluated against engineering faculties and students. The findings show that GPT-4 outperformed the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics. This suggests potential improvements for GPT models in handling symbolic calculations and tensor analyses. The study highlights the crucial role of prompt engineering and shows that prompts can significantly improve LLM performance. Interestingly, the results reveal differences in LLM performance based on prompt domain or subject focus.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large Language Models (LLMs) are super smart computers that can understand and answer questions. In this study, researchers tested three LLMs to see how well they could answer questions about mechanics, a branch of engineering that deals with things like motion and forces. They created a special test with 126 multiple-choice questions that covered different topics in mechanics. The LLMs were compared to human experts and students to see how well they did. One of the LLMs, called GPT-4, performed better than the others and even did better than some humans! This is exciting because it shows that LLMs could be used as helpful tools for learning and research in mechanics.

Keywords

» Artificial intelligence  » Claude  » Gpt  » Prompt