Summary of An Empirical Study on Large Language Models in Accuracy and Robustness Under Chinese Industrial Scenarios, by Zongjie Li et al.

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

by Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

First submitted to arxiv on: 27 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents a comprehensive empirical study on the accuracy and robustness of large language models (LLMs) in the context of Chinese industrial production. Specifically, it evaluates 9 different LLMs developed by Chinese vendors and 4 global ones on domain-specific problems and metamorphic testing framework. The results show that current LLMs exhibit low accuracy (less than 0.6) in Chinese industrial contexts, with local LLMs performing worse than global ones overall. Robustness scores vary across industrial sectors and abilities, highlighting the need for further research and tooling support.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper looks at how good big language models are at understanding problems from different industries in China. It tests 13 Chinese and 4 global models on many different types of questions to see how well they do. The results show that these models aren’t very accurate (most get less than 60% right) and have trouble with certain types of questions. The study helps us understand what we can expect from these models in real-world situations and where we might need to improve them.

Keywords

» Artificial intelligence

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

by Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Smutf: Schema Matching Using Generative Tags and Hybrid Features, by Yu Zhang et al.

Summary of Pace: a Pragmatic Agent For Enhancing Communication Efficiency Using Large Language Models, by Jiaxuan Li and Minxi Yang and Dahua Gao and Wenlong Xu and Guangming Shi

Related Posts