Summary of Infiagent-dabench: Evaluating Agents on Data Analysis Tasks, by Xueyu Hu et al.
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasksby Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai,…
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasksby Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai,…
REBUS: A Robust Evaluation Benchmark of Understanding Symbolsby Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik…
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Modelsby Matthew…
Monte Carlo Tree Search for Recipe Generation using GPT-2by Karan Taneja, Richard Segal, Richard GoodwinFirst…
I am a Strange Dataset: Metalinguistic Tests for Language Modelsby Tristan Thrush, Jared Moore, Miguel…
Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Datasetby Shrey Satapara, Parth…
InFoBench: Evaluating Instruction Following Ability in Large Language Modelsby Yiwei Qin, Kaiqiang Song, Yebowen Hu,…
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame…
Evaluating Large Language Models on the GMAT: Implications for the Future of Business Educationby Vahid…
Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based…