Summary of Travellm: Could You Plan My New Public Transit Route in Face Of a Network Disruption?, by Bowen Fang et al.
TraveLLM: Could you plan my new public transit route in face of a network disruption?by…
TraveLLM: Could you plan my new public transit route in face of a network disruption?by…
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?by Nemika Tyagi, Mihir Parmar, Mohith…
LLMs left, right, and center: Assessing GPT’s capabilities to label political bias from web domainsby…
SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergyby Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun…
End-To-End Clinical Trial Matching with Large Language Modelsby Dyke Ferber, Lars Hilgers, Isabella C. Wiest,…
Halu-J: Critique-Based Hallucination Judgeby Binjie Wang, Steffi Chern, Ethan Chern, Pengfei LiuFirst submitted to arxiv…
Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insightsby…
Regurgitative Training: The Value of Real Data in Training Large Language Modelsby Jinghui Zhang, Dandan…
Aligning Model Evaluations with Human Preferences: Mitigating Token Count Bias in Language Model Assessmentsby Roland…
CiteME: Can Language Models Accurately Cite Scientific Claims?by Ori Press, Andreas Hochlehnert, Ameya Prabhu, Vishaal…