Summary of Ai Sandbagging: Language Models Can Strategically Underperform on Evaluations, by Teun Van Der Weij et al.
AI Sandbagging: Language Models can Strategically Underperform on Evaluationsby Teun van der Weij, Felix Hofstätter,…
AI Sandbagging: Language Models can Strategically Underperform on Evaluationsby Teun van der Weij, Felix Hofstätter,…
An Evaluation Benchmark for Autoformalization in Lean4by Aryan Gulati, Devanshu Ladsaria, Shubhra Mishra, Jasdeep Sidhu,…
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Stepby Owen Dugan, Donato Manuel…
Exploring Multilingual Large Language Models for Enhanced TNM classification of Radiology Report in lung cancer…
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoningby Joongwon Kim, Bhargavi Paranjape, Tushar Khot,…
Data-Efficient Learning with Neural Programsby Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur,…
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Modelsby Shreyas Basavatia, Keerthiram…
Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Researchby Harish Haresamudram, Hrudhai…
Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generationby Nachiket Kotalwar, Alkis Gotovos, Adish SinglaFirst submitted…
Large Generative Graph Modelsby Yu Wang, Ryan A. Rossi, Namyong Park, Huiyuan Chen, Nesreen K.…