Summary of Osworld: Benchmarking Multimodal Agents For Open-ended Tasks in Real Computer Environments, by Tianbao Xie et al.
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environmentsby Tianbao Xie, Danyang Zhang,…
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environmentsby Tianbao Xie, Danyang Zhang,…
Rho-1: Not All Tokens Are What You Needby Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao…
Self-supervised Dataset Distillation: A Good Compression Is All You Needby Muxin Zhou, Zeyuan Yin, Shitong…
MetaCheckGPT – A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-modelsby Rahul Mehta, Andrew Hoblitzell,…
XNLIeu: a dataset for cross-lingual NLI in Basqueby Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier…
Event Grounded Criminal Court View Generation with Cooperative (Large) Language Modelsby Linan Yue, Qi Liu,…
Improving Language Model Reasoning with Self-motivated Learningby Yunlong Feng, Yang Xu, Libo Qin, Yasheng Wang,…
Measuring proximity to standard planes during fetal brain ultrasound scanningby Chiara Di Vece, Antonio Cirigliano,…
Dynamic Generation of Personalities with Large Language Modelsby Jianzhi Liu, Hexiang Gu, Tianyu Zheng, Liuyu…
Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacksby Kavita Kumari, Murtuza Jadliwala, Sumit Kumar…