Summary of Visualwebbench: How Far Have Multimodal Llms Evolved in Web Page Understanding and Grounding?, by Junpeng Liu et al.
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?by Junpeng Liu,…
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?by Junpeng Liu,…
FABLES: Evaluating faithfulness and content selection in book-length summarizationby Yekyung Kim, Yapei Chang, Marzena Karpinska,…
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representationsby Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie…
Can Large Language Models do Analytical Reasoning?by Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang,…
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Modelsby Andrew Zhu, Alyssa Hwang,…
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMsby Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen…
In-Context Principle Learning from Mistakesby Tianjun Zhang, Aman Madaan, Luyu Gao, Steven Zheng, Swaroop Mishra,…
Can LLMs perform structured graph reasoning?by Palaash Agrawal, Shavak Vasania, Cheston TanFirst submitted to arxiv…
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understandingby Jie…
Can AI Help with Your Personal Finances?by Oudom Hean, Utsha Saha, Binita SahaFirst submitted to…