Summary of Benchmarking Multi-image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-hop Reasoning, by Bingchen Zhao et al.
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoningby Bingchen…