Summary of Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms, by Miaosen Zhang et al.
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithmsby Miaosen Zhang, Yixuan Wei,…
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithmsby Miaosen Zhang, Yixuan Wei,…
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite…
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotationsby Ruiyuan Lyu, Tai Wang,…
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understandingby Fei Wang, Xingyu Fu, James Y. Huang,…
Pandora: Towards General World Model with Natural Language Actions and Video Statesby Jiannan Xiang, Guangyi…
Advancing High Resolution Vision-Language Models in Biomedicineby Zekai Chen, Arda Pekis, Kevin BrownFirst submitted to…
Updating CLIP to Prefer Descriptions Over Captionsby Amir Zur, Elisa Kreiss, Karel D'Oosterlinck, Christopher Potts,…
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Videoby Hector A. Valdez, Kyle Min, Subarna TripathiFirst…
GPT-ology, Computational Models, Silicon Sampling: How should we think about LLMs in Cognitive Science?by Desmond…
Talking Heads: Understanding Inter-layer Communication in Transformer Language Modelsby Jack Merullo, Carsten Eickhoff, Ellie PavlickFirst…