Summary of Llava-zip: Adaptive Visual Token Compression with Intrinsic Image Information, by Ke Wang et al.
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Informationby Ke Wang, Hong XuanFirst submitted to…
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Informationby Ke Wang, Hong XuanFirst submitted to…
GenPlan: Generative Sequence Models as Adaptive Plannersby Akash Karthikeyan, Yash Vardhan PantFirst submitted to arxiv…
Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Modelsby Sri Harsha Dumpala, David Arps, Sageev…
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Modelsby Quang-Hung Le, Long Hoang Dang,…
Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?by Zihao Li, Lecheng Zheng,…
AmCLR: Unified Augmented Learning for Cross-Modal Representationsby Ajay Jagannath, Aayush Upadhyay, Anant MehtaFirst submitted to…
Anomaly detection using Diffusion-based methodsby Aryan Bhosale, Samrat Mukherjee, Biplab Banerjee, Fabio CuzzolinFirst submitted to…
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Modelsby Sayak Chakrabarty, Souradip PalFirst…
In-Application Defense Against Evasive Web Scans through Behavioral Analysisby Behzad Ousat, Mahshad Shariatnasab, Esteban Schafir,…
Can foundation models actively gather information in interactive environments to test hypotheses?by Nan Rosemary Ke,…