Summary of Olive: Object Level In-context Visual Embeddings, by Timothy Ossowski et al.
OLIVE: Object Level In-Context Visual Embeddingsby Timothy Ossowski, Junjie HuFirst submitted to arxiv on: 2…
OLIVE: Object Level In-Context Visual Embeddingsby Timothy Ossowski, Junjie HuFirst submitted to arxiv on: 2…
Artemis: Towards Referential Understanding in Complex Videosby Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie,…
Don’t Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Modelsby A. Bavaresco, A.…
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Groundingby Haoyu Zhao, Wenhang…
Finite Groundings for ASP with Functions: A Journey through Consistencyby Lukas Gerlach, David Carral, Markus…
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLMby Abdur Rahman, Rajat…
Creativity and Markov Decision Processesby Joonas Lahikainen, Nadia M. Ady, Christian GuckelsbergerFirst submitted to arxiv…
WorldAfford: Affordance Grounding based on Natural Language Instructionsby Changmao Chen, Yuren Cong, Zhen KanFirst submitted…
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgeryby…
Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Trainingby Sheng Yan,…