Summary of Voice-enabled Ai Agents Can Perform Common Scams, by Richard Fang et al.
Voice-Enabled AI Agents can Perform Common Scamsby Richard Fang, Dylan Bowman, Daniel KangFirst submitted to…
Voice-Enabled AI Agents can Perform Common Scamsby Richard Fang, Dylan Bowman, Daniel KangFirst submitted to…
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generationby Jihyo Kim, Seulbi…
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmarkby Himanshu Gupta, Shreyas Verma, Ujjwala Anantheswaran, Kevin Scaria,…
MCSFF: Multi-modal Consistency and Specificity Fusion Framework for Entity Alignmentby Wei Ai, Wen Deng, Hongyi…
A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entitiesby Gianluca Apriceno,…
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AIby Sijie Cheng, Kechen Fang, Yangyang Yu,…
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editingby Yoonjeon…
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Interventionby Ying Liu, Ge…
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Modelsby Juseong Jin, Chang Wook…
Exploring Efficient Foundational Multi-modal Models for Video Summarizationby Karan Samel, Apoorva Beedu, Nitish Sontakke, Irfan…