Summary of Tina: Think, Interaction, and Action Framework For Zero-shot Vision Language Navigation, by Dingbang Li et al.
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigationby Dingbang Li, Wenzhou Chen,…
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigationby Dingbang Li, Wenzhou Chen,…
Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformationby Fatma Shalabi, Huy H. Nguyen, Hichem Felouat, Ching-Chun…
NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolationby PengFei Zheng,…
Cross-modality debiasing: using language to mitigate sub-population shifts in imagingby Yijiang Pang, Bao Hoang, Jiayu…
Optimal Design and Implementation of an Open-source Emulation Platform for User-Centric Shared E-mobility Servicesby Maqsood…
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputsby…
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Productionby Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan,…
Red Teaming Models for Hyperspectral Image Analysis Using Explainable AIby Vladimir Zaigrajew, Hubert Baniecki, Lukasz…
Leveraging LLMs for On-the-Fly Instruction Guided Image Editingby Rodrigo Santos, João Silva, António BrancoFirst submitted…
LG-Traj: LLM Guided Pedestrian Trajectory Predictionby Pranav Singh Chib, Pravendra SinghFirst submitted to arxiv on:…