Multi modal – Page 4 – GrooveSquid.com

April 15, 2025

Summary of Aim: Adaptive Inference Of Multi-modal Llms Via Token Merging and Pruning, by Yiwu Zhong et al.

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruningby Yiwu Zhong, Zhuoming Liu,…

April 15, 2025

Summary of Seqafford: Sequential 3d Affordance Reasoning Via Multimodal Large Language Model, by Chunlin Yu et al.

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Modelby Chunlin Yu, Hanqing Wang, Ye…

April 15, 2025

Summary of Obi-bench: Can Lmms Aid in Study Of Ancient Script on Oracle Bones?, by Zijian Chen et al.

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?by Zijian Chen, Tingzhu…

April 15, 2025

Summary of Human Action Clips: Detecting Ai-generated Human Motion, by Matyas Bohacek et al.

Human Action CLIPS: Detecting AI-generated Human Motionby Matyas Bohacek, Hany FaridFirst submitted to arxiv on:…

April 15, 2025

Summary of Gemex: a Large-scale, Groundable, and Explainable Medical Vqa Benchmark For Chest X-ray Diagnosis, by Bo Liu et al.

GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosisby Bo Liu,…

April 15, 2025

Summary of Creating Scalable Agi: the Open General Intelligence Framework, by Daniel A. Dollinger et al.

Creating Scalable AGI: the Open General Intelligence Frameworkby Daniel A. Dollinger, Michael SingletonFirst submitted to…

April 15, 2025

Summary of Automatic Evaluation For Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark, by Rong-cheng Tu et al.

Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmarkby Rong-Cheng Tu, Zi-Ao…

April 15, 2025

Summary of Ccis-diff: a Generative Model with Stable Diffusion Prior For Controlled Colonoscopy Image Synthesis, by Yifan Xie et al.

CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesisby Yifan Xie,…

April 15, 2025

Summary of Gaussiananything: Interactive Point Cloud Latent Diffusion For 3d Generation, by Yushi Lan et al.

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generationby Yushi Lan, Shangchen Zhou, Zhaoyang Lyu,…

April 15, 2025

Summary of Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection, by Yeonghyeon Park et al.

Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detectionby YeongHyeon Park, Myung Jin…