Summary of Mardini: Masked Autoregressive Diffusion For Video Generation at Scale, by Haozhe Liu et al.
MarDini: Masked Autoregressive Diffusion for Video Generation at Scaleby Haozhe Liu, Shikun Liu, Zijian Zhou,…
MarDini: Masked Autoregressive Diffusion for Video Generation at Scaleby Haozhe Liu, Shikun Liu, Zijian Zhou,…
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoningby…
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinationsby Aryo Pradipta Gema, Chen Jin, Ahmed…
On Explaining with Attention Matricesby Omar Naim, Nicholas AsherFirst submitted to arxiv on: 24 Oct…
FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generationby Christopher T.H Teo, Milad Abdollahzadeh, Xinda Ma,…
Integrating Canonical Neural Units and Multi-Scale Training for Handwritten Text Recognitionby Zi-Rui WangFirst submitted to…
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Modelsby Ziyu Liu, Yuhang Zang, Xiaoyi…
Emotion Recognition with Facial Attention and Objective Activation Functionsby Andrzej Miskow, Abdulrahman AltahhanFirst submitted to…
Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environmentby Seongjun Jeong, Gi-Cheon Kang, Joochan Kim,…
Order Matters: Exploring Order Sensitivity in Multimodal Large Language Modelsby Zhijie Tan, Xu Chu, Weiping…