Summary of Ssm Meets Video Diffusion Models: Efficient Long-term Video Generation with Structured State Spaces, by Yuta Oshima et al.
SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spacesby Yuta Oshima,…
SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spacesby Yuta Oshima,…
Multi-modal Auto-regressive Modeling via Visual Wordsby Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping…
Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Modelsby Patrick Knab, Sascha Marton,…
Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimationby Kira Wursthorn, Markus Hillemann, Markus…
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Modelsby Yan Liu, Renren Jin,…
Beyond Memorization: The Challenge of Random Memory Access in Language Modelsby Tongyao Zhu, Qian Liu,…
Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern…
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLMby Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu…
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metricby Haokun Lin, Haoli…
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selectionby Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao…