Summary of Mixture Of Diverse Size Experts, by Manxi Sun et al.
Mixture of Diverse Size Expertsby Manxi Sun, Wei Liu, Jian Luan, Pengzhi Gao, Bin WangFirst…
Mixture of Diverse Size Expertsby Manxi Sun, Wei Liu, Jian Luan, Pengzhi Gao, Bin WangFirst…
Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Modelby Derek Jollie, Jingmin…
BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generationby S. Rohollah Hosseyni, Ali Ahmad Rahmani, S. Jamal…
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenariosby Luning Wang, Shiyao Li, Xuefei…
On the Diagram of Thoughtby Yifan Zhang, Yang Yuan, Andrew Chi-Chih YaoFirst submitted to arxiv…
Latent Diffusion Models for Controllable RNA Sequence Generationby Kaixuan Huang, Yukang Yang, Kaidi Fu, Yanyi…
Token Turing Machines are Efficient Vision Modelsby Purvish Jajal, Nick John Eliopoulos, Benjamin Shiue-Hal Chou,…
Representation Tuningby Christopher M. AckermanFirst submitted to arxiv on: 11 Sep 2024CategoriesMain: Machine Learning (cs.LG)Secondary:…
Understanding Knowledge Drift in LLMs through Misinformationby Alina Fastowski, Gjergji KasneciFirst submitted to arxiv on:…
Alleviating Hallucinations in Large Language Models with Scepticism Modelingby Yetao Wu, Yihong Wang, Teng Chen,…