Summary of Da-moe: Towards Dynamic Expert Allocation For Mixture-of-experts Models, by Maryam Akhavan Aghdam et al.
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Modelsby Maryam Akhavan Aghdam, Hongpeng Jin, Yanzhao WuFirst…
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Modelsby Maryam Akhavan Aghdam, Hongpeng Jin, Yanzhao WuFirst…
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationby Yecheng Wu, Zhuoyang Zhang, Junyu…
Residual Stream Analysis with Multi-Layer SAEsby Tim Lawson, Lucy Farnik, Conor Houghton, Laurence AitchisonFirst submitted…
Preserving Empirical Probabilities in BERT for Small-sample Clinical Entity Recognitionby Abdul Rehman, Jian Jun Zhang,…
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Samplingby Kaiwen Zheng,…
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMsby Ruoyu Wang, Xiaoxuan Li, Lina YaoFirst…
OLMoE: Open Mixture-of-Experts Language Modelsby Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison,…
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruningby Soumajyoti Sarkar, Leonard…
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inferenceby Barys Liskavets, Maxim…
Self-Supervised Vision Transformers for Writer Retrievalby Tim Raven, Arthur Matei, Gernot A. FinkFirst submitted to…