Summary of Towards Incremental Learning in Large Language Models: a Critical Review, by Mladjan Jovanovic and Peter Voss
Towards Incremental Learning in Large Language Models: A Critical Reviewby Mladjan Jovanovic, Peter VossFirst submitted…
Towards Incremental Learning in Large Language Models: A Critical Reviewby Mladjan Jovanovic, Peter VossFirst submitted…
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Expertsby Yifeng Ding,…
Multi-Head Mixture-of-Expertsby Xun Wu, Shaohan Huang, Wenhui Wang, Furu WeiFirst submitted to arxiv on: 23…
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuningby Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan…
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Modelsby Bowen Pan, Yikang Shen, Haokun…
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Expertsby Alexandre Muzio, Alex Sun, Churan HeFirst submitted…
Half-Space Feature Learning in Neural Networksby Mahesh Lorik Yadav, Harish Guruprasad Ramaswamy, Chandrashekar LakshminarayananFirst submitted…
Jamba: A Hybrid Transformer-Mamba Language Modelby Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan…
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selectionby Yuanhang Yang, Shiyi Qi, Wenchao Gu,…
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Studyby Jinze Zhao, Peihao Wang, Zhangyang WangFirst…