Summary of How Lightweight Can a Vision Transformer Be, by Jen Hong Tan
How Lightweight Can A Vision Transformer Beby Jen Hong TanFirst submitted to arxiv on: 25…
How Lightweight Can A Vision Transformer Beby Jen Hong TanFirst submitted to arxiv on: 25…
Norface: Improving Facial Expression Analysis by Identity Normalizationby Hanwei Liu, Rudong An, Zhimeng Zhang, Bowen…
Qwen2 Technical Reportby An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou,…
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translationby Nadezhda Chirkova, Vassilina Nikoulina,…
SimSMoE: Solving Representational Collapse via Similarity Measureby Giang Do, Hung Le, Truyen TranFirst submitted to…
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Modelsby Zihao Zeng, Yibo Miao, Hongcheng…
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Modelsby Tianwen Wei, Bo Zhu,…
Yuan 2.0-M32: Mixture of Experts with Attention Routerby Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun…
LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-designby Rui Kong, Qiyang Li,…
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Modelsby Jingwei Xu, Junyu Lai, Yunpeng HuangFirst submitted…