DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
by DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J.L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R.J. Chen, R.L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S.S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W.L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X.Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun
First submitted to arxiv on: 7 May 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel deep learning model, DeepSeek-V2, is introduced for language processing tasks. This Mixture-of-Experts (MoE) model, comprising 236 billion total parameters with only 21 billion activated per token, demonstrates economical training and efficient inference capabilities. It incorporates innovative architectures like Multi-head Latent Attention (MLA) and DeepSeekMoE, which enable efficient inference and strong performance at a reduced cost. Compared to its predecessor, DeepSeek-V2 achieves better results while saving 42.5% of training costs and reducing key-value cache by 93.3%. The model is pre-trained on a massive corpus and fine-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), resulting in top-tier performance among open-source models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary DeepSeek-V2 is a new language processing tool that can understand and generate human-like text. It’s special because it uses a lot of computer power, but still runs quickly. This model is good at many tasks like chatbots and writing stories. The creators made DeepSeek-V2 by training it on lots of text from different sources, then fine-tuning it to be even better. |
Keywords
* Artificial intelligence * Attention * Deep learning * Fine tuning * Inference * Mixture of experts * Reinforcement learning * Supervised * Token