Loading Now

Summary of Minicpm: Unveiling the Potential Of Small Language Models with Scalable Training Strategies, by Shengding Hu et al.


MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

by Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

First submitted to arxiv on: 9 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The researchers introduce a new family of Small Language Models (SLMs) called MiniCPM, which can be used as an efficient alternative to Large Language Models. They develop two variants of MiniCPM with 1.2B and 2.4B non-embedding parameters, which perform similarly well to larger models. The team also explores the scalability of SLMs in both model and data dimensions, introducing a learning rate scheduler (WSD LRS) that allows for efficient training and domain adaptation. They present an analysis of the training dynamics using WSD LRS and derive the optimal data-model ratio. Additionally, they introduce three MiniCPM models with different architectures, which achieve excellent performance in various tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
MiniCPM is a new type of language model that is smaller and more efficient than others. Scientists created two versions of MiniCPM with 1.2 billion and 2.4 billion parameters. These small models are just as good at doing certain tasks as larger models. The researchers also found ways to make the models work better by changing how they learn new information. They shared their findings and made the models available online for others to use.

Keywords

» Artificial intelligence  » Domain adaptation  » Embedding  » Language model