Summary of Deepseek Llm: Scaling Open-source Language Models with Longtermism, by Deepseek-ai: Xiao Bi et al.

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

by DeepSeek-AI, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li, Guowei Li, Jiashi Li, Yao Li, Y.K. Li, Wenfeng Liang, Fangyun Lin, A.X. Liu, Bo Liu, Wen Liu, Xiaodong Liu, Xin Liu, Yiyuan Liu, Haoyu Lu, Shanghao Lu, Fuli Luo, Shirong Ma, Xiaotao Nie, Tian Pei, Yishi Piao, Junjie Qiu, Hui Qu, Tongzheng Ren, Zehui Ren, Chong Ruan, Zhangli Sha, Zhihong Shao, Junxiao Song, Xuecheng Su, Jingxiang Sun, Yaofeng Sun, Minghui Tang, Bingxuan Wang, Peiyi Wang, Shiyu Wang, Yaohui Wang, Yongji Wang, Tong Wu, Y. Wu, Xin Xie, Zhenda Xie, Ziwei Xie, Yiliang Xiong, Hanwei Xu, R.X. Xu, Yanhong Xu, Dejian Yang, Yuxiang You, Shuiping Yu, Xingkai Yu, B. Zhang, Haowei Zhang, Lecong Zhang, Liyue Zhang, Mingchuan Zhang, Minghua Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Qihao Zhu, Yuheng Zou

First submitted to arxiv on: 5 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. This paper aims to facilitate scaling of large-scale models by presenting distinctive findings on scaling laws and introducing DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. The authors develop a dataset with 2 trillion tokens and conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on the base models, resulting in the creation of DeepSeek Chat models. Evaluation results show that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in code, mathematics, and reasoning domains. Furthermore, open-ended evaluations reveal superior performance of DeepSeek LLM 67B Chat compared to GPT-3.5.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models (LLMs) have grown rapidly, but some research suggests that scaling them can be tricky. This paper tries to help by studying how big models grow and introduces a new project called DeepSeek LLM. They make a big dataset with lots of text and test different ways to make the model better. The results show that their biggest model (67B) is better than some other popular models on tasks like coding, math, and reasoning. It even beats one of the best AI models, GPT-3.5!

Keywords

* Artificial intelligence * Fine tuning * Gpt * Llama * Optimization * Scaling laws * Supervised

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of On the Numerical Reliability Of Nonsmooth Autodiff: a Maxpool Case Study, by Ryan Boustany (tse-r)

Summary of Reliability-optimized User Admission Control For Urllc Traffic: a Neural Contextual Bandit Approach, by Omid Semiari et al.

Related Posts