Summary of Yi: Open Foundation Models by 01.ai, By 01.ai: Alex Young et al.
Yi: Open Foundation Models by 01.AI
by 01.AI, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai
First submitted to arxiv on: 7 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The Yi model family is a series of language and multimodal models that exhibit strong capabilities across multiple dimensions. Based on 6B and 34B pretrained language models, the Yi models are extended to chat models, long context models, depth-upscaled models, and vision-language models. The base models achieve strong performance on various benchmarks like MMLU, while finetuned chat models deliver high human preference rates on AlpacaEval and Chatbot Arena. The Yi models’ performance is attributed primarily to the quality of their training data, which is constructed through a pipeline involving data deduplication and filtering. For pretraining, 3.1 trillion tokens of English and Chinese corpora are used. Finetuning involves polishing a small instruction dataset over multiple iterations, with each instance verified by machine learning engineers. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The Yi model family is a group of language and image models that do many things well together. These models start with big language models, then add chat capabilities, long text understanding, bigger models, and models that can combine words and pictures. The base models are good at lots of tasks, and the special chat models are liked by humans on certain platforms. The Yi models work because they were trained using high-quality data. They used a lot of language and image information to train their big language model, then added more capabilities to make them even better. |
Keywords
» Artificial intelligence » Language model » Machine learning » Pretraining