Summary of Llama Beyond English: An Empirical Study on Language Capability Transfer, by Jun Zhao et al.
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
by Jun Zhao, Zhihao Zhang, Luhui Gao, Qi Zhang, Tao Gui, Xuanjing Huang
First submitted to arxiv on: 2 Jan 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates ways to effectively transfer large language models’ capabilities for generation and following instructions from English to other non-English languages, focusing on LLaMA as the pretrained model. Researchers conducted an extensive empirical study with over 1440 GPU hours, analyzing factors like vocabulary extension, further pretraining, and instruction tuning to improve transfer performance. The study employed four standardized testing benchmarks (C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench) to evaluate the model’s knowledge alignment and response quality. Results show that comparable performance to state-of-the-art transfer models can be achieved with less than 1% of the pretraining data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us figure out how to teach language models like ChatGPT to understand and speak other languages, like Spanish or Mandarin. The researchers used a big model called LLaMA and tested it on lots of tasks in different languages. They wanted to see what makes it good at transferring its skills from English to other languages. They found that even with very little training data, the model can do a great job speaking another language! This is important because we want to make sure language models are fair and don’t just understand one language. |
Keywords
» Artificial intelligence » Alignment » Instruction tuning » Llama » Pretraining