Loading Now

Summary of Llama Beyond English: An Empirical Study on Language Capability Transfer, by Jun Zhao et al.


LLaMA Beyond English: An Empirical Study on Language Capability Transfer

by Jun Zhao, Zhihao Zhang, Luhui Gao, Qi Zhang, Tao Gui, Xuanjing Huang

First submitted to arxiv on: 2 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates ways to effectively transfer large language models’ capabilities for generation and following instructions from English to other non-English languages, focusing on LLaMA as the pretrained model. Researchers conducted an extensive empirical study with over 1440 GPU hours, analyzing factors like vocabulary extension, further pretraining, and instruction tuning to improve transfer performance. The study employed four standardized testing benchmarks (C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench) to evaluate the model’s knowledge alignment and response quality. Results show that comparable performance to state-of-the-art transfer models can be achieved with less than 1% of the pretraining data.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us figure out how to teach language models like ChatGPT to understand and speak other languages, like Spanish or Mandarin. The researchers used a big model called LLaMA and tested it on lots of tasks in different languages. They wanted to see what makes it good at transferring its skills from English to other languages. They found that even with very little training data, the model can do a great job speaking another language! This is important because we want to make sure language models are fair and don’t just understand one language.

Keywords

» Artificial intelligence  » Alignment  » Instruction tuning  » Llama  » Pretraining