Loading Now

Summary of Continual Pre-training For Cross-lingual Llm Adaptation: Enhancing Japanese Language Capabilities, by Kazuki Fujii et al.


Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

by Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

First submitted to arxiv on: 27 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This study explores the benefits of cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus. By extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a large Japanese web corpus, the researchers constructed Swallow, an LLM with enhanced Japanese capability. The experimental results demonstrate that the performance on Japanese tasks improved significantly through continual pre-training, with the performance increasing monotonically up to 100B tokens. Compared to other LLMs trained from scratch in English and Japanese, Swallow achieved superior performance. The analysis reveals that continual pre-training is particularly effective for Japanese question answering tasks. Furthermore, the study investigates the impact of vocabulary expansion and parallel corpora incorporation on cross-lingual continual pre-training from English to Japanese. The results show that vocabulary expansion had no negative impact on performance, except for summarization, and that combining parallel corpora enhanced translation ability.
Low GrooveSquid.com (original content) Low Difficulty Summary
Cross-lingual continual pre-training of large language models allows us to use vast English resources while reducing costs. This study creates Swallow, an LLM with Japanese capabilities, by adding Japanese characters to Llama 2 and training on a huge Japanese web corpus. The results show that this approach improves Japanese task performance significantly. Swallow even outperforms other LLMs trained from scratch in both languages.

Keywords

» Artificial intelligence  » Llama  » Question answering  » Summarization  » Translation