Summary of Continual Pre-training For Cross-lingual Llm Adaptation: Enhancing Japanese Language Capabilities, by Kazuki Fujii et al.

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

by Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

First submitted to arxiv on: 27 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This study explores the benefits of cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus. By extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a large Japanese web corpus, the researchers constructed Swallow, an LLM with enhanced Japanese capability. The experimental results demonstrate that the performance on Japanese tasks improved significantly through continual pre-training, with the performance increasing monotonically up to 100B tokens. Compared to other LLMs trained from scratch in English and Japanese, Swallow achieved superior performance. The analysis reveals that continual pre-training is particularly effective for Japanese question answering tasks. Furthermore, the study investigates the impact of vocabulary expansion and parallel corpora incorporation on cross-lingual continual pre-training from English to Japanese. The results show that vocabulary expansion had no negative impact on performance, except for summarization, and that combining parallel corpora enhanced translation ability.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Cross-lingual continual pre-training of large language models allows us to use vast English resources while reducing costs. This study creates Swallow, an LLM with Japanese capabilities, by adding Japanese characters to Llama 2 and training on a huge Japanese web corpus. The results show that this approach improves Japanese task performance significantly. Swallow even outperforms other LLMs trained from scratch in both languages.

Keywords

* Artificial intelligence * Llama * Question answering * Summarization * Translation

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

by Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Airlift Challenge: a Competition For Optimizing Cargo Delivery, by Adis Delanovic et al.

Summary of Compressed Deepfake Video Detection Based on 3d Spatiotemporal Trajectories, by Zongmei Chen et al.

Related Posts