Summary of The Language Barrier: Dissecting Safety Challenges Of Llms in Multilingual Contexts, by Lingfeng Shen et al.
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
by Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng, Philipp Koehn, Daniel Khashabi
First submitted to arxiv on: 23 Jan 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the safety concerns of large language models (LLMs) in multilingual settings. The authors observe that state-of-the-art LLMs tend to generate unsafe responses more frequently when malicious prompts are written in lower-resource languages, and that these models often produce irrelevant responses. To understand the cause of this discrepancy, the researchers study the effect of instruction tuning with reinforcement learning from human feedback (RLHF) or supervised finetuning (SFT) on the HH-RLHF dataset. Surprisingly, training with high-resource languages improves model alignment, but training in lower-resource languages yields minimal improvement. The findings highlight the challenges in cross-lingual LLM safety and inform future research in this direction. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models are very smart computers that can understand and generate human-like text. But they can also be misused to cause harm. This paper looks at how these models behave when given prompts in different languages. The researchers found that the models are more likely to generate bad responses when given prompts in languages that are less common or used by fewer people. They also found that fine-tuning the models using data from high-resource languages helps improve their alignment, but it doesn’t make a big difference for low-resource languages. |
Keywords
» Artificial intelligence » Alignment » Fine tuning » Instruction tuning » Reinforcement learning from human feedback » Rlhf » Supervised