Summary of Cori: Cjkv Benchmark with Romanization Integration — a Step Towards Cross-lingual Transfer Beyond Textual Scripts, by Hoang H. Nguyen et al.
CORI: CJKV Benchmark with Romanization Integration – A step towards Cross-lingual Transfer Beyond Textual Scripts
by Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Natalie Parde, Eugene Rohrbaugh, Philip S. Yu
First submitted to arxiv on: 19 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the impact of source language on cross-lingual transfer for various languages. By assuming English as the source language may not be suitable for all languages, particularly those that are more closely related to other languages in their own linguistic family. The study shows that selecting a source language with high contact with the target language can improve cross-lingual transfer. Additionally, the paper proposes a novel benchmark dataset for CJKV languages (Chinese, Japanese, Korean, and Vietnamese) to facilitate research on language contact. To better capture this contact, the authors introduce a Contrastive Learning objective that integrates Romanized transcription beyond textual scripts, leading to enhanced cross-lingual representations and zero-shot cross-lingual transfer capabilities. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper explores how choosing the right “teacher” language affects how well a machine can learn languages. Right now, many systems are trained on English, but this might not be the best choice for all languages. The researchers show that selecting a teacher language that is closely related to the target language can help machines transfer what they’ve learned more effectively. They also create a new dataset for four East Asian languages (Chinese, Japanese, Korean, and Vietnamese) to study how these languages interact with each other. By combining different ways of writing words into Roman letters with machine learning, the authors develop better tools for learning languages without needing direct training data. |
Keywords
* Artificial intelligence * Machine learning * Zero shot