Summary of Cori: Cjkv Benchmark with Romanization Integration — a Step Towards Cross-lingual Transfer Beyond Textual Scripts, by Hoang H. Nguyen et al.

CORI: CJKV Benchmark with Romanization Integration – A step towards Cross-lingual Transfer Beyond Textual Scripts

by Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Natalie Parde, Eugene Rohrbaugh, Philip S. Yu

First submitted to arxiv on: 19 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates the impact of source language on cross-lingual transfer for various languages. By assuming English as the source language may not be suitable for all languages, particularly those that are more closely related to other languages in their own linguistic family. The study shows that selecting a source language with high contact with the target language can improve cross-lingual transfer. Additionally, the paper proposes a novel benchmark dataset for CJKV languages (Chinese, Japanese, Korean, and Vietnamese) to facilitate research on language contact. To better capture this contact, the authors introduce a Contrastive Learning objective that integrates Romanized transcription beyond textual scripts, leading to enhanced cross-lingual representations and zero-shot cross-lingual transfer capabilities.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper explores how choosing the right “teacher” language affects how well a machine can learn languages. Right now, many systems are trained on English, but this might not be the best choice for all languages. The researchers show that selecting a teacher language that is closely related to the target language can help machines transfer what they’ve learned more effectively. They also create a new dataset for four East Asian languages (Chinese, Japanese, Korean, and Vietnamese) to study how these languages interact with each other. By combining different ways of writing words into Roman letters with machine learning, the authors develop better tools for learning languages without needing direct training data.

Keywords

* Artificial intelligence * Machine learning * Zero shot

CORI: CJKV Benchmark with Romanization Integration – A step towards Cross-lingual Transfer Beyond Textual Scripts

by Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Natalie Parde, Eugene Rohrbaugh, Philip S. Yu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Sa-attack: Speed-adaptive Stealthy Adversarial Attack on Trajectory Prediction, by Huilin Yin et al.

Summary of End-to-end Verifiable Decentralized Federated Learning, by Chaehyeon Lee et al.

Related Posts