Summary of Cantonmt: Investigating Back-translation and Model-switch Mechanisms For Cantonese-english Neural Machine Translation, by Kung Yin Hong et al.
CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation
by Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic
First submitted to arxiv on: 13 May 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed novel approach tackles low-resource language translations by developing a machine translation model from Cantonese to English. The study creates a new parallel corpus through online corpus combination, preprocessing, and cleaning, as well as a monolingual Cantonese dataset via web scraping for synthetic parallel corpus generation. Fine-tuning models, back-translation, and model switch are employed, with automatic evaluation metrics (SacreBLEU, hLEPOR, COMET, and BERTscore) assessing the translation quality. The best-performing model, NLLB-mBART with model switch, achieves comparable scores to State-of-the-art commercial models (Bing and Baidu Translators), reaching a SacreBLEU score of 16.8 on the test set. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us better understand how machine translation works from Cantonese to English. Researchers created new data sources and used different techniques to train models. They tested these models using special scores and found that one model, called NLLB-mBART, performed almost as well as top commercial translators. This study’s results can help people communicate more effectively between languages. |
Keywords
» Artificial intelligence » Fine tuning » Translation