Loading Now

Summary of Spring Lab Iitm’s Submission to Low Resource Indic Language Translation Shared Task, by Hamees Sayed et al.


SPRING Lab IITM’s submission to Low Resource Indic Language Translation Shared Task

by Hamees Sayed, Advait Joglekar, Srinivasan Umesh

First submitted to arxiv on: 1 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty summary: This paper presents a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese. The approach involves a comprehensive pipeline from data collection to training and evaluation, leveraging datasets from WMT, BPCC, PMIndia, and OpenLanguageData. To address the scarcity of bilingual data, back-translation techniques are used on monolingual datasets for Mizo and Khasi, expanding the training corpus. The pre-trained NLLB 3.3B model is fine-tuned for Assamese, Mizo, and Manipuri, achieving improved performance over the baseline. For Khasi, special tokens are introduced and the model is trained on the Khasi corpus. Masked language modelling and fine-tuning for English-to-Indic and Indic-to-English translations are used in the training process.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty summary: This research paper makes it easier to translate text between four languages that are not well-studied: Khasi, Mizo, Manipuri, and Assamese. The team uses a lot of data from different sources and special techniques to make their model work better. They fine-tune an existing model for three of the languages and create a new one for Khasi. This helps improve the translation quality and makes it more accurate.

Keywords

» Artificial intelligence  » Fine tuning  » Translation