Summary of Indic Qa Benchmark: a Multilingual Benchmark to Evaluate Question Answering Capability Of Llms For Indic Languages, by Abhishek Kumar Singh et al.
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
by Abhishek Kumar Singh, Vishwajeet kumar, Rudra Murthy, Jaydeep Sen, Ashish Mittal, Ganesh Ramakrishnan
First submitted to arxiv on: 18 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The abstract introduces the Indic QA Benchmark, a large dataset for context-grounded question answering in 11 major Indian languages, covering both extractive and abstractive tasks. Multilingual Large Language Models (LLMs) are evaluated, including instruction-finetuned versions, revealing weak performance in low-resource languages due to an English language bias in their training data. The Translate Test paradigm is also explored, translating inputs to English for processing and back into the source language for output, which outperforms multilingual LLMs, especially in low-resource settings. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The Indic QA Benchmark is a new dataset that can help us understand how well Large Language Models (LLMs) can answer questions in languages other than English. Right now, we don’t know much about how these models work in languages like Hindi or Bengali because there aren’t many tests to measure their performance. The researchers created a big test with lots of questions and answers in 11 Indian languages. They also tested some special kinds of LLMs that are trained on English data but can be fine-tuned for other languages. Unfortunately, these models didn’t do very well when they were asked questions in low-resource languages like Nepali or Punjabi because they’re biased towards English. The researchers found a different way to test the models that worked better, by translating the inputs into English and then back again. This could help us create better language models that can answer questions in many different languages. |
Keywords
* Artificial intelligence * Question answering