Summary of Indicvoices-r: Unlocking a Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian Tts, by Ashwin Sankar and Srija Anand and Praveen Srinivasa Varadhan and Sherry Thomas and Mehak Singal and Shridhar Kumar and Deovrat Mehendale and Aditi Krishana and Giri Raju and Mitesh Khapra

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS

by Ashwin Sankar, Srija Anand, Praveen Srinivasa Varadhan, Sherry Thomas, Mehak Singal, Shridhar Kumar, Deovrat Mehendale, Aditi Krishana, Giri Raju, Mitesh Khapra

First submitted to arxiv on: 9 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents a novel approach to enhancing text-to-speech (TTS) synthesis for Indian languages by leveraging large-scale automatic speech recognition (ASR) datasets. The authors develop IndicVoices-R (IV-R), the largest multilingual Indian TTS dataset, containing 1,704 hours of high-quality speech from 10,496 speakers across 22 Indian languages. IV-R matches the quality of gold-standard TTS datasets like LJSpeech, LibriTTS, and IndicTTS. The authors also introduce the IV-R Benchmark, assessing zero-shot, few-shot, and many-shot speaker generalization capabilities of TTS models on Indian voices. They demonstrate that fine-tuning an English pre-trained model on a combined dataset of high-quality IndicTTS and IV-R data results in better zero-shot speaker generalization compared to fine-tuning on the IndicTTS dataset alone. The authors release all data and code, opening up new possibilities for TTS models for all 22 official Indian languages.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper creates a large database of Indian voices that can be used to create more realistic speech for Indian languages. This is important because there isn’t much good data available for Indian languages right now. The authors use a combination of old and new techniques to make the database, which they call IndicVoices-R. They also created a test to see how well the database works, which they call the IV-R Benchmark. The results show that using this new database can help TTS models be more accurate when speaking in Indian languages.

Keywords

* Artificial intelligence * Few shot * Fine tuning * Generalization * Zero shot

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS

by Ashwin Sankar, Srija Anand, Praveen Srinivasa Varadhan, Sherry Thomas, Mehak Singal, Shridhar Kumar, Deovrat Mehendale, Aditi Krishana, Giri Raju, Mitesh Khapra

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of On the Convergence Analysis Of Over-parameterized Variational Autoencoders: a Neural Tangent Kernel Perspective, by Li Wang and Wei Huang

Summary of Attention Based Machine Learning Methods For Data Reduction with Guaranteed Error Bounds, by Xiao Li and Jaemoon Lee and Anand Rangarajan and Sanjay Ranka

Related Posts