Loading Now

Summary of A Multilingual Training Strategy For Low Resource Text to Speech, by Asma Amalas et al.


A multilingual training strategy for low resource Text to Speech

by Asma Amalas, Mounir Ghogho, Mohamed Chetouani, Rachid Oulad Haj Thami

First submitted to arxiv on: 2 Sep 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Recent advancements in neural Text-to-Speech (TTS) have led to high-quality synthesized speech, but these models rely on extensive datasets that can be costly and difficult to scale to all existing languages, especially low-resource ones. To alleviate this burden, we investigate the feasibility of using social media data for constructing a small TTS dataset and exploring cross-lingual transfer learning (TL) for low-resource languages. We specifically assess the effectiveness of multilingual modeling as an alternative to training on monolingual corpora. Our findings show that multilingual pre-training outperforms monolingual pre-training in increasing the intelligibility and naturalness of generated speech.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine being able to hear someone speaking in a language you don’t know, but still understanding what they’re saying. This is what Text-to-Speech (TTS) technology aims to do. However, creating these technologies requires a lot of data, which can be expensive and difficult to get for languages that aren’t widely spoken. In this paper, we explore ways to make TTS work better for these low-resource languages by using social media and other data sources. We found that using multiple languages at once (multilingual modeling) is more effective than just one language in making the synthesized speech sound natural and easy to understand.

Keywords

» Artificial intelligence  » Transfer learning