Loading Now

Summary of Parameter-efficient Adaptation Of Multilingual Multimodal Models For Low-resource Asr, by Abhishek Gupta et al.


Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR

by Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, Preethi Jyothi

First submitted to arxiv on: 17 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The abstract presents research on improving automatic speech recognition (ASR) for low-resource languages by combining parameter-efficient fine-tuning and text-only adaptation using the SeamlessM4T multilingual multimodal model. The authors demonstrate how this approach can leverage unlabeled text to boost ASR performance, achieving a relative 17% WER reduction in zero-shot settings without labeled speech.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research aims to help machines better understand spoken languages that have limited training data. To do this, scientists are experimenting with combining two techniques: fine-tuning and adaptation. Fine-tuning helps the machine learn from its mistakes, while adaptation lets it use text alone to get better at recognizing speech. The team uses a special model called SeamlessM4T to test these approaches. They found that by combining them, they can make machines more accurate at recognizing spoken language, even when they haven’t been trained on that language before.

Keywords

» Artificial intelligence  » Fine tuning  » Parameter efficient  » Zero shot