Loading Now

Summary of An Effective Context-balanced Adaptation Approach For Long-tailed Speech Recognition, by Yi-cheng Wang et al.


An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition

by Yi-Cheng Wang, Li-Ting Pai, Bi-Cheng Yan, Hsin-Wei Wang, Chi-Han Lin, Berlin Chen

First submitted to arxiv on: 10 Sep 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes an extension to contextual adapters (CAs) for end-to-end automatic speech recognition (ASR) models, aiming to improve performance on rare words. CAs infuse external knowledge into E2E ASR models by using a context word list. However, two data imbalance problems remain: overfitting due to low-frequency context words and poor performance on low-frequency context words themselves. The authors investigate the impact of altering the context list’s frequency distribution on model performance and introduce a simple yet effective context-balanced learning objective. Experimental results on the AISHELL-1 benchmark dataset demonstrate a significant reduction in character error rate (CER) by up to 1.21% and an even more pronounced 9.44% reduction in the error rate of zero-shot words.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper is about how computers can better understand spoken language, especially when using rare words. The problem is that current methods are good at understanding common words but struggle with uncommon ones. The authors want to improve this by giving their method more information and making it learn in a way that’s fair for all the different types of words.

Keywords

» Artificial intelligence  » Cer  » Overfitting  » Zero shot