Loading Now

Summary of An Effective Mixture-of-experts Approach For Code-switching Speech Recognition Leveraging Encoder Disentanglement, by Tzu-ting Yang et al.


An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

by Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, Berlin Chen

First submitted to arxiv on: 27 Feb 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper tackles the issue of code-switching in automatic speech recognition (ASR). End-to-end neural networks have made significant progress in ASR, but code-switching remains a major challenge. The authors focus on improving the acoustic encoder to handle this phenomenon. Their contributions are threefold: introducing a disentanglement loss to separate inter-lingual acoustic information from linguistic confusion, demonstrating better performance using their method compared to dual-encoder architectures, and showing that their approach complements mixture-of-experts (MoE) architecture.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making computers better at understanding speech. Right now, computers have trouble when people speak in different languages during a single conversation. The researchers tried to solve this problem by improving how the computer “listens” to the speech. They came up with three new ideas: a special way of training the computer to understand both languages separately, showing that their method works better than other ways computers do this task, and finding out that their idea fits well together with another technique called mixture-of-experts.

Keywords

» Artificial intelligence  » Encoder  » Mixture of experts