Summary of Focused Discriminative Training For Streaming Ctc-trained Automatic Speech Recognition Models, by Adnan Haider et al.
Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models
by Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang
First submitted to arxiv on: 23 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel training framework called Focused Discriminative Training (FDT) is proposed to improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models. The approach identifies challenging segments in audio and improves the model’s recognition on those areas, eliminating the need for complex decision-making processes typically required in standard discriminative training methods. Compared to additional fine-tuning with MMI or MWER loss on the encoder, FDT is shown to achieve greater reductions in Word Error Rate (WER) on streaming models trained on LibriSpeech. The method also improves a converged word-piece streaming E2E model trained on 600k hours of assistant and dictation dataset. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper creates a new way to train speech recognition models that gets better at recognizing words in tricky parts of audio. It’s called Focused Discriminative Training (FDT) and it helps the model learn from mistakes. This method is special because it doesn’t need complex decisions about how to organize words, which makes it easier to use. The results show that this new way of training gets better at recognizing words than other methods do. |
Keywords
» Artificial intelligence » Encoder » Fine tuning