Summary of Focused Discriminative Training For Streaming Ctc-trained Automatic Speech Recognition Models, by Adnan Haider et al.

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

by Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang

First submitted to arxiv on: 23 Aug 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel training framework called Focused Discriminative Training (FDT) is proposed to improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models. The approach identifies challenging segments in audio and improves the model’s recognition on those areas, eliminating the need for complex decision-making processes typically required in standard discriminative training methods. Compared to additional fine-tuning with MMI or MWER loss on the encoder, FDT is shown to achieve greater reductions in Word Error Rate (WER) on streaming models trained on LibriSpeech. The method also improves a converged word-piece streaming E2E model trained on 600k hours of assistant and dictation dataset.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper creates a new way to train speech recognition models that gets better at recognizing words in tricky parts of audio. It’s called Focused Discriminative Training (FDT) and it helps the model learn from mistakes. This method is special because it doesn’t need complex decisions about how to organize words, which makes it easier to use. The results show that this new way of training gets better at recognizing words than other methods do.

Keywords

» Artificial intelligence » Encoder » Fine tuning

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

by Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Vale: a Multimodal Visual and Language Explanation Framework For Image Classifiers Using Explainable Ai and Language Models, by Purushothaman Natarajan and Athira Nambiar

Summary of A Web-based Solution For Federated Learning with Llm-based Automation, by Chamith Mawela and Chaouki Ben Issaid and Mehdi Bennis

Related Posts