Summary of Deferred Nam: Low-latency Top-k Context Injection Via Deferred Context Encoding For Non-streaming Asr, by Zelin Wu et al.

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

by Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

First submitted to arxiv on: 15 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents an approach to improve speech recognition by incorporating contextual biasing, which enables transcribing important phrases even if they are rare in the training data. The attention-based biasing method allows for end-to-end cotraining of the recognizer and biasing system without requiring separate inference-time components. The method consists of a context encoder, context filter, and cross-attention application. However, the context encoder is a bottleneck that needs to be optimized. The paper shows that by moving the lightweight phrase selection pass before context encoding, it can achieve a speedup of up to 16.1 times and enable biasing to scale to 20K phrases with a maximum pre-decoding delay under 33ms. Additionally, the technique achieves up to a 37.5% relative WER reduction over the baseline without the losses and lightweight phrase selection pass.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper helps machines better understand what people are saying by using context clues like names and words that are important in a conversation. The approach is called attention-based biasing and it’s a way to teach machines to focus on the most important parts of speech. It involves encoding the context, narrowing down the context, and then applying the context to improve recognition. By moving certain steps around, the paper shows how this approach can be faster and more accurate.

Keywords

» Artificial intelligence » Attention » Cross attention » Encoder » Inference

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

by Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Eyeformer: Predicting Personalized Scanpaths with Transformer-guided Reinforcement Learning, by Yue Jiang et al.

Summary of Awareness Of Uncertainty in Classification Using a Multivariate Model and Multi-views, by Alexey Kornaev et al.

Related Posts