Loading Now

Summary of Deferred Nam: Low-latency Top-k Context Injection Via Deferred Context Encoding For Non-streaming Asr, by Zelin Wu et al.


Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

by Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

First submitted to arxiv on: 15 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents an approach to improve speech recognition by incorporating contextual biasing, which enables transcribing important phrases even if they are rare in the training data. The attention-based biasing method allows for end-to-end cotraining of the recognizer and biasing system without requiring separate inference-time components. The method consists of a context encoder, context filter, and cross-attention application. However, the context encoder is a bottleneck that needs to be optimized. The paper shows that by moving the lightweight phrase selection pass before context encoding, it can achieve a speedup of up to 16.1 times and enable biasing to scale to 20K phrases with a maximum pre-decoding delay under 33ms. Additionally, the technique achieves up to a 37.5% relative WER reduction over the baseline without the losses and lightweight phrase selection pass.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper helps machines better understand what people are saying by using context clues like names and words that are important in a conversation. The approach is called attention-based biasing and it’s a way to teach machines to focus on the most important parts of speech. It involves encoding the context, narrowing down the context, and then applying the context to improve recognition. By moving certain steps around, the paper shows how this approach can be faster and more accurate.

Keywords

» Artificial intelligence  » Attention  » Cross attention  » Encoder  » Inference