Summary of Scaling the Vocabulary Of Non-autoregressive Models For Efficient Generative Retrieval, by Ravisri Valluri et al.

Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval

by Ravisri Valluri, Akash Kumar Mohankumar, Kushal Dave, Amit Singh, Jian Jiao, Manik Varma, Gaurav Sinha

First submitted to arxiv on: 10 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper introduces a new approach to Information Retrieval called Generative Retrieval, which reframes the task as a constrained generation problem using Autoregressive (AR) language models. However, AR-based methods suffer from high inference latency and cost compared to traditional dense retrieval techniques. The authors investigate fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. They find that standard NAR models alleviate latency and cost concerns but exhibit a significant drop in retrieval performance due to their inability to capture dependencies between target tokens. To address this, the authors propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases, reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the larger vocabulary. The results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper introduces Generative Retrieval, which uses Autoregressive language models to retrieve information. The problem is that these AR-based methods are slow and expensive. To fix this, the authors look at Non-autoregressive (NAR) language models as an alternative. They find that NAR models are faster but not as good at finding what we’re looking for because they can’t understand relationships between words. To solve this, they create a new approach called PIXAR that lets NAR models understand more kinds of phrases and sentences. This makes it better at finding what we want, while still being fast.

Keywords

» Artificial intelligence » Autoregressive » Inference » Optimization » Token

Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval

by Ravisri Valluri, Akash Kumar Mohankumar, Kushal Dave, Amit Singh, Jian Jiao, Manik Varma, Gaurav Sinha

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3d-slowfast Network, by Manvik Pasula and Pramit Saha

Summary of Stable Minima Cannot Overfit in Univariate Relu Networks: Generalization by Large Step Sizes, By Dan Qiao et al.

Related Posts