Summary of Scaling the Vocabulary Of Non-autoregressive Models For Efficient Generative Retrieval, by Ravisri Valluri et al.
Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval
by Ravisri Valluri, Akash Kumar Mohankumar, Kushal Dave, Amit Singh, Jian Jiao, Manik Varma, Gaurav Sinha
First submitted to arxiv on: 10 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Information Retrieval (cs.IR); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces a new approach to Information Retrieval called Generative Retrieval, which reframes the task as a constrained generation problem using Autoregressive (AR) language models. However, AR-based methods suffer from high inference latency and cost compared to traditional dense retrieval techniques. The authors investigate fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. They find that standard NAR models alleviate latency and cost concerns but exhibit a significant drop in retrieval performance due to their inability to capture dependencies between target tokens. To address this, the authors propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases, reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the larger vocabulary. The results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper introduces Generative Retrieval, which uses Autoregressive language models to retrieve information. The problem is that these AR-based methods are slow and expensive. To fix this, the authors look at Non-autoregressive (NAR) language models as an alternative. They find that NAR models are faster but not as good at finding what we’re looking for because they can’t understand relationships between words. To solve this, they create a new approach called PIXAR that lets NAR models understand more kinds of phrases and sentences. This makes it better at finding what we want, while still being fast. |
Keywords
» Artificial intelligence » Autoregressive » Inference » Optimization » Token