Summary of Michelangelo: Long Context Evaluations Beyond Haystacks Via Latent Structure Queries, by Kiran Vodrahalli et al.
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries
by Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shivanna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, Rohan Anil, Ethan Dyer, Siamak Shakeri, Roopali Vij, Harsh Mehta, Vinay Ramasesh, Quoc Le, Ed Chi, Yifeng Lu, Orhan Firat, Angeliki Lazaridou, Jean-Baptiste Lespiau, Nithya Attaluri, Kate Olszewska
First submitted to arxiv on: 19 Sep 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces Michelangelo, an evaluation framework for large language models that measures their ability to retrieve information from longer contexts. The framework, called Latent Structure Queries (LSQ), constructs tasks that require the model to “chisel away” irrelevant information and reveal a latent structure in the context. This is achieved by querying the model for details about this structure. The authors demonstrate three diagnostic evaluations across code and natural-language domains, which provide a stronger signal of long-context language model capabilities. The evaluation is also easy to automatically score. The paper shows that there is significant room for improvement in synthesizing long-context information. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Michelangelo is an evaluation tool for language models that helps figure out if they can understand and use information from longer texts. Normally, these models are tested on short pieces of text, but this framework tests them on longer contexts to see how well they can grasp the main ideas. The authors created three different ways to test the models using code and natural languages. They showed that current models aren’t very good at this yet, so there’s room for improvement. |
Keywords
* Artificial intelligence * Language model