Summary of Distilling System 2 Into System 1, by Ping Yu et al.

Distilling System 2 into System 1

by Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov

First submitted to arxiv on: 8 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research paper investigates methods for “compiling” (distilling) higher-quality outputs from large language models’ (LLMs’) System 2 techniques back into their original generations without intermediate reasoning token sequences. The System 2 approach involves extra compute during inference to produce better final responses, which has been explored in previous papers such as Chain-of-Thought (Wei et al., 2022), Rephrase and Respond (Deng et al., 2023a), System 2 Attention (Weston and Sukhbaatar, 2023), and Branch-Solve-Merge (Saha et al., 2023). The authors show that several self-supervised methods can successfully distill these higher-quality outputs with improved results compared to the original System 1 performance, and with less inference cost than System 2. This research has implications for future continually learning AI systems, enabling them to focus on tasks they cannot yet do well.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about finding ways to make big language models better at answering questions without doing extra thinking in between. Right now, these models can be trained to think more before giving an answer, which makes the answer better. The researchers looked at how to take that extra thinking and put it back into the model’s original answers, so they don’t have to do as much extra work. They found some ways to do this that actually make the answers even better, but with less effort than before. This could be important for future AI systems that will keep learning and improving over time.

Keywords

» Artificial intelligence » Attention » Inference » Self supervised » Token

Distilling System 2 into System 1

by Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Contrastive Learning Of Preferences with a Contextual Infonce Loss, by Timo Bertram et al.

Summary of Real-time Spacecraft Pose Estimation Using Mixed-precision Quantized Neural Network on Cots Reconfigurable Mpsoc, by Julien Posso et al.

Related Posts