Summary of In-context Learning State Vector with Inner and Momentum Optimization, by Dongfang Li et al.
In-Context Learning State Vector with Inner and Momentum Optimization
by Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang
First submitted to arxiv on: 17 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel analysis of Large Language Models’ In-Context Learning (ICL) is presented, focusing on the compressed vectors derived from transformers. The study draws parallels between these vectors and parameters trained with gradient descent, introducing the concept of state vector. Two optimization methods are proposed to refine the state vector: inner and momentum-based gradient descent, inspired by model soup and momentum-based gradient descent. Additionally, a divide-and-conquer aggregation method is introduced for simulating state vector aggregation in multiple example settings. The paper presents extensive experiments using Llama-2 and GPT-J, achieving state-of-the-art performance on diverse tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large Language Models can learn from just a few examples by doing something called In-Context Learning (ICL). Researchers have found that the things learned through ICL can be represented in special compressed vectors. But they don’t know how these vectors work or how to make them better. This paper helps fill this gap by studying these vectors and showing how they’re related to other things in machine learning. The researchers also come up with new ways to improve these vectors, like using a “momentum” method that helps the vector learn from experience. They test their ideas on two big language models (Llama-2 and GPT-J) and show that it works really well. |
Keywords
» Artificial intelligence » Gpt » Gradient descent » Llama » Machine learning » Optimization