Loading Now

Summary of In-context Learning State Vector with Inner and Momentum Optimization, by Dongfang Li et al.


In-Context Learning State Vector with Inner and Momentum Optimization

by Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

First submitted to arxiv on: 17 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel analysis of Large Language Models’ In-Context Learning (ICL) is presented, focusing on the compressed vectors derived from transformers. The study draws parallels between these vectors and parameters trained with gradient descent, introducing the concept of state vector. Two optimization methods are proposed to refine the state vector: inner and momentum-based gradient descent, inspired by model soup and momentum-based gradient descent. Additionally, a divide-and-conquer aggregation method is introduced for simulating state vector aggregation in multiple example settings. The paper presents extensive experiments using Llama-2 and GPT-J, achieving state-of-the-art performance on diverse tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large Language Models can learn from just a few examples by doing something called In-Context Learning (ICL). Researchers have found that the things learned through ICL can be represented in special compressed vectors. But they don’t know how these vectors work or how to make them better. This paper helps fill this gap by studying these vectors and showing how they’re related to other things in machine learning. The researchers also come up with new ways to improve these vectors, like using a “momentum” method that helps the vector learn from experience. They test their ideas on two big language models (Llama-2 and GPT-J) and show that it works really well.

Keywords

» Artificial intelligence  » Gpt  » Gradient descent  » Llama  » Machine learning  » Optimization