Loading Now

Summary of Inference and Verbalization Functions During In-context Learning, by Junyi Tao et al.


Inference and Verbalization Functions During In-Context Learning

by Junyi Tao, Xiaoyin Chen, Nelson F. Liu

First submitted to arxiv on: 12 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large language models (LMs) have the ability to learn from a few demonstrations to solve new tasks during inference. However, previous research has shown that in some settings, LMs are minimally affected by irrelevant labels. This paper hypothesizes that LMs perform in-context learning with irrelevant labels through two sequential processes: an inference function and a verbalization function. The inference function solves the task, while the verbalization function maps the inferred answer to the label space. The key finding is that the inference function is invariant to remappings of the label space, enabling LMs to share the same inference function across settings with different label words. Empirical validation is provided through controlled layer-wise interchange intervention experiments on multiple datasets and tasks (natural language inference, sentiment analysis, and topic classification) using various open-sourced models, including GEMMA-7B, MISTRAL-7B-V0.3, GEMMA-2-27B, and LLAMA-3.1-70B.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine a computer program that can learn from just a few examples to do new tasks. This paper explores how this program works when it’s given wrong or irrelevant information during learning. They think the program has two parts: one that solves the task and another that connects the solution to the correct answer. The exciting part is that the first part, which does the actual solving, doesn’t care what words are used to describe the answers. This means the program can use the same solving part for different tasks with different labels. They tested this idea on various datasets and found it worked across multiple tasks.

Keywords

» Artificial intelligence  » Classification  » Inference  » Llama