Summary of Inference and Verbalization Functions During In-context Learning, by Junyi Tao et al.
Inference and Verbalization Functions During In-Context Learning
by Junyi Tao, Xiaoyin Chen, Nelson F. Liu
First submitted to arxiv on: 12 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Large language models (LMs) have the ability to learn from a few demonstrations to solve new tasks during inference. However, previous research has shown that in some settings, LMs are minimally affected by irrelevant labels. This paper hypothesizes that LMs perform in-context learning with irrelevant labels through two sequential processes: an inference function and a verbalization function. The inference function solves the task, while the verbalization function maps the inferred answer to the label space. The key finding is that the inference function is invariant to remappings of the label space, enabling LMs to share the same inference function across settings with different label words. Empirical validation is provided through controlled layer-wise interchange intervention experiments on multiple datasets and tasks (natural language inference, sentiment analysis, and topic classification) using various open-sourced models, including GEMMA-7B, MISTRAL-7B-V0.3, GEMMA-2-27B, and LLAMA-3.1-70B. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine a computer program that can learn from just a few examples to do new tasks. This paper explores how this program works when it’s given wrong or irrelevant information during learning. They think the program has two parts: one that solves the task and another that connects the solution to the correct answer. The exciting part is that the first part, which does the actual solving, doesn’t care what words are used to describe the answers. This means the program can use the same solving part for different tasks with different labels. They tested this idea on various datasets and found it worked across multiple tasks. |
Keywords
» Artificial intelligence » Classification » Inference » Llama