Summary of Representation Tuning, by Christopher M. Ackerman
Representation Tuning
by Christopher M. Ackerman
First submitted to arxiv on: 11 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Activation engineering for online control of large language models (LLMs) is gaining popularity. This work extends inference-time steering by directly tuning vectors representing behavioral directions into the model, eliminating the need for online control. We identify activation vectors related to honesty in an open-source LLM (Llama-2-13b-chat), demonstrate its effect on model output, and show that fine-tuning these vectors with a dual loss function based on cosine similarity yields stronger results than online steering or token-based loss alone. The approach has potential as a safety measure, and we provide code, data, and tuned models for reproducibility. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research makes it easier to control large language models (LLMs) by changing what they say. Normally, these models are controlled in real-time, but this new method lets us change their behavior beforehand. The researchers used a special model called Llama-2-13b-chat and found that they could make its output more or less honest by adding certain “directions” to the way it thinks. They also compared this method to others and found that it worked better in some cases. This new approach might help keep AI language models safe and responsible. |
Keywords
» Artificial intelligence » Cosine similarity » Fine tuning » Inference » Llama » Loss function » Token