Loading Now

Summary of Representation Tuning, by Christopher M. Ackerman


Representation Tuning

by Christopher M. Ackerman

First submitted to arxiv on: 11 Sep 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Activation engineering for online control of large language models (LLMs) is gaining popularity. This work extends inference-time steering by directly tuning vectors representing behavioral directions into the model, eliminating the need for online control. We identify activation vectors related to honesty in an open-source LLM (Llama-2-13b-chat), demonstrate its effect on model output, and show that fine-tuning these vectors with a dual loss function based on cosine similarity yields stronger results than online steering or token-based loss alone. The approach has potential as a safety measure, and we provide code, data, and tuned models for reproducibility.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research makes it easier to control large language models (LLMs) by changing what they say. Normally, these models are controlled in real-time, but this new method lets us change their behavior beforehand. The researchers used a special model called Llama-2-13b-chat and found that they could make its output more or less honest by adding certain “directions” to the way it thinks. They also compared this method to others and found that it worked better in some cases. This new approach might help keep AI language models safe and responsible.

Keywords

» Artificial intelligence  » Cosine similarity  » Fine tuning  » Inference  » Llama  » Loss function  » Token