Loading Now

Summary of Calibrating Language Models with Adaptive Temperature Scaling, by Johnathan Xie et al.


Calibrating Language Models with Adaptive Temperature Scaling

by Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric Mitchell, Chelsea Finn

First submitted to arxiv on: 29 Sep 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper introduces a novel post-hoc calibration method called Adaptive Temperature Scaling (ATS) for large language models (LLMs) that have undergone fine-tuning with reinforcement learning from human feedback (RLHF). The authors show that the effectiveness of LLMs is not only measured by their ability to generate accurate outputs but also by their calibration, or how well their confidence scores reflect the probability of their outputs being correct. They demonstrate that after RLHF fine-tuning, the calibration of these models degrades significantly. To address this issue, ATS predicts a temperature scaling parameter for each token prediction based on token-level features and fits over a standard supervised fine-tuning (SFT) dataset. The adaptive nature of ATS addresses the varying degrees of calibration shift that can occur after RLHF fine-tuning. The authors evaluate ATS on three downstream natural language evaluation benchmarks and show that it improves calibration by over 10-50% compared to prior calibration methods, without impeding performance improvements from RLHF.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper is about a new way to make language models better at guessing how good they are at predicting words. Language models are programs that can generate text, but they’re not perfect and sometimes get things wrong. To make them more accurate, researchers fine-tune the models using feedback from humans. However, this process makes the models less reliable in terms of knowing when they’re right or wrong. The new method, called Adaptive Temperature Scaling (ATS), helps fix this problem by adjusting the model’s confidence levels based on the words it’s trying to predict. This approach improves the model’s ability to tell when it’s correct and when it’s not, making it more reliable overall.

Keywords

» Artificial intelligence  » Fine tuning  » Probability  » Reinforcement learning from human feedback  » Rlhf  » Supervised  » Temperature  » Token