Summary of Self-taught Evaluators, by Tianlu Wang et al.
Self-Taught Evaluators
by Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li
First submitted to arxiv on: 5 Aug 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Model-based evaluation is crucial for successful model development, serving as both a training tool and a replacement for human evaluation. The traditional approach involves collecting large amounts of human preference judgments over model responses, which can be costly and outdated as models improve. This work presents an innovative method to train evaluators without human annotations, utilizing only synthetic training data. By starting with unlabeled instructions, the iterative self-improvement scheme generates contrasting model outputs and trains a Large Language Model-as-a-Judge (LLM-as-a-Judge) to produce reasoning traces and final judgments. The process repeats at each iteration using improved predictions. Without labeled preference data, the Self-Taught Evaluator can significantly improve a strong LLM (Llama3-70B-Instruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench. This performance surpasses commonly used LLM judges like GPT-4 and matches that of top-performing reward models trained with labeled examples. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper presents a new way to train evaluators without using human annotations. It starts by creating instructions and then trains a model to generate different responses based on those instructions. The model learns to make judgments and provide reasoning for its answers. This process happens multiple times, with the model getting better each time. In the end, the trained evaluator can correctly evaluate models 88% of the time, which is almost as good as the best human evaluators. |
Keywords
» Artificial intelligence » Gpt » Large language model