Summary of Non-instructional Fine-tuning: Enabling Instruction-following Capabilities in Pre-trained Language Models Without Instruction-following Data, by Juncheng Xie et al.
Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data
by Juncheng Xie, Shensian Syu, Hung-yi Lee
First submitted to arxiv on: 27 Aug 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The authors propose a novel approach for finetuning large language models (LLMs) to follow instructions without relying on traditional supervised data. They use the first half of random text from OpenWebText as instructions and pre-trained LLMs like GPT-3.5-turbo or GPT-4-turbo to complete the text as responses. Despite this “non-instructional” approach, they find that pre-trained LLMs can still gain instruction-following capabilities after finetuning. This is demonstrated through experiments on several well-known models (e.g., LLaMA-2-7B, LLaMA-3-8B, LLaMA-3-70B, Mistral-7B-v0.1). The results show that the “non-instructional data” also improves some models that underwent supervised finetuning and human preference alignment. Specifically, the authors’ LLaMA-3-70B-Instruct model is comparable to LLaMA-3.1-70B-Instruct on the Arena Hard leaderboard. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research helps large language models learn to follow instructions without needing special data. The authors test this by giving random text from OpenWebText as “instructions” and asking pre-trained models like GPT-3.5-turbo or GPT-4-turbo to complete the text. Surprisingly, these models can still learn to follow instructions even though they don’t have any traditional instruction-related data. The authors try this approach on several different models and find that it makes them better at following instructions. |
Keywords
» Artificial intelligence » Alignment » Gpt » Llama » Supervised