Summary of Scaling Laws For Forgetting When Fine-tuning Large Language Models, by Damjan Kalajdzievski

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

by Damjan Kalajdzievski

First submitted to arxiv on: 11 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates the problem of “forgetting” when fine-tuning pre-trained large language models (LLMs) on a downstream task, revealing that even parameter-efficient strategies like Low-Rank Adapters (LoRA) still suffer from catastrophic forgetting. The study finds an inverse linear relationship between fine-tuning performance and forgetting amount, as well as precise scaling laws showing forgetting increases with the number of parameters fine-tuned and update steps. The research also examines the impact of forgetting on knowledge, reasoning, and safety guardrails trained into Llama 2 7B chat. The findings suggest that forgetting cannot be avoided through early stopping or parameter variations, highlighting the need for future research to develop fine-tuning schemes mitigating forgetting.
Low	GrooveSquid.com (original content)	Low Difficulty Summary When pre-trained language models are fine-tuned for a new task, they often forget what they knew before. This “forgetting” can be bad news because it means the model might not work well on its original job anymore. Researchers found that even special techniques to make fine-tuning more efficient still cause forgetting to happen. They also discovered that forgetting gets worse as you update the model more times or add more new information to learn. The study shows how forgetting affects a chatbot’s ability to reason and be safe, which is important because these models are used in many applications today. Overall, this research suggests we need to find ways to make fine-tuning work better without causing models to forget what they learned before.

Keywords

* Artificial intelligence * Early stopping * Fine tuning * Llama * Lora * Parameter efficient * Scaling laws

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

by Damjan Kalajdzievski

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Correlated Quantization For Faster Nonconvex Distributed Optimization, by Andrei Panferov et al.

Summary of An Experimental Evaluation Of Deep Reinforcement Learning Algorithms For Hvac Control, by Antonio Manjavacas et al.

Related Posts