Summary of Scaling Laws For Forgetting When Fine-tuning Large Language Models, by Damjan Kalajdzievski
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
by Damjan Kalajdzievski
First submitted to arxiv on: 11 Jan 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates the problem of “forgetting” when fine-tuning pre-trained large language models (LLMs) on a downstream task, revealing that even parameter-efficient strategies like Low-Rank Adapters (LoRA) still suffer from catastrophic forgetting. The study finds an inverse linear relationship between fine-tuning performance and forgetting amount, as well as precise scaling laws showing forgetting increases with the number of parameters fine-tuned and update steps. The research also examines the impact of forgetting on knowledge, reasoning, and safety guardrails trained into Llama 2 7B chat. The findings suggest that forgetting cannot be avoided through early stopping or parameter variations, highlighting the need for future research to develop fine-tuning schemes mitigating forgetting. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary When pre-trained language models are fine-tuned for a new task, they often forget what they knew before. This “forgetting” can be bad news because it means the model might not work well on its original job anymore. Researchers found that even special techniques to make fine-tuning more efficient still cause forgetting to happen. They also discovered that forgetting gets worse as you update the model more times or add more new information to learn. The study shows how forgetting affects a chatbot’s ability to reason and be safe, which is important because these models are used in many applications today. Overall, this research suggests we need to find ways to make fine-tuning work better without causing models to forget what they learned before. |
Keywords
* Artificial intelligence * Early stopping * Fine tuning * Llama * Lora * Parameter efficient * Scaling laws