Loading Now

Summary of Scaling Laws For Forgetting When Fine-tuning Large Language Models, by Damjan Kalajdzievski


Scaling Laws for Forgetting When Fine-Tuning Large Language Models

by Damjan Kalajdzievski

First submitted to arxiv on: 11 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the problem of “forgetting” when fine-tuning pre-trained large language models (LLMs) on a downstream task, revealing that even parameter-efficient strategies like Low-Rank Adapters (LoRA) still suffer from catastrophic forgetting. The study finds an inverse linear relationship between fine-tuning performance and forgetting amount, as well as precise scaling laws showing forgetting increases with the number of parameters fine-tuned and update steps. The research also examines the impact of forgetting on knowledge, reasoning, and safety guardrails trained into Llama 2 7B chat. The findings suggest that forgetting cannot be avoided through early stopping or parameter variations, highlighting the need for future research to develop fine-tuning schemes mitigating forgetting.
Low GrooveSquid.com (original content) Low Difficulty Summary
When pre-trained language models are fine-tuned for a new task, they often forget what they knew before. This “forgetting” can be bad news because it means the model might not work well on its original job anymore. Researchers found that even special techniques to make fine-tuning more efficient still cause forgetting to happen. They also discovered that forgetting gets worse as you update the model more times or add more new information to learn. The study shows how forgetting affects a chatbot’s ability to reason and be safe, which is important because these models are used in many applications today. Overall, this research suggests we need to find ways to make fine-tuning work better without causing models to forget what they learned before.

Keywords

* Artificial intelligence  * Early stopping  * Fine tuning  * Llama  * Lora  * Parameter efficient  * Scaling laws