Loading Now

Summary of Recovering the Pre-fine-tuning Weights Of Generative Models, by Eliahu Horwitz et al.


Recovering the Pre-Fine-Tuning Weights of Generative Models

by Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

First submitted to arxiv on: 15 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The dominant paradigm in generative modeling involves pre-training on a large-scale dataset and then aligning the model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. However, this assumption is often false. The authors present Spectral DeTuning, a method that can recover the exact pre-fine-tuning model weights using a few low-rank (LoRA) fine-tuned models. This approach exploits a new vulnerability against large-scale models such as Stable Diffusion and Mistral.
Low GrooveSquid.com (original content) Low Difficulty Summary
Generative modeling involves training machines to create new content like images or text. Usually, this is done by first training the machine on lots of data and then making sure it follows human rules. But what if someone could get back to how the machine was before it followed those rules? That’s exactly what some smart people did. They created a way to take a machine that has been trained to follow rules and figure out how it worked before it started following rules. This means that even though we thought our machines were safe, they’re not as safe as we thought.

Keywords

* Artificial intelligence  * Diffusion  * Fine tuning  * Lora