Summary of Recovering the Pre-fine-tuning Weights Of Generative Models, by Eliahu Horwitz et al.
Recovering the Pre-Fine-Tuning Weights of Generative Models
by Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen
First submitted to arxiv on: 15 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The dominant paradigm in generative modeling involves pre-training on a large-scale dataset and then aligning the model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. However, this assumption is often false. The authors present Spectral DeTuning, a method that can recover the exact pre-fine-tuning model weights using a few low-rank (LoRA) fine-tuned models. This approach exploits a new vulnerability against large-scale models such as Stable Diffusion and Mistral. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Generative modeling involves training machines to create new content like images or text. Usually, this is done by first training the machine on lots of data and then making sure it follows human rules. But what if someone could get back to how the machine was before it followed those rules? That’s exactly what some smart people did. They created a way to take a machine that has been trained to follow rules and figure out how it worked before it started following rules. This means that even though we thought our machines were safe, they’re not as safe as we thought. |
Keywords
* Artificial intelligence * Diffusion * Fine tuning * Lora