Summary of Recovering the Pre-fine-tuning Weights Of Generative Models, by Eliahu Horwitz et al.

Recovering the Pre-Fine-Tuning Weights of Generative Models

by Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

First submitted to arxiv on: 15 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The dominant paradigm in generative modeling involves pre-training on a large-scale dataset and then aligning the model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. However, this assumption is often false. The authors present Spectral DeTuning, a method that can recover the exact pre-fine-tuning model weights using a few low-rank (LoRA) fine-tuned models. This approach exploits a new vulnerability against large-scale models such as Stable Diffusion and Mistral.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Generative modeling involves training machines to create new content like images or text. Usually, this is done by first training the machine on lots of data and then making sure it follows human rules. But what if someone could get back to how the machine was before it followed those rules? That’s exactly what some smart people did. They created a way to take a machine that has been trained to follow rules and figure out how it worked before it started following rules. This means that even though we thought our machines were safe, they’re not as safe as we thought.

Keywords

* Artificial intelligence * Diffusion * Fine tuning * Lora

Recovering the Pre-Fine-Tuning Weights of Generative Models

by Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Reward Generalization in Rlhf: a Topological Perspective, by Tianyi Qiu et al.

Summary of Backdoor Attack Against One-class Sequential Anomaly Detection Models, by He Cheng and Shuhan Yuan

Related Posts