Summary of Language Models Are Homer Simpson! Safety Re-alignment Of Fine-tuned Language Models Through Task Arithmetic, by Rishabh Bhardwaj et al.
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmeticby Rishabh…