Summary of Destein: Navigating Detoxification Of Language Models Via Universal Steering Pairs and Head-wise Activation Fusion, by Yu Li et al.
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusionby Yu…