Summary of Jailbreak Defense in a Narrow Domain: Limitations Of Existing Methods and a New Transcript-classifier Approach, by Tony T. Wang et al.
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approachby…