Summary of Ravel: Evaluating Interpretability Methods on Disentangling Language Model Representations, by Jing Huang et al.
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representationsby Jing Huang, Zhengxuan Wu, Christopher Potts,…