Summary of Latentqa: Teaching Llms to Decode Activations Into Natural Language, by Alexander Pan and Lijie Chen and Jacob Steinhardt
LatentQA: Teaching LLMs to Decode Activations Into Natural Languageby Alexander Pan, Lijie Chen, Jacob SteinhardtFirst…