Summary of How Transformers Learn Structured Data: Insights From Hierarchical Filtering, by Jerome Garnier-brun et al.
How transformers learn structured data: insights from hierarchical filteringby Jerome Garnier-Brun, Marc Mézard, Emanuele Moscato,…