Summary of Curse Of Attention: a Kernel-based Perspective For Why Transformers Fail to Generalize on Time Series Forecasting and Beyond, by Yekun Ke et al.
Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series…