Summary of Optimizing Attention with Mirror Descent: Generalized Max-margin Token Selection, by Addison Kristanto Julistiono et al.
Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selectionby Addison Kristanto Julistiono, Davoud Ataee Tarzanagh,…