Summary of Benign Overfitting in Single-head Attention, by Roey Magen et al.
Benign Overfitting in Single-Head Attentionby Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu,…
Benign Overfitting in Single-Head Attentionby Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu,…
Toward generalizable learning of all (linear) first-order methods via memory augmented Transformersby Sanchayan Dutta, Suvrit…
Utilizing Lyapunov Exponents in designing deep neural networksby Tirthankar MittraFirst submitted to arxiv on: 8…
Extended convexity and smoothness and their applications in deep learningby Binchuan Qi, Wei Gong, Li…
Score-Based Variational Inference for Inverse Problemsby Zhipeng Xue, Penghao Cai, Xiaojun Yuan, Xiqi GaoFirst submitted…
On the Impacts of the Random Initialization in the Neural Tangent Kernel Theoryby Guhan Chen,…
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descentby Bingrui Li, Wei…
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapseby Arthur Jacot, Peter Súkeník,…
SGD with memory: fundamental properties and stochastic accelerationby Dmitry Yarotsky, Maksim VelikanovFirst submitted to arxiv…
Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Mapby Yoonsoo Nam,…