Summary of When Attention Sink Emerges in Language Models: An Empirical View, by Xiangming Gu et al.
When Attention Sink Emerges in Language Models: An Empirical Viewby Xiangming Gu, Tianyu Pang, Chao…
When Attention Sink Emerges in Language Models: An Empirical Viewby Xiangming Gu, Tianyu Pang, Chao…
Towards Calibrated Losses for Adversarial Robust Reject Option Classificationby Vrund Shah, Tejas Chaudhari, Naresh ManwaniFirst…
Nonuniform random feature models using derivative informationby Konstantin Pieper, Zezhong Zhang, Guannan ZhangFirst submitted to…
Is uniform expressivity too restrictive? Towards efficient expressivity of graph neural networksby Sammy Khalife, Josué…
Theory, Analysis, and Best Practices for Sigmoid Self-Attentionby Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris…
On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmeticby Geonho Hwang, Yeachan Park, Sejun…
Artificial Neural Network and Deep Learning: Fundamentals and Theoryby M. M. HammadFirst submitted to arxiv…
Improving Nonlinear Projection Heads using Pretrained Autoencoder Embeddingsby Andreas Schliebitz, Heiko Tapken, Martin AtzmuellerFirst submitted…