Summary of More Expressive Attention with Negative Weights, by Ang Lv et al.
More Expressive Attention with Negative Weightsby Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu…
More Expressive Attention with Negative Weightsby Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu…
Structure Matters: Dynamic Policy Gradientby Sara Klein, Xiangyuan Zhang, Tamer Başar, Simon Weissmann, Leif DöringFirst…
Impact of white noise in artificial neural networks trained for classification: performance and noise mitigation…
LASER: Attention with Exponential Transformationby Sai Surya Duvvuri, Inderjit S. DhillonFirst submitted to arxiv on:…
PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendationby Weiqin Yang, Jiawei Chen,…
Joint Training for Selective Predictionby Zhaohui Li, Rebecca J. PassonneauFirst submitted to arxiv on: 31…
Rethinking Softmax: Self-Attention with Polynomial Activationsby Hemanth Saratchandran, Jianqiao Zheng, Yiping Ji, Wenbo Zhang, Simon…
Stick-breaking Attentionby Shawn Tan, Yikang Shen, Songlin Yang, Aaron Courville, Rameswar PandaFirst submitted to arxiv…
Methods of improving LLM training stabilityby Oleg Rybakov, Mike Chrzanowski, Peter Dykas, Jinze Xue, Ben…
Calibration of Ordinal Regression Networksby Daehwan Kim, Haejun Chung, Ikbeom JangFirst submitted to arxiv on:…