Summary of Selective Attention Improves Transformer, by Yaniv Leviathan et al.
Selective Attention Improves Transformerby Yaniv Leviathan, Matan Kalman, Yossi MatiasFirst submitted to arxiv on: 3…
Selective Attention Improves Transformerby Yaniv Leviathan, Matan Kalman, Yossi MatiasFirst submitted to arxiv on: 3…
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selectionby Song Li, Yang Tan, Song Ke,…
Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks…
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networksby Rui Hu,…
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QAby Eduard Tulchinskii, Laida Kushnareva,…
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Accelerationby Jintao Zhang, Jia wei, Haofeng Huang, Pengle…
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysisby Hongkang Li, Meng Wang, Songtao…
HATFormer: Historic Handwritten Arabic Text Recognition with Transformersby Adrian Chan, Anupam Mijar, Mehreen Saeed, Chau-Wai…
HyperBrain: Anomaly Detection for Temporal Hypergraph Brain Networksby Sadaf Sadeghian, Xiaoxiao Li, Margo SeltzerFirst submitted…
Searching for Efficient Linear Layers over a Continuous Space of Structured Matricesby Andres Potapczynski, Shikai…