Summary of Berter: the Efficient One, by Pradyumna Saligram et al.
BERTer: The Efficient Oneby Pradyumna Saligram, Andrew LanpouthakounFirst submitted to arxiv on: 19 Jul 2024CategoriesMain:…
BERTer: The Efficient Oneby Pradyumna Saligram, Andrew LanpouthakounFirst submitted to arxiv on: 19 Jul 2024CategoriesMain:…
Revisiting Attention for Multivariate Time Series Forecastingby Haixiang WuFirst submitted to arxiv on: 18 Jul…
LiNR: Model Based Neural Retrieval on GPUs at LinkedInby Fedor Borisyuk, Qingquan Song, Mingzhou Zhou,…
Transformers with Stochastic Competition for Tabular Data Modellingby Andreas Voskou, Charalambos Christoforou, Sotirios ChatzisFirst submitted…
Whitening Not Recommended for Classification Tasks in LLMsby Ali Forooghi, Shaghayegh Sadeghi, Jianguo LuFirst submitted…
Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layersby Freya Behrens, Luca…
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representationsby Walter Simoncini, Spyros Gidaris, Andrei…
Improving Hyperbolic Representations via Gromov-Wasserstein Regularizationby Yifei Yang, Wonjun Lee, Dongmian Zou, Gilad LermanFirst submitted…
Balancing the Scales: Reinforcement Learning for Fair Classificationby Leon Eshuijs, Shihan Wang, Antske FokkensFirst submitted…
RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentationby Li Li, Hubert P. H.…