Summary of Linear Transformers with Learnable Kernel Functions Are Better In-context Models, by Yaroslav Aksenov et al.
Linear Transformers with Learnable Kernel Functions are Better In-Context Modelsby Yaroslav Aksenov, Nikita Balagansky, Sofia…
Linear Transformers with Learnable Kernel Functions are Better In-Context Modelsby Yaroslav Aksenov, Nikita Balagansky, Sofia…
LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluationby Hongyun Zhou, Xiangyu Lu, Wang Xu,…
Learn To be Efficient: Build Structured Sparsity in Large Language Modelsby Haizhong Zheng, Xiaoyan Bai,…
BlackMamba: Mixture of Experts for State-Space Modelsby Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren MillidgeFirst…
Engineering A Large Language Model From Scratchby Abiodun Finbarrs OketunjiFirst submitted to arxiv on: 30…
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Modelsby Erik Arakelyan, Zhaoqi Liu,…
How Can Large Language Models Understand Spatial-Temporal Data?by Lei Liu, Shuo Yu, Runze Wang, Zhenxun…
Learning Shortcuts: On the Misleading Promise of NLU in Language Modelsby Geetanjali Bihani, Julia Taylor…
Machine Translation with Large Language Models: Prompt Engineering for Persian, English, and Russian Directionsby Nooshin…
We Need to Talk About Classification Evaluation Metrics in NLPby Peter Vickers, Loïc Barrault, Emilio…