Summary of Mechanics Of Next Token Prediction with Self-attention, by Yingcong Li et al.
Mechanics of Next Token Prediction with Self-Attentionby Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit…
Mechanics of Next Token Prediction with Self-Attentionby Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit…
Conditional computation in neural networks: principles and research trendsby Simone Scardapane, Alessandro Baiocchi, Alessio Devoto,…
Transformers Learn Low Sensitivity Functions: Investigations and Implicationsby Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott…
The pitfalls of next-token predictionby Gregor Bachmann, Vaishnavh NagarajanFirst submitted to arxiv on: 11 Mar…
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantificationby Ekaterina Fadeeva, Aleksandr Rubashevskii,…
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxyby Yu Zhu, Chuxiong Sun,…
Learning to Decode Collaboratively with Multiple Language Modelsby Shannon Zejiang Shen, Hunter Lang, Bailin Wang,…
On the Origins of Linear Representations in Large Language Modelsby Yibo Jiang, Goutham Rajendran, Pradeep…
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmaxby Tobias…
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serveby Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree…