Summary of Mini-sequence Transformer: Optimizing Intermediate Memory For Long Sequences Training, by Cheng Luo et al.
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Trainingby Cheng Luo, Jiawei Zhao, Zhuoming Chen,…
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Trainingby Cheng Luo, Jiawei Zhao, Zhuoming Chen,…
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learningby Emanuele Frascaroli, Aniello Panariello,…
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Databy Xinyi Wang, Antonis Antoniades,…
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencodersby Senthooran Rajamanoharan, Tom Lieberum, Nicolas Sonnerat,…
HeCiX: Integrating Knowledge Graphs and Large Language Models for Biomedical Researchby Prerana Sanjay Kulkarni, Muskaan…
RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoringby Ali Ghiasvand MohammadkhaniFirst submitted to arxiv on:…
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimizationby Audrey Huang,…
Patch-Level Training for Large Language Modelsby Chenze Shao, Fandong Meng, Jie ZhouFirst submitted to arxiv…
Analyzing the Generalization and Reliability of Steering Vectorsby Daniel Tan, David Chanin, Aengus Lynch, Dimitrios…
Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scaleby Ayush Kaushal, Tejas Vaidhya, Arnab…