Summary of How to Train Long-context Language Models (effectively), by Tianyu Gao et al.
How to Train Long-Context Language Models (Effectively)by Tianyu Gao, Alexander Wettig, Howard Yen, Danqi ChenFirst…
How to Train Long-Context Language Models (Effectively)by Tianyu Gao, Alexander Wettig, Howard Yen, Danqi ChenFirst…
Dynamic Gradient Alignment for Online Data Mixingby Simin Fan, David Grangier, Pierre AblinFirst submitted to…
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QAby Eduard Tulchinskii, Laida Kushnareva,…
MenakBERT – Hebrew Diacriticizerby Ido Cohen, Jacob Gidron, Idan PintoFirst submitted to arxiv on: 3…
Mitigating Memorization In Language Modelsby Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb Geniesse,…
LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decompositionby Alireza Kheirandish, Duo Xu, Faramarz FekriFirst submitted…
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspectiveby Zeyu Gan,…
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Modelsby Philipp Mondorf, Sondre Wold, Barbara PlankFirst…
In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasksby Dingzirui Wang, Xuanliang Zhang, Qiguang Chen,…
Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Trainingby Qingyang Li,…