Summary of Scaling Optimal Lr Across Token Horizons, by Johan Bjorck et al.
Scaling Optimal LR Across Token Horizonsby Johan Bjorck, Alon Benhaim, Vishrav Chaudhary, Furu Wei, Xia…
Scaling Optimal LR Across Token Horizonsby Johan Bjorck, Alon Benhaim, Vishrav Chaudhary, Furu Wei, Xia…
Calibrating Language Models with Adaptive Temperature Scalingby Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric…
Exploring Token Pruning in Vision State Space Modelsby Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu…
Review of Digital Asset Development with Graph Neural Network Unlearningby Zara LisbonFirst submitted to arxiv…
On the Optimal Memorization Capacity of Transformersby Tokio Kajitsuka, Issei SatoFirst submitted to arxiv on:…
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Modelsby Hui-Po Wang,…
RmGPT: Rotating Machinery Generative Pretrained Modelby Yilin Wang, Yifei Yu, Kong Sun, Peixuan Lei, Yuxuan…
Non-asymptotic Convergence of Training Transformers for Next-token Predictionby Ruiquan Huang, Yingbin Liang, Jing YangFirst submitted…
Counterfactual Token Generation in Large Language Modelsby Ivi Chatzi, Nina Corvelo Benz, Eleni Straitouri, Stratis…
Characterizing stable regions in the residual stream of LLMsby Jett Janiak, Jacek Karwowski, Chatrik Singh…