Summary of Scaling Law For Language Models Training Considering Batch Size, by Xian Shuai et al.
Scaling Law for Language Models Training Considering Batch Sizeby Xian Shuai, Yiding Wang, Yimeng Wu,…
Scaling Law for Language Models Training Considering Batch Sizeby Xian Shuai, Yiding Wang, Yimeng Wu,…
Differential learning kinetics govern the transition from memorization to generalization during in-context learningby Alex Nguyen,…
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Modelsby Yanxi Chen,…
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokensby Xu…
Towards Precise Scaling Laws for Video Diffusion Transformersby Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke…
Scaling Laws for Black box Adversarial Attacksby Chuan Liu, Huanran Chen, Yichi Zhang, Yinpeng Dong,…
Loss-to-Loss Prediction: Scaling Laws for All Datasetsby David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach,…
Ultra-Sparse Memory Networkby Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Zeng, Ran Guo,…
Circuit Complexity Bounds for RoPE-based Transformer Architectureby Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long,…
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional…