Scaling laws – Page 2 – GrooveSquid.com

July 13, 2025

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabulariesby Chaofan Tao, Qian Liu, Longxu Dou,…

July 13, 2025

Weighted Grouped Query Attention in Transformersby Sai Sena Chinnakonduru, Astarag MohapatraFirst submitted to arxiv on:…

July 13, 2025

Merlin: A Vision Language Foundation Model for 3D Computed Tomographyby Louis Blankemeier, Joseph Paul Cohen,…

July 13, 2025

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Predictionby Keyu Tian, Yi Jiang, Zehuan Yuan,…

July 13, 2025

Algorithmic progress in language modelsby Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman,…

July 13, 2025

Large Language Models: A Surveyby Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher,…

July 13, 2025

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learningby Moritz Reuss, Jyothish…

July 13, 2025

AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Lawsby Oren…

July 13, 2025

Neural Scaling Laws Rooted in the Data Distributionby Ari BrillFirst submitted to arxiv on: 10…

July 13, 2025

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across familiesby Felipe Maia Polo,…