Summary of What Does It Mean to Be a Transformer? Insights From a Theoretical Hessian Analysis, by Weronika Ormaniec et al.
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysisby Weronika…
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysisby Weronika…
Liger Kernel: Efficient Triton Kernels for LLM Trainingby Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan…
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Pathsby Yew Ken Chia, Guizhen…
COME: Test-time adaption by Conservatively Minimizing Entropyby Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao,…
AFlow: Automating Agentic Workflow Generationby Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen,…
Hard-Constrained Neural Networks with Universal Approximation Guaranteesby Youngjae Min, Anoopkumar Sonar, Navid AzizanFirst submitted to…
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processesby Juan Sebastian…
Principled Bayesian Optimisation in Collaboration with Human Expertsby Wenjie Xu, Masaki Adachi, Colin N. Jones,…
A Kernelizable Primal-Dual Formulation of the Multilinear Singular Value Decompositionby Frederiek Wesel, Kim BatselierFirst submitted…
Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent…