Summary of A Law Of Next-token Prediction in Large Language Models, by Hangfeng He et al.
A Law of Next-Token Prediction in Large Language Modelsby Hangfeng He, Weijie J. SuFirst submitted…
A Law of Next-Token Prediction in Large Language Modelsby Hangfeng He, Weijie J. SuFirst submitted…
MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruningby Seungbeom Hu, ChanJun Park, Andrew…
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Timeby Yingyu Liang, Zhizhou Sha, Zhenmei…
BankTweak: Adversarial Attack against Multi-Object Trackers by Manipulating Feature Banksby Woojin Shin, Donghwa Kang, Daejin…
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformersby Sayed Mohammad Vakilzadeh…
Jamba-1.5: Hybrid Transformer-Mamba Models at Scaleby Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom…
AI-driven Transformer Model for Fault Prediction in Non-Linear Dynamic Automotive Systemby Priyanka KumarFirst submitted to…
Deep Analysis of Time Series Data for Smart Grid Startup Strategies: A Transformer-LSTM-PSO Model Approachby…
Transformers are Minimax Optimal Nonparametric In-Context Learnersby Juno Kim, Tai Nakamaki, Taiji SuzukiFirst submitted to…
A Benchmark for AI-based Weather Data Assimilationby Wuxin Wang, Weicheng Ni, Tao Han, Taikang Yuan,…