Summary of Local to Global: Learning Dynamics and Effect Of Initialization For Transformers, by Ashok Vardhan Makkuva et al.
Local to Global: Learning Dynamics and Effect of Initialization for Transformersby Ashok Vardhan Makkuva, Marco…
Local to Global: Learning Dynamics and Effect of Initialization for Transformersby Ashok Vardhan Makkuva, Marco…
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformersby Brian K Chen, Tianyang…
Multi-layer Learnable Attention Mask for Multimodal Tasksby Wayner Barrios, SouYoung JinFirst submitted to arxiv on:…
Short-term Inland Vessel Trajectory Prediction with Encoder-Decoder Modelsby Kathrin Donandt, Karim Böttger, Dirk SöffkerFirst submitted…
Improved context-sensitive transformer model for inland vessel trajectory predictionby Kathrin Donandt, Karim Böttger, Dirk SöffkerFirst…
Block Transformer: Global-to-Local Language Modeling for Fast Inferenceby Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik…
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Taskby Siavash Golkar, Alberto Bietti,…
A Temporal Kolmogorov-Arnold Transformer for Time Series Forecastingby Remi Genet, Hugo InzirilloFirst submitted to arxiv…
Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniquesby Shwai He, Daize Dong,…
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasksby Tianyu…