Summary of Are Self-attentions Effective For Time Series Forecasting?, by Dongbin Kim et al.
Are Self-Attentions Effective for Time Series Forecasting?by Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki KimFirst…
Are Self-Attentions Effective for Time Series Forecasting?by Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki KimFirst…
Demystifying amortized causal discovery with transformersby Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco LocatelloFirst submitted…
Automatic Domain Adaptation by Transformers in In-Context Learningby Ryuichiro Hataya, Kota Matsui, Masaaki ImaizumiFirst submitted…
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capabilityby Chenyu Zheng, Wei Huang, Rongzhen Wang,…
Zamba: A Compact 7B SSM Hybrid Modelby Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington,…
Amortized Active Causal Induction with Deep Reinforcement Learningby Yashas Annadani, Panagiotis Tigas, Stefan Bauer, Adam…
Disentangling and Integrating Relational and Sensory Information in Transformer Architecturesby Awni Altabaa, John LaffertyFirst submitted…
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspectiveby Akiyoshi Tomihari, Issei SatoFirst submitted…
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representationby Yeachan Park, Minseok Kim, Yeoneung…
Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learningby Chun-Kai Huang, Yi-Hsien…