Summary of Hymba: a Hybrid-head Architecture For Small Language Models, by Xin Dong et al.
Hymba: A Hybrid-head Architecture for Small Language Modelsby Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin…
Hymba: A Hybrid-head Architecture for Small Language Modelsby Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin…
Transformers with Sparse Attention for Granger Causalityby Riya Mahesh, Rahul Vashisht, Chandrashekar LakshminarayananFirst submitted to…
SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformersby Hojjat Karami, David Atienza, Anisoara…
UniFlow: A Foundation Model for Unified Urban Spatio-Temporal Flow Predictionby Yuan Yuan, Jingtao Ding, Chonghua…
Selective Attention: Enhancing Transformer through Principled Context Controlby Xuechen Zhang, Xiangyu Chang, Mingchen Li, Amit…
ULTra: Unveiling Latent Token Interpretability in Transformer Based Understandingby Hesam Hosseini, Ghazal Hosseini Mighan, Amirabbas…
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Modelsby Vinay Kumar Sankarapu, Chintan Chitroda,…
Ultra-Sparse Memory Networkby Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Zeng, Ran Guo,…
Comparing Prior and Learned Time Representations in Transformer Models of Timeseriesby Natalia Koliou, Tatiana Boura,…
Transformer Neural Processes - Kernel Regressionby Daniel Jenson, Jhonathan Navott, Mengyan Zhang, Makkunda Sharma, Elizaveta…