Summary of Adaptive Large Language Models by Layerwise Attention Shortcuts, By Prateek Verma et al.
Adaptive Large Language Models By Layerwise Attention Shortcutsby Prateek Verma, Mert PilanciFirst submitted to arxiv…
Adaptive Large Language Models By Layerwise Attention Shortcutsby Prateek Verma, Mert PilanciFirst submitted to arxiv…
Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identificationby Jiaxing Xu, Kai He, Mengcheng…
Cross-lingual transfer of multilingual models on low resource African Languagesby Harish Thangaraj, Ananya Chenat, Jaskaran…
Kolmogorov-Arnold Transformerby Xingyi Yang, Xinchao WangFirst submitted to arxiv on: 16 Sep 2024CategoriesMain: Machine Learning…
Exploring Fine-tuned Generative Models for Keyphrase Selection: A Case Study for Russianby Anna Glazkova, Dmitry…
Mitigating Partial Observability in Adaptive Traffic Signal Control with Transformersby Xiaoyu Wang, Ayal Taitler, Scott…
Flash STU: Fast Spectral Transform Unitsby Y. Isabel Liu, Windsor Nguyen, Yagiz Devre, Evan Dogariu,…
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzlesby Kulin Shah, Nishanth…
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrievalby Di Liu, Meng Chen, Baotong Lu, Huiqiang…
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformersby Siyu Chen, Heejune Sheen,…