Summary of Mechanistic Design and Scaling Of Hybrid Architectures, by Michael Poli et al.
Mechanistic Design and Scaling of Hybrid Architecturesby Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash…
Mechanistic Design and Scaling of Hybrid Architecturesby Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash…
MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolationby Weiguo GaoFirst submitted to arxiv…
Leave No Patient Behind: Enhancing Medication Recommendation for Rare Disease Patientsby Zihao Zhao, Yi Jing,…
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Cachingby Youpeng Zhao, Di Wu, Jun…
Residual-based Language Models are Free Boosters for Biomedical Imagingby Zhixin Lai, Jing Wu, Suiyao Chen,…
Transcribing Bengali Text with Regional Dialects to IPA using District Guided Tokensby S M Jishanul…
Less Is More – On the Importance of Sparsification for Transformers and Graph Neural Networks…
LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecastingby Qinyao Luo, Silu…
A Transformer approach for Electricity Price Forecastingby Oscar Llorente, Jose PortelaFirst submitted to arxiv on:…
VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connectionsby Dongqi Fu, Zhigang Hua, Yan Xie, Jin…