Summary of Decomposable Transformer Point Processes, by Aristeidis Panos
Decomposable Transformer Point Processesby Aristeidis PanosFirst submitted to arxiv on: 26 Sep 2024CategoriesMain: Machine Learning…
Decomposable Transformer Point Processesby Aristeidis PanosFirst submitted to arxiv on: 26 Sep 2024CategoriesMain: Machine Learning…
A multi-source data power load forecasting method using attention mechanism-based parallel cnn-gruby Chao Min, Yijia…
HydraViT: Stacking Heads for a Scalable ViTby Janek Haberer, Ali Hojjat, Olaf LandsiedelFirst submitted to…
Benign Overfitting in Token Selection of Attention Mechanismby Keitaro Sakamoto, Issei SatoFirst submitted to arxiv…
CASPFormer: Trajectory Prediction from BEV Images with Deformable Attentionby Harsh Yadav, Maximilian Schaefer, Kun Zhao,…
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reductionby Zhenmei…
Non-asymptotic Convergence of Training Transformers for Next-token Predictionby Ruiquan Huang, Yingbin Liang, Jing YangFirst submitted…
INT-FlashAttention: Enabling Flash Attention for INT8 Quantizationby Shimao Chen, Zirui Liu, Zhiying Wu, Ce Zheng,…
Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patternsby Yang Zhao, Li…
Trajectory Anomaly Detection with Language Modelsby Jonathan Mbuya, Dieter Pfoser, Antonios AnastasopoulosFirst submitted to arxiv…