Summary of Mixture Of Experts Meets Prompt-based Continual Learning, by Minh Le et al.
Mixture of Experts Meets Prompt-Based Continual Learningby Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen,…
Mixture of Experts Meets Prompt-Based Continual Learningby Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen,…
Attention as an RNNby Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio,…
Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learningby Prashant Bhat, Bharath Renjith, Elahe…
DCT-Based Decorrelated Attention for Vision Transformersby Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet…
A Transformer variant for multi-step forecasting of water level and hydrometeorological sensitivity analysis based on…
Generalized Laplace Approximationby Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng LimFirst submitted to…
FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecastingby Ruiqi Li, Maowei Jiang, Kai…
Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and…
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculumby Hadi Pouransari, Chun-Liang Li, Jen-Hao…
Reducing Transformer Key-Value Cache Size with Cross-Layer Attentionby William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar…