Summary of Merging Multi-task Models Via Weight-ensembling Mixture Of Experts, by Anke Tang et al.
Merging Multi-Task Models via Weight-Ensembling Mixture of Expertsby Anke Tang, Li Shen, Yong Luo, Nan…
Merging Multi-Task Models via Weight-Ensembling Mixture of Expertsby Anke Tang, Li Shen, Yong Luo, Nan…
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modelingby Mingze Wang, Weinan EFirst…
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabularyby Takashi MoritaFirst submitted to arxiv…
Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformersby Shan Zhao, Zhitong Xiong, Xiao Xiang ZhuFirst submitted…
Graph Transformers without Positional Encodingsby Ayush GargFirst submitted to arxiv on: 31 Jan 2024CategoriesMain: Machine…
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Alteringby Xiaopeng Li,…
Scavenging Hyena: Distilling Transformers into Long Convolution Modelsby Tokiniaina Raharison Ralambomihanta, Shahrad Mohammadzadeh, Mohammad Sami…
Retrieval Augmented Deep Anomaly Detection for Tabular Databy Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên…
Engineering A Large Language Model From Scratchby Abiodun Finbarrs OketunjiFirst submitted to arxiv on: 30…
Validation, Robustness, and Accuracy of Perturbation-Based Sensitivity Analysis Methods for Time-Series Deep Learning Modelsby Zhengguang…