Summary of Improving Transformers with Dynamically Composable Multi-head Attention, by Da Xiao et al.
Improving Transformers with Dynamically Composable Multi-Head Attentionby Da Xiao, Qingye Meng, Shengping Li, Xingyuan YuanFirst…
Improving Transformers with Dynamically Composable Multi-Head Attentionby Da Xiao, Qingye Meng, Shengping Li, Xingyuan YuanFirst…
Transformer tricks: Removing weights for skipless transformersby Nils GraefFirst submitted to arxiv on: 18 Apr…
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Modelsby Zhiyuan Peng, Xuyang…
TDANet: A Novel Temporal Denoise Convolutional Neural Network With Attention for Fault Diagnosisby Zhongzhi Li,…
Attention is all you need for boosting graph convolutional neural networkby Yinwei WuFirst submitted to…
TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformerby Eunjee Choi, Jong-Kook KimFirst submitted to…
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMsby Ben Athiwaratkun, Sujan Kumar…
Gujarati-English Code-Switching Speech Recognition using ensemble prediction of spoken languageby Yash Sharma, Basil Abraham, Preethi…
A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directionsby Quoc-Vinh Lai-DangFirst…
Interactive Continual Learning: Fast and Slow Thinkingby Biqing Qi, Xingquan Chen, Junqi Gao, Dong Li,…