Summary of Hdt: Hierarchical Document Transformer, by Haoyu He et al.
HDT: Hierarchical Document Transformerby Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas GeigerFirst submitted…
HDT: Hierarchical Document Transformerby Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas GeigerFirst submitted…
Transformer Block Coupling and its Correlation with Generalization in LLMsby Murdock Aubry, Haoming Meng, Anton…
Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformersby Cody Wild, Jesper AndersonFirst submitted to arxiv…
Toto: Time Series Optimized Transformer for Observabilityby Ben Cohen, Emaad Khwaja, Kan Wang, Charles Masson,…
Teaching Transformers Causal Reasoning through Axiomatic Trainingby Aniket Vashishtha, Abhinav Kumar, Abbavaram Gowtham Reddy, Vineeth…
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmeticby Ruochen Jin, Bojian Hou, Jiancong…
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillationby Liqun Ma, Mingjie Sun,…
A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Datasetby Gyeong…
SOLO: A Single Transformer for Scalable Vision-Language Modelingby Yangyi Chen, Xingyao Wang, Hao Peng, Heng…
Self-supervised Pretraining for Partial Differential Equationsby Varun Madhavan, Amal S Sebastian, Bharath Ramsundar, Venkatasubramanian ViswanathanFirst…