Summary of Hdt: Hierarchical Document Transformer, by Haoyu He et al.
HDT: Hierarchical Document Transformer
by Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger
First submitted to arxiv on: 11 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The Hierarchical Document Transformer (HDT) is a novel architecture designed for structured hierarchical documents. This paper proposes a sparse Transformer model that leverages document structure for improved computational efficiency and memory utilization. The approach introduces auxiliary anchor tokens and redesigns attention mechanisms to facilitate information exchange between different levels of the hierarchy. To address the challenge of implementing sample-dependent hierarchical attention, the authors develop a novel sparse attention kernel. Experimental results show that utilizing structural information leads to faster convergence, higher sample efficiency, and better performance on downstream tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making computers understand documents better. Documents are important in fields like science, law, or medicine. Right now, computers have trouble using the structure of these documents to help them learn. The Hierarchical Document Transformer (HDT) is a new way for computers to process documents by using special tokens and rethinking how they pay attention to different parts of the document. This helps computers work more efficiently and make better decisions. The results show that this approach makes computers smarter and faster at learning from documents. |
Keywords
* Artificial intelligence * Attention * Transformer