Loading Now

Summary of Hdt: Hierarchical Document Transformer, by Haoyu He et al.


HDT: Hierarchical Document Transformer

by Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger

First submitted to arxiv on: 11 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The Hierarchical Document Transformer (HDT) is a novel architecture designed for structured hierarchical documents. This paper proposes a sparse Transformer model that leverages document structure for improved computational efficiency and memory utilization. The approach introduces auxiliary anchor tokens and redesigns attention mechanisms to facilitate information exchange between different levels of the hierarchy. To address the challenge of implementing sample-dependent hierarchical attention, the authors develop a novel sparse attention kernel. Experimental results show that utilizing structural information leads to faster convergence, higher sample efficiency, and better performance on downstream tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making computers understand documents better. Documents are important in fields like science, law, or medicine. Right now, computers have trouble using the structure of these documents to help them learn. The Hierarchical Document Transformer (HDT) is a new way for computers to process documents by using special tokens and rethinking how they pay attention to different parts of the document. This helps computers work more efficiently and make better decisions. The results show that this approach makes computers smarter and faster at learning from documents.

Keywords

* Artificial intelligence  * Attention  * Transformer