Summary of Higen: Hierarchy-aware Sequence Generation For Hierarchical Text Classification, by Vidit Jain et al.
HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification
by Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang
First submitted to arxiv on: 24 Jan 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Hierarchical text classification (HTC) is a complex task that requires learning dynamic representations of documents to capture the relevance of different sections at various hierarchy levels. Traditional methods focus on static representations, which may not account for this variability. To address this limitation, researchers propose HiGen, a framework that uses language models to encode dynamic text representations. This approach incorporates a level-guided loss function and a task-specific pretraining strategy to enhance performance, particularly for classes with limited examples. The authors also present the ENZYME dataset, designed specifically for HTC, which consists of PubMed articles aiming to predict Enzyme Commission numbers. Experiments on this dataset, as well as WOS and NYT datasets, demonstrate superior performance of HiGen compared to existing methods while efficiently handling data imbalance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary HiGen is a new approach to hierarchical text classification that uses language models to create dynamic representations of documents. This helps the model learn how different parts of a document relate to each other at different levels in the hierarchy. The authors also developed a special dataset called ENZYME, which is designed specifically for this task and contains articles from PubMed about enzymes. By using HiGen, researchers can better classify text into different categories based on its content. |
Keywords
* Artificial intelligence * Loss function * Pretraining * Text classification