Summary of Higen: Hierarchy-aware Sequence Generation For Hierarchical Text Classification, by Vidit Jain et al.

HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification

by Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang

First submitted to arxiv on: 24 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Hierarchical text classification (HTC) is a complex task that requires learning dynamic representations of documents to capture the relevance of different sections at various hierarchy levels. Traditional methods focus on static representations, which may not account for this variability. To address this limitation, researchers propose HiGen, a framework that uses language models to encode dynamic text representations. This approach incorporates a level-guided loss function and a task-specific pretraining strategy to enhance performance, particularly for classes with limited examples. The authors also present the ENZYME dataset, designed specifically for HTC, which consists of PubMed articles aiming to predict Enzyme Commission numbers. Experiments on this dataset, as well as WOS and NYT datasets, demonstrate superior performance of HiGen compared to existing methods while efficiently handling data imbalance.
Low	GrooveSquid.com (original content)	Low Difficulty Summary HiGen is a new approach to hierarchical text classification that uses language models to create dynamic representations of documents. This helps the model learn how different parts of a document relate to each other at different levels in the hierarchy. The authors also developed a special dataset called ENZYME, which is designed specifically for this task and contains articles from PubMed about enzymes. By using HiGen, researchers can better classify text into different categories based on its content.

Keywords

* Artificial intelligence * Loss function * Pretraining * Text classification

HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification

by Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Enhancing Stochastic Gradient Descent: a Unified Framework and Novel Acceleration Methods For Faster Convergence, by Yichuan Deng et al.

Summary of Multiverse: Exposing Large Language Model Alignment Problems in Diverse Worlds, by Xiaolong Jin et al.

Related Posts