Loading Now

Summary of Higen: Hierarchy-aware Sequence Generation For Hierarchical Text Classification, by Vidit Jain et al.


HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification

by Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang

First submitted to arxiv on: 24 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Hierarchical text classification (HTC) is a complex task that requires learning dynamic representations of documents to capture the relevance of different sections at various hierarchy levels. Traditional methods focus on static representations, which may not account for this variability. To address this limitation, researchers propose HiGen, a framework that uses language models to encode dynamic text representations. This approach incorporates a level-guided loss function and a task-specific pretraining strategy to enhance performance, particularly for classes with limited examples. The authors also present the ENZYME dataset, designed specifically for HTC, which consists of PubMed articles aiming to predict Enzyme Commission numbers. Experiments on this dataset, as well as WOS and NYT datasets, demonstrate superior performance of HiGen compared to existing methods while efficiently handling data imbalance.
Low GrooveSquid.com (original content) Low Difficulty Summary
HiGen is a new approach to hierarchical text classification that uses language models to create dynamic representations of documents. This helps the model learn how different parts of a document relate to each other at different levels in the hierarchy. The authors also developed a special dataset called ENZYME, which is designed specifically for this task and contains articles from PubMed about enzymes. By using HiGen, researchers can better classify text into different categories based on its content.

Keywords

* Artificial intelligence  * Loss function  * Pretraining  * Text classification