Loading Now

Summary of Unidec : Unified Dual Encoder and Classifier Training For Extreme Multi-label Classification, by Siddhant Kharbanda et al.


UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification

by Siddhant Kharbanda, Devaansh Gupta, Gururaj K, Pankaj Malhotra, Amit Singh, Cho-Jui Hsieh, Rohit Babbar

First submitted to arxiv on: 4 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed UniDEC framework achieves state-of-the-art results in Extreme Multi-label Classification (XMC) while significantly reducing computational costs. Conventional methods utilize dual encoders and one-vs-all classifiers, but these models are computationally expensive, requiring up to 16 GPUs for training on large datasets. To address this issue, the authors develop a loss-independent, end-to-end trainable framework that trains the dual encoder and classifier together, using a multi-class loss. The proposed pick-some-label (PSL) reduction reduces computational cost by 4-16x by calculating the loss on a subset of positive and negative labels. This framework achieves state-of-the-art results on datasets with millions of labels while being computationally efficient and resource-friendly.
Low GrooveSquid.com (original content) Low Difficulty Summary
Extreme Multi-label Classification is a problem that predicts a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. The existing methods use dual encoders to embed the queries and label texts and one-vs-all classifiers to rerank the shortlisted labels. However, these methods are computationally expensive, requiring many GPUs for training on large datasets. The UniDEC framework is a new approach that trains the dual encoder and classifier together in a unified manner with a multi-class loss, reducing computational cost by 4-16x.

Keywords

» Artificial intelligence  » Classification  » Encoder