Summary of On the Necessity Of World Knowledge For Mitigating Missing Labels in Extreme Classification, by Jatin Prakash et al.
On the Necessity of World Knowledge for Mitigating Missing Labels in Extreme Classification
by Jatin Prakash, Anirudh Buvanesh, Bishal Santra, Deepak Saini, Sachin Yadav, Jian Jiao, Yashoteja Prabhu, Amit Sharma, Manik Varma
First submitted to arxiv on: 18 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Extreme Classification (XC) framework aims to map queries to the most relevant documents from a very large document set. Typically, XC algorithms learn this mapping from datasets curated from implicit feedback such as user clicks. However, these datasets inevitably suffer from missing labels. This paper formally shows that systematic missing labels lead to missing knowledge, which is critical for accurately modeling relevance between queries and documents. To address this issue, the authors propose SKIM (Scalable Knowledge Infusion for Missing Labels), an algorithm that leverages a combination of small Language Models (LM) and abundant unstructured meta-data to effectively mitigate the missing label problem. The proposed method outperforms existing methods on large-scale public datasets through exhaustive unbiased evaluation, including human annotations and simulations inspired from industrial settings. SKIM also scales to proprietary query-ad retrieval datasets containing 10 million documents, outperforming contemporary methods by 12% in offline evaluation and increasing ad click-yield by 1.23% in an online A/B test conducted on a popular search engine. The authors release their code, prompts, trained XC models, and fine-tuned SLMs at: https://github.com/bicycleman15/skim |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Extreme Classification (XC) is a way to find the most relevant documents from a huge library. Usually, algorithms learn how to do this by looking at what people click on. But sometimes these datasets don’t have all the labels they need. This paper shows that when this happens, it’s like missing important information that helps us understand how good of a match something is. To fix this, the authors suggest a new way called SKIM (Scalable Knowledge Infusion for Missing Labels) that uses small language models and extra data to help fill in the gaps. This method works really well on big datasets and even beats other methods when tested with real-world data. |
Keywords
» Artificial intelligence » Classification