Summary of Tnt-llm: Text Mining at Scale with Large Language Models, by Mengting Wan et al.
TnT-LLM: Text Mining at Scale with Large Language Models
by Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan
First submitted to arxiv on: 18 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel two-phase framework, TnT-LLM, is proposed to automate the process of end-to-end label generation and assignment with minimal human effort. The approach leverages Large Language Models (LLMs) to induce and use pseudo labels, enabling zero-shot, multi-stage reasoning for iteratively producing and refining a label taxonomy. In the second phase, LLMs are used as data labelers to yield training samples for lightweight supervised classifiers. The framework is applied to the analysis of user intent and conversational domain for Bing Copilot, an open-domain chat-based search engine. Experimental results demonstrate that TnT-LLM generates more accurate and relevant label taxonomies compared to state-of-the-art baselines, achieving a favorable balance between accuracy and efficiency for classification at scale. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Transforming unstructured text into organized and meaningful forms is essential in text mining. Most existing methods rely on domain expertise and manual curation, making the process expensive and time-consuming. A new approach uses Large Language Models to automate label generation and assignment with minimal human effort. The framework has two phases: first, LLMs produce and refine a label taxonomy iteratively, then they’re used as data labelers for lightweight supervised classifiers. This method is applied to analyze user intent and conversational domain for Bing Copilot. The results show that this approach generates more accurate and relevant labels than existing methods. |
Keywords
» Artificial intelligence » Classification » Supervised » Zero shot