Summary of Tnt-llm: Text Mining at Scale with Large Language Models, by Mengting Wan et al.

TnT-LLM: Text Mining at Scale with Large Language Models

by Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

First submitted to arxiv on: 18 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel two-phase framework, TnT-LLM, is proposed to automate the process of end-to-end label generation and assignment with minimal human effort. The approach leverages Large Language Models (LLMs) to induce and use pseudo labels, enabling zero-shot, multi-stage reasoning for iteratively producing and refining a label taxonomy. In the second phase, LLMs are used as data labelers to yield training samples for lightweight supervised classifiers. The framework is applied to the analysis of user intent and conversational domain for Bing Copilot, an open-domain chat-based search engine. Experimental results demonstrate that TnT-LLM generates more accurate and relevant label taxonomies compared to state-of-the-art baselines, achieving a favorable balance between accuracy and efficiency for classification at scale.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Transforming unstructured text into organized and meaningful forms is essential in text mining. Most existing methods rely on domain expertise and manual curation, making the process expensive and time-consuming. A new approach uses Large Language Models to automate label generation and assignment with minimal human effort. The framework has two phases: first, LLMs produce and refine a label taxonomy iteratively, then they’re used as data labelers for lightweight supervised classifiers. This method is applied to analyze user intent and conversational domain for Bing Copilot. The results show that this approach generates more accurate and relevant labels than existing methods.

Keywords

* Artificial intelligence * Classification * Supervised * Zero shot

TnT-LLM: Text Mining at Scale with Large Language Models

by Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Queryagent: a Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-correction, by Xiang Huang et al.

Summary of Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity From Clinical Notes: a Zero-shot Learning Approach, by Maria Mahbub et al.

Related Posts