Loading Now

Summary of Context-aware Clustering Using Large Language Models, by Sindhu Tipirneni et al.


Context-Aware Clustering using Large Language Models

by Sindhu Tipirneni, Ravinarayana Adkathimar, Nurendra Choudhary, Gaurush Hiranandani, Rana Ali Amjad, Vassilis N. Ioannidis, Changhe Yuan, Chandan K. Reddy

First submitted to arxiv on: 2 May 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed CACTUS approach leverages open-source Large Language Models (LLMs) for efficient and effective supervised clustering of entity subsets, particularly focusing on text-based entities. The model captures context via a scalable inter-entity attention mechanism and introduces an augmented triplet loss function tailored for supervised clustering. To improve generalization, the paper introduces a self-supervised clustering task based on text augmentation techniques. Experimental results demonstrate that CACTUS significantly outperforms existing unsupervised and supervised baselines under various external clustering evaluation metrics.
Low GrooveSquid.com (original content) Low Difficulty Summary
CACTUS is a new way to group similar texts together using language models. It uses open-source models that are less expensive and faster than powerful closed-source models. The approach captures the meaning of text by paying attention to how words relate to each other. It also learns from labeled data to improve its results. By comparing it to other methods, the paper shows that CACTUS is better at grouping texts together.

Keywords

» Artificial intelligence  » Attention  » Clustering  » Generalization  » Self supervised  » Supervised  » Triplet loss  » Unsupervised