Loading Now

Summary of Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-even Performance, by Branislav Pecher et al.


Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

by Branislav Pecher, Ivan Srba, Maria Bielikova

First submitted to arxiv on: 20 Feb 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed research investigates how many labelled examples are needed for specialized small language models to outperform general large language models when solving NLP tasks with limited data. By analyzing the behavior of fine-tuning, instruction-tuning, prompting, and in-context learning on 7 language models across 8 text classification tasks, the study identifies the performance break-even points while considering performance variance. The results show that specialized models often require only a few samples (average 10-1000) to match or surpass general ones. However, the number of required labels strongly depends on dataset and task characteristics, with multi-class datasets requiring fewer labels (up to 100) compared to binary datasets (up to 5000). When performance variance is considered, the number of required labels increases by 100-200% on average, with some cases reaching up to 1500%. This research provides valuable insights for developers and researchers working in the field of NLP.
Low GrooveSquid.com (original content) Low Difficulty Summary
This study helps us understand how to use small language models to solve big problems when we don’t have a lot of labeled data. The researchers looked at how seven different language models performed on eight text classification tasks and found that they often need only a few examples (around 10-1000) to do just as well as bigger language models. But the number of examples needed depends on what kind of task we’re trying to solve, with some tasks needing much fewer examples than others. This study helps us figure out how many labeled examples we really need for our language models to work well.

Keywords

* Artificial intelligence  * Fine tuning  * Instruction tuning  * Nlp  * Prompting  * Text classification