Loading Now

Summary of Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?, by Nicy Scaria et al.


Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?

by Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani

First submitted to arxiv on: 1 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty summary: This study explores the capabilities of Small Language Models (SLMs) in learning, retaining, and eliminating different types of noise in data. Four pre-trained SLMs with parameters ranging from 1-3 billion were used, including Olmo 1B, Qwen1.5 1.8B, Gemma 2B, and Phi2 2.7B. The models were instruction-tuned on noise-free data and tested using in-context examples to evaluate their ability to learn from noisy patterns. Results show that Olmo is highly sensitive to noise, quickly adapting to noisy patterns, while Phi2 resists learning character-level and transliteration noise due to its high-quality pretraining data. Gemma excels with transliteration noise, likely benefiting from its multilingual pretraining. The findings can be used to develop robust training strategies for SLMs.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty summary: This study looks at how Small Language Models (SLMs) deal with noisy data. Researchers tested four different SLMs on different types of noise and found that each model handled noise differently. One model, Olmo, was very sensitive to noise and learned quickly. Another model, Phi2, did well with certain kinds of noise because its training data was high-quality. The study’s results can help developers create better training strategies for SLMs.

Keywords

* Artificial intelligence  * Pretraining