Loading Now

Summary of Seven: Pruning Transformer Model by Reserving Sentinels, By Jinying Xiao et al.


SEVEN: Pruning Transformer Model by Reserving Sentinels

by Jinying Xiao, Ping Li, Jie Nie, Zhe Tang

First submitted to arxiv on: 19 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel approach to pruning large-scale Transformer models, which have shown excellent performance across various tasks. However, their significant parameter size hinders their applicability on mobile devices. The authors argue that existing pruning methods tend to retain weights with larger gradient noise due to the dynamic and intricate nature of gradients on TMs compared to Convolutional Neural Networks. They introduce Symbolic Descent (SD), a general approach for training and fine-tuning TMs, and utilize it to describe noisy batch gradient sequences on TMs through the cumulative process of SD. The authors then develop a method called SEVEN that dynamically assesses the importance scores of weights based on their consistently high sensitivity, i.e., small gradient noise. SEVEN tends to preserve these weights, leading to improved pruning results. Extensive experiments are conducted on various TMs in natural language, question-answering, and image classification domains to validate the effectiveness of SEVEN.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making big computer models smaller so they can work better on phones and other devices. Right now, these big models are really good at doing tasks like understanding language and recognizing pictures, but they take up too much space and use too many resources. The authors of this paper want to fix that by developing a new way to make the models smaller without losing their ability to do these tasks well. They test their approach on different types of computer models and show that it works really well in lots of situations.

Keywords

* Artificial intelligence  * Fine tuning  * Image classification  * Pruning  * Question answering  * Transformer