Loading Now

Summary of Focus on the Core: Efficient Attention Via Pruned Token Compression For Document Classification, by Jungmin Yun et al.


Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

by Jungmin Yun, Mihyeon Kim, Youngbin Kim

First submitted to arxiv on: 3 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel approach to improve the performance and efficiency of transformer-based models for natural language processing (NLP) tasks is proposed. The method integrates token pruning and token combining strategies to reduce the computationally expensive self-attention mechanism in pre-trained transformers like BERT. Token pruning eliminates less important tokens, while fuzzy logic handles uncertainty and potential mispruning risks. Token combining condenses input sequences to further compress the model. Experimental results on various datasets demonstrate superior performance compared to baseline models, with a +5%p improvement in accuracy and +5.6%p in F1 score for the BERT model. Additionally, memory cost is reduced by 0.61x and speedup achieved by 1.64x.
Low GrooveSquid.com (original content) Low Difficulty Summary
Transformer-based models are great at many natural language processing tasks, but they can be slow and use a lot of computer power. To fix this, researchers came up with two ideas: token pruning and token combining. Token pruning gets rid of some of the less important words in the model, while fuzzy logic helps make sure it doesn’t accidentally remove important ones. Token combining takes long sentences and makes them shorter to help the model work faster. By using both of these techniques together, the researchers showed that they can make the models work better and use less computer power.

Keywords

» Artificial intelligence  » Bert  » F1 score  » Natural language processing  » Nlp  » Pruning  » Self attention  » Token  » Transformer