Summary of Focus on the Core: Efficient Attention Via Pruned Token Compression For Document Classification, by Jungmin Yun et al.
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
by Jungmin Yun, Mihyeon Kim, Youngbin Kim
First submitted to arxiv on: 3 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel approach to improve the performance and efficiency of transformer-based models for natural language processing (NLP) tasks is proposed. The method integrates token pruning and token combining strategies to reduce the computationally expensive self-attention mechanism in pre-trained transformers like BERT. Token pruning eliminates less important tokens, while fuzzy logic handles uncertainty and potential mispruning risks. Token combining condenses input sequences to further compress the model. Experimental results on various datasets demonstrate superior performance compared to baseline models, with a +5%p improvement in accuracy and +5.6%p in F1 score for the BERT model. Additionally, memory cost is reduced by 0.61x and speedup achieved by 1.64x. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Transformer-based models are great at many natural language processing tasks, but they can be slow and use a lot of computer power. To fix this, researchers came up with two ideas: token pruning and token combining. Token pruning gets rid of some of the less important words in the model, while fuzzy logic helps make sure it doesn’t accidentally remove important ones. Token combining takes long sentences and makes them shorter to help the model work faster. By using both of these techniques together, the researchers showed that they can make the models work better and use less computer power. |
Keywords
» Artificial intelligence » Bert » F1 score » Natural language processing » Nlp » Pruning » Self attention » Token » Transformer