Summary of Block Selective Reprogramming For On-device Training Of Vision Transformers, by Sreetama Sarkar et al.
Block Selective Reprogramming for On-device Training of Vision Transformers
by Sreetama Sarkar, Souvik Kundu, Kai Zheng, Peter A. Beerel
First submitted to arxiv on: 25 Mar 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel on-device fine-tuning method called block selective reprogramming (BSR) to overcome memory and computation challenges when training vision transformers (ViTs) for personalized learning applications. BSR selectively drops tokens based on self-attention scores of frozen layers, reducing the memory required for training while maintaining similar accuracy. The approach is evaluated on ViT-B and DeiT-S models with five different datasets, demonstrating up to 1.4x reduction in training memory and up to 2x reduction in compute cost compared to existing alternatives. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper helps solve a big problem: how to train special types of computer vision models (ViTs) on devices like smartphones or smart glasses, where there’s limited space and power. The researchers came up with a new way to do this called block selective reprogramming (BSR). It works by only updating certain parts of the model and skipping over others that don’t need to be changed. This makes it faster and uses less memory than other methods. The team tested their approach on several types of models and datasets, showing that it can make training up to 1.4 times faster and use up to 2 times less power. |
Keywords
» Artificial intelligence » Fine tuning » Self attention » Vit