Summary of Efficient Adaptation Of Pre-trained Vision Transformer Via Householder Transformation, by Wei Dong et al.
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
by Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, Hengtao Shen
First submitted to arxiv on: 30 Oct 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel Parameter-Efficient Fine-Tuning (PEFT) approach is proposed for Vision Transformers (ViTs), which adapt pre-trained models to downstream tasks by learning a low-rank adaptation matrix. Unlike existing methods like LoRA and Adapter, this technique employs Singular Value Decomposition (SVD) to represent the adaptation matrix, allowing it to flexibly capture layer-wise variations. Householder transformations are used to construct orthogonal matrices that mimic unitary matrices, requiring only a vector. The approach learns diagonal values in a layer-wise manner, enabling the generation of adaptation matrices with varying ranks across different layers. This flexibility is demonstrated through promising fine-tuning performance on standard downstream vision tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper finds a new way to make pre-trained Vision Transformers work better for specific tasks. Instead of using the same approach for all layers, this method learns how to adapt each layer in its own special way. This is done by breaking down the adaptation process into smaller pieces and learning which parts need more or less adjustment. The result is that the fine-tuned model performs well on various downstream vision tasks. |
Keywords
» Artificial intelligence » Fine tuning » Lora » Low rank adaptation » Parameter efficient