Summary of Efficient Adaptation Of Pre-trained Vision Transformer Via Householder Transformation, by Wei Dong et al.

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

by Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, Hengtao Shen

First submitted to arxiv on: 30 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel Parameter-Efficient Fine-Tuning (PEFT) approach is proposed for Vision Transformers (ViTs), which adapt pre-trained models to downstream tasks by learning a low-rank adaptation matrix. Unlike existing methods like LoRA and Adapter, this technique employs Singular Value Decomposition (SVD) to represent the adaptation matrix, allowing it to flexibly capture layer-wise variations. Householder transformations are used to construct orthogonal matrices that mimic unitary matrices, requiring only a vector. The approach learns diagonal values in a layer-wise manner, enabling the generation of adaptation matrices with varying ranks across different layers. This flexibility is demonstrated through promising fine-tuning performance on standard downstream vision tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper finds a new way to make pre-trained Vision Transformers work better for specific tasks. Instead of using the same approach for all layers, this method learns how to adapt each layer in its own special way. This is done by breaking down the adaptation process into smaller pieces and learning which parts need more or less adjustment. The result is that the fine-tuned model performs well on various downstream vision tasks.

Keywords

* Artificial intelligence * Fine tuning * Lora * Low rank adaptation * Parameter efficient

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

by Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, Hengtao Shen

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Backdoor Attack Against Vision Transformers Via Attention Gradient-based Image Erosion, by Ji Guo et al.

Summary of Vl-cache: Sparsity and Modality-aware Kv Cache Compression For Vision-language Model Inference Acceleration, by Dezhan Tu et al.

Related Posts