Loading Now

Summary of Visual Fourier Prompt Tuning, by Runjia Zeng et al.


Visual Fourier Prompt Tuning

by Runjia Zeng, Cheng Han, Qifan Wang, Chunshu Wu, Tong Geng, Lifu Huang, Ying Nian Wu, Dongfang Liu

First submitted to arxiv on: 2 Nov 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes Visual Fourier Prompt Tuning (VFPT), a novel method for adapting large-scale transformer-based models to new tasks. The approach addresses the problem of performance degradation caused by disparities between pretraining and finetuning datasets. VFPT incorporates the Fast Fourier Transform into prompt embeddings, considering spatial and frequency domain information. This method outperforms current state-of-the-art baselines on two benchmarks with low parameter usage (0.57% of model parameters) and notable performance enhancements (73.20% mean accuracy). The paper’s contribution is a general solution to dataset challenges, applicable regardless of data disparities.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper introduces Visual Fourier Prompt Tuning (VFPT), a way to adapt large models for new tasks. It solves a big problem where the model doesn’t work well when the training and test datasets are different. VFPT uses special math called Fast Fourier Transform to help the model learn from both spatial and frequency information. This approach works better than current methods on two tests, using fewer model parameters (0.57%) and getting better results (73.20% accuracy). The paper’s discovery is a solution that can be used in many situations.

Keywords

» Artificial intelligence  » Pretraining  » Prompt  » Transformer