Summary of Pre-training Of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images, by Jen Hong Tan

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

by Jen Hong Tan

First submitted to arxiv on: 6 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The research paper investigates whether a lightweight Vision Transformer (ViT) can outperform Convolutional Neural Networks (CNNs) like ResNet on small image datasets. The study finds that a pure ViT, pre-trained using a masked auto-encoder technique and minimal image scaling, can achieve superior performance. This is demonstrated through experiments on the CIFAR-10 and CIFAR-100 datasets, which involve ViT models with fewer than 3.65 million parameters and a multiply-accumulate (MAC) count below 0.27G. The results show that this lightweight transformer-based architecture achieves state-of-the-art performance without scaling up images from the datasets.
Low	GrooveSquid.com (original content)	Low Difficulty Summary A team of researchers looked at how well a special kind of computer program, called a Vision Transformer, can work with small images. They wanted to know if this type of program could be as good as or even better than other types of programs that are commonly used for image recognition tasks. The answer is yes! By training the Vision Transformer in a special way and using it on small image datasets, the researchers were able to get great results without having to use a lot of computer power.

Keywords

* Artificial intelligence * Encoder * Resnet * Transformer * Vision transformer * Vit

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

by Jen Hong Tan

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Sub-play: Adversarial Policies Against Partially Observed Multi-agent Reinforcement Learning Systems, by Oubo Ma et al.

Summary of Efficient Generation Of Hidden Outliers For Improved Outlier Detection, by Jose Cribeiro-ramallo et al.

Related Posts