Summary of Siamese Transformer Networks For Few-shot Image Classification, by Weihao Jiang et al.
Siamese Transformer Networks for Few-shot Image Classification
by Weihao Jiang, Shuoxi Zhang, Kun He
First submitted to arxiv on: 16 Jul 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed novel approach uses a Siamese Transformer Network (STN) with two parallel branch networks, each utilizing the pre-trained Vision Transformer (ViT) architecture. The first branch extracts global features while the second branch focuses on local features. By applying Euclidean distance to global features and Kullback-Leibler divergence to local features, followed by L2 normalization and weighted combination, this method leverages both feature types for few-shot image classification. A meta-learning approach is used during training to fine-tune the network. This simple yet effective framework outperforms state-of-the-art baselines on four popular benchmarks in 5-shot and 1-shot scenarios. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way of recognizing images has been discovered! Humans are good at classifying pictures they’ve never seen before, even if it’s just a few examples to work with. This is because we can focus on small details and find similarities between old and new images. Computer scientists have created a new method that combines two types of features – global (big picture) and local (small details) – to help machines do the same thing. They used a special computer program called Siamese Transformer Network, which had two parts working together. This allowed them to look at both big and small things in an image to figure out what it is. The new method works really well on tests and is better than other ways computers have tried before. |
Keywords
» Artificial intelligence » 1 shot » Euclidean distance » Few shot » Image classification » Meta learning » Transformer » Vision transformer » Vit