Loading Now

Summary of Convolutional Neural Networks and Vision Transformers For Fashion Mnist Classification: a Literature Review, by Sonia Bbouzidi et al.


Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review

by Sonia Bbouzidi, Ghazala Hcini, Imen Jdey, Fadoua Drira

First submitted to arxiv on: 5 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper conducts a comparative analysis between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the domain of image classification, focusing on clothing classification within the e-commerce sector. The study utilizes the Fashion MNIST dataset to explore the unique attributes of CNNs and ViTs. While CNNs have traditionally excelled at image classification tasks, ViTs introduce a self-attention mechanism that enables nuanced weighting of input data components. The paper reviews existing literature to highlight the distinctions between ViTs and CNNs in image classification, examining state-of-the-art methodologies employing both architectures. Factors influencing performance include dataset characteristics, image dimensions, target classes, hardware infrastructure, architecture type, and top results.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper compares two types of artificial intelligence models, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), to see which one works best for classifying images of clothing. They use a dataset called Fashion MNIST that has lots of pictures of clothes. CNNs are good at recognizing small details, while ViTs are good at understanding the bigger picture. The study looks at what makes each type of model work well or not so well. By combining these two types of models, they might be able to create an even better one.

Keywords

» Artificial intelligence  » Classification  » Image classification  » Self attention