Loading Now

Summary of On Vision Transformers For Classification Tasks in Side-scan Sonar Imagery, by Bw Sheffield et al.


On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery

by BW Sheffield, Jeffrey Ellen, Ben Whitmore

First submitted to arxiv on: 18 Sep 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the use of Vision Transformers (ViTs) for classifying man-made objects on the seafloor using side-scan sonar (SSS) imagery. Current approaches rely on Convolutional Neural Networks (CNNs) with hand-crafted features, which can struggle with diverse seafloor textures and result in high false positive rates. ViTs, utilizing self-attention mechanisms to capture global information, offer greater flexibility in processing spatial hierarchies. The paper compares the performance of ViT models with CNN architectures like ResNet and ConvNext for binary classification tasks in SSS imagery. The dataset encompasses diverse geographical seafloor types and is balanced between object presence and absence. Results show superior classification performance by ViTs across f1-score, precision, recall, and accuracy metrics, although at the cost of increased computational resources.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at using special computer models called Vision Transformers to help identify objects on the ocean floor from sonar images. Right now, people use a different type of model called Convolutional Neural Networks, which can be tricky when dealing with different types of seafloor textures. The new models might do better because they can look at the whole image and not just small parts like the old models do. The researchers tested these new models on a big dataset that includes lots of different kinds of ocean floor and found that they did really well, but it took more computer power to make them work.

Keywords

» Artificial intelligence  » Classification  » Cnn  » F1 score  » Precision  » Recall  » Resnet  » Self attention  » Vit