Loading Now

Summary of Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in Cnns, by Xiaohan Ding et al.


Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs

by Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun

First submitted to arxiv on: 13 Mar 2022

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents a new approach to designing convolutional neural networks (CNNs) by using large kernel sizes instead of the traditional small kernel sizes. Inspired by vision transformers, the authors propose five guidelines for designing efficient high-performance CNNs with large kernels. They demonstrate the effectiveness of this approach by proposing RepLKNet, a pure CNN architecture that uses kernels as large as 31×31. This approach closes the performance gap between CNNs and vision transformers, achieving comparable or superior results on ImageNet and downstream tasks while maintaining lower latency. The study also reveals that large-kernel CNNs have larger effective receptive fields and higher shape bias compared to small-kernel CNNs.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new way of building computer vision models is presented in this paper. Instead of using many small pieces, they show that it’s better to use a few big pieces (like 31×31) to understand images. This helps the model learn more about shapes and less about textures. The authors tested their idea on some big datasets like ImageNet and ADE20K, and it worked really well. They even released the code and models online so others can try it out.

Keywords

* Artificial intelligence  * Cnn