Summary of Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in Cnns, by Xiaohan Ding et al.
Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs
by Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun
First submitted to arxiv on: 13 Mar 2022
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper presents a new approach to designing convolutional neural networks (CNNs) by using large kernel sizes instead of the traditional small kernel sizes. Inspired by vision transformers, the authors propose five guidelines for designing efficient high-performance CNNs with large kernels. They demonstrate the effectiveness of this approach by proposing RepLKNet, a pure CNN architecture that uses kernels as large as 31×31. This approach closes the performance gap between CNNs and vision transformers, achieving comparable or superior results on ImageNet and downstream tasks while maintaining lower latency. The study also reveals that large-kernel CNNs have larger effective receptive fields and higher shape bias compared to small-kernel CNNs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way of building computer vision models is presented in this paper. Instead of using many small pieces, they show that it’s better to use a few big pieces (like 31×31) to understand images. This helps the model learn more about shapes and less about textures. The authors tested their idea on some big datasets like ImageNet and ADE20K, and it worked really well. They even released the code and models online so others can try it out. |
Keywords
* Artificial intelligence * Cnn