Summary of Compute Better Spent: Replacing Dense Layers with Structured Matrices, by Shikai Qiu et al.
Compute Better Spent: Replacing Dense Layers with Structured Matrices
by Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson
First submitted to arxiv on: 10 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel approach to building more compute-efficient foundation models is proposed in this work, which identifies structured matrices as replacements for traditional dense matrices. The researchers show that different structures require unique initialization scales and learning rates, which are critical for performance. Building on insights from the Maximal Update Parameterization, they determine the optimal scaling for these unconventional layers. The study also explores the scaling laws of various structures, comparing their performance gains with increasing compute. A new matrix family, Monarch matrices, is introduced, along with the Block Tensor-Train (BTT) structure, which outperforms dense matrices on multiple tasks. Specifically, BTT achieves exponentially lower training loss than dense when training MLPs and ViTs on CIFAR-10/100 with augmentation, and matches dense ViT-S/32 performance on ImageNet-1k with 3.8 times less compute. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research explores ways to make big AI models more efficient. It shows that using special matrix structures instead of traditional ones can be a game-changer. The study finds that these new matrices need to be initialized and learned in specific ways, but when done correctly, they can outperform the old way. The researchers introduce a new type of matrix called Monarch, which works well for certain tasks. They also test another structure called Block Tensor-Train, which is better than traditional methods on some problems. |
Keywords
» Artificial intelligence » Scaling laws » Vit