Summary of Compute Better Spent: Replacing Dense Layers with Structured Matrices, by Shikai Qiu et al.

Compute Better Spent: Replacing Dense Layers with Structured Matrices

by Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

First submitted to arxiv on: 10 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel approach to building more compute-efficient foundation models is proposed in this work, which identifies structured matrices as replacements for traditional dense matrices. The researchers show that different structures require unique initialization scales and learning rates, which are critical for performance. Building on insights from the Maximal Update Parameterization, they determine the optimal scaling for these unconventional layers. The study also explores the scaling laws of various structures, comparing their performance gains with increasing compute. A new matrix family, Monarch matrices, is introduced, along with the Block Tensor-Train (BTT) structure, which outperforms dense matrices on multiple tasks. Specifically, BTT achieves exponentially lower training loss than dense when training MLPs and ViTs on CIFAR-10/100 with augmentation, and matches dense ViT-S/32 performance on ImageNet-1k with 3.8 times less compute.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research explores ways to make big AI models more efficient. It shows that using special matrix structures instead of traditional ones can be a game-changer. The study finds that these new matrices need to be initialized and learned in specific ways, but when done correctly, they can outperform the old way. The researchers introduce a new type of matrix called Monarch, which works well for certain tasks. They also test another structure called Block Tensor-Train, which is better than traditional methods on some problems.

Keywords

» Artificial intelligence » Scaling laws » Vit

Compute Better Spent: Replacing Dense Layers with Structured Matrices

by Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Investigating Pre-training Objectives For Generalization in Vision-based Reinforcement Learning, by Donghu Kim et al.

Summary of Geometric Sparsification in Recurrent Neural Networks, by Wyatt Mackey et al.

Related Posts