Summary of White-box Transformers Via Sparse Rate Reduction: Compression Is All There Is?, by Yaodong Yu et al.
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
by Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma
First submitted to arxiv on: 22 Nov 2023
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed paper explores a new perspective on representation learning, where the goal is to compress and transform data distributions towards low-dimensional Gaussian mixtures. The authors introduce a measure called sparse rate reduction, which simultaneously maximizes information gain and sparsity of learned representations. This leads to a family of transformer-like architectures, named CRATE, that are mathematically fully interpretable. Experiments show that these networks learn to compress and sparsify large-scale image and text datasets, achieving performance close to highly engineered models like ViT, MAE, DINO, BERT, and GPT2. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper talks about a new way to think about learning representations from data. Instead of just using complicated computer programs, it suggests that we should try to make the data simpler and easier to understand. The authors create a special kind of computer program called CRATE that can do this, and they show that it works well on big datasets of images and text. This is important because it could help us understand how our brains work when we learn new things. |
Keywords
* Artificial intelligence * Bert * Mae * Representation learning * Transformer * Vit