Summary of Osscar: One-shot Structured Pruning in Vision and Language Models with Combinatorial Optimization, by Xiang Meng et al.

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

by Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia Ponomareva, Rahul Mazumder

First submitted to arxiv on: 2 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Structured pruning is a technique to reduce inference costs for large vision and language models by removing carefully chosen structures. This paper focuses on one-shot (post-training) structured pruning, which doesn’t require model retraining after pruning. The authors propose a novel combinatorial optimization framework based on layer-wise reconstruction objectives and scalable optimization. They also design a local combinatorial optimization algorithm that uses low-rank updates for efficient local search. The framework is time- and memory-efficient, improving upon state-of-the-art one-shot methods on vision models like ResNet50 and MobileNet, as well as language models like OPT-1.3B to OPT-30B. For example, the framework achieves 125lower test perplexity on WikiText with a 2inference time speedup compared to ZipLM. The authors’ work considers models with tens of billions of parameters, which is up to 100larger than previous structured pruning literature.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper talks about making big computer models smaller and faster without losing their ability to understand language or see images. The idea is to carefully remove parts of the model that aren’t as important, so it can process information more quickly. The authors developed a new way to do this called “structured pruning” that doesn’t require the model to be retrained after making changes. This approach can make models with tens of billions of parameters run up to 100 times faster while still being just as good at understanding language or recognizing images.

Keywords

* Artificial intelligence * Inference * One shot * Optimization * Perplexity * Pruning

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

by Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia Ponomareva, Rahul Mazumder

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Sample Complexity Of Offline Distributionally Robust Linear Markov Decision Processes, by He Wang et al.

Summary of Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure Of Data, by Divyansh Singhvi et al.

Related Posts