Summary of Randar: Decoder-only Autoregressive Visual Generation in Random Orders, by Ziqi Pang et al.
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
by Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang
First submitted to arxiv on: 2 Dec 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary RandAR, a novel decoder-only visual autoregressive (AR) model, enables arbitrary token orders in image generation. Unlike previous models relying on predefined orders, RandAR removes this bias and achieves comparable performance to conventional raster-order counterparts. The design includes inserting position instruction tokens before each predicted image token, allowing for random order prediction. Trained on randomly permuted token sequences, RandAR showcases new capabilities and efficiency improvements through parallel decoding with KV-Cache at inference time. Additionally, it supports inpainting, outpainting, and resolution extrapolation in a zero-shot manner. The paper introduces RandAR as a potential direction for decoder-only visual generation models, expanding their applications across diverse scenarios. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new kind of computer model called RandAR can create images in different orders. This is unlike other models that have to follow a specific order to generate images. RandAR removes this rule and does just as well. The way it works is by adding special instructions before each image token, allowing it to predict the next one in any order. It’s trained on mixed-up orders and can do things like fill in missing parts of an image or make new versions that are bigger or smaller than the original. |
Keywords
» Artificial intelligence » Autoregressive » Decoder » Image generation » Inference » Token » Zero shot