Loading Now

Summary of Accelerating Diffusion Transformers with Token-wise Feature Caching, by Chang Zou et al.


Accelerating Diffusion Transformers with Token-wise Feature Caching

by Chang Zou, Xuyang Liu, Ting Liu, Siteng Huang, Linfeng Zhang

First submitted to arxiv on: 5 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed token-wise feature caching method addresses the computational cost issue of diffusion transformers by adaptively selecting suitable tokens for caching, allowing for varying caching ratios across neural layers. This approach is shown to be effective in both image and video synthesis, achieving acceleration rates of up to 2.36and 1.93on PixArt-and OpenSora respectively, with minimal impact on generation quality.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new way to make computer-generated images and videos is being developed using a special type of artificial intelligence called diffusion transformers. These models are good at creating realistic pictures and moving images, but they use a lot of computer power. To fix this problem, some people have come up with ways to store features from previous steps in the process and reuse them later. However, these methods don’t take into account that different parts of the image or video might need more or less help from these stored features. This paper introduces a new method called token-wise feature caching that can figure out which parts need more help and apply it accordingly. The results show that this approach is much faster than before, with almost no loss in quality.

Keywords

» Artificial intelligence  » Diffusion  » Token