Summary of Data Extrapolation For Text-to-image Generation on Small Datasets, by Senmao Ye et al.
Data Extrapolation for Text-to-image Generation on Small Datasets
by Senmao Ye, Fei Liu
First submitted to arxiv on: 2 Oct 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a new approach to augmenting training data for text-to-image generation using linear extrapolation. The method applies linear extrapolation only to the text feature, and retrieves new image data from the internet using search engines. To ensure the reliability of the new text-image pairs, two outlier detectors are designed to purify the retrieved images. The augmented dataset is constructed dozens of times larger than the original dataset, resulting in a significant improvement in text-to-image performance. Additionally, the paper proposes NULL-guidance to refine score estimation and recurrent affine transformation to fuse text information. The model achieves FID scores of 7.91, 9.52, and 5.00 on the CUB, Oxford, and COCO datasets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper helps improve text-to-image generation by creating more training data using a new method called linear extrapolation. This method takes the original text and generates new images from the internet. The new images are then checked to make sure they’re reliable before being used for training. This results in much better text-to-image performance. The researchers also came up with two ways to make their model better: NULL-guidance and recurrent affine transformation. |
Keywords
» Artificial intelligence » Image generation