Summary of Uniaa: a Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark, by Zhaokun Zhou et al.
UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
by Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang
First submitted to arxiv on: 15 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary | 
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here | 
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Unified Multi-modal Image Aesthetic Assessment (UNIAA) framework aims to bridge the gap between human aesthetics and traditional IAA methods, which are often limited to a single data source or task. The framework includes a Multi-modal Large Language Model (MLLM) named UNIAA-LLaVA, trained on transformed existing datasets into unified and high-quality visual instruction tuning data. The UNIAA-Bench is a comprehensive benchmark consisting of three aesthetic levels: Perception, Description, and Assessment. Experiments validate the effectiveness and rationality of UNIAA, with UNIAA-LLaVA achieving competitive performance compared to existing MLLMs, including GPT-4V in aesthetic perception. | 
| Low | GrooveSquid.com (original content) | Low Difficulty Summary Image Aesthetic Assessment (IAA) is a crucial task in computer vision that helps determine the beauty or attractiveness of images. The traditional IAA methods are limited and don’t work well with different data sources or tasks. To solve this problem, researchers propose a new framework called Unified Multi-modal Image Aesthetic Assessment (UNIAA). This framework uses a special type of AI model called a Multi-modal Large Language Model (MLLM) to assess image aesthetics. The MLLM is trained on transformed existing datasets and performs well in assessing image beauty. The UNIAA framework also includes a benchmark that evaluates the performance of different AI models in assessing image aesthetics. | 
Keywords
* Artificial intelligence * Gpt * Instruction tuning * Large language model * Multi modal




