Summary of Ll-icm: Image Compression For Low-level Machine Vision Via Large Vision-language Model, by Yuan Xue et al.
LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model
by Yuan Xue, Qi Zhang, Chuanmin Jia, Shiqi Wang
First submitted to arxiv on: 5 Dec 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Image Compression for Machines (ICM) addresses the pressing need to compress images for machine vision tasks, rather than human viewing. Current works primarily focus on high-level tasks like object detection and semantic segmentation, but neglect the quality of original images in real-world scenarios. Low-level machine vision models, such as image restoration models, can improve this quality, making their compression requirements crucial. The proposed LL-ICM framework pioneers a joint optimization approach for compressing and processing low-level machine vision tasks. By achieving mutual adaptation between image codecs and LL task models, LL-ICM enhances its encoding ability while optimizing downstream task performance. Moreover, integrating large-scale vision-language models enables the generation of universal feature embeddings for LL vision tasks, allowing one LL-ICM codec to generalize to multiple tasks. The paper establishes a comprehensive benchmark using full and no-reference image quality assessments, demonstrating that LL-ICM achieves 22.65% BD-rate reductions over state-of-the-art methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine taking pictures for machines to understand, not people to look at. Most research focuses on big tasks like recognizing objects or understanding what’s happening in a scene. But real-world images might be poor quality, making things worse after compression. To fix this, we developed a new way to compress images that also helps machines learn and improve their performance. Our approach combines two important tasks: compressing images and processing machine vision information. By doing both together, our method becomes better at encoding images and optimizing the performance of downstream tasks. We even added a special technique to make our approach more universal and robust to errors. The results show that our approach can reduce the amount of data needed for image compression by 22.65% compared to other methods. |
Keywords
» Artificial intelligence » Object detection » Optimization » Semantic segmentation