Summary of Mia-dpo: Multi-image Augmented Direct Preference Optimization For Large Vision-language Models, by Ziyu Liu et al.

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

by Ziyu Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Haodong Duan, Conghui He, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

First submitted to arxiv on: 23 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper presents Multi-Image Augmented Direct Preference Optimization (MIA-DPO), a novel approach for visual preference alignment that effectively handles multi-image inputs. By extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats, MIA-DPO mitigates the scarcity of diverse multi-image training data and reduces annotation costs. The method uses attention values to identify and filter out rejected responses, achieving an average performance boost of 3.0% on LLaVA-v1.5 and 4.3% on InternLM-XC2.5. This approach is compatible with various architectures and outperforms existing methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper solves a problem in machine learning called visual preference alignment. It’s like trying to figure out what people like about pictures. Right now, there are only a few ways to do this, but they’re not very good at handling lots of pictures together. The new method, MIA-DPO, makes it easier and cheaper by using extra images that aren’t even part of the picture. It’s also really good at finding what people like or dislike about each image.

Keywords

» Artificial intelligence » Alignment » Attention » Machine learning » Optimization

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

by Ziyu Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Haodong Duan, Conghui He, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Graphusion: a Rag Framework For Knowledge Graph Construction with a Global Perspective, by Rui Yang et al.

Summary of Leveraging Deep Learning For Time Series Extrinsic Regression in Predicting Photometric Metallicity Of Fundamental-mode Rr Lyrae Stars, by Lorenzo Monti et al.

Related Posts