Loading Now

Summary of Eaco: Enhancing Alignment in Multimodal Llms Via Critical Observation, by Yongxin Wang et al.


EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

by Yongxin Wang, Meng Cao, Haokun Lin, Mingfei Han, Liang Ma, Jin Jiang, Yuhao Cheng, Xiaodan Liang

First submitted to arxiv on: 6 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Enhancing Alignment in Multimodal Large Language Models via Critical Observation (EACO) approach aims to improve the performance and reduce hallucinations of multimodal large language models (MLLMs) by leveraging self-generated preference data. The method begins with collecting and refining a dataset for training a critical evaluation model, dubbed the Critic, which observes model responses across multiple dimensions and selects preferred and non-preferred outputs for refined Direct Preference Optimization (DPO) tuning. An additional supervised fine-tuning stage is employed to further enhance model performance. EACO achieves an 8.5% improvement over LLaVA-v1.6-Mistral-7B across multiple benchmarks, while reducing hallucinations by 65.6% on HallusionBench and improving reasoning ability by 21.8% on MME-Cognition.
Low GrooveSquid.com (original content) Low Difficulty Summary
Multimodal large language models have made significant progress in visual question answering and reasoning tasks using instruction fine-tuning specific datasets. They can also learn from preference data annotated by humans to improve their reasoning abilities and reduce hallucinations. However, most preference data is generated by the model itself. Researchers propose a new approach called Enhancing Alignment in Multimodal Large Language Models via Critical Observation (EACO) that aligns MLLMs using self-generated preference data without requiring high-quality critical labels. The method improves model performance and reduces hallucinations.

Keywords

» Artificial intelligence  » Alignment  » Fine tuning  » Optimization  » Question answering  » Supervised