Summary of Mitigating Hallucination in Multimodal Large Language Model Via Hallucination-targeted Direct Preference Optimization, by Yuhan Fu et al.
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
by Yuhan Fu, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Xirong Li
First submitted to arxiv on: 15 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Hallucination-targeted Direct Preference Optimization (HDPO) aims to reduce hallucinations in Multimodal Large Language Models (MLLMs). Unlike previous approaches, HDPO tackles hallucinations from diverse forms and causes. Specifically, three types of preference pair data target insufficient visual capabilities, long context generation, and multimodal conflicts. Experimental results demonstrate superior performance across multiple hallucination evaluation datasets, surpassing most state-of-the-art methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary To reduce hallucinations in MLLMs, a new method called Hallucination-targeted Direct Preference Optimization (HDPO) is introduced. This approach tackles hallucinations from their diverse forms and causes. The method develops three types of preference pair data targeting insufficient visual capabilities, long context generation, and multimodal conflicts. HDPO shows superior performance across multiple datasets. |
Keywords
» Artificial intelligence » Hallucination » Optimization