Summary of Noiseboost: Alleviating Hallucination with Noise Perturbation For Multimodal Large Language Models, by Kai Wu et al.
NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models
by Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang
First submitted to arxiv on: 30 May 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes NoiseBoost, a simple and broadly applicable method to alleviate hallucinations in multimodal large language models (MLLMs). Hallucinations occur when MLLMs generate lengthy, detailed descriptions for images, often neglecting visual information. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models. NoiseBoost integrates noise feature perturbations to regularize attention weights among visual and linguistic tokens. This approach enhances MLLM performance across common training strategies, including supervised fine-tuning and reinforcement learning. Moreover, NoiseBoost enables semi-supervised learning for MLLMs, unlocking the power of unlabeled data. Comprehensive experiments demonstrate improved dense caption accuracy by 8.1% with human evaluation and comparable results using 50% less labeled data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps computers better understand pictures. Computers can write long descriptions about what’s in a picture, but sometimes they get it wrong. They might say things that aren’t really there. The researchers found out why this happens and came up with a simple way to fix it called NoiseBoost. It makes the computer pay more attention to the actual picture instead of just using words it learned from other texts. This helps the computer write better descriptions and even learn new things from pictures without needing as many labeled examples. |
Keywords
» Artificial intelligence » Attention » Fine tuning » Reinforcement learning » Semi supervised » Summarization » Supervised