Loading Now

Summary of Iiu: Independent Inference Units For Knowledge-based Visual Question Answering, by Yili Li et al.


IIU: Independent Inference Units for Knowledge-based Visual Question Answering

by Yili Li, Jing Yu, Keke Gai, Gang Xiong

First submitted to arxiv on: 15 Aug 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes Independent Inference Units (IIU) to improve knowledge-based visual question answering. Existing methods focus on modeling correlations between multimodal clues, but this approach lacks interpretability and generalization ability. IIU decomposes intra-modal information into functionally independent units that process each semantic-specific clue independently. The model also maintains a memory update module to reduce redundant information and enhance performance. Compared to existing non-pretrained multi-modal reasoning models on standard datasets, our IIU model achieves a new state-of-the-art, outperforming basic pretrained multi-modal models by 3%. This approach provides explainable reasoning evidence by disentangling intra-modal clues and reasoning units.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps computers answer questions about pictures. Current methods are good at finding connections between different parts of the picture, but they don’t always make sense or work well with new data. The new method, called Independent Inference Units (IIU), breaks down the information in each part of the picture into separate units that process it independently. This helps computers better understand what’s happening in the picture and provide more accurate answers. The results show that this approach works well and can even beat other models that have been trained on lots of data.

Keywords

» Artificial intelligence  » Generalization  » Inference  » Multi modal  » Question answering