Summary of Tldr: Token-level Detective Reward Model For Large Vision Language Models, by Deqing Fu et al.
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
by Deqing Fu, Tong Xiao, Rui Wang, Wang Zhu, Pengchuan Zhang, Guan Pang, Robin Jia, Lawrence Chen
First submitted to arxiv on: 7 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel reward model called TLRD (Token-Level Detective Reward Model) to provide fine-grained annotations for each text token in multimodal large language models. The existing reward models are binary and do not capture the complexity of human feedback, which can lead to implicit biases towards texts and less grounding in images. To address this, TLDR uses a perturbation-based method to generate synthetic hard negatives with token-level labels, allowing it to train on more diverse and challenging data. The authors demonstrate the effectiveness of TLDR in assisting off-the-shelf models to self-correct their generations and serving as a hallucination evaluation tool. Additionally, they show that TLDR can improve the base model’s performance significantly and speed up human annotation by 3 times to acquire high-quality vision language data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper tries to make AI better at understanding text and images together. Right now, the way we teach AI what’s good or bad is too simple – it’s just a yes or no answer for each sentence. But this isn’t enough, as AI can learn biases from this feedback. To fix this, the authors suggest a new way to give AI feedback, called TLDR. This method provides more detailed information about each word in a text, helping AI understand what makes a good or bad description of an image. The authors show that using TLDR can make AI better at generating descriptions and evaluating how well AI does on this task. They also find that TLDR can help humans label data faster and more accurately. |
Keywords
» Artificial intelligence » Grounding » Hallucination » Token