Loading Now

Summary of Video-text Dataset Construction From Multi-ai Feedback: Promoting Weak-to-strong Preference Learning For Video Large Language Models, by Hao Yi and Qingyang Li and Yulan Hu and Fuzheng Zhang and Di Zhang and Yong Liu


Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

by Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong Liu

First submitted to arxiv on: 25 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes a high-quality video-text preference dataset, called MMAIP-V, to address the issues of obtaining VQA preference data. The dataset is constructed by sampling from the response distribution set and using an external scoring function for response evaluation. To leverage the preference knowledge in MMAIP-V, the authors propose Iter-W2S-RLAIF, a framework that iteratively updates the reference model and performs parameter extrapolation to enhance MLLMs’ alignment capabilities. The paper also proposes an unbiased and information-complete evaluation scheme in VQA evaluation.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper creates a new dataset for video-text preference data, which is important for Multimodal Large Language Models (MLLMs) to align properly. They then created a way to use this data to make MLLMs better at understanding preferences. The paper also has a plan for how to check if the dataset and method are working well.

Keywords

* Artificial intelligence  * Alignment