Summary of Video-text Dataset Construction From Multi-ai Feedback: Promoting Weak-to-strong Preference Learning For Video Large Language Models, by Hao Yi and Qingyang Li and Yulan Hu and Fuzheng Zhang and Di Zhang and Yong Liu
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
by Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong Liu
First submitted to arxiv on: 25 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a high-quality video-text preference dataset, called MMAIP-V, to address the issues of obtaining VQA preference data. The dataset is constructed by sampling from the response distribution set and using an external scoring function for response evaluation. To leverage the preference knowledge in MMAIP-V, the authors propose Iter-W2S-RLAIF, a framework that iteratively updates the reference model and performs parameter extrapolation to enhance MLLMs’ alignment capabilities. The paper also proposes an unbiased and information-complete evaluation scheme in VQA evaluation. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper creates a new dataset for video-text preference data, which is important for Multimodal Large Language Models (MLLMs) to align properly. They then created a way to use this data to make MLLMs better at understanding preferences. The paper also has a plan for how to check if the dataset and method are working well. |
Keywords
* Artificial intelligence * Alignment