Summary of Token-level Proximal Policy Optimization For Query Generation, by Yichen Ouyang et al.

Token-level Proximal Policy Optimization for Query Generation

by Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, Feng Sun, Qi Zhang

First submitted to arxiv on: 1 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel approach to query generation is proposed, leveraging Large Language Models (LLMs) to infer user intent based on web search interaction history. The Token-level Proximal Policy Optimization (TPPO) method fine-tunes LLMs for better performance in generating high-quality queries. TPPO combines a token-level reward model and a proximal policy optimization module to address the sparse reward challenge in Reinforcement Learning from AI Feedback (RLAIF) frameworks. Experiments on both open-source and industrial datasets show that TPPO significantly improves query generation performance, outperforming existing methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Query generation is important for search engines and recommendation systems. Researchers used Large Language Models to improve this task. However, they still struggled with generating good queries. The solution proposed in this paper is called Token-level Proximal Policy Optimization (TPPO). It helps the models learn by giving them rewards when they do a good job. This new approach worked better than previous methods.

Keywords

* Artificial intelligence * Optimization * Reinforcement learning * Token

Token-level Proximal Policy Optimization for Query Generation

by Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, Feng Sun, Qi Zhang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of B-cosification: Transforming Deep Neural Networks to Be Inherently Interpretable, by Shreyash Arya et al.

Summary of Exploring Multi-modality Dynamics: Insights and Challenges in Multimodal Fusion For Biomedical Tasks, by Laura Wenderoth

Related Posts