Summary of Improving Multi-domain Task-oriented Dialogue System with Offline Reinforcement Learning, by Dharmendra Prajapat et al.
Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning
by Dharmendra Prajapat, Durga Toshniwal
First submitted to arxiv on: 8 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Task-oriented dialogue (TOD) system leverages pre-trained large language models, specifically GPT-2, for end-to-end task completion. By fine-tuning these models using both supervised learning and reinforcement learning (RL), the system addresses issues like exposure bias and token loss. A non-differentiable reward function is designed to balance success rate and BLEU evaluation metrics, guiding the model towards completing user tasks while generating coherent responses. The model is trained on dialogue-session data comprising user utterance, belief state, system act, and system response. Experimental results on MultiWOZ2.1 show improved inform rates (1.60%) and success rates (3.17%) compared to the baseline. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A TOD system helps complete tasks through conversations. The current approach uses large language models like GPT-2. However, this leads to problems with exposure bias and token loss. To solve these issues, a new method combines supervised learning and reinforcement learning. A special reward function is designed to balance the success rate and how well the responses match what was expected. This helps the model create good conversations that complete tasks while being clear and easy to understand. The model was tested on a dataset called MultiWOZ2.1 and showed significant improvements. |
Keywords
» Artificial intelligence » Bleu » Fine tuning » Gpt » Reinforcement learning » Supervised » Token