Loading Now

Summary of Improving Multi-domain Task-oriented Dialogue System with Offline Reinforcement Learning, by Dharmendra Prajapat et al.


Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning

by Dharmendra Prajapat, Durga Toshniwal

First submitted to arxiv on: 8 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Task-oriented dialogue (TOD) system leverages pre-trained large language models, specifically GPT-2, for end-to-end task completion. By fine-tuning these models using both supervised learning and reinforcement learning (RL), the system addresses issues like exposure bias and token loss. A non-differentiable reward function is designed to balance success rate and BLEU evaluation metrics, guiding the model towards completing user tasks while generating coherent responses. The model is trained on dialogue-session data comprising user utterance, belief state, system act, and system response. Experimental results on MultiWOZ2.1 show improved inform rates (1.60%) and success rates (3.17%) compared to the baseline.
Low GrooveSquid.com (original content) Low Difficulty Summary
A TOD system helps complete tasks through conversations. The current approach uses large language models like GPT-2. However, this leads to problems with exposure bias and token loss. To solve these issues, a new method combines supervised learning and reinforcement learning. A special reward function is designed to balance the success rate and how well the responses match what was expected. This helps the model create good conversations that complete tasks while being clear and easy to understand. The model was tested on a dataset called MultiWOZ2.1 and showed significant improvements.

Keywords

» Artificial intelligence  » Bleu  » Fine tuning  » Gpt  » Reinforcement learning  » Supervised  » Token