Pise, Nitin
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Optimizing dialog policy with large action spaces using deep reinforcement learning Thakkar, Manisha; Pise, Nitin
Indonesian Journal of Electrical Engineering and Computer Science Vol 36, No 1: October 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v36.i1.pp428-440

Abstract

Dialogue policy is responsible to select the next appropriate action from the current dialogue state to accomplish the user goal efficiently. Present commercial task-oriented dialogue systems are mostly rule-based; thus, they are not easily scalable to adapt multiple domains. To design an adaptive dialogue policy, user feedback is an essential parameter. Recently, deep reinforcement learning algorithms have been popularly applied to such problems. However, managing large state-action space is time consuming and computationally expensive. Additionally, it requires good quality and a reliable user simulator to train the dialogue policy which takes additional design efforts. In this paper, we propose a novel approach to improve the performance of dialogue policy by accelerating the training process by using imitation learning for deep reinforcement learning. We utilized proximal policy optimization (PPO) algorithm to model dialogue policy using a large-scale multi-domain tourist dataset MultiWOZ2.1. We observed a remarkable performance of dialogue policy with 91.8% task success rate, and an approximate 50% decrease in the average number of turns required to complete tasks without using user simulator in the early phase of training cycles. This approach is expected to help researchers to design computationally efficient and scalable dialogue agents by avoiding training from scratch.