Estimation Method of User Satisfaction Using N-gram- based Dialog History Model for Spoken Dialog System Sunao Hara, Norihide Kitaoka, Kazuya Takeda {naoh, kitaoka, kazuya.takeda}@nagoya-u.jp Graduate School of Information Science, Nagoya University, Japan LREC2010: O3 - Dialogue and Evaluation
LREC2010: O3 - Dialogue and Evaluation. Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System. Sunao Hara, Norihide Kitaoka, Kazuya Takeda {naoh, kitaoka, kazuya.takeda}@nagoya-u.jp. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Estimation Method of User Satisfaction Using N-gram-based Dialog History Model
for Spoken Dialog System
Sunao Hara, Norihide Kitaoka, Kazuya Takeda
{naoh, kitaoka, kazuya.takeda}@nagoya-u.jp
Graduate School of Information Science,Nagoya University, Japan
LREC2010: O3 - Dialogue and Evaluation
Introduction
• The aim of this study– Construct an estimation model of user satisfaction for
spoken dialog systems (SDSs) based on the real PC environment data
• Experiment– Field experiment using a SDS for the music
retrieval application– Construct and evaluate an estimation model
for user satisfaction using N-gram history model
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 2
Background (1/2)• Use of speech input applications (e.g. Skype)
by PC users is spreading – More users may use Spoken Dialog Systems (SDSs)
via the Internet
• The acoustic properties of PC environments differ among users– e.g. microphones, noise conditions, etc.
• From a practical application standpoint– Evaluation and prediction of the system performance (User
Satisfaction) are also important issues
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 3
Collect the speech under realistic PC environment
Build an estimation model for User Satisfaction
Background (2/2)
• The evaluation using automatically measured metrics– Tune up the system parameters in the designing stage– Use to select the best dialog strategy for SDS applications– PARADISE Framework [Walker, et al. 1997]
• The detection of problematic dialog for call center Interactive Voice Response (IVR) systems– To detect that “the conversation will break down”, as soon
as possible– Problematic dialog predictor using SLU-success feature
[Walker, et al. 2002]– N-gram-based call quality monitoring system [Kim 2007]
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 4
Can we estimate the user satisfaction of SDSby modeling the dialog context?
Spoken Language Understanding
MusicNavi2 database• Field experiment using a music
retrieval system with spoken dialog interface1. Download the system through the Internet2. Use it for a certain period3. Fill in questionnaires on the web page
• Music retrieval system - MusicNavi2– “Music retrieval application” + “Spoken dialog interface”– The spoken dialogue interface for retrieving
and playing songs stored in user’s PC– Can collect speech data in corporation with a server
program via the Internet
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 5
Modeling method for the dialog context• The dialog management of SDS is
designed by a dialog developer– The management is not always satisfactory for users
• Assume that satisfaction appears in the dialog context
• Statistically learning the naturalness of the dialog– Use N-gram to model the dialog context– Construct models for each class of users– Estimate the unknown user’s satisfaction based on the
likelihood of N-gram model
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 11
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 19
N AUC1-gram 0.6192-gram 0.7243-gram 0.7044-gram 0.7395-gram 0.7396-gram 0.7617-gram 0.7658-gram 0.756
The more N of N-gram is,the less false detection rate becomes
Conclusion• Estimation method of user satisfaction
using N-gram-based dialog history model for SDS– Constructed the real PC environmental database– Achieved high performance in the detection of “task incomplete”
users• 100% true detection rate, when 6% false detection rate
– Not sufficient performance in the detection of “unsatisfied” users– N-gram model was effective by comparison of 1-gram– Using both system and user dialog act was effective
• Future works– N-gram model-based estimation of dialog failure (online detection)– Analysis of the dialog context affected user satisfaction– Integrated method of using acoustic features, prosodic features,
dialog features, etc.
May 19, 2010LREC2010: Sunao HARA et al., Nagoya Univ., Japan. 20