Top Banner
Realizing the Interactive Speech Interface in a Multi-user Virtual Environment Advisor Tsai-Yen Li Author Chun-Feng Liao NCCU Department of Computer Science Intelligent Media Lab July 2004
69

Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Jan 03, 2016

Download

Documents

zena-contreras

Realizing the Interactive Speech Interface in a Multi-user Virtual Environment. Advisor Tsai-Yen Li Author Chun-Feng Liao NCCU Department of Computer Science Intelligent Media Lab July 2004. More Abstract / High Level. More Concrete / Low Level. Agenda. Introduction Related Work - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Realizing the Interactive Speech Interface in

a Multi-user Virtual Environment

AdvisorTsai-Yen Li

AuthorChun-Feng Liao

NCCU Department of Computer ScienceIntelligent Media Lab

July 2004

Page 2: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

More Abstract / High Level

More Concrete / Low Level

Page 3: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

Page 4: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Introduction Applications of 3D virtual environments

and voice user interface have received significant attentions recently.

Incorporating VUI into virtual environments can enhance user interaction and immersiveness .

Most related research do not provide an effective mechanism for multi-user dialog management.

Page 5: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Contributions of this Research

1. Suggest a MUVE dialog model based on VoiceXML dialog model.

2. Propose a way to integrate speech interface into MUVE.

3. XAML-V : Extend XAML to provide a speech-enabled interactive animation scripting language.

4. Dealing with implementation problems of XAML-V using software patterns as recipes.

MUVE = Multi-user Virtual Environment

Page 6: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

Page 7: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

VUI / VE Integration Problems

[McGlashan 95] identified 3 types of Virtual Environment – VUI integration problems.• Speech Recognition• Language Understanding• Interaction Metaphor

Scott McGlashan is the editor-in-chief of W3C VoiceXML 2.0.

Page 8: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Integration Considerations Client Interface

• Ad hoc [Cernak02] • VRML – EAI - JSAPI[Wauchope03] [O.Apaydin02]

[Descamps01]

Dialog Management• Database : [Wauchope03] • IDE : [Cernak02] • Scripting Language :

- Based on VoiceXML: DialogXML [Nyberg02] and Galatea [Sagayama03]

- Customize: MPML-VR [Descamps01]

Page 9: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

VUI Integration

Page 10: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

IMNet – A Client-Server MUVE System

IMNet Server

IMClientA

IMClientB

IMClientC

broadcast broadcastsend

Page 11: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Animation Script Language

Using high-level scripts to control animation characters is not a new idea.

AML focuses on synchronization of facial expression and voice.• lacks the function to extract or modify an

existing animation. STEP can compose new animations

from existing animation components.• falls short on specifying detail animation

attributes.

STEP = Scripting Technology for Embodied Persona

AML = Avatar Markup Language

Page 12: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML (eXtensible Animation Markup

Language) Describe character animations at

various command levels . Developers can compose a new

animation from existing animation clips.

The syntax is extensible by providing plug-in modules.

Page 13: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

VoiceXML

VoiceXML 1.0 was proposed by W3C in 2000.

Used in telephony interactive applications.

Based on HTTP, using a form-based dialog model.

Server

Client

Page 14: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

VoiceXML : An Example

<vxml version="2.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing?</prom

pt> </field> <block> <submit next="http://www.drink.example.com/drink2.asp"/> </block> </form></vxml>

Page 15: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

VoiceXML Dialog Model Architectural View

Page 16: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and Design Conclusion

Page 17: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Definitions & Notations Dialog : Exactly two avatars concentrate

on interacting with each other. Subjects : Avatars in dialog. Observers : Avatars not in a dialog.

U : Avatars controlled by human. S : Avatars controlled by system. Suffix s : Subject avatars. Suffix i (i=1,2,3,…) :

Observer avatars.

Ss Us

Ui

Subjects

Observer

Page 18: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

VoiceXML Dialog Model

VoiceXML was designed originally for dialogs in telephony systems.

In most cases there are 2 interactive instances in telephony applications.

Page 19: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Problems with VoiceXML Dialog Model

in MUVE (1)Ss

Us

Document Server

conceptuallyactually

How is the dialog status with Us ???

IMNet Server

Page 20: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Problems with VoiceXML Dialog Model

in MUVE (2)

Ss

Us2

Document Server

conceptuallyactually

Who is talking with me ???

Us1

actually

I’m talking with Ss.

VRML Browser

What should I draw ?

Page 21: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Problems with VoiceXML Dialog Model

in MUVE(3) Conceptually, Us is having Dialog with Ss.

Actually,Us interacting with Document Server which carries Ss’s Dialog script.

VoiceXML is lack of some dialog locking mechanism.

VoiceXML Dialog Model looks unreasonable in MUVE.

Page 22: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Proposed Dialog Model in MUVE

We enhance the originally VoiceXML dialog model to fix this problem.• Proxy Request• Dialog Lock• Dialog State• Dialog Negotiation

Page 23: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Proxy Request

Ss

Us

Document Server

conceptually actually

Ss Proxy the HTTP request for Us

Page 24: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Benefits of Proxy Request Model

By applying this model in MUVE we have following benefits:• Us didn’t aware of Document Server

provides the flexibility to switch different roles.

• Ss did aware of dialog status with Us.

Page 25: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Dialogs without Dialog Lock

A

B

C

It’s impossible for A to accept speech input from multiple avatars at the same time.

A will confuse if B and C talk to him at the same time.

Speech Output from ASpeech Input to A

Page 26: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Dialog Lock

We suggest only 2 people can be in a dialog at the same time.

Dialog Lock mechanism is used to realize this constraint.

Page 27: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Dialog with Dialog Lock

A is currently in dialog with C

A

B

CDialog Lock

Dialog Scripts

Broadcasting Scripts

Broadcasting Scripts

Speech Output from A

Speech Input to A

Page 28: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Dialog States

Page 29: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Initialize a Dialog

Enter negotiation-stateSend dialog request message

Enter negotiation-state

Send dialog accept message

Enter in-dialog-state

Send dialog ack message

Enter in-dialog-state

Fetch first xaml-v script

Send first xaml-v script

Page 30: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Stop a Dialog

Enter not-in-dialog-state

Enter not-in-dialog-state

Send end dialog message

Page 31: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Summary : Proposed Dialog Model in MUVE

Ss Us

Page 32: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

Page 33: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

The XAML Scripting Language

<AnimItem DEF =”WaveWalk” cycle=”2000”>

<AnimImport src=”Walk”>

<AnimItem DEF=”SimpleWave” cycle=”1000”>

<Node target=”r_shoulder”>

<OrientationInterpolator key =”…” keyValue=”…” />

</Node>

<AnimItem>

</AnimItem>

Page 34: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML & XAML-V

XAML

XAML-V

Page 35: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML-V Features Extension of XAML Scripting Language. Subset of VoiceXML . Supports form-level and field-level

animations. Realizing the concepts discussed in

previous section.• Dialog negotiation• Proxy request• Broadcasting

Page 36: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Nested Plug-in Syntax

Page 37: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Dialog Negotiation

Page 38: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

How XAML-V Realize the Proxy Request

Model

Ss

Us

Document Server

2. Issue HTTP Command:http://xxx/helloFormResponse.jsp?helpType=no %20thanks

1. Send proxy request message

4. Return requested dialog script

3. HTTP Response

Page 39: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

End Dialog

Page 40: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Summary : Benefits of XAML-V

Extended form XAML animation script, XAML-V inherits its strong animation functions.

Can be dynamically generated by various Server-Side Script technologies.(i.e. JSP or ASP)

Dialog model works in MUVE.

Page 41: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

Page 42: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

System Design and Implementation

System Architecture XAML-V Component Example Scenario Video DEMO

Page 43: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML-V Architecture in MUVE

Page 44: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML-V Implementation

XAML Platform delegates XAML-V scripting elements to VoicePluginObject.

Embedded animations are sent back to the Animation Manager.

Page 45: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Implementing XAML-V Components with Software

Patterns

Page 46: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML-V Components Deployment Diagram

Page 47: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Message Monitor

Intercept all the messages passed by server.

Page 48: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Client

Page 49: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Us talk to Ss

Page 50: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Us talk to Us

Page 51: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Example XAML-V Script

index.jsp

Page 52: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

helpFormResponse.jsp

Page 53: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Result: Subject’s View

Page 54: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Result: Observer’s View

Page 55: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Video DEMO

Page 56: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Conclusion We believe that integrating speech interface

will make users to communicate in more natural way on MUVE.

In this thesis, we • Enhance the MUVE by integrating with speech

interface.• Suggest a new dialog model based on VoiceXML

dialog model to work properly in MUVE.• Design a XAML-V dialog script to realize

suggested dialog model.• Implement XAML-V platform using software

patterns.

Page 57: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Future Work

Face animation. Consider range between avatars. Consider 3D sound (sound

direction and volume). Add camera control into XAML-V. More sophisticated

synchronization between animation and speech interface.

Performance optimization.

Page 58: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Q & A

Page 59: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Backup

Page 60: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Research Objective Provide a solution for VUI integration Dialog management mechanism in a multi-

user virtual environment (MUVE). Realizing such a mechanism.

Page 61: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Solving Synchronize Problems when Establishing

Dialog Dialog States Synchronization Mechanism Time-out

Page 62: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Each Client may only negotiate with another client

at a time.

Page 63: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Using Time-out to Prevent Infinite Pending

Page 64: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Protocol Framework

Page 65: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Dialog Lock

Page 66: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

XAML-V Interpreter

Page 67: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Input Device

Page 68: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment
Page 69: Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

請多多捧場 時間 : 2004 . 7 . 14 ( 二 ) AM 10:00 地點 : 電算中心二樓會議室 考生 : 黃培智、廖峻鋒 旁聽口試的好處 :

• 觀摩他人口試的佈置、流程做為自己日後口試的參考。

• 考生與老師答辯過程可從中獲益良多並體會現場緊張的氣氛。

• 有精緻的點心和飲料可以吃。