The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Post on 01-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

The Voice-Enabled Web: VoiceXML and

Related Standards for Telephone Access to

Web Applications14 Feb. 2002

Christophe StrobbeK.U.Leuven - ESAT-SCD-DocArch

Overview• Voice browsers• History of voice markup languages• W3C Speech Interface Framework• Communication Architecture• VoiceXML 2.0• Grammars• SALT

• Not WAP/WML, Voice over IP

Voice Browser

Device (hardware and software) that interprets voice markup languages to generate voice output and interpret voice input.

Companies

History

1990s: companies developed their own markup languages:

• PhoneML (AT&T)

• PhoneML (Lucent)

• VoxML (Motorola)

• TalkML (HP Labs)

• SpeechML (IBM)

=> VoiceXML Forum : VoiceXML 1.0

• 1998: W3C Voice Browser Workshop

VoiceXML Specification History

• April 1999 – Initial spec – Request For Comment

• August 1999 – 0.9 Spec released

• March 2000 – 1.0 Spec released

• October 2001 – 2.0 Working Draft (W3C)

• March 2002 – next Working Draft

• 4th quarter 2002 – 2.0 Recommendation W3C?

Why Voice Markup Languages?

• “Voicifying” web pages by adding a few VoiceXML tags is not feasible:– basic design principles that make a good web page

are very different from those that make an efficient voice interface

– e.g. Raggett & Ben-Natan: “Voice Browsers” (W3C, 1998)

• … unless you want to create a multimodal interface (cf. SALT) ?

Speech Interface Framework

TTS

Language Understanding

WorldWideWeb

User

TelephoneSystem

DialogManager

LanguageGeneration

MediaPlanning

Prerecorded audio player

ASR

DTMF tone recognizer

Context Inter-

pretation

Lexicon Natural LanguageSemantics ML

VoiceXML2.0

Reusable ComponentsSpeech Synthesis ML

N-gram Grammar ML

SpeechRecognition

Grammar ML

Communication Architecture

What is VoiceXML?

For creating audio dialogs that include• Synthesized speech• Digitized audio• Recognition of spoken and DTMF key input• Recording of spoken input• Telephony• Mixed-initiative conversationsMajor goal: bring the advantages of web-based development

and content delivery to interactive voice response applications.

Advantages of VoiceXML

As perceived by Motorola et al:• People want a better mobile user interface

while on the go

• Device Independent

• Open standards create and drive market demand

• Easy to program since similar to other XML-based languages

• Utilizes existing web infrastructure

Developing applications• To develop VoiceXML applications you have

to learn several languages:– VoiceXML

– ECMAScript (JavaScript/Jscript)

– a grammar format (GSL, JSGF, Speech Recognition Grammar Specification)

– a back end scripting language (Perl, Java, …)

• Web developers are used to this kind of environment

VoiceXML Basics• XML-based

• More structured then HTML (describes structure and semantics of data, not presentation)– Must close all tags (i.e. <prompt> </prompt>)

• Structure of language described in a Document Type Description (DTD)

VoiceXML Applications

• An application consists of a single application root document as well as zero or more other documents

• The application root document is loaded whenever any other document is accessed

• The application root document grammars and variables are visible in other application documents

Document root

DocumentDocumentDocument

VoiceXML Documents• Documents can contain two types of dialogs:

– forms (<form>)

– menus (<menu>)

• Other elements:– <meta>: metadata, defined as name/value pair

– <var>: for declaring variables

– <script>: for client-side ECMAScript

– <catch>: for catching events

– <link>: transitions to other dialogs

Forms and menus• Forms may contain zero or more <field>

elements– the user must provide a value for the field before

proceeding to the next element in the form

– each field may specify a grammar that defines the allowable inputs

• Menus may contain one or more <choice> elements– a menu presents the user with a choice of options

and then transitions to another dialog

VoiceXML Example01 <!-- helloworld.vxml -->

02 <?xml version="1.0"?>

03 <vxml version="1.0">

04 <form>

05 <block>

06 <prompt>

07 Hello World!

08 </prompt>

09 </block>

10 </form>

11 </vxml>

Example with Grammar01 <vxml version="1.0">

02 <meta name=“maintainer" content=“christophe@docarch.be"/>

03 <form id="hello">

04 <field name="item">

05 <prompt>Would you like coffee, tea, or juice?</prompt>

06 <grammar type="application/x-gsl">

07 [coffee tea juice] </grammar>

08 <filled>

09 <prompt>Your <value expr="item"/>

10 will be ready momentarily</prompt>

11 </filled>

12 </field>

13 </form>

14 </vxml>

Dynamic VoiceXML#!perl –w

print "Content-type: text/x-vxml \n\n";

$HOMEBUFFER = '<?xml version="1.0"?>

<vxml version="1.0">

<form>

<block>

<prompt> Hello World </prompt>

</block>

</form>

</vxml>';

print $HOMEBUFFER;

Other Markup Languages• JSML: JSpeech Markup Language (Sun)

• Dialog ML (Dennis Heuer)

• SABLE (SABLE Consortium)

• DMML (Dialogue Moves Markup Language)

• SALT: Speech Application Language Tags (SALT Forum)

• (CallXML, Telephony Markup Language, …)

Progress since March 2000 (VoiceXML 1.0) ?

SALT• Speech Application Language Tags (SALT

Forum)

• SALT Forum founded by Microsoft, Intel, …; 15 October 2001

• very simple set of tags for extending existing markup languages (xHTML, XML)

• specification available Q1 2002

• specification submitted to standards body (W3C??) mid 2002

top related