Top Banner
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD- DocArch
21

The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Jan 01, 2016

Download

Documents

Melinda Owen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

The Voice-Enabled Web: VoiceXML and

Related Standards for Telephone Access to

Web Applications14 Feb. 2002

Christophe StrobbeK.U.Leuven - ESAT-SCD-DocArch

Page 2: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Overview• Voice browsers• History of voice markup languages• W3C Speech Interface Framework• Communication Architecture• VoiceXML 2.0• Grammars• SALT

• Not WAP/WML, Voice over IP

Page 3: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Voice Browser

Device (hardware and software) that interprets voice markup languages to generate voice output and interpret voice input.

Page 4: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Companies

Page 5: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

History

1990s: companies developed their own markup languages:

• PhoneML (AT&T)

• PhoneML (Lucent)

• VoxML (Motorola)

• TalkML (HP Labs)

• SpeechML (IBM)

=> VoiceXML Forum : VoiceXML 1.0

• 1998: W3C Voice Browser Workshop

Page 6: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

VoiceXML Specification History

• April 1999 – Initial spec – Request For Comment

• August 1999 – 0.9 Spec released

• March 2000 – 1.0 Spec released

• October 2001 – 2.0 Working Draft (W3C)

• March 2002 – next Working Draft

• 4th quarter 2002 – 2.0 Recommendation W3C?

Page 7: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Why Voice Markup Languages?

• “Voicifying” web pages by adding a few VoiceXML tags is not feasible:– basic design principles that make a good web page

are very different from those that make an efficient voice interface

– e.g. Raggett & Ben-Natan: “Voice Browsers” (W3C, 1998)

• … unless you want to create a multimodal interface (cf. SALT) ?

Page 8: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Speech Interface Framework

TTS

Language Understanding

WorldWideWeb

User

TelephoneSystem

DialogManager

LanguageGeneration

MediaPlanning

Prerecorded audio player

ASR

DTMF tone recognizer

Context Inter-

pretation

Lexicon Natural LanguageSemantics ML

VoiceXML2.0

Reusable ComponentsSpeech Synthesis ML

N-gram Grammar ML

SpeechRecognition

Grammar ML

Page 9: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Communication Architecture

Page 10: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

What is VoiceXML?

For creating audio dialogs that include• Synthesized speech• Digitized audio• Recognition of spoken and DTMF key input• Recording of spoken input• Telephony• Mixed-initiative conversationsMajor goal: bring the advantages of web-based development

and content delivery to interactive voice response applications.

Page 11: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Advantages of VoiceXML

As perceived by Motorola et al:• People want a better mobile user interface

while on the go

• Device Independent

• Open standards create and drive market demand

• Easy to program since similar to other XML-based languages

• Utilizes existing web infrastructure

Page 12: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Developing applications• To develop VoiceXML applications you have

to learn several languages:– VoiceXML

– ECMAScript (JavaScript/Jscript)

– a grammar format (GSL, JSGF, Speech Recognition Grammar Specification)

– a back end scripting language (Perl, Java, …)

• Web developers are used to this kind of environment

Page 13: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

VoiceXML Basics• XML-based

• More structured then HTML (describes structure and semantics of data, not presentation)– Must close all tags (i.e. <prompt> </prompt>)

• Structure of language described in a Document Type Description (DTD)

Page 14: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

VoiceXML Applications

• An application consists of a single application root document as well as zero or more other documents

• The application root document is loaded whenever any other document is accessed

• The application root document grammars and variables are visible in other application documents

Document root

DocumentDocumentDocument

Page 15: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

VoiceXML Documents• Documents can contain two types of dialogs:

– forms (<form>)

– menus (<menu>)

• Other elements:– <meta>: metadata, defined as name/value pair

– <var>: for declaring variables

– <script>: for client-side ECMAScript

– <catch>: for catching events

– <link>: transitions to other dialogs

Page 16: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Forms and menus• Forms may contain zero or more <field>

elements– the user must provide a value for the field before

proceeding to the next element in the form

– each field may specify a grammar that defines the allowable inputs

• Menus may contain one or more <choice> elements– a menu presents the user with a choice of options

and then transitions to another dialog

Page 17: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

VoiceXML Example01 <!-- helloworld.vxml -->

02 <?xml version="1.0"?>

03 <vxml version="1.0">

04 <form>

05 <block>

06 <prompt>

07 Hello World!

08 </prompt>

09 </block>

10 </form>

11 </vxml>

Page 18: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Example with Grammar01 <vxml version="1.0">

02 <meta name=“maintainer" content=“[email protected]"/>

03 <form id="hello">

04 <field name="item">

05 <prompt>Would you like coffee, tea, or juice?</prompt>

06 <grammar type="application/x-gsl">

07 [coffee tea juice] </grammar>

08 <filled>

09 <prompt>Your <value expr="item"/>

10 will be ready momentarily</prompt>

11 </filled>

12 </field>

13 </form>

14 </vxml>

Page 19: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Dynamic VoiceXML#!perl –w

print "Content-type: text/x-vxml \n\n";

$HOMEBUFFER = '<?xml version="1.0"?>

<vxml version="1.0">

<form>

<block>

<prompt> Hello World </prompt>

</block>

</form>

</vxml>';

print $HOMEBUFFER;

Page 20: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Other Markup Languages• JSML: JSpeech Markup Language (Sun)

• Dialog ML (Dennis Heuer)

• SABLE (SABLE Consortium)

• DMML (Dialogue Moves Markup Language)

• SALT: Speech Application Language Tags (SALT Forum)

• (CallXML, Telephony Markup Language, …)

Progress since March 2000 (VoiceXML 1.0) ?

Page 21: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

SALT• Speech Application Language Tags (SALT

Forum)

• SALT Forum founded by Microsoft, Intel, …; 15 October 2001

• very simple set of tags for extending existing markup languages (xHTML, XML)

• specification available Q1 2002

• specification submitted to standards body (W3C??) mid 2002