UWSP Web Speech Research Group
Joe FrostMark StenersonProfessor Dave Gibbs
Presentation to AITPMonday, October 17, 2005
Before beginning… Thanks for the opportunity to share
Feel free to interrupt with Questions/Comments
This will hopefully be of interest to you!
AITP needed a presentation! We’re happy to be here…
Overview UWSP Web Speech Research Group
Purpose Research Interest Origins
Previous Work Shaker – PowerPoint conversion SIDE – Speaking Integrated Development Environment
Current Work Interactions with Web Pages Voice-controlled database interaction Telephony Applications
Questions
UWSP Web Speech Research Group Purpose
Research the functionality and usefulness of the use of voice input and output on the web through the creation of meaningful projects and prototypes
Voice input: speech recognition (SR, or ASR) Voice output: speech synthesis, or text-to-speech
(TTS)
UWSP Web Speech Research Group Completed Research
Developed IDE to assist in the preparation of speaking online course materials
Development tool to create speaking web pages Useful for instructors, instructional designers, and Training
materials
Current Research Interactive browsing capability (forms, etc.) Investigating speaking web pages with broader
applicability – including speech recognition Interactive database prototype – “hands-free” database
updating telephony-based systems
Origins of Web Speech Research Group
Online Course: WDMD 170 Spring 2004 Audio over PowerPoint, or saved as HTML
large files – inaccessible to dial-up users Clumsy to edit, maintain
Investigated Speech Recognition XP wished to train my profile
Opera introduced its “speaking browser” March 2004 Press Release Currently Opera 8.5 (load same page in Opera and speak)
Investigated Text-To-Speech (TTS) Microsoft Speech Application Language Tags (SALT) VoiceXML
Initial Inquiry Investigated Speech
Recognition XP prompted me to train
my profile Demonstrate training
speech profile (Quick Launch Speech
button)
Initial Inquiry Investigated Text-To-
Speech (TTS) Built into XP
Demonstrate TTS Speech
XP Supplied voices LH Michael, Michelle MS Mary, Mike, Sam
Purchased Voices NeoSpeech Kate, Paul
Speaking Integrated Development Environment:SIDE
Uses TTS (Text-To-Speech) technology TTS in a web page: essentially a markup
language
SALT (Speech Application Language Tags) Developed by Microsoft and the SALT Forum
Voice XML Roots in a research project called PhoneWeb at AT&T Bell
Laboratories. Eventually picked up by the VoiceXML Forum
Web Page TTS: SALT
SALT:
“The Speech Application Language Tags (SALT) specification enables multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The Speech Application Language Tags extend existing mark-up languages such as HTML, XHTML, and XML.” -SALTForum.com
Web Page TTS: VoiceXML Voice XML
A language for creating voice-user interfaces, particularly for the telephone. It uses speech recognition and touchtone (DTMF keypad) for input, and pre-recorded audio and text-to-speech synthesis (TTS) for output. It is based on the Worldwide Web Consortium's (W3C's) Extensible Markup Language (XML), and leverages the web paradigm for application development and deployment. By having a common language, application developers, platform vendors, and tool providers all can benefit from code portability and reuse. -VoiceXML Forum
Comparison SALT
Newer technology Support from Microsoft Designed for internet age Controllable Voice purchase availability Large download to enable
speech
VXML
Older technology Support from VXML community Designed for telephony Many functions, not interactive Single voice available currently Very small add-in download
SALT and HTML
<body>
</body>
<html>
</html>
<head>
</head>SALT tags
SALT tags are entered into the head of the document.
Upon rendering the document IE recognizes the embedded SALT using a special plug-in
SALT example: Hello World How does this work?
Use of tags within the HTML document to invoke the Windows voice
Example of simple tags:
<html xmlns:salt=“http://www.saltforum.org/2002/SALT”> <head> <salt:prompt id=“hello”>Hello World</salt:prompt> </head> <body onload=“hello.Start()”> </body></html>
Hello World Example
(Note: this example only “speaks” if you have the I.E. Web Speech Add-In installed. You can download the add-in from our web page)
VoiceXML and HTML <html>
</html>
<head> <form> <block>
</block> </form></head>
VXML speech text within <form> and <block>
• VXML tags before </head>
• Insert to <body>• ev:event = "load" ev:handler = "#objID"
<body>
</body>
VoiceXML example: Hello World How does this work?
Use of tags within the HTML document to invoke the voice within the Opera 8 web browser
Example of simple tags:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:ev="http://www.w3.org/2001/xml-
events"> <head> <form xmlns="http://www.w3.org/2001/vxml" id="lecture"> <block> Hello World </block> </form> </head> <body ev:event="load" ev:handler="#lecture"> </body></html>
NOTE: Hello World Example must be manually loaded into Opera.Filename is Opera-HelloWorld.xml
Web Speech Research Group: First Iteration Fall 2004: Independent Study – 2 students
Conversion “wizard” – PPT saved as HTML was input Used notes section of PPT file as the text to be spoken by the
page
Added SALT Tags (worked only with SALT) Added “controls” via JavaScript Tabbed “Shaker”, as in SALT Shaker Presented December 2004 (requires I.E. Speech Add-in)
Web Speech Research Group: Second Iteration Spring 2005: Senior Projects Team (CIS 480) – 4
students create an application: Speech Integrated Development
Environment (SIDE) To enable a web author to add speech to pages Useful in online courses Create SALT or VXML pages
CIS 499 Independent Study – 2 students Interactive database prototype
Spring 2005: Speech IDE SIDE – Speech Integrated Development
Environment
Take SALT Shaker Wizard to an integrated development environment, or Speech IDE
Allow the modification of any existing web page intoone with speech
Web Page Conversion SIDE Project SIDE Function: permits a web author to easily
add speech to his/her web pages. Conversion:
HTML with SALT tags and Control Panel (IE) HTML with Voice XML tags (Opera)
HTML
TextHTML with SALT
SIDEHTML with VXML
• Keyboard• Text file• Voice
Where do the TTS tags go (SALT)?
<body>
</body>
<html>
</html>
<head>
</head>
JavaScriptSALT tags
Control Panel
• JavaScript and SALT tags before </head>
• Control Panel before </body>
SIDE Conversion ExampleCan add speaking text to any HTML page
Convert the AITP Portal Page to a speaking page using SIDE. (close PPT to avoid invoking “Train Profile”) Will create SALT tags for I.E.
Speaking Pages USES?? Online course page “lectures” Low overhead / low fidelity applications Training Situations
Looking for prototypical application
SALT Examples Recognition only Recognition and response combined
Demonstrate internal simple processing
Interactive form Potentially linked to db and submission
Web page navigation
JF
Main SALT Tags <salt: prompt>
The speaking (output) tag; TTS Methods:
Start() – begin speaking Stop() Pause() Resume()
<salt: listen> The listening (input) tag; recognition Contains one or more grammars
Grammars define what words are listened for
<salt: bind> Binds the recognized value to a form element
JF
Example: recognition Recognize a country of the European Union
(requires I.E. Speech Add-In and microphone) Courtesy of Mark Huckvale, University College
London www.phon.ucl.ac.uk/home/mark/salt/
Key code snippets
<salt:listen id="RecogEU">
<input name="txtCountry" type="text" onclick="RecogEU.Start()" />
JF
Example: recognition and response Recognize a country and supply its capital city (Huckvale)
(requires I.E. Speech Add-In and microphone)
Key code snippets
function LookupCapital(country) {
If (country=="Austria") return("Vienna");
else if (country=="Belgium") return("Brussels");
else if (country=="Cyprus") return("Nicosia");
else if (country=="Czech Republic") return("Prague");
etc.
JF
Example: interactive form Interactively order a pizza, using a
Pizza order form (Hill, WSRG)(requires I.E. Speech Add-In and microphone)
JF
Example: web page navigation Navigate pages with links already established
Page Control using speech recognition (Gibbs)(requires I.E. Speech Add-In and microphone)
Allows speaking interruptions<salt:prompt id='Prompt' oncomplete = 'pageFinished();'
bargein = 'true' bargeintype ='speech'>
Allows continuous listening<salt:listen id="testreco"
onsilence = "testreco.start();“
onnoreco = "testreco.start();“
onreco = "CheckCommand();">
JF
What It Can Provide Accessibility
No keyboard needed
Hands Free Use Automobiles!
Telephony Menu driven pages
Transparent Web Applications Same pages serving both www and listen-only
devices?
JF