VoiceXML VoiceXML and Internet Telephony and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University {kns10,hgs}@cs.columbia.edu Joint work (in progress) with Daniel, Naho, Visda and Sean.
Dec 20, 2015
VoiceXMLVoiceXMLand Internet Telephonyand Internet Telephony
Kundan Singh and Henning SchulzrinneColumbia University
{kns10,hgs}@cs.columbia.edu
Joint work (in progress) with Daniel, Naho, Visda and Sean.
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
2
OverviewOverview
A language for specifying voice dialogs in interactive voice response systems
• Information retrieval– News, sports, traffic, stock quotes
• e-business– Customer service, banking, stock trading
• Notification service
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
3
PSTN based IVR PlatformPSTN based IVR Platform
PSTN
End userEnd user
IVRIVR1 platform platform• Voice and telephony functions (ASR2, TTS3, DTMF4)• Service logic (application specific)
• Receives incoming PSTN5 call• Responds back with prompts• Accepts user input (DTMF or speech)• Takes action based on user input
(Usually the service logic is programmed for the specific
application, say weather report)
[1] Interactive voice response[2] Automated speech recognition[3] Text to speech [4] Dual tone multi-frequency (touch tone)[5] Public switched telephone network
Welcome to voice mail. Press 3 to listen to new messages...
1-212-8545224
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
4
DecompositionDecomposition
PSTN
End userEnd user
IVR platformIVR platform• Voice and telephony functions (ASR, TTS, DTMF)• Service logic (application specific)
End userEnd userVoice gatewayVoice gateway• Voice and telephony functions
Internet
Web serverWeb server
• Service logic
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
5
VoiceXMLVoiceXML
PSTN
End userEnd user
Internet
Voice gatewayVoice gateway
Web serverWeb server
• Service logic (CGI, servlet, JSP)
• Voice and telephony functions• VoiceXML browser
End userEnd userVXMLVXML HTMLHTML
DB
Multimedia
Audio/grammar
Scripts
Web server
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
6
Why VoiceXMLWhy VoiceXML
• Alternative: write C/C++ application on telephony platforms ?
• Separate application specific service logic (HTML, VoiceXML) and User interaction (browser, IO device)
• Can use existing web development tools
• Can have single application for both web and voice
• Can use existing infrastructure: HTTP, web servers, etc.
• Programming voice services for telephony platforms
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
7
VoiceXML vs HTMLVoiceXML vs HTML
• Phone vs PC; IO phone
• Transport: HTTP
• Voice browser vs web browser
• VoiceXML vs HTML form
<form action=“url”> Enter your Id: <input name=‘id’> <input type=‘submit’> </form>
<form> <field name=‘id’> <prompt> Your ID, please. </prompt> </field> <block> <submit next=“url”/> </block></form>
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
8
VoiceXML examples [ 1 ]VoiceXML examples [ 1 ]
<?xml version=“1.0”?>
<vxml version=“1.0”>
<form>
<block>
<prompt>
<emp>Hello</emp>, World!
</prompt>
</block>
</form>
</vxml>
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
9
VoiceXML examples [ 2 ]VoiceXML examples [ 2 ]
<form id=“weather_info”> <block>Welcome to the weather information service.</block>
<field name=“state”> <prompt>What state?</prompt> <grammar src=“state.gram”
type=“application/x-jsgf”/> <catch event=“help”> Please speak the state for which you want the weather. </catch> <field>
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
10
VoiceXML examples [ 2 ]VoiceXML examples [ 2 ] <field name=“city”> <prompt>What city?</prompt> <grammar src=“city.gram”
type=“application/x-jsgf”/> <help> Please speak the state for which you want the weather. </help> <field> <block><submit next=“/servet/weather” namelist=“city state”/> </block></form>
Grammar (city.gram):
California | Illinois | New Jersey | New York
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
11
VoiceXML examples [ 3 ]VoiceXML examples [ 3 ]
<field name=“card_type”> … <grammar> visa {visa} | master [card] {mastercard} | amex {amex} | american [express] {amex} </grammar> <help>Please say Visa, Mastercard, or American Express.</help> … </field>
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
12
VoiceXML examples [ 4 ]VoiceXML examples [ 4 ]
<form><field name=“drink”> <prompt>Would you like Coffee, Tea, Milk or
Nothing.</prompt> <option value=“coffee”>coffee</option> <option value=“tea”>tea</option> <option value=“milk”>milk</option> <option value=“nothing”>nothing</option></field><block> <submit next=“http://…/bartender.cgi”
namelist=“drink”/></block></form>
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
13
VoiceXML examples [ 5 ]VoiceXML examples [ 5 ]
<menu> <prompt>Would you like Coffee, Tea, Milk or
Nothing.</prompt> <choice next=“http://…coffee.vxml”>coffee</choice> <choice next=“http://…tea.vxml”>tea</choice> <choice next=“http://…coffee.vxml”>milk</choice> <choice next=“http://…blank.vxml”>nothing</choice> <nomatch count=“1”>I did not understand what you
said.</nomatch> <nomatch count=“2”>Please say one of coffee, tea,
milk or nothing</nomatch>
<noinput>You must say something.</noinput></menu>
Alternatively: “Would you like <enumerate/>”
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
14
Form Interpretation Form Interpretation AlgorithmAlgorithm
• Initialize variables, counters.
• Main loop– Select phase: select next form
– Collect phase: prompt and collect input
– Process phase: process the event
• Document: collection of forms
• An application can use multiple documents
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
15
VoiceXML scopeVoiceXML scope
• Human-Machine Interaction– Audio output (TTS, pre-recorded file)
– Audio input (Speech recognition, audio recording)
– Character input (DTMF)
– Presentation logic (scripting)
• Basic Connection Control– disconnect
– transfer
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
16
Application scopeApplication scope
• General service logic
• State management
• Dialog generation
• Dialog sequencing
• Database operation
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
17
VoiceXML featuresVoiceXML features
• Menus, Forms, Sub-Dialogs
• Inputs (grammar, record, dtmf)
• Outputs (audio, text-to-speech)
• Events (error handling: nomatch, noinput, catch-throw)
• Variables and scripting (var, assign, if)
• Transition or links (goto, submit)
• Transfer to 3rd party (also add third party)
• Disconnect the call
• Platform specific object, and property
• Pre-fetching
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
18
VoiceXML 1.0 VoiceXML 1.0 <tags><tags>
assign, audio, block, break, catch, choice, clear, disconnect, div, dtmf, else, elseif, emp, enumerate, error, exit, field, filled, form, goto, grammar, help, if, initial, link, menu, meta, noinput, nomatch, object, option, param, property, pros, record, reprompt, return, sayas, script, subdialog, submit, throw, transfer, value, var, vxml
TelephonyTelephony, , Speech Synthesis or audio outputSpeech Synthesis or audio output, , User input User input and Grammarand Grammar, , Program flowProgram flow, , Variable and propertiesVariable and properties, , Error handlingError handling, Misc. , Misc.
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
19
Internet TelephonyInternet Telephony
PSTN Internet
End userEnd user End userEnd userVoice gatewayVoice gateway
Web serverWeb server
• Service logic (CGI, servlet, JSP)
Voice and telephonyfunction
VoiceXML browser
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
20
Internet TelephonyInternet Telephony
PSTN
End userEnd user
SIP user agentSIP user agent
Voice gatewayVoice gateway
Web serverWeb server•CGI, servlet, JSP
PSTN/SIP
VoiceXML browser with SIP
SIP phoneSIP phone
New module
18 April, 2001 VoiceXML/Kundan Singh/Columbia University
21
Internet TelephonyInternet Telephony
Web serverWeb server(CGI, servlet, JSP)Example: Email by phone,voicemail by phone, directory services for department,web browsing by phone (Not WAP), …
VoiceXMLVoiceXML browser with SIP
SIP phoneSIP phone
• Accept SIP connection• Fetch XML page over HTTP• Parse XML• Interpret VoiceXML tags• Do Text-to-speech• Receive and detect user input (DTMF, or in future speech) • Parse according to the grammer• Fetch audio file from web and play to the user . . .
gatewaygateway
SIP for signaling,RTP for audio,DTMF (either in-band audio tones or RFC2833)