Top Banner
US010607504B1 ( 12 ) United States Patent Ramanarayanan et al . ( 10 ) Patent No .: US 10,607,504 B1 ( 45 ) Date of Patent : Mar. 31 , 2020 ( 54 ) COMPUTER - IMPLEMENTED SYSTEMS AND METHODS FOR A CROWD SOURCE - BOOTSTRAPPED SPOKEN DIALOG SYSTEM ( 58 ) Field of Classification Search CPC ( Continued ) GO9B 19/04 ( 56 ) References Cited ( 71 ) Applicant : Educational Testing Service , Princeton , NJ ( US ) U.S. PATENT DOCUMENTS 2006/0080101 A1 * 4/2006 Chotimongkol G06F 17/278 704/257 GIOL 15/22 704/275 2006/0149555 A1 * 7/2006 Fabbrizio ( Continued ) ( 72 ) Inventors : Vikram Ramanarayanan , San Francisco , CA ( US ) ; David Suendermann - Oeft , San Francisco , CA ( US ); Patrick Lange , San Francisco , CA ( US ); Alexei V. Ivanov , Redwood City , CA ( US ); Keelan Evanini , Pennington , NJ ( US ); Yao Qian , San Francisco , CA ( US ); Zhou Yu , Pittsburgh , PA ( US ) OTHER PUBLICATIONS ( 73 ) Assignee : Educational Testing Service , Princeton , NJ ( US ) Bohus , Dan , Raux , Antoine , Harris , Thomas , Eskenazi , Maxine , Rudnicky , Alexander ; Olympus : An Open - Source Framework for Conversational Spoken Language Interface Research ; Proceedings of the Workshop on Bridging the Gap : Academic and Industrial Research in Dialog Technolgies ; pp . 32-39 ; 2007 . ( Continued ) ( * ) Notice : Subject to any disclaimer , the term of this patent is extended or adjusted under 35 U.S.C. 154 (b ) by 377 days . Primary Examiner Thomas J Hong ( 74 ) Attorney , Agent , or Firm Jones Day ( 21 ) Appl . No .: 15 / 272,903 ( 22 ) Filed : Sep. 22 , 2016 Related U.S. Application Data ( 60 ) Provisional application No. 62 / 232,537 , filed on Sep. 25 , 2015 . ( 57 ) ABSTRACT Systems and methods are provided for implementing an educational dialog system . An initial task model is accessed that identifies a plurality of dialog states associated with a task , a language model configured to identify a response meaning associated with a received response , and a language understanding model configured to select a next dialog state based on the identified response meaning . The task is provided to a plurality of persons for training . The task model is updated by revising the language model and the language understanding model based on responses received to prompts of the provided task , and the updated task is provided to a student for development of speaking capabili ties . ( 51 ) Int . Ci . GOIB 19/04 ( 2006.01 ) GIOL 15/22 ( 2006.01 ) ( Continued ) ( 52 ) U.S. Cl . CPC G09B 19/04 ( 2013.01 ) ; GIOL 15/063 ( 2013.01 ); GIOL 15/1815 ( 2013.01 ); GIOL 15/22 ( 2013.01 ); GIOL 2015/0635 ( 2013.01 ) 17 Claims , 12 Drawing Sheets 1002 ACCESS INITIAL TASK MODEL 1004 PROVIDE TASK REPRESENTED BY TASK MODEL TO PERSONS FOR CROWOSOURCED TRAINING 1006 UPDATE TASK MODEL BY REVISING LANGUAGE MODEL AND SPOKEN LANGUAGE UNDERSTANDING MODEL 1008 PROVIDE UPDATE TASK TO STUDENT FOR DEVELOPMENT OF SPEAKING CAPABILITIES
20

( 12 ) United States Patent Ramanarayanan et alantikenschlacht.com/su/pdf/patent2020.pdfUS010607504B1 ( 12 ) United States Patent Ramanarayanan et al . ( 10 ) Patent No .: US 10,607,504

Feb 04, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • US010607504B1

    ( 12 ) United States Patent Ramanarayanan et al .

    ( 10 ) Patent No .: US 10,607,504 B1 ( 45 ) Date of Patent : Mar. 31 , 2020

    ( 54 ) COMPUTER - IMPLEMENTED SYSTEMS AND METHODS FOR A CROWD SOURCE - BOOTSTRAPPED SPOKEN DIALOG SYSTEM

    ( 58 ) Field of Classification Search CPC

    ( Continued ) GO9B 19/04

    ( 56 ) References Cited ( 71 ) Applicant : Educational Testing Service , Princeton ,

    NJ ( US ) U.S. PATENT DOCUMENTS

    2006/0080101 A1 * 4/2006 Chotimongkol G06F 17/278 704/257

    GIOL 15/22 704/275

    2006/0149555 A1 * 7/2006 Fabbrizio

    ( Continued )

    ( 72 ) Inventors : Vikram Ramanarayanan , San Francisco , CA ( US ) ; David Suendermann - Oeft , San Francisco , CA ( US ) ; Patrick Lange , San Francisco , CA ( US ) ; Alexei V. Ivanov , Redwood City , CA ( US ) ; Keelan Evanini , Pennington , NJ ( US ) ; Yao Qian , San Francisco , CA ( US ) ; Zhou Yu , Pittsburgh , PA ( US )

    OTHER PUBLICATIONS

    ( 73 ) Assignee : Educational Testing Service , Princeton , NJ ( US )

    Bohus , Dan , Raux , Antoine , Harris , Thomas , Eskenazi , Maxine , Rudnicky , Alexander ; Olympus : An Open - Source Framework for Conversational Spoken Language Interface Research ; Proceedings of the Workshop on Bridging the Gap : Academic and Industrial Research in Dialog Technolgies ; pp . 32-39 ; 2007 .

    ( Continued ) ( * ) Notice : Subject to any disclaimer , the term of this

    patent is extended or adjusted under 35 U.S.C. 154 ( b ) by 377 days .

    Primary Examiner Thomas J Hong ( 74 ) Attorney , Agent , or Firm Jones Day

    ( 21 ) Appl . No .: 15 / 272,903

    ( 22 ) Filed : Sep. 22 , 2016

    Related U.S. Application Data ( 60 ) Provisional application No. 62 / 232,537 , filed on Sep.

    25 , 2015 .

    ( 57 ) ABSTRACT Systems and methods are provided for implementing an educational dialog system . An initial task model is accessed that identifies a plurality of dialog states associated with a task , a language model configured to identify a response meaning associated with a received response , and a language understanding model configured to select a next dialog state based on the identified response meaning . The task is provided to a plurality of persons for training . The task model is updated by revising the language model and the language understanding model based on responses received to prompts of the provided task , and the updated task is provided to a student for development of speaking capabili ties .

    ( 51 ) Int . Ci . GOIB 19/04 ( 2006.01 ) GIOL 15/22 ( 2006.01 )

    ( Continued ) ( 52 ) U.S. Cl .

    CPC G09B 19/04 ( 2013.01 ) ; GIOL 15/063 ( 2013.01 ) ; GIOL 15/1815 ( 2013.01 ) ; GIOL

    15/22 ( 2013.01 ) ; GIOL 2015/0635 ( 2013.01 ) 17 Claims , 12 Drawing Sheets

    1002

    ACCESS INITIAL TASK MODEL

    1004

    PROVIDE TASK REPRESENTED BY TASK MODEL TO PERSONS

    FOR CROWOSOURCED TRAINING

    1006

    UPDATE TASK MODEL BY REVISING LANGUAGE MODEL AND SPOKEN

    LANGUAGE UNDERSTANDING MODEL

    1008

    PROVIDE UPDATE TASK TO STUDENT FOR DEVELOPMENT OF SPEAKING CAPABILITIES

  • US 10,607,504 B1 Page 2

    ( 51 ) Int . Ci . GIOL 15/18 ( 2013.01 ) GIOL 15/06 ( 2013.01 )

    ( 58 ) Field of Classification Search USPC 434/185 See application file for complete search history .

    ( 56 ) References Cited

    U.S. PATENT DOCUMENTS

    2014/0379326 A1 * 12/2014 Sarikaya GIOL 15/18 704/9

    G06F 17/289 704/8

    2015/0363393 A1 * 12/2015 Williams

    OTHER PUBLICATIONS

    Bohus , Dan , Saw , Chit , Horvitz , Eric ; Directions Robot : In - the Wild Experiences and Lessons Learned ; Proceedings of the Inter national Conference on Autonomous Agents and Multi - Agent Sys tems ; pp . 637-644 ; 2014 . Buchholz , Sabine , Latorre , Javier ; Crowdsourcing Preference Tests , and How to Detect Cheating ; INTERSPEECH ; pp . 3053-3056 ; Aug. 2011 . Eskenazi , Maxine , Black , Alan , Raux , Antoine , Langner , Brian ; Let's Go Lab : a Platform for Evaluation of Spoken Dialog Systems with Real World Users ; 9th Annual Conference of the International Speech Communications Association ; p . 219 ; Sep. 2008 . Evanini , Keelan , Higgins , Derrick , Zechner , Klaus ; Using Amazon Mechanical Turk for Transcription of Non - Native Speech ; Proceed ings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk ; pp . 53-56 ; Jun . 2010 . Jurcicek , Filip , Keizer , Simon , Gasic , Milica , Mairesse , Francois , Thomson , Blaise , Yu , Kai , Young , Steve ; Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk ; Pro ceedings of INTERSPEECH ; pp . 3061-3064 ; 2011 . Kousidis , Spyros , Kennington , Casey , Baumann , Timo , Buschmeier , Hendrik , Kopp , Stefan , Schlangen , David ; A Multimodal In - Car Dialogue System That Tracks the Driver's Attention ; Proceedings of the 16th International Conference on Multimodal Interaction ; pp . 26-33 ; Nov. 2014 . Lamere , Paul , Kwok , Philip , Gouvea , Evandro , Raj , Bhiksha , Singh , Rita , Walker , William , Warmuth , Manfred , Wolf , Peter ; The CMU SPHINX - 4 Speech Recognition System ; Proceedings of the ICASSP ; Hong Kong , China ; 2003 . McGraw , Ian , Lee , Chia - ying , Hetherington , Lee , Seneff , Stephanie , Glass , James ; Collecting Voices from the Cloud ; LREC ; pp . 1576 1583 ; 2010 . Minessale , Anthony , Collins , Michael , Schreiber , Darren , Chandler , Raymond ; FreeSWITCH Cookbook ; Packt Publishing ; 2012 . Pappu , Aasish , Rudnicky , Alexander ; Deploying Speech Interfaces to the Masses ; Proceedings of the Companion Publication of the 2013 International Conference on Intelligent User Interfaces Com panion ; pp . 41-42 ; Mar. 2013 . Povey , Daniel , Ghoshal , Amab , Boulianne , Gilles , Burget , Lukas , Glembek , Ondrej , Goel , Nagendra , Hannemann , Mirko , Motlicek , Petr , Qian , Yanmin , Schwarz , Petr , Silovsky , Jan , Stemmer , Georg , Vesely , Karel ; The Kaldi Speech Recognition Toolkit ; Proceedings of the ASRU Workshop ; 2011 .

    Prylipko , Dmytro , Schnelle - Walka , Dirk , Lord , Spencer , Wendemuth , Andreas ; Zanzibar OpenIVR : an Open - Source Framework for Devel opment of Spoken Dialog Systems ; Proceedings of the TSD Work shop ; 2011 . Ramanarayanan , Vikram , Suendermann - Oeft , David , Ivanov , Alexei , Evanini , Keelan ; A Distributed Cloud - Based Dialog System for Conversational Application Development ; Proceedings of the SIGDIAL Conference ; pp . 432-434 ; Sep. 2015 . Rayner , Manny , Frank , Ian , Chua , Cathy , Tsourakis , Nikos , Bouil lon , Pierrette ; For a Fistful of Dollars : Using Crowd - Sourcing to Evaluate a Spoken Language CALL Application ; Proceedings of the SLATE Workshop ; Aug. 2011 . Schnelle - Walka , Dirk , Radomski , Stefan , Muhlhauser , Max ; JVoiceXML as a Modality Component in the W3C Multimodal Architecture ; Journal on Multimodal User Interfaces , 7 ( 3 ) ; pp . 183-194 ; Nov. 2013 . Schroder , Marc , Trouvain , Jurgen ; The German Text - to - Speech Synthesis System Mary : a Tool for Research , Development and Teaching ; International Journal of Speech Technology , 6 ( 4 ) ; pp . 365-377 ; 2003 . Sciutti , Alessandra , Schilingmann , Lars , Palinko , Oskar , Nagai , Yukie , Sandini , Giulio ; A Gaze - Contingent Dictating Robot to Study Turn - Taking ; Proceedings of the 10th Annual ACM / IEEE International Conference on Human - Robot Interaction Extended Abstracts ; pp . 137-138 ; 2015 . Suendermann , David , Liscombe , Jackson , Pieraccini , Roberto ; How to Drink from a Fire Hose : One Person can Annoscribe 693 Thousand Utterances in One Month ; Proceedings of SIGDIAL 2010 : the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue ; pp . 257-260 ; Sep. 2010 . Suendermann , David , Liscombe , Jackson , Pieraccini , Roberto , Evanini , Keelan ; How Am I Doing ?: A New Framework to Effectively Measure the Performance of Automated Customer Care Contact Centers ; Ch . 7 in Advances in Speech Recognition : Mobile Envi ronments , A. Neustein ( Ed . ) ; Springer ; pp . 155-179 ; Aug. 2010 . Suendermann - Oeft , David , Ramanarayanan , Vikram , Teckenbrock , Moritz , Neutatz , Felix , Schmidt , Dennis ; HALEF : An Open - Source Standard - Compliant Telephony - Based Modular Spoken Dialog Sys tem — A Review and an Outlook ; Proceedings of the International Workshop on Spoken Dialog Systems ; Jan. 2015 . Taylor , Paul , Black , Alan , Caley , Richard ; The Architecture of the Festival Speech Synthesis System ; Proceedings of the ESCA Work shop on Speech Synthesis ; 1998 . Van Meggelen , Jim , Madsen , Leif , Smith , Jared ; Asterisk : The Future of Tel ony ; Sebastopol , CA : O'Reilly Media ; 2007 . Vinciarelli , Alessandro , Pantic , Maja , Bourlard , Herve ; Social Sig nal Processing : Survey of an Emerging Domain ; Image and Vision Computing Journal , 27 ( 12 ) ; pp . 1743-1759 ; 2009 . Wolters , Maria , Isaac , Karl , Renals , Steve ; Evaluating Speech Synthesis Intelligibility Using Amazon Mechanical Turk ; pp . 136 141 ; Jan. 2010 . Yu , Zhou , Bonus , Dan , Horvitz , Eric ; Incremental Coordination : Attention - Centric Speech Production in a Physically situated Con versational Agent ; Proceedings of the SIGDIAL 2015 Conference ; pp . 402-406 ; Sep. 2015 . Yu , Zhou , Papangelis , Alexandros , Rudnicky , Alexander ; TickTock : A Non - Goal - Oriented Multimodal Dialog System with Engagement Awareness ; Proceedings of the Association for the Advancement of Artificial Intelligence Spring Symposium ; pp . 108-111 ; 2015 .

    * cited by examiner

  • 102

    U.S. Patent

    PROMPT RESPONSE

    Mar. 31 , 2020

    TRAINING 104

    EDUCATIONAL DIALOG SYSTEM

    PROMPT

    Sheet 1 of 12

    108

    RESPONSE

    SCORE

    EVALUATION 901

    Fig . 1

    US 10,607,504 B1

  • 206

    208

    202

    204

    CONTINUE

    CONTINUE

    CONTINUE

    BEGIN

    WELCOME

    COULD YOU TELL ME MORE ABOUT YOUR EDUCATION ?

    00 YOU PLAN TO RETURN TO SCHOOL FOR HIGHER STUDIES ?

    U.S. Patent

    T CONTINUE 210

    CONTINUE HAVE YOU EVER QUIT A JOB BEFORE ?

    IS THE ANSWER AFFIRMATIVE ?

    CONTINUE CONTINUE

    IS THE ANSWER AFFIRMATIVE ?

    INTERESTING YES

    CONTINUE

    Mar. 31 , 2020

    CONTINUE

    DEFAULT

    212

    LET'S TALK ABOUT YOUR EXPERIENCE

    DEFAULT

    BRANCH NO

    CONTINUE

    CONTINUE I SEE

    YES

    1 THAT'S UNFORTUNATE

    CONTINUE

    DEFAULT

    CONTINUE

    CONTINUE

    BRANCH

    HAVE YOU EVER SPOKEN BEFORE A GROUP OF PEOPLE ?

    Sheet 2 of 12

    MOVE ON

    IS THE ANSWER CONTINUE AFFIRMATIVE ?

    BRANCH

    NO

    CONTINUE OKAY , THAT'S GREAT

    YES

    NO

    THANKS , WAS A PLEASURE

    DEFAULT

    NO

    BRANCH

    GREAT !

    OKAY , MOVE ON

    I SEE

    CONTINUE CONTINUE

    YES

    GET BACK TO YOU LATER

    CONTINUE

    CONTINUE

    CONTINUE

    CONTINUE

    RETURN

    CONTINUE

    WELL I HAVE BEEN ASKING YOU A LOT OF QUESTIONS . DO YOU HAVE ANY QUESTIONS FOR ME ?

    IS THE ANSWER AFFIRMATIVE ?

    US 10,607,504 B1

    Fig . 2

  • 312

    310

    302

    PROMPT

    LANGUAGE MODEL

    U.S. Patent

    RESPONSE 314

    RESPONSE

    304

    RESPONSE MEANING

    322

    TRAINING 306

    TASK ADMINISTRATOR ( DIALOG MANAGER )

    TASK MODELS

    Mar. 31 , 2020

    318

    PROMPT

    NEXT DIALOG STATE

    316

    RESPONSE

    Sheet 3 of 12

    314

    EVALUATION

    RESPONSE MEANING

    ( SPOKEN ) LANGUAGE UNDERSTANDING MODEL

    308

    320 SCORE

    EDUCATIONAL DIALOG SYSTEM

    US 10,607,504 B1

    3

    Fig . 3

  • 414

    412

    402

    PROMPT

    LANGUAGE MODEL

    U.S. Patent

    RESPONSE

    410

    -412

    416

    RESPONSE

    404

    RESPONSE MEANING

    408

    TRAINING 406

    TASK AOMINISTRATOR ( DIALOG MANAGER )

    TASK MODELS

    Mar. 31 , 2020

    420

    PROMPT

    NEXT DIALOG STATE

    .

    418

    RESPONSE

    Sheet 4 of 12

    416

    EVALUATION

    RESPONSE MEANING

    LANGUAGE UNDERSTANDING MODEL SCORE

    EDUCATIONAL DIALOG SYSTEM

    US 10,607,504 B1

    Fig . 4

  • 510

    502

    PROMPT

    LANGUAGE MODEL

    U.S. Patent

    RESPONSE

    RESPONSE

    504

    RESPONSE MEANING

    508

    TRAINING

    TASK ADMINISTRATOR ( DIALOG MANAGER

    TASK MODELS

    Mar. 31 , 2020

    PROMPT

    NEXT DIALOG STATE

    514

    516

    512

    RESPONSE

    Sheet 5 of 12

    EVALUATION

    RESPONSE MEANING

    LANGUAGE UNDERSTANDING MODEL

    506

    SCORE

    EDUCATIONAL DIALOG SYSTEM

    US 10,607,504 B1

    Fig . 5

  • SPEECH CONTENT ; GESTURES ; FACIAL EXPRESSIONS U.S. Patent

    PROMPT

    RESPONSE

    LANGUAGE MODEL

    RESPONSE

    RESPONSE MEANING

    }

    TRAINING

    TASK ADMINISTRATOR

    Mar. 31 , 2020

    TASK MODELS

    PROMPT

    NEXT DIALOG STATE

    Sheet 6 of 12

    RESPONSE EVALUATION

    RESPONSE MEANING

    LANGUAGE UNDERSTANDING MODEL

    MICROPHONE AND ASR ; MOTION DETECTOR ; VIDEO CAMERA

    SCORE

    EDUCATIONAL DIALOG SYSTEM

    US 10,607,504 B1

    6

    Fig . 6

  • uula botin

    PSIN

    U.S. Patent

    SIP

    TELEPHONY SERVERS

    VOICE BROWSER

    ( 2 ) OATA LOGGING & ORGANIZATION

    SIP WebRTC

    SIP

    SIP

    RTP ( AUDIO )

    Mar. 31 , 2020

    HTTP

    MRCPI v2 )

    TCP

    HTTP

    SPEECH SERVER

    WEB SERVER

    Sheet 7 of 12

    ( ASR )

    ( 3 ) ITERATIVE REFINEMENT OF MODELS & CALLFLOWS SPEECH TRANSCRIPTION , ANNOTATION & RATING ( STAR ) PORTAL

    LOGGING DATABASE ( MySOL ) US 10,607,504 B1

    Fig . 7

  • 3

    und DILON

    PSIN

    ( 1 ) CROWDSOURCED DATA COLLECTION USING AMAZON MECHANICAL TURK

    U.S. Patent

    VOICE BROWSER SIP

    w

    TELEPHONY SERVERS WITH VIDEO SUPPORT ( ASTERISK & FREESWITCH )

    JVoiceXML

    ( 2 ) OATA LOGGING & ORGANIZATION

    SIP WebRTC

    Zanzibar SIP

    SIP

    RTP ( AUDIO )

    Mar. 31 , 2020

    HTTP

    MRCP ( v2 )

    SPEECH SERVER

    WEB SERVER

    HTTP

    $

    APACHE

    $

    CAIRO ( MRCP server

    FESTIVAL ( TIS )

    Sheet 8 of 12

    SPHINX ( ASR )

    VXML , JSGF , ARPA , SRGS , WAV

    MARY ( TS )

    KALDI ( ASR ) SERVER

    1

    ( 3 ) ITERATIVE REFINEMENT OF MODELS & CALLFLOWS

    <

    SPEECH TRANSCRIPTION , ANNOTATION & RATING ( STAR ) PORTAL

    LOGGING DATABASE ( MySQL ) US 10,607,504 B1

    Fig . 8

  • NO . DIALOG STATES

    NO . CALLS

    mont

    ITEM

    PRAGMATICS ( FOOD OFFER ) PRAGMATICS ( SCHEDULING ) JOB INTERVIEW

    PIZZA CUSTOMER SERVICE

    131 166 192 187

    ???

    COMPLETION RATE ( % ) 61.83 66.87 35.42 47.06

    U.S. Patent

    8 7

    0.8

    Mar. 31 , 2020

    0.64 PROPORTION OF COMPLETED CALLS

    0.4

    Sheet 9 of 12

    0.2 0.0

    5

    15

    20

    10

    DAY OF DATA COLLECTION

    US 10,607,504 B1

    Fig . 9

  • U.S. Patent Mar. 31 , 2020 Sheet 10 of 12 US 10,607,504 B1

    1002

    ACCESS INITIAL TASK MODEL

    1004

    PROVIDE TASK REPRESENTED BY TASK MODEL TO PERSONS

    FOR CROWOSOURCED TRAINING

    1006

    UPDATE TASK MODEL BY REVISING LANGUAGE MODEL AND SPOKEN

    LANGUAGE UNDERSTANDING MODEL

    1008

    PROVIDE UPDATE TASK TO STUDENT FOR DEVELOPMENT OF SPEAKING CAPABILITIES

    Fig . 10

  • U.S. Patent Mar. 31 , 2020 Sheet 11 of 12 US 10,607,504 B1

    1100 1107 1110

    COMPUTER - READABLE MEMORY

    TASK MODEL

    1102

    PROCESSING SYSTEM 1108

    COMPUTER - IMPLEMENTED EDUCATIONAL DIALOG SYSTEM

    DATA STORE ( S )

    1112

    1104 SCORES

    Fig . 11A

    1120

    1130 1134

    USER PCK COMPUTER - READABLE MEMORY

    TASK MODEL

    1122 1128 1124 1122

    USER PC NETWORK ( S ) SERVER ( S ) DATA STORE ( S )

    1132

    SCORES USER PC 1127 1138

    1122 PROCESSING SYSTEM

    1137 COMPUTER - IMPLEMENTED

    EDUCATIONAL DIALOG SYSTEM

    Fig . 11B

  • U.S. Patent Mar. 31 , 2020 Sheet 12 of 12 US 10,607,504 B1

    1150

    1180 1179 1181 DISPLAY

    KEYBOARD MICROPHONE

    1154 1187 1188

    CPU INTERFACE DISPLAY INTERFACE

    1152

    1190

    ROM RAM DISK CONTROLLER

    COMMUNICATION PORTS

    1158 1159 1182

    1184

    CD ROM HARD DRIVE

    1185

    1183 FLOPPY ORIVE

    Fig . 11C

  • a

    US 10,607,504 B1 2

    COMPUTER - IMPLEMENTED SYSTEMS AND based on the identified response meaning . The task is METHODS FOR A CROWD provided to a plurality of persons for training , where pro

    SOURCE - BOOTSTRAPPED SPOKEN viding the task includes providing a prompt for a particular DIALOG SYSTEM one of the dialog states , receiving a response to the prompt ,

    5 using the language model to determine the response mean CROSS - REFERENCE TO RELATED ing based on the received response , and selecting a particular

    APPLICATIONS next dialog state based on the determined response meaning . The task model is updated by revising the language model

    This application claims priority to U.S. Provisional Appli and the language understanding model based on responses cation No. 62 / 232,537 , entitled “ Bootstrapping Develop- 10 received to prompts of the provided task , and the updated ment of a Cloud - Based Multimodal Dialog System in the task is provided to a student for development of speaking Educational Domain , ” filed Sep. 25 , 2015 , the entirety of capabilities . each of which is incorporated herein by reference . As another example , a system for implementing an edu cational dialog system includes a processing system that

    FIELD 15 includes one or more data processors and a computer readable medium encoded with instructions for command

    The technology described in this patent document relates ing the processing system to execute steps of a method . In generally to interaction evaluation and more particularly to the method , an initial task model is accessed that identifies development of a spoken dialog system for teaching and a plurality of dialog states associated with a task , a language evaluation of interactions . 20 model configured to identify a response meaning associated

    with a received response , and language understanding BACKGROUND model configured to select a next dialog state based on the

    identified response meaning . The task is provided Spoken dialog systems ( SDSs ) consist of multiple sub plurality of persons for training , where providing the task

    systems , such as automatic speech recognizers ( ASRs ) , 25 includes providing a prompt for a particular one of the dialog spoken language understanding ( SLU ) modules , dialog states , receiving a response to the prompt , using the lan managers ( DMs ) , and spoken language generators , among guage model to determine the response meaning based on others , interacting synergistically and often in real time . the received response , and selecting a particular next dialog Each of these subsystems is complex and brings with it state based on the determined response meaning . The task design challenges and open research questions in its own 30 model is updated by revising the language model and the right . Rapidly bootstrapping a complete , working dialog language understanding model based on responses received system from scratch is therefore a challenge of considerable to prompts of the provided task , and the updated task is magnitude . Apart from the issues involved in training rea provided to a student for development of speaking capabili sonably accurate models for ASR and SLU that work well in ties . the domain of operation in real time , one should review that 35 As a further example , a computer - readable medium is the individual systems also work well in sequence such that encoded with instructions for commanding a processing the overall SDS performance does not suffer and provides an system to implement a method associated with an educa effective interaction with interlocutors who call into the tional dialog system . In the method , an initial task model is system . accessed that identifies a plurality of dialog states associated

    The ability to rapidly prototype and develop such SDSs is 40 with a task , a language model configured to identify a important for applications in the educational domain . For response meaning associated with a received response , and example , in automated conversational assessment , test a language understanding model configured to select next developers might design several conversational items , each dialog state based on the identified response meaning . The in a slightly different domain or subject area . One can , in task is provided to a plurality of persons for training , where such situations , be able to rapidly develop models and 45 providing the task includes providing a prompt for a par capabilities to ensure that the SDS can handle each of these ticular one of the dialog states , receiving a response to the diverse conversational applications gracefully . This is also prompt , using the language model to determine the response true in the case of learning applications and so - called meaning based on the received response , and selecting a formative assessments : One should be able to quickly and particular next dialog state based on the determined response accurately bootstrap SDSs that can respond to a wide variety 50 meaning . The task model is updated by revising the lan of learner inputs across domains and contexts . Language guage model and the language understanding model based learning and assessments add yet another complication in on responses received to prompts of the provided task , and that systems need to deal gracefully with non - native speech . the updated task is provided to a student for development of Despite these challenges , the increasing demand for non speaking capabilities . native conversational learning and assessment applications 55 makes this avenue of research an important one to pursue ; BRIEF DESCRIPTION OF THE DRAWINGS however , this requires us to find a way to rapidly obtain data for model building and refinement an iterative cycle . FIG . 1 is a diagram depicting a processor - implemented

    educational dialog system . SUMMARY FIG . 2 is a diagram depicting example dialog states

    associated with a task . Systems and methods are provided for implementing an FIG . 3 is a diagram depicting example components of an

    educational dialog system . An initial task model is accessed educational dialog system . that identifies a plurality of dialog states associated with a FIG . 4 is a diagram depicting active entities of an edu task , a language model configured to identify a response 65 cational dialog system in a training mode . meaning associated with a received response , and a language FIG . 5 is a diagram depicting active entities of an edu understanding model configured to select a next dialog state cational dialog system in an educational or evaluation mode .

    60

  • 15

    30

    US 10,607,504 B1 3 4

    FIG . 6 is a diagram depicting a multi - modal educational model for understanding the response meaning associated dialog system . with a received response , and a language ( e.g. , spoken FIGS . 7-8 are diagrams depicting example components of language ) understanding model configured to select a next

    an educational dialog system . dialog state based on the identified response meaning . An FIG . 9 is a diagram depicting four example tasks and 5 initial task model can take the form of a set of dialog states

    improved performance of task models as those task models ( e.g. , as depicted in FIG . 2 ) and a base / default language are trained ( e.g. , using the crowd sourcing techniques model and language understanding model . The educational described herein . dialog system 102 functions in two modes , a training mode FIG . 10 is a flow diagram depicting a processor - imple 104 , and an active learning / evaluation mode 106 . mented method for implementing an educational dialog 10 In the training mode , a variety of persons interact with the system . task to further develop the default language and language FIGS . 11A , 11B , and 11C depict example systems for understanding models . This additional training enables the implementing the approaches described herein for imple dialog system 102 to better understand context and nuances menting a computer - implemented educational dialog sys associated with the task being developed and implemented . tem . For example , in different tasks , the phrase “ I don't know ”

    DETAILED DESCRIPTION can have different meanings . In a task where an interactor is under police interrogation , that phrase likely means that the

    FIG . 1 is a diagram depicting a processor - implemented subject has no knowledge of the topic . But , where the task educational dialog system . The educational dialog system 20 potentially includes flirtatious behavior by an interactor , the 102 is configured to interactively provide prompts associ phrase “ I don't know ” combined with a smile and a shrug ated with dialog states to an interactor , prompting the could imply coy behavior , where the subject really does have interactor to provide a response . In this way , the interactor knowledge on the topic . The base language model and can participate in a simulated conversation with the dialog language understanding model may not be able to under system 102. The dialog system 102 may provide its prompts 25 stand such nuances , but trained versions of those models , in a voice - only fashion ( e.g. , via a speaker ) , or in a multi which adjust their behavior based on successful / unsuccess modal fashion using an avatar ( e.g. , via a graphical user ful completions of tasks and indicated survey approvals or interface , a puppet , an artificial life form ) that communicates disapprovals by training - interactors will gain an understand both voice and information via other modalities , such as ing of these factors over time . facial expressions and body movements . In one embodiment , the training mode 104 for a task

    The educational dialog system 102 of FIG . 1 is configured model for a task is crowd sourced ( e.g. , using the Amazon initially with a base task model that identifies the dialog Mechanical Turk platform ) . A plurality of training - interac states associated with a conversation task . The dialog states tors interact with a task in training mode 104 via prompts indicate an anticipated path of a task that will be facilitated and responses , where those responses ( e.g. , speech , facial by the educational dialog system 102. FIG . 2 is a diagram 35 expressions , gestures ) are captured and evaluated to deter depicting example dialog states associated with a task . The mine whether the language model and / or language under task begins at 202 where the dialog system provides a standing model should be adjusted . Once the task model has welcome at 204 and asks initial questions at 206 and 208 . been refined via a number of interactions with the public , The question at 208 is the first time in the dialog states that crowd - sourced participants , the improved task model can be the conversation branches based on response given by the 40 provided for educational purposes , such as for developing interactor , as indicated by the evaluation at 210 and the speaking and interaction skills of non - native language corresponding branch at 212 to one of three different pos speakers in a drilling or even an evaluation context , where sible paths . In one embodiment , the evaluation of the a score 108 is provided . interactor response to the prompt question at 208 is per FIG . 3 is a diagram depicting example components of an formed in two steps . First , the response to the prompt is 45 educational dialog system . The dialog system 302 includes processed ( e.g. , voice responses are decoded via automatic a task administrator ( dialog manager ) 304. The task admin speech recognition , facial expressions are determined via istrator 304 monitors traversal of the dialog states of a task video processing , body language is detected via infrared conversation . It provides corresponding prompts , whether in motion capture ) to determine a response meaning associated training 306 or evaluation 308 mode and receives corre with the response ( e.g. , based on the totality of data received 50 sponding responses . Responses 310 ( e.g. , text from auto in the response , such as audio , video , and motion capture ) matic speech recognition performed on a response ) are using a language model . Once the meaning is determined at provided to the language model 312 which determines a 208-210 , that meaning is utilized at 210-212 to select an response meaning 314 that is returned to the task adminis appropriate branch using a meaning understanding ( spoken trator 304 ( or directly to the language understanding model language ) model . While the example of FIG . 2 generally 55 316 ) . The response meaning 314 is received by the language depicts a single path task conversation , more complicated understanding model 316 and determines the next dialog sets of dialog states ( e.g. , tree shaped ) can be implemented . state 318 that should be taken in the task , where that next For example , voice , gesture , and facial expression data can state 318 is returned to the task administrator 304 . be used to measure an engagement level of an interactor In a training mode 306 , the task administrator 304 is ( e.g. , based on head pose , gaze , and facial expressions to 60 configured to adjust the language model 312 and language identify smiles or indications of boredom , such as yawns , as understanding model 316 to improve their performance in well as content of detected speech ) , where positive feedback subsequent interactor iterations . The task model , which or other encouragement is given to an interactor whose includes the dialog states , the current language model , and engagement is determined to have waned . the current language understanding model , is accessed from

    With reference back to FIG . 1 , the educational dialog 65 a task model data store 322 before each training iteration and system 102 utilizes a task model , which includes the plu is returned , when any of those entities are altered , for rality of dialog states associated with a task , the language storage . In an evaluation mode 308 , the task administrator

  • US 10,607,504 B1 5 6

    304 may be configured to output a score 320 indicating a ( SIP ) , Public Switched Telephone Network ( PSTN ) , quality of responses received from the evaluation - interactor . and web Real - Time Communications ( WebRTC ) stan

    FIG . 4 is a diagram depicting active entities of an edu dards and include support for voice and video ; cational dialog system in a training mode . In training mode A voice browser ( e.g. , JVoiceXML ) , which is compatible 406 , the task administrator 404 accesses the task model for 5 with VoiceXML 2.1 and can process SIP traffic and the task to be administered from the task model database which incorporates support for multiple grammar stan 408. The task administrator 404 traverses the dialog states dards , such as Java Speech Grammar Format ( JSGF ) , associated with the task model , providing prompts 410 and Advanced Research Projects Agency ( ARPA ) , and receiving response 412 from the training - interactor . The Weighted Finite State Transducer ( WFST ) ; responses 412 are provided to the language model 414 10 A Media Resource Control Protocol ( MRCP ) speech which determines response meanings 416 , where that server , which allows the voice browser to initiate SIP or response meaning 416 is used by the language understand Real - Time Transport Protocol ( RTP ) connections from / ing model 418 to determine the next dialog state 420 for the to the telephony server and incorporates two speech task . recognizers and synthesizers ;

    Following conclusion of the task ( via a completion of the 15 An Apache Tomcat - based web server which can host entirety of the task or a failure to complete the task ) , the task dynamic VoiceXML pages , web services , and media administrator 404 adjusts the language model 414 and the libraries containing grammars and audio files ; language understanding model 418 based on the training OpenVXML , a VoiceXML - based voice application interactions . For example , if a task is not completed or if an authoring suite : generates dynamic web applications interactor states via a survey that they were dissatisfied with 20 that can be housed on the web server ; the task ( e.g. , the task did not provide a next prompt that was A MySQL database server for storing call logs ; appropriate for their current response ) , then the task admin A speech transcription , annotation , and rating portal that istrator 404 determines that one of the models 414 , 418 allows one to listen to and transcribe full - call record should be adjusted to better function . For example , the ings , rate them on a variety of dimensions such as caller language model 414 may be refined to apply a different 25 experience and latency , and perform various semantic response meaning 416 to a particular response 412 from the annotation tasks to train ASR and SLU modules . training - interactor that resulted in an erroneous dialog state FIG . 9 is a diagram depicting four example tasks and path . If the training - interactor completes the task or indicates improved performance of task models as those task models a positive experience , then that data is utilized to strengthen are trained ( e.g. , using the crowd sourcing techniques the models 414 , 418 ( e.g. , weights associated with potential 30 described herein ) . In the example of FIG . 9 , four tasks are paths or factors in a neural network model ) based on the described , having between 1 and 8 dialog states . The final confirmation that those models 414 , 418 behaved appropri two examples are longer , having 8 and 7 dialog states , ately to the responses 412 received from the training respectively . Task models were trained over the number of interactor . The adjusted task model is then returned by the iterations shown in the middle column , where completion task administrator 404 to the task model data store 408 . 35 rate is illustrated in a final column as an indicator of quality

    FIG . 5 is a diagram depicting active entities of an edu of the task models . The graph at the bottom of FIG.9 shows cational dialog system in an educational or evaluation mode . an improvement of completion rate associated with the job In the evaluation mode 506 , the task administrator 504 interview and pizza customer service tasks over time , as the accesses a task model from the task model data store 508 and associated language and language understanding modules uses that data to set up the language model 510 and language 40 were trained using crowd source participation . understanding model 512. The task administrator 504 tra FIG . 10 is a flow diagram depicting a processor - imple verses the dialog states of the task as informed by responses mented method for implementing an educational dialog 516 to prompts 514 with the aid of the language model 510 system . At 1002 , an initial task model is accessed that and the language understanding model 512. The task admin identifies a plurality of dialog states associated with a task , istrator 504 tracks the appropriateness of responses 516 45 a language model configured to identify a response meaning received from the evaluation - interactor as well as other associated with a received response , and a language under metrics associated with those responses ( e.g. , pronunciation , standing model configured to select a next dialog state based grammar ) to determine a score 518 indicative of the quality on the identified response meaning . The task is provided to of the interactor's communication with the educational a plurality of persons for training at 1004 , where providing dialog system 502 . 50 the task includes providing a prompt for a particular one of

    FIG . 6 is a diagram depicting a multi - modal educational the dialog states , receiving a response to the prompt , using dialog system . FIG . 6 indicates that speech response data is the language model to determine the response meaning captured via automatic speech recognition , as well as gesture based on the received response , and selecting a particular data via motion detection ( e.g. , using an XBOX Kinect next dialog state based on the determined response meaning . motion capture device ) and facial expression data via video 55 The task model is updated at 1006 by revising the language capture and analysis . The language model uses all or a model and the language understanding model based on portion of that data to extract meaning associated with responses received to prompts of the provided task , and the responses , where extracted text associated with speech may updated task is provided to a student at 1008 for develop be assigned different meanings depending on the context of ment of speaking capabilities . gesture and facial expression data that is identified . FIGS . 11A , 11B , and 11C depict example systems for FIGS . 7-8 are diagrams depicting example components of implementing the approaches described herein for imple

    an educational dialog system . Certain embodiments use a menting a computer - implemented educational dialog sys HALEF dialog system to develop conversational applica tem . For example , FIG . 11A depicts an exemplary system tions within the crowd sourcing framework . These systems 1100 that includes a standalone computer architecture where can include one or more of the following components : 65 a processing system 1102 ( e.g. , one or more computer

    Telephony servers ( e.g. , Asterisk and FreeSWITCH ) , processors located in a given computer or in multiple which are compatible with Session Initiation Protocol computers that may be separate and distinct from one

    60

  • US 10,607,504 B1 7 8

    another ) includes a computer - implemented educational dia board 1179 , or other input device 1181 , such as a micro log system 1104 being executed on the processing system phone , remote control , pointer , mouse and / or joystick . 1102. The processing system 1102 has access to a computer Additionally , the methods and systems described herein readable memory 1107 in addition to one or more data stores may be implemented on many different types of processing 1108. The one or more data stores 1108 may include task 5 devices by program code comprising program instructions models 1110 as well as scores 1112. The processing system that are executable by the device processing subsystem . The 1102 may be a distributed parallel computing environment , software program instructions may include source code , which may be used to handle very large - scale data sets . object code , machine code , or any other stored data that is

    FIG . 11B depicts a system 1120 that includes a client operable to cause a processing system to perform the meth server architecture . One or more user PCs 1122 access one 10 ods and operations described herein and may be provided in or more servers 1124 running a computer - implemented any suitable language such as C , C ++ , JAVA , for example , or any other suitable programming language . Other imple educational dialog system 1137 on a processing system 1127 mentations may also be used , however , such as firmware or via one or more networks 1128. The one or more servers even appropriately designed hardware configured to carry 1124 may access a computer - readable memory 1130 as well 15 out the methods and systems described herein . as one or more data stores 1132. The one or more data stores The systems ' and methods ' data ( e.g. , associations , map 1132 may include task models 1134 as well as scores 1138 . pings , data input , data output , intermediate data results , final FIG . 11C shows a block diagram of exemplary hardware data results , etc. ) may be stored and implemented in one or for a standalone computer architecture 1150 , such as the more different types of computer - implemented data stores , architecture depicted in FIG . 11A that may be used to 20 such as different types of storage devices and programming include and / or implement the program instructions of sys constructs ( e.g. , RAM , ROM , Flash memory , flat files , tem embodiments of the present disclosure . A bus 1152 may databases , programming data structures , programming vari serve as the information highway interconnecting the other ables , IF - THEN ( or similar type ) statement constructs , etc. ) . illustrated components of the hardware . A processing system It is noted that data structures describe formats for use in 1154 labeled CPU ( central processing unit ) ( e.g. , one or 25 organizing and storing data in databases , programs , memory , more computer processors at a given computer or at multiple or other computer - readable media for use by a computer computers ) , may perform calculations and logic operations program . required to execute a program . A non - transitory processor The computer components , software modules , functions , readable storage medium , such as read only memory ( ROM ) data stores and data structures described herein may be 1158 and random access memory ( RAM ) 1159 , may be in 30 connected directly or indirectly to each other in order to communication with the processing system 1154 and may allow the flow of data needed for their operations . It is also include one or more programming instructions for perform noted that a module or processor includes but is not limited ing the method of implementing a computer - implemented to a unit of code that performs a software operation , and can educational dialog system . Optionally , program instructions be implemented for example as a subroutine unit of code , or may be stored on a non - transitory computer - readable storage 35 as a software function unit of code , or as an object ( as in an medium such as a magnetic disk , optical disk , recordable object - oriented paradigm ) , or as an applet , or in a computer memory device , flash memory , or other physical storage script language , or as another type of computer code . The medium . software components and / or functionality may be located on

    In FIGS . 11A , 11B , and 11C , computer readable memo a single computer or distributed across multiple computers ries 1108 , 1130 , 1158 , 1159 or data stores 1108 , 1132 , 1183 , 40 depending upon the situation at hand . 1184 , 1188 may include one or more data structures for While the disclosure has been described in detail and with storing and associating various data used in the example reference to specific embodiments thereof , it will be appar systems for implementing a computer - implemented educa ent to one skilled in the art that various changes and tional dialog system . For example , a data structure stored in modifications can be made therein without departing from any of the aforementioned locations may be used to store 45 the spirit and scope of the embodiments . Thus , it is intended data from XML files , initial parameters , and / or data for other that the present disclosure cover the modifications and variables described herein . A disk controller 1190 interfaces variations of this disclosure provided they come within the one or more optional disk drives to the system bus 1152 . scope of the appended claims and their equivalents . For These disk drives may be external or internal floppy disk example , in one embodiment , in addition to or in the drives such as 1183 , external or internal CD - ROM , CD - R , 50 alternative to adjusting language models and language CD - RW or DVD drives such as 1184 , or external or internal understanding models , systems and methods can be config hard drives 1185. As indicated previously , these various disk ured to adjust acoustic models ( models that relate how drives and disk controllers are optional devices . probable a given sequence of words correspond to the actual

    Each of the element managers , real - time data buffer , speech signal received ) , dialog management models ( models conveyors , file input processor , database index shared access 55 that inform what the optimal next action of the dialog system memory loader , reference data buffer and data managers should be based on the given state ) , and engagement pre may include a software application stored in one or more of diction models ( models that inform the dialog manager how the disk drives connected to the disk controller 1190 , the to react given a current engagement state of the user ) . ROM 1158 and / or the RAM 1159. The processor 1154 may The invention claimed is : access one or more components as required . 1. A processor - implemented method for implementing an

    A display interface 1187 may permit information from the educational dialog system , comprising : bus 1152 to be displayed on a display 1180 in audio , graphic , accessing an initial task model that identifies a plurality of or alphanumeric format . Communication with external dialog states associated with a task , a language model devices may optionally occur using various communication configured to identify a response meaning associated ports 1182 . with a received response , and a language understanding

    In addition to these computer - type components , the hard model configured to select a next dialog state based on ware may also include data input devices , such as a key the identified response meaning ;

    60

    65

  • 5

    10

    25

    US 10,607,504 B1 9 10

    wherein the language model identifies the response model configured to select a next dialog state based on meaning based on content of speech of the response the identified response meaning ; received from a person , wherein the content of the wherein the language model identifies the response speech is determined using automatic speech recog meaning based on content of speech of the response nition ; received from a person , wherein the content of the

    wherein the language model further identifies the speech is determined using automatic speech recog response meaning based on gestures associated with nition ; the response received from the person , wherein the wherein the language model further identifies the gestures are captured via a video capture device or an response meaning based on gestures associated with infrared capture device ; the response received from the person , wherein the

    providing the task to a plurality of persons for training , gestures are captured via a video capture device or an wherein providing the task includes providing a prompt infrared capture device ; for a particular one of the dialog states , receiving a providing the task to a plurality of persons for training , response to the prompt , using the language model to wherein providing the task includes providing a prompt determine the response meaning based on the received 15 for a particular one of the dialog states , receiving a response , and selecting a particular next dialog state response to the prompt , using the language model to based on the determined response meaning ; determine the response meaning based on the received

    providing a survey to each of the plurality of persons after response , and selecting a particular next dialog state interaction with the provided task , wherein the survey based on the determined response meaning ; requests evaluation data regarding the quality of the 20 providing a survey to each of the plurality of persons after interaction ; interaction with the provided task , wherein the survey

    updating the task model by revising the language model requests evaluation data regarding the quality of the and the language understanding model based on interaction ; responses received to prompts of the provided task and updating the task model by revising the language model the evaluation data from the surveys ; and the language understanding model based on

    providing an updated task to a student for development of responses received to prompts of the provided task and speaking capabilities ; and the evaluation data from the surveys ;

    scoring the student's speaking capabilities based on the providing an updated task to a student for development of student's interaction with the updated task . speaking capability ; and

    2. The method of claim 1 , wherein the updated task is 30 scoring the student's speaking capability based on the provided to the student by an educational organization ; student's interaction with the updated task . wherein said providing the task to a plurality of persons 8. The system of claim 7 , wherein the updated task is

    for training includes providing the task to a pool of provided to the student by an educational organization ; public persons unaffiliated with the educational orga wherein said providing the task to a plurality of persons nization . for training includes providing the task to a pool of

    3. The method of claim 1 , further comprising : public persons unaffiliated with the educational orga providing revised tasks to further persons for additional nization .

    training prior to providing the updated task to the 9. The system of claim 7 , wherein the steps further student . include :

    4. The method of claim 3 , further comprising : providing revised tasks to further persons for additional tracking whether a first person participating in a round of training prior to providing the updated task to the

    training completes all of the dialog states associated student . with the task as a first metric ; 10. The system of claim 9 , wherein the steps further

    tracking whether a second person participating in a round include : of training completes all of the dialog states associated 45 tracking whether a first person participating in a round of with the updated task as a second metric ; training completes all of the dialog states associated

    comparing the first metric and the second metric to with the task as a first metric ; determine whether the task model is improving based tracking whether a second person participating in a round on additional training . of training completes all of the dialog states associated

    5. The method of claim 4 , wherein updates to the task 50 with the updated task as a second metric ; model are retained when the task model is determined to comparing the first metric and the second metric to have improved , wherein updates are reverted when the task determine whether the task model is improving based model is determined not to have improved . on additional training .

    6. The method of claim 1 , wherein the updated task is 11. The system of claim 10 , wherein updates to the task provided to a student for development of non - native speak- 55 model are retained when the task model is determined to ing capabilities . have improved , wherein updates are reverted when the task

    7. A processor - implemented system for implementing an model is determined not to have improved . educational dialog system , comprising : 12. The system of claim 10 , wherein the updated task is

    a processing system comprising one or more data proces provided to a student for development of non - native speak sors ; 60 ing capabilities .

    a non - transitory computer - readable medium encoded with 13. A non - transitory computer - readable medium encoded instructions for commanding the processing system to with instructions for commanding one or more data proces execute steps of a method that include : sors to execute steps of a method for implementing an

    accessing an initial task model that identifies a plurality of educational dialog system , the method comprising : dialog states associated with a task , a language model 65 accessing an initial task model that identifies a plurality of configured to identify a response meaning associated dialog states associated with a task , a language model with a received response , and a language understanding configured to identify a response meaning associated

    35

    40

  • 10

    US 10,607,504 B1 11 12

    with a received response , and a language understanding updating the task model by revising the language model model configured to select a next dialog state based on and the language understanding model based on the identified response meaning ; responses received to prompts of the provided task and wherein the language model identifies the response the evaluation data from the surveys ; meaning based on content of speech of the response 5 received from a person , wherein the content of the providing an updated task to a student for development of speech is determined using automatic speech recog speaking capabilities ; and nition ; scoring the student's speaking capabilities based on the wherein the language model further identifies the student's interaction with the updated task . response meaning based on gestures associated with the response received from the person , wherein the 14. The method of claim 1 , wherein the gestures captured gestures are captured via a video capture device or an include facial expressions . infrared capture device ; 15. The method of claim 1 , wherein the language model providing the task to a plurality of persons for training , further identifies the response meaning based on a detected wherein providing the task includes providing a prompt 15 engagement level of the person . for a particular one of the dialog states , receiving a

    response to the prompt , using the language model to 16. The system of claim 7 , wherein the gestures captured determine the response meaning based on the received include facial expressions . response , and selecting a particular next dialog state 17. The system of claim 7 , wherein the language model based on the determined response meaning ; further identifies the response meaning based on a detected providing a survey to each of the plurality of persons after 20 engagement level of the person . interaction with the provided task , wherein the survey requests evaluation data regarding the quality of the interaction ;