Proposal of a Hierarchical Architecture for Multimodal Interactive Systems Masahiro Araki* 1 Tsuneo Nitta* 2 Kouichi Katsurada* 2 Takuya Nishimoto* 3 Tetsuo Amakasu* 4 Shinnichi Kawamoto* 5 * 1 Kyoto Institute of Technology * 2 Toyohashi University of Technology * 3 The University of Tokyo * 4 NTT Cyber Space Labs. * ATR 2007/11/16 1 W3C MMI ws
35
Embed
Proposal of a Hierarchical Architecture for Multimodal Interactive Systems Masahiro Araki* 1 Tsuneo Nitta* 2 Kouichi Katsurada* 2 Takuya Nishimoto* 3 Tetsuo.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Proposal of a Hierarchical Architecture for Multimodal
d interaction with robot speech, image, sensor speech, display
enegotiation with interactive agent
speech speech, face image
f kiosk terminal touch, speech speech, display
What isKasuri?
Nishijin Kasuri is a traditional texture in Kyoto.
Example of use caseInteraction with robot
2007/11/16 18W3C MMI ws
Requirements
2007/11/16 W3C MMI ws 19
1. general2. input modality3. output modality4. architecture, integration and synchronization
point5. runtimes and deployments6. dialogue management7. handling of forms and fields8. connection with outside application9. user model and environment information10.from the viewpoint of developer
in common with W3C
extension
ASR pen / touch TTS / audio output
graphicaloutput
control/interpret
control/interpret control control
control / understanding control
control
control
data model application logiclayer 6:application
layer 5: task control
layer 4interactioncontrol
layer 3:modality integration
layer 2:modality component
layer 1:I/O device
user model /device model
results ・ event
interpreted result/ event
integrated result / event
event
event
event
event command
command
command
set/get event/ control set/get
event / result command
command
command
command
2007/11/16 W3C MMI ws 21
Detailed analysis of use case
Requirements for each layer
Publish trial standard
release reference implementation
Investigation procedurePhase 2
Detailed use case analysis
2007/11/16 W3C MMI ws 22
Requirements of each layer
2007/11/16 W3C MMI ws 23
• Clarify Input/Output with adjacent layers• Define events• Clarify inner layer processing• Investigate markup language
– Output : (to 2nd ) recognition result– Example : ASR, touch input, face detection, ...
• Output module– Input : (from 2nd ) output contents – Output : (to outside) signal– Example : TTS, Face image synthesizer, Web browser, ...
2007/11/16 24W3C MMI ws
2nd : Modality component• Function– lapper that absorbs the difference of 1st layer ex ) Speech Recognition component
grammar : SRGS semantic analysis : SISRresult: EMMA
– provide multimodal synchronization ex) TTS with lip synchronization
2007/11/16 25W3C MMI ws
TTS
LS-TTS2nd:Modalitycomponent
1st:Input/Outputmodule
FSM
3rd : Modality Fusion• Integration of input information – Interpretation of sequential / simultaneous input– Output the integrated result as EMMA format