Top Banner
stanford hci group / cs376 http:// cs376.stanford.edu Scott Klemmer · 16 November 2006 Speech & Multimod
14

Stanford hci group / cs376 u Scott Klemmer · 16 November 2006 Speech & Multimod al.

Dec 31, 2015

Download

Documents

Octavia Roberts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

stanford hci group / cs376

http://cs376.stanford.eduScott Klemmer · 16 November 2006

Speech & Multimodal

Page 2: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

2

Some hci definitions Multimodal generally refers to an

interface that can accept input from two or more combined modes

Multimedia generally refers to an interface that produces output in two or more modes

The vast majority of multimodal systems have been speech + pointing (pen or mouse) input, with graphical (and sometimes voice) output

Page 3: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

3

Canonical App: Maps

Why are maps so well-suited? A visual artifact for computation

(Hutchins)

Page 4: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

4

What is an interface

Is it an interface if there’s no method for a user to tell if they’ve done something? What might an example be?

Is it an interface if there’s no method for explicit user input? example: health monitoring apps

Page 5: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

5

Sensor Fusion multimodal = multiple human channels sensor fusion = multiple sensor

channels Example app: Tracking people (1

human channel) might use: RFID + vision + keyboard

activity + … I disagree with the Oviatt paper

Speech + lips is sensor fusion, not multimodality

Page 6: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

6

What constitutes a modality? To some extent, it’s a matter of

semantics Is pen a different modality than a

mouse? Are two mice different modalities if one

is controlling a gui, and the other controls a tablet-like ui?

Is a captured modality the same as an input modality? How does the audio notebook fit into this?

Page 7: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

7

Input modalities mouse pen: recognized or unrecognized speech non-speech audio tangible object manipulation gaze, posture, body-tracking Each of these experiences has

different implementing technologies e.g., gaze tracking could be laser-based

or vision-based

Page 8: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

8

Output modalities

Visual displays Raster graphics, Oscilloscope, paper

printer, … Haptics: Force Feedback Audio Smell Taste

Page 9: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

9

Dual Purpose Speech

Page 10: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

10

Why multimodal?

Hands busy / eyes busy Mutual disambiguation Faster input “More natural”

Page 11: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

11

On Anthropomorphism

The multimodal community grew out of the AI and speech communities

Should human communication with computers be as similar as possible to human-human communication?

Page 12: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

12

Multimodal Software Architectures

OAA, AAA, OOPS

Page 13: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

13

Next Time…Vision-Based Interaction

Computer Vision for Interactive Computer Graphics, William T. Freeman, Yasunari Miyake, Ken-ichi Tanaka, David B. Anderson, Paul A. Beardsley, Chris N. Dodge, Michal Roth, Craig D. Weissman, William S. Yerazunis, Hiroshi Kage, Kazuo Kyuma

A Design Tool for Camera-based Interaction, Jerry Alan Fails and Dan R. Olsen

Page 14: Stanford hci group / cs376  u Scott Klemmer · 16 November 2006 Speech & Multimod al.

CS547 Tomorrow

Ben Shneiderman, University of Maryland – Science 2.0: The Design Science of Collaboration

14