Top Banner
Comparisons of Computer Input Modalities and Methods Yoshiharu Sato, http :// yo-sato.com /
32

Comparisons of input modalities and methods

May 17, 2015

Download

Technology

Yoshiharu Sato

a survey of input modalities and methods.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparisons of input modalities and methods

Comparisons of Computer Input

Modalities and Methods

Yoshiharu Sato, http://yo-sato.com/

Page 2: Comparisons of input modalities and methods

Input methods

• Mechanical Movement• Audio• Gaze• Brain• Multimodal Fusion

Page 3: Comparisons of input modalities and methods

Input methods

• Mechanical Movement• Audio• Gaze• Brain• Multimodal Fusion

Page 4: Comparisons of input modalities and methods

Mechanical Movement

• Advantages• Easy to control

• Disadvantages• Speed is limited by the mechanical movement.

• Hand/Finger methods • Body Gestures• Muscle sensing

Page 5: Comparisons of input modalities and methods

Hand/Finger methods

• Advantages• Disadvantages• Artifact (input device) must be within reach of the user.

It does not suit to remote control scenario or mobile scenarios.• Hand is busy to type characters or hold a device. Eyes

are busy to look at keyboards or touch panel. It does not suit to mobile scenario.

• Keyboard• Handwriting

Page 6: Comparisons of input modalities and methods

Keyboard

• Advantages• Keys directly map to characters, and there is smaller problem of

recognition accuracy than hand-writing or voice recognition (note: it does involve recognition for East Asia ideogram input). This is one of reasons why it is hard to beat keyboard as mainstream input means.

• Keys can represent any functions and language is rich. • Device is cheap.• It requires smaller computation cost than the others.

• Disadvantages• Keyboard input operation is not natural. • Inputting texts reply on the knowledge of key positions (“memory in the

world”, Norman 1988).

• Hardware keyboard• Software keyboard

Page 7: Comparisons of input modalities and methods

Hardware keyboard

• Advantages• Keys are fixed. By that, human can rely on the

knowledge in the world [Norman, 1988], and it’s easy to operate.

• Disadvantages• Keys are fixed and limited, and functions bound to a key

are sometimes over-loaded and modes are introduced to confuse users.

Page 8: Comparisons of input modalities and methods

Software keyboard

• Advantages• Keys are configurable by software, and there is no need of

over-loading of keys.• Touch language is richer than key press.

• Disadvantages• Touch is less accurate than mouse, and requires more efforts

in correction than the hardware keyboard• Keyboard occludes screen real-estate, and distracts user’s

thinking.• Keys are part of touch monitor, and it is hard to use under an

extreme lightening condition.• Keys are configurable by software, and users need to look for

key positions and a new key layout requires re-learning.

Page 9: Comparisons of input modalities and methods

Handwriting

• State of art • Online hand-writing technology was established in 90’s.

• Typical recognition engine goes through the process of normalization of input data (e.g., base-line, slant/slope), segmentation and feature extraction and classification (dynamic programming, neural network or so + language model).

• Commercial engines have about 10% character error rate for isolated characters in boxes, and 20 % character error rate for run-on mode in 90’s. They were close to practical accuracy.

• Hand-writing is already integrated into most retail devices such as PC or Smart Phone.

• Offline handwriting technology has not reached to a practical use.

Page 10: Comparisons of input modalities and methods

Handwriting (Cont’d)

• Advantages• Ink is the character, and it is direct and intuitive. Human has

been familiar for a long time.• Pen can play the role of mouse.• Silent. It can protect privacy.• Not subjective to environment noise.

• Disadvantages• Ink needs a conversion to character codes, recognition

cannot be 100%, and recognition results require corrections.• Hand-busy• Finger-movement to write a character is complex and time

consuming.

Page 11: Comparisons of input modalities and methods

Body Gestures

• State-of-art• Body gestures are recognized by computer vision, or

motion sensor. • Microsoft Kinect• Leap Motion • NTT DoCoMo “UbiButton”• “Ring”• Shiseido

• There is a research to use tongue gestures by magnetic sensors.

Page 12: Comparisons of input modalities and methods

Body Gestures (Cont’d)

• Advantages• Gesture language can be richer than mouse/keys and touch by

virtue of 3D.• It does not occlude screen real-estate.

• Disadvantages• Computer vision is subjective to occlusion and light condition. • 3D freehand pointing precision may be lower than that with a

2D surface.• Freehand gestures involve more muscles than

keyboard/mouse interaction, and large/frequent arm movements cause fatigue over time.

• It’s socially awkward. It is strange if I make gestures against machine in crowded environment.

Page 13: Comparisons of input modalities and methods

Muscle sensing

• State-of-art• EMG (electromyography) in forearm muscle-sensing

band can classify finger moves.• There is no commercial system yet for computer

commanding. • There are several vendors of EMG, and low-end device

costs less than $1,000.

Page 14: Comparisons of input modalities and methods

Muscle sensing (Cont’d)

• Advantages• Muscle can be sensed by a non-obtrusive way without some

artifacts in the reach of the user.• It allows hand-free operations.• It doesn’t require observable interaction that can be socially

awkward. It protects privacy.• Not interfere with environment as voice recognition or

computer vision. • Fatigue free.

• Disadvantages• It is limited by mechanical movement speed.• Language must be designed.

Page 15: Comparisons of input modalities and methods

Input methods

• Mechanical Movement• Audio• Gaze• Brain• Multimodal Fusion

Page 16: Comparisons of input modalities and methods

Audio

• Advantages• Speaking is direct, intuitive, and natural. Human has been familiar with it

for a long time. People don’t have to learn speaking. So consumers perceive speech interface as not a input task.

• Hand-free and eye-free, and suites to mobile scenario.• It is 5 times faster to speak than writing/typing.

• Disadvantages• Voice needs a conversion to character codes, requires recognition, and

corrections.• There is a segmentation problem of conversation, commands, and text

recognition.

• Voice recognition• Silent speech recognition• Lip reading

Page 17: Comparisons of input modalities and methods

Voice recognition

• State-of-art• Voice recognition technology has been investigated since

1960’s, established in 1990’s. • Voice recognition has been already in practical use in call

centers, medical jobs, and any time-critical jobs but documentation is required. Remote hand-free control by speech in a car is also in practical use. The remote control of home equipment’s is also starting up.

• There have been researches to use speech as primary and use other method for confirmation, selection, or correction. A research showed a double of T9 productivity. A research combines speech with Gaze and Dasher, and gained twice productivity compared Dasher only.

Page 18: Comparisons of input modalities and methods

Voice recognition (Cont’d)

• Advantages• Voice can communicate emotions.

• Disadvantages• It is subjective to environmental noises. Recognition

accuracy drastically drops in noisy environment by 20-50%. The accuracy degradation comes from natural spontaneous interaction or diverse speaker too.

• It’s socially awkward in two ways • Speaking is loud and invites noises to the others.• It doesn’t keep privacy. It does not suit to crowded environments. • See http

://yoshiharusato.wordpress.com/2014/05/29/why-speech-recognition-do-not-work/.

Page 19: Comparisons of input modalities and methods

Silent speech recognition

• State-of-art• Research of non-voiced speech recognition emerged

recently. Alternative to air-microphone are throat microphone, surface EMG (electromyography), ultrasound imaging of tongue and lips, and a type of stethoscope microphone. • There is no commercial system yet.

Page 20: Comparisons of input modalities and methods

Silent speech recognition

• Advantages• Silent speech solves the most critical defects of voiced

speech recognition.• It is robust against environmental noise.• It protects privacy.

• Disadvantages• Technology practicality is to be proved.

• The quality of body-conducted speech degrades compared with normal speech.

• NAM is not able to recognize pitch (Tone of Chinese).

Page 21: Comparisons of input modalities and methods

Lip reading

• State-of-art• Lip reading is approached from pattern recognition by

computer vision, or muscle move recognition by EMG (Electromyography). The computer vision approach is still the level of limited vocabulary (Takeshi Saitoh, 2009). Word recognition rate is about 80-90%. EMG approach can distinguish only vowels.

• According to (Rosenblum, 2010), human lip-reading experts can read tong positions, air flows, and tones by observing subtle moves of chin, cheek, and face. Theoretically the technology should be able to overcome the current limitations.

• There are a number of researches to use lip reading to supplement speech recognition, or combine it with keys.

• There is no commercial system yet.

Page 22: Comparisons of input modalities and methods

Lip reading (Cont’d)

• Advantages• Lip reading solves the most critical defects of voiced

speech recognition.• It is robust against environmental noise.• It protects privacy.

• Disadvantages• Lip reading is not matured yet as a standalone

technology. • Computer vision approach is subjective to occlusion and

light condition.

Page 23: Comparisons of input modalities and methods

Input methods

• Mechanical Movement• Audio• Gaze• Brain• Multimodal Fusion

Page 24: Comparisons of input modalities and methods

Gaze

• State-of-art• It’s approached by computer vision. There are already some commercial

systems. Most of commercial systems measure the Point-Of-Regard by “corneal-reflection and pupil-center” method with an infrared camera. There are a number of vendors. Gaze tracking is applied in Digital camera called “Iris” to sense focus.

• There are remote sensor type and head-mounted type. Head-mounted eye tracker can take advantage of higher accuracy and simplified geometry, and robust against head moves.

• Current eye-tracking systems achieve an accuracy of 0.5 degrees (equivalent to a region of approximately 15 pixels on a 17” display with a resolution of 1024x768 pixels viewed from a distance 70cm).

• There have been a number of researches of eye-typing for disabled people. They use software keyboard or dasher with gaze. There was a research to apply the gaze tracking to replace candidate selection in document authoring scenario, which observed more than half the time was spent on looking and selecting the right choice from candidate list with traditional IME.

Page 25: Comparisons of input modalities and methods

Gaze (Cont’d)

• Advantages• Eye gaze moves quicker than hand/finger/body. A simple target selection and cursor

positioning operations were performed approximately twice as fast as with an eye tracker than with any of the conventional cursor positioning devices. When all is performing well, eye gaze interaction can give a subjective feeling of a highly responsive system, almost as though the system is executing the user’s intentions before he or she expresses them (Karn, 2003).

• The eyes can move without fatigue.• The time required to move the eye is not related to the distance to be moved, unlike most

other input.• Operating the eye requires no training or particular coordination for normal users.

• Disadvantages• It is difficult how to interpret Point-Of-Regard if we don’t use other means or control.

Moving one’s eyes is often an almost subconscious act, and eye movement is always “on”, called “Midas Touch” problem (Karn, 2003). • Dwell time (hampering speed, fatiguing), “gaze-and-touch”, or eye gesture were used to solve this.

• Eyes basically provide only positional information. • Computer vision is subjective to occlusion and light condition.• It requires calibration before use.

Page 26: Comparisons of input modalities and methods

Input methods

• Mechanical Movement• Audio• Gaze• Brain• Multimodal Fusion

Page 27: Comparisons of input modalities and methods

Brain

• State-of-art• The brain-machine interface may replace any human computer interactions

someday. But it is not certain when brain-machine interfaces can deal with texts or symbol sequences.

• It uses • expensive high-end sensors as

• fMRI (Functional Magnetic Resonance Imaging) • or Brain blood pattern by fMRI (functional magnetic resonance imaging) • or MEG (Magneto-encephalography),

• or low-end sensors as • NIRS (Near-infrared spectroscopy) • or EEG (Electro-encephalogram.

• MSR showed off-the-shell EEG ($1500) can classify several brain states [Tan, 2005]. Hitachi offers “Kokorogatari” (2005) which tells Yes/No by NIRS. Honda research succeeded in 2006 to distinguish 3 symbols ‘paper, stone and scissors‘ by fMRI. Honda research also showed in 2009 a robot ASIMO moves arm and foot as commanded by EEG & NIRS system.

• There are ventures who offer some solution: NeuroSky, Inc, BrainGate, and Emotiv Systems.

Page 28: Comparisons of input modalities and methods

Brain (Cont’d)

• Advantages• Eye-free, Hand-free.

• Disadvantages• Technology is not matured yet. EEG requires intense

focus at present.

Page 29: Comparisons of input modalities and methods

Input methods

• Mechanical Movement• Audio• Gaze• Brain• Multimodal Fusion

Page 30: Comparisons of input modalities and methods

Multimodal Fusion

• Advantages• Users have a freedom of choice of modality. It

contributes to reliability (error correction).• Can support more users.• Modality fusion usually outperforms uni-modal

recognition.

• Disadvantages• Processing (either early fusion or late fusion) could

become more complex than mono-modal methods.

Page 31: Comparisons of input modalities and methods

Input methods

• Mechanical Movement – slow but reliable• Audio – fast for text input• Gaze – fast for pointing• Brain• Multimodal Fusion

Page 32: Comparisons of input modalities and methods

Summary

• Silent Speech (including Lip Reading) is a preferred technology of text input.• Gaze is the fast pointing method and provides

information of user’s intention.• Finger dexterity is reliable to control & command

machines.