Top Banner
People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor Darrell, Gaile Gordon, Mike Harville, Dan Ellis, Scott Meredith, and Electric Planet
34

People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

People Centric Processing

Malcolm Slaney

Interval Research

Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor Darrell, Gaile Gordon, Mike Harville, Dan Ellis, Scott Meredith, and

Electric Planet

Page 2: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Hard Problems

Honest Politician?

Traveling Salesman

Human Intelligence

Page 3: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Talk Goals

Better CHI with Signal ComputationRealistic human behavior

Recognition Synthesis

Page 4: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Outline

Recognition (BabyEars)

Improving Perception (Mach1, ASA)

Entertainment (Audio Morph, Mirror)

Synthesis (Video Rewrite)

What Works and What Doesn’t

Page 5: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Human Recognition

Speech RecognitionGesture RecognitionEmotion

At a distance Independent channel

Page 6: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

BabyEars

Motivation Speech recognizers

Learn wordsIgnore prosody

Infants learn prosody!

Problem What can we tell about

the affective message? What do we do with this?

Page 7: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

BabyEars Procedure

Data Collection Spontaneous

Labeling Approval Attention Prohibition Neutral (adult-directed)

Recognition

Page 8: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

BabyEars Results

Strong ID Female Utterances

Global, Pitch Slope

Global, Pitch Range

-0.05 0 0.05

0.5

1

1.5

2

Strong ID Female Utterances

Global, Delta MFCC

Global, Energy Variance5 10 15

5

10

15

Strong AD/ID Female Utterances

Global, Pitch Slope

Global, Pitch Range

-0.05 0 0.05

0.5

1

1.5

2

2.5Strong AD/ID Female Utterances

Global, Delta MFCC

Global, Energy Variance5 10 15

5

10

15

Key Approval - White Attention - Blue Prohibition - Red Neutral -Green

(Adult)

Page 9: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

BabyEars Results

Gender-dependent resultsStrengthMessage

Prohibitions

0 1 2 3 4 5 6 7 8

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

Classification versus Gender

All

Male

Female

Page 10: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

BabyEars Successes

Recognition at human ratesAre these the right labels?Why it works

Broad classes Simple features Spontaneous data

What do we do with this?

1 2 3 4 5 6 7

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Number of Features

Fraction Correct

Speaker Dependent Results

1 (f)

2 (f)

3 (m)

4 (f)

5 (f)

6 (m)

7 (f)

8 (m)

9 (f)

10 (m)

11 (m)

12 (m)

Page 11: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Human Perception

Algorithms to improve human performance

Speech PerceptionAuditory Scene Analysis

Middle Ear

Cochlea

Outer Ear

Auditory Nerve

Page 12: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Mach1

Speed up speechMaintain comprehensionModel human speaker

Compress fast speech less Compress unstressed speech more

Measure emphasis

Measure speaking rate

Modify rate

Page 13: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Mach1 Example

Compare linear to Mach1Which is easier to understand?

SOLA

Mach1

SOLA 1 SOLA 2 SOLA 3 SOLA 4 SOLA 5 SOLA 6 SOLA 7

Mach1 1 Mach1 2 Mach1 3 Mach1 4 Mach1 5 Mach1 6 Mach1 7

Page 14: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Mach1 Results

ToEFLComprehensionVary rateLinear vs.

Mach1Why it works

Mimics speakers No hard decisions

Page 15: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Auditory Scene Analysis

Foreground/backgroundSimilar to vision problem

Page 16: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

ASA Results

Example by Dan Ellis

4.1 The original sound (10 seconds) 4.2 Background noise cloud ("Noise1") 4.3 Crash ("Noise2, Click1") 4.4 Horn1, Horn2, Horn3, Horn4, Horn5 ("Wefts1-4", "Weft5", "Wefts6,7", "Weft8", "Wefts9-12") 4.5 Complete reconstruction with all elements

Page 17: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Human Entertainment

Electric Planet videoMagic Morphin’ Mirror Audio Morphing

Page 18: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Audio Morphing

Interpolate between objectsVisual example

Page 19: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Morphing Sounds

Same pitch (/a/ to /i/)

Different pitch

Proper morph

Page 20: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Morphing Representations

Smooth spectrogram Encodes formants

(what was said)

Pitch spectrogram Encodes pitch

(how it was said) Encodes breath

Page 21: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Audio Morph Results

Corner to Morning Pitch changes Durations change Voicing changes Formants change

Why it works Perceptual representation

Page 22: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Magic Morphin’ Mirror

Page 23: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Mirror Magic

Real-time stereo depthSkin color detectionFace recognition and tracking

Page 24: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Mirror Results

VideoWhy it works

Uses multiple modalities Fuzzy boundaries Distortions are forgiving

Page 25: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Human Synthesis

Speech synthesisAudio-visual speech

Page 26: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Speech Synthesis

Words are easyProsody is hard

Pitch Duration Voice quality

Data driven prosody Original Repeat after me Courtesy of Scott Meredith (Microsoft)

Page 27: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Facial Animation

Graphical Model Data Driven

Polygons Image-based rendering

FM synthesis Wavetable synthesis

Formant synthesis Diphone concatenation

Diphone recognition HMM recognizers

Page 28: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Video Rewrite Approach

Data driven

Stitch

BackgroundVideoVideo

ModelAnalysis

Synthesis SelectLip Video

stage

stage Together

Page 29: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Video Rewrite–Model

Build video model of speakerUse speech recognition to label audio

/EH-B/

/IY-B/

/OW-B/

/AA-B/

Video ModelPhoneme

Recognition

Eigen-points

Page 30: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Video Rewrite–EigenPoints

Train Gray-scale image Control points

Model Linear hyperplane

To use Apply image Read out control points

Page 31: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Video Rewrite–Tracking

Find face Affine model Single reference image

Need for Database Placing Face

Subpixel accuracy necessary!

Page 32: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Video Rewrite–Results

Video database 8 minutes of Ellen 2 minutes of JFK

Only half usableHead rotation

Why it works Large amount of training data Prosody comes with audio

Page 33: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Intelligent CHI

Lots of possibilitiesDon’t expect perfectionUse lots of dataUse what we know of perception

Page 34: People Centric Processing Malcolm Slaney Interval Research Describing work by Malcolm Slaney, Michele Covell, Gerald McRoberts, Chris Bregler, Trevor.

Thanks

For more information

[email protected]

http://web.interval.com/~malcolm