Top Banner
MikeTalk:An Adaptive Man-Machine Interface Tony Ezzat Volker Blanz Tomaso Poggio
30

MikeTalk:An Adaptive Man-Machine Interface

Jan 23, 2016

Download

Documents

NARA

MikeTalk:An Adaptive Man-Machine Interface. Tony Ezzat Volker Blanz Tomaso Poggio. TTVS Overview. Input: Text Output: Photo-realistic talking face uttering text. Desktop Agents. You have received 1 email from Tommy Poggio. Desktop Agents. Customer Support. You have bought 20 - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MikeTalk:An Adaptive Man-Machine Interface

MikeTalk:An Adaptive Man-Machine Interface

Tony EzzatVolker Blanz

Tomaso Poggio

Page 2: MikeTalk:An Adaptive Man-Machine Interface

TTVS Overview

• Input: Text

• Output: Photo-realistic talking face uttering text

Page 3: MikeTalk:An Adaptive Man-Machine Interface

Desktop Agents

Page 4: MikeTalk:An Adaptive Man-Machine Interface

Desktop Agents

You have received 1 email from Tommy Poggio.

Page 5: MikeTalk:An Adaptive Man-Machine Interface

Customer Support

Page 6: MikeTalk:An Adaptive Man-Machine Interface

Customer Support

You have bought 20 shares of SONYat $40 each.

Page 7: MikeTalk:An Adaptive Man-Machine Interface

Advertisements

Page 8: MikeTalk:An Adaptive Man-Machine Interface

Advertisements

Hi Tony, would you be interestedin a ticket from Boston to New

York for $50.00?

Page 9: MikeTalk:An Adaptive Man-Machine Interface

Modules

Page 10: MikeTalk:An Adaptive Man-Machine Interface

Phoneme Corpus

Step 1:

– collect a visual corpus from a subject

– corpus contains 44 words

–one word for each American English phoneme

Page 11: MikeTalk:An Adaptive Man-Machine Interface

6 Consonantal Visemes

Step 2:

– extract one image per phoneme: viseme

–group visemes together by visual similarity

Page 12: MikeTalk:An Adaptive Man-Machine Interface

9 Vocalic Visemes (+ 1 SilenceViseme)

Page 13: MikeTalk:An Adaptive Man-Machine Interface

Problem1:Need to Interpolate!

Page 14: MikeTalk:An Adaptive Man-Machine Interface

Solution: Morphing!

Problem 2: too tedious to specify correspondence by hand across many images!

Simultaneous interpolation of shape & texture. (Beier & Neely 1992)

Page 15: MikeTalk:An Adaptive Man-Machine Interface

Solution: Optical Flow

• To interpolate between two visemes, optical flow is first computed

• A 2D motion vector field is produced:

dx(x,y) dy(x,y)

(Horn & Schunk 1986) (Lucas & Kanade 1988)

Page 16: MikeTalk:An Adaptive Man-Machine Interface

Morphing

• Forward warping A to B

• Forward warping B to A

• Blending

• Holefilling

Page 17: MikeTalk:An Adaptive Man-Machine Interface

Synthesis Database

• 16 Visemes total

• 256 Optical flow vectors total, from every viseme to every other viseme

Page 18: MikeTalk:An Adaptive Man-Machine Interface

Concatenation and Lip Sync

• Load the correct viseme transitions

• Concatenate viseme transitions

• Sample the viseme transitions using audio durations

Page 19: MikeTalk:An Adaptive Man-Machine Interface

Examples

“1, 2, 3, 4, 5”

“cat, dog, pig,cow, moose, horse,sheep”

“you have received10 email messages.”

Page 20: MikeTalk:An Adaptive Man-Machine Interface

Current Work

• Coarticulation

• Eye + head movements

• Emotion

• 3D instead of 2d

• Psychophysics

Page 21: MikeTalk:An Adaptive Man-Machine Interface

3DWith Volker Blanz

Page 22: MikeTalk:An Adaptive Man-Machine Interface

The End

Page 23: MikeTalk:An Adaptive Man-Machine Interface

Co-articulation

• Problem: Current method does not handle coarticulation, so speech looks overly articulated

• Can record all possible triphones/ quadriphones but this approach requires a lot of data!

• Best method is to learn a model for coarticulation, but what is the representation for the lips?

Page 24: MikeTalk:An Adaptive Man-Machine Interface

Principal Components Analysis

• Each image is a vector in a high-dimensional space

• Using PCA, find the optimal set of vectors that span the space

• Project the entire corpus onto those basis vectors

Page 25: MikeTalk:An Adaptive Man-Machine Interface

Top 2 PCA Bases for /buut/

Page 26: MikeTalk:An Adaptive Man-Machine Interface

Top 2 PCA Bases for /get/

Problem: Too nonlinear!

Page 27: MikeTalk:An Adaptive Man-Machine Interface

Flow Component Analysis

• Compute optical from a reference lip image to all other images in the corpus

• Compute PCA on all the flows

Page 28: MikeTalk:An Adaptive Man-Machine Interface

Top 2 FPCA Bases for /buut/

Page 29: MikeTalk:An Adaptive Man-Machine Interface

Top 2 FPCA Bases for /get/

Much more linear behavior!

Page 30: MikeTalk:An Adaptive Man-Machine Interface

Current Work

• Now that we have parameterized the mouth, what is the model for mouth synthesis?

• How is that model fit to the PCA data?