Top Banner
Create Photo- Create Photo- Realistic Talking Realistic Talking Face Face Changbo Hu Changbo Hu 2001.11.26 2001.11.26 * * This work was done during visiting This work was done during visiting Microsoft Research China with Bainin Microsoft Research China with Bainin g Guo and Bo Zhang g Guo and Bo Zhang
35

Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Create Photo-Realistic Create Photo-Realistic Talking FaceTalking Face

Changbo HuChangbo Hu

2001.11.262001.11.26

**This work was done during visiting Microsoft ReThis work was done during visiting Microsoft Research China with Baining Guo and Bo Zhangsearch China with Baining Guo and Bo Zhang

Page 2: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

OutlineOutline

Introduction of talking faceIntroduction of talking face

MotivationsMotivations

System overviewSystem overview

TechniquesTechniques

ConclusionsConclusions

Page 3: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

IntroductionIntroduction

What is a talking faceWhat is a talking face Face (lip) animation, driven by voiceFace (lip) animation, driven by voice ApplicationsApplications

The process of talking faceThe process of talking face Face modelFace model Motion captureMotion capture Mapping betweenMapping between

audio and video audio and video Rendering, Rendering,

Photo-realistic?Photo-realistic?

Page 4: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

LiteraturesLiteratures

Walter,93, DecFace, 2Dwire frame modelWalter,93, DecFace, 2Dwire frame model Terzopoulos,95, Skin and muscle modelTerzopoulos,95, Skin and muscle model Breglar,97, Video Rewrite, Sample image basedBreglar,97, Video Rewrite, Sample image based TS Huang,98,Mesh model from range dataTS Huang,98,Mesh model from range data Poggio,98, MikeTalk, Viseme morphingPoggio,98, MikeTalk, Viseme morphing Guenter,99, Making face, 3D from multicamera Guenter,99, Making face, 3D from multicamera Zhengyou Zhang, 00, 3D face modeling from video Zhengyou Zhang, 00, 3D face modeling from video

through epipolar constraintthrough epipolar constraint Cosatto,00, Planar quads modelCosatto,00, Planar quads model

Page 5: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Some Face modelsSome Face models

Page 6: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

MotivationsMotivations

Aim: a graphics interface for conversation Aim: a graphics interface for conversation agentagent Photo-realisticPhoto-realistic Driven by ChineseDriven by Chinese Smooth connection between sentencesSmooth connection between sentences

Extended from “Video rewrite”Extended from “Video rewrite”

Page 7: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

System overview:System overview:Pipeline of the system(1)Pipeline of the system(1)

Video with Sound

Images Sound

Pose trackingPhoneme

segmentation

AnnotationLip motion Tracking

Train database

Page 8: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

System overview: System overview: Pipeline of the system(2)Pipeline of the system(2)

New text

Wav sound

TTS system

Triphone sequence

Segmentation

Synthesized triphone sequence

Train database

Lip motion sequence

Rewrite to faces

Background sequence

Page 9: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

TechniquesTechniques

Analysis:Analysis: Audio processAudio process Image processImage process

SynthesisSynthesis Lip image Lip image Background imageBackground image Stitch togetherStitch together

Page 10: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Audio part:Audio part:Sound SegmentationSound Segmentation

Given the wav file and the scriptGiven the wav file and the script

Using HMM to train the segment systemUsing HMM to train the segment system

Segment wav file to phoneme sequenceSegment wav file to phoneme sequence

Example of the segmentation result:Example of the segmentation result:SILOPEN 0 23SILOPEN 24 42s 43 61if4 62 74j 75 80ia1 81 97sh 98 109ang1 110 121y 122 130e4 131 133y 134 145in2 146 154h 155 164ang2 165 194

Page 11: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Annotation with PhonemeAnnotation with Phoneme

Using phoneme to annotate video framesUsing phoneme to annotate video frames

Each phoneme in a sentence corresponds Each phoneme in a sentence corresponds to a short time of video sequenceto a short time of video sequence

Training Sentence

Audio FramesVideo Frames Phoneme Sequence

Frames for Phoneme1 Frames for Phoneme1 Phoneme1

Frames for Phoneme2 Frames for Phoneme2 Phoneme2

… … …

Page 12: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Phoneme Distance AnalysisPhoneme Distance Analysis

Phoneme&triphone basicsPhoneme&triphone basics

Chinese Phoneme vs. English PhonemeChinese Phoneme vs. English Phoneme

Distance Metrics definitionsDistance Metrics definitions

ResultsResults

Page 13: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Phoneme BasicsPhoneme Basics

Phonemes represents the basic elements Phonemes represents the basic elements in speech. All possible speech can be in speech. All possible speech can be represented by combination of phonemes.represented by combination of phonemes.

CH, JH, S, EH, EY, OY, AE, SIL…CH, JH, S, EH, EY, OY, AE, SIL…

Triphone are three consecutive Triphone are three consecutive phonemes. It not only represents phonemes. It not only represents pronounce characteristics but also pronounce characteristics but also contains context information.contains context information.

T-IY-P, IY-P-AA, P-AA-T…T-IY-P, IY-P-AA, P-AA-T…

Page 14: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Chinese Phoneme vs. EnglishChinese Phoneme vs. English

Chinese phoneme has two basic groups: Initials Chinese phoneme has two basic groups: Initials and Finals.and Finals.

Initials: B, P, M, F, …Initials: B, P, M, F, …Finals: a3, o1, e2, eng3, iang4, ue5, …Finals: a3, o1, e2, eng3, iang4, ue5, …

Chinese finals each has 5 tones: 1,2,3,4,5.Chinese finals each has 5 tones: 1,2,3,4,5.Different tones: a1, a2, a3, a4, a5.Different tones: a1, a2, a3, a4, a5.

Chinese finals actually is not a basic elements of Chinese finals actually is not a basic elements of speech.speech.

For example: iang1, iao1, uang1, iong1…For example: iang1, iao1, uang1, iong1…

Chinese phoneme set is much larger than Chinese phoneme set is much larger than English.English.

Page 15: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Phoneme Distance AnalysisPhoneme Distance Analysis

Define the distance between any two phonemes.Define the distance between any two phonemes.

Since we only synthesis video but not sound, so Since we only synthesis video but not sound, so tone is ignoredtone is ignored

Lip shape motion is the core element for Lip shape motion is the core element for distance metrics.distance metrics.

Page 16: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Phoneme Distance AnalysisPhoneme Distance Analysis

Video 1 Video 2 Video 4

Video 1 Video 2

Video 3

Phoneme 1:

Phoneme 2:

Time Align to an uniform length

Video 2 Video 3 Video 4

Video 2Video 1

Video 1

Average the videos to get an average video

Video Average

Video Average

By comparing the two aligned average videos, we generate the distance matrix of the whole phoneme set.

Page 17: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Image part: Image part: Pose TrackingPose Tracking

Assume a plane Assume a plane model for facemodel for face

Standard Standard minimization method minimization method to find transform to find transform matrix (affine matrix (affine transform)[Black,95]transform)[Black,95]

Mask is used to Mask is used to constrain interests constrain interests part of the facepart of the face

Template Picture

Mask Image

Page 18: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Pose trackingPose tracking

Motion prediction using parameters with Motion prediction using parameters with physical meaningphysical meaning

100

0cossin

0sincos

.

100

0

0

.

100

10

01

100543

211

syk

ksx

t

t

aaa

aaa

y

x

Page 19: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Pose TrackingPose Tracking

Some tracking results:Some tracking results:

Page 20: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Lip Motion TrackingLip Motion Tracking

Using Eigen Points (Covell, 91)Using Eigen Points (Covell, 91)

Feature Points include Jaw, lip and teethFeature Points include Jaw, lip and teeth

Training database specified manuallyTraining database specified manually

Auto tracking through all pose-tracked imaAuto tracking through all pose-tracked imagesges

Page 21: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Lip motion trackingLip motion tracking

Page 22: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Lip MotionLip Motion TrackingTrackingT

rain

D

atab

ase

(ha

nd-

labe

led)

Aut

o T

rack

ing

Res

ults

Page 23: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Synthesis new sentencesSynthesis new sentences

New text converted by TTS system to wavNew text converted by TTS system to wav

Wav is segmented to phoneme sequenceWav is segmented to phoneme sequence

Using DP to find an optimal video Using DP to find an optimal video sequence from the training databasesequence from the training database

Time-align triphone videos and stitch them Time-align triphone videos and stitch them together.together.

Transform the lip sequence and paste Transform the lip sequence and paste them to background faces.them to background faces.

Page 24: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Lip sequence synthesisLip sequence synthesis

Optimal phoneme sequences

Triphone 1

Triphone 2 Triphone 5

Triphone 3

Triphone 4

Triphone 6

Triphone 7

Triphone 8 Triphone B

Triphone 9

Triphone A

Triphone C

New phoneme sequences

New phoneme sequences

Page 25: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Dynamic ProgrammingDynamic Programming

Begin

Triphone1 Triphone3Triphone2 Triphone4

End

Triphone5

Page 26: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Edge Cost DefinitionEdge Cost Definition

Two parts: Two parts: 1.1. phoneme distance: 3 phonemes’ distances added phoneme distance: 3 phonemes’ distances added

togethertogether

2.2. Lip shape distance for the overlap portion of triphone Lip shape distance for the overlap portion of triphone videovideo

Weighted add together two partWeighted add together two part

Page 27: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Background video generationBackground video generation

Background is a video sequence when the Background is a video sequence when the virtual character spoke something elsevirtual character spoke something else

Similarity measurement of backgroundSimilarity measurement of background

Select “standard frame”Select “standard frame”The frame with maximal number of frames similar The frame with maximal number of frames similar to itto it

Filter out the frames with jerkinessFilter out the frames with jerkiness

yxyx swswkwwtwtwFFD ******),( 65432121

Page 28: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.
Page 29: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Stitch the time-aligned result to Stitch the time-aligned result to background facesbackground faces

Write back with a maskWrite back with a mask

Transform the synthesized lip to the Transform the synthesized lip to the background facebackground face

Page 30: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Mask image for write-back operation

Original background frame Write-back result of the same frame

Page 31: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

More video resultsMore video results

Page 32: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

More video resultsMore video results

Page 33: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Conclusion and Future WorkConclusion and Future Work

Pose tracking and lip motion trackingPose tracking and lip motion tracking

Size of the train databaseSize of the train database

Talking face with expressionTalking face with expression

Real-time generation?Real-time generation?

Fast modeling for different personFast modeling for different person

Page 34: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Animation Animation

Page 35: Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

Thank you