Research Article A New Method of 3D Facial …downloads.hindawi.com/journals/jam/2014/706159.pdfResearch Article A New Method of 3D Facial Expression Animation ShuoSun 1 andChunbaoGe
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleA New Method of 3D Facial Expression Animation
Shuo Sun1 and Chunbao Ge2
1 Department of Mathematics Tianjin Polytechnic University Tianjin 300387 China2 Shengshi Interactive Game Beijing 10010 China
Correspondence should be addressed to Shuo Sun sunshuotjpueducn
Received 4 March 2014 Accepted 6 April 2014 Published 20 May 2014
Academic Editor Li Wei
Copyright copy 2014 S Sun and C Ge This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Animating expressive facial animation is a very challenging topic within the graphics community In this paper we introduce anovel ERI (expression ratio image) driving framework based on SVR and MPEG-4 for automatic 3D facial expression animationThrough using the method of support vector regression (SVR) the framework can learn and forecast the regression relationshipbetween the facial animation parameters (FAPs) and the parameters of expression ratio image Firstly we build a 3D face animationsystem driven by FAP Secondly through using the method of principle component analysis (PCA) we generate the parameter setsof eigen-ERI space which will rebuild reasonable expression ratio imageThen we learn a model with the support vector regressionmapping and facial animation parameters can be synthesized quickly with the parameters of eigen-ERI Finally we implement our3D face animation system driving by the result of FAP and it works effectively
1 Introduction
Facial animation is one alternative for enabling naturalhuman-computer interaction Computer facial animation hasapplications inmanyfields For example in the entertainmentindustry realistic virtual humans with facial expressions areincreasingly used In communication applications interac-tive talking faces not only make the interaction betweenusers and machines more fun but also provide a friendlyinterface and help to attract users [1 2] Among the issuesconcerning the realism of synthesized facial animationhumanlike expression is critical But how to analyze andcomprehend humanlike expression is still a very challengingtopic for the computer graphics community Facial expressionanalysis and synthesis are an active and challenging researchtopic in computer vision impacting important applications inareas such as human-computer interaction and data-drivenanimation We introduce a novel MPEG-4 based 3D facialanimation framework and the animation system driving byFAP that produced from camera videos The MPEG-4 basedframework has the advantages of currency and fewdata [3ndash5]First the system takes 2D video input and recognizes the faceareaThen the face image was transformed to ERI [6] Next asimple ERIrsquos parameterizedmethod is adopted for generatingan FAP driving model by the support vector regression and
it is a statistic model based on MPEG-4 The results of FAPscan be used to drive the 3D face model defined by MPEG-4standard
The remainder of the paper is organized as follows Wepresent the related work in Section 2 We then describe howto preprocess video data in Section 3 Section 4 presents howto construct the eigen-ERI Section 5 describes the extractionof the FAP In Section 6 we propose a novel SVR-basedFAP driving model Finally we show the experiments andconclude the paper in Sections 7 and 8
2 Related Works
Recently realistic facial animation has become one of themost important research topics of computer graphics Manyresearchers focus on the nature of the facial expression In[7 8] a sample-based method is used to make photorealisticexpression in detail In particular Chang et al [9] implementa novel framework for automatic 3D facial expression analysisin video Liu et al proposed an ERI-based method to extractthe texture depth information of the photo in [6] andFigure 2shows the ERI of the frown expression For reflecting thechanges of a face surface Tu et al [10] use the gradientsof the ratio value at each pixel in ratio images and apply it
Hindawi Publishing CorporationJournal of Applied MathematicsVolume 2014 Article ID 706159 6 pageshttpdxdoiorg1011552014706159
2 Journal of Applied Mathematics
(a) AAM (b) Mesh triangle
Figure 1 Mesh model of neutral expression
Figure 2 ERI of frown expression
to 3D model In addition a parameterized ERI method wasproposed by Tu et al [10] and Jiang et al [11]
Zhu et al [12] used a method of SVR-based facial texturedriving for realistic expression synthesis In Zhursquos work aregression model was learned between ERIrsquos parameters andFAPs According to the inputs of FAPs themodel will forecastthe parameters of ERIs Then a reasonable facial expressionimage was generated with ERIs On the contrary our methodis to forecast FAPs from the parameters of the eigen-ERIsfurthermore we realize a 3D face animation system drivingby FAPs
In this paper we realize a 3D face animation systemthat can generate realistic facial animation with realisticexpression details and can apply in different 3Dmodel similarwith human The main problems of our facial animationsystem are the extraction of the ERI from the camera videoand learning the SVR-based model furthermore building anFAP driving 3D facial expression animation system Nextsection will introduce these
3 Video Data Preprocessing
31 Video Data Capture For the robustness of the algorithmwe get video data in normal light environment and videoequipment is often used to support continuous capture PCdigital cameraThenwe set the sampling rate of 24 frames persecond and the sampling resolution of 320lowast240 and the total1000 sampling frame was captured of which 200 expressionsdefined key frame
Anger
Joy
Disgust
SadnessFear
Surprise
Figure 3 Six basic facial expressions
In this paper we adopted the method of the markedpoints in face to extract facial motion data and the mostdifference with other methods is that our method is basedon the MPEG-4 standard (Figure 1(a)) The advantage of thestandard is that you can share daters and the data can be usedwith any standards-based grid By marked points we can getmore accurate facialmotion data while blue circle chip can bepasted in any place including eyebrows and eyelids and haveno effects on their movement which makes it possible to getthe whole facial motion data and satisfy with the processingof the next step
In order to obtain experimental data first we make acoarse positioning through the blue calibration point usingface detection tools and have calibration and reduction trans-formation operations on the sample data Then we design aface mesh model in Figure 1 to describe the geometry of thefacial animation Mesh vertices were mapped automaticallyusing active appearance model (AAM) and manual finetuning to meet subsequent demand for data extraction
After obtaining the data of the texture and feature pointsall texture will be aligned to the average model
32 Features Points According to MPEG-4 we defined sixbasic expressions types (see Figure 3) and captured them fromvideo camera Twenty-four key frames demanded for trainingand one nonkey frame used for testing per expression type areextractedwith resolution of 320lowast240 pixels In Figure 1(a) the
Journal of Applied Mathematics 3
68 feature points aremarked automatically with AAM [13] Inour experiment we adopted the 32 feature points belongingto the FPs in MPEG-4
4 Computation of Eigen-ERI
41 The Computation of the ERI For building general ERIrsquosmodel we used a large of frontal and neutral expressionface as sample library And we extract the outline of theface features by the method of the active appearance model(AAM) Then the following formula was defined to computethe average facersquos ERI as the standard shape model
119865119898=
1
119873
119873
sum
119894=1
119865119894 (1)
Here 119865119898means average face and 119865
119894means any one face
119894 = 1 2 119873 and 119873 means the number of the frontal andneutral expression faces
For getting facial expression details we compute the ERIfrom all key frames sequences continuouslyThe first frame isdenoted by119865 and regarded as a neutral face image119865
119894(1 le 119894 lt
119899) denoted the rest of expressive face images samples where nis the total number of key frames Each expressive face sample119865119894will be aligned to 119865 Then the ERI of the sample sequence
can be computed as follows
119877119894(119906 V) =
119865119894(119906 V)
119865 (119906 V) (2)
Here the (119906 V) denote the coordinates of a pixel in theimage and 119865
119894and 119865 denote the color values of the pixels
respectivelyBecause the computation of ERI is each point one by one
according to [14] the results will be unpredictable if the facefeatures cannot be aligned exactly So we revise the definitionof ERIrsquos computation
119877 (119906 V) =1198651015840
(119906 V)119860V119890 (119865 (119906 V))
(3)
42 Eigen-ERI The matrix of ERI is the 2D gray image119868(119909 119910) We represent it with the119882 times 119867 dimension vector ΓThe training sets are Φ
119894| 119894 = 1 119872 the119872 means the
sum of the images The average vector of the all images is
120595 =
1
119872
119872
sum
119894=1
Γ119894 (4)
The difference value of the ERIrsquos Γ119894and the average of the
Feature face is composed of the orthogonal eigenvector ofthe covariance matrix
ERI calculation only takes the 119872 eigenvector corre-sponding to the largest eigenvalue The119872 is decided by thethreshold of the 120579
120582
119869 = min119903
119903 |
sum119903
119894=1120582119894
sum119872
119895=1120582119895gt 120579120582
(7)
In this paper we select the maximal 21 variables in the eigen-ERI space to represent 96 variation information in thesample sets
5 Extraction of FAP
51 Definition of the FAP (Facial Animation Parameter) TheFAP is defined by a set of facial animation parameters inMPEG-4 standard FAP based on the face of small actionsis very close to the facial muscle movement In fact FAPparameter represents a set of basic facial movements includ-ing the head movement control tongue eyes and lips facialexpression and lip can reproduce the most natural moveIn addition like those of humans did not exaggerated facialexpressions cartoon FAP parameter can be traced
There are six basic expressions types defined in MPEG-4(see Figure 3) The MPEG-4 has a total of 68 FAP The valueof FAP is based on FAPU (facial animation parameter unit)as the unit so that FAP has the versatility The calculationformulation is
FAP119894=
FP1015840 minus FPFAPU
(8)
Among them 119894 = 3 68 FP and FP1015840 are neutral facialfeature points on the corresponding parts
52 The Implement of Face Animation Based on FAP Facialanimation definitions table defines three parts First the FAPrange is divided into several segments Second we want toknow which grid points are controlled by the FAP in themesh Third we must know the motion factor of the controlpoints in each segment Each FAP needs to find out whichparts of the three parts in facial animation definition tablethen according to the MPEG-4 algorithm calculated by thedisplacement of the FAP control all grid points For a set ofFAPs each FAP calculated the effective grid point rod sizeby the shift-and-add up you will get a vivid facial expression(Figure 3) The concrete implement may refer to [15]
6 SVR-Based FAP Driving Model
61 Support Vector Regression (SVR) In a typical regressionproblem we are given a training set of independent andidentically distributed (119894119894119889) examples in the form of 119899ordered pairs (119909
119894 119910119894)119899
119894=1sub 119877119889
times 119877 where 119909119894and 119910
119894
denote the input and output respectively of the 119894th trainingexample Linear regression is the simplest method to solvethe regression problem where the regression function is
a linear function of the input As a nonlinear extensionsupport vector regression is a kernel method that extendslinear regression to nonlinear regression by exploiting thekernel trick [16 17] Essentially each input 119909
119894isin 119877119889 is mapped
implicitly via a nonlinear regressionmap 120601(sdot) to some kernel-induced feature space119865where linear regression is performedSpecifically SVR learns the following regression function byestimating 119908 isin 119865 and 119908
where ⟨lowast lowast⟩ denotes the inner product in 119865 The problemis solved by minimizing some empirical risk measure that isregularized appropriately to control the model capacity
One commonly used SVR model is called 120576-SVR modeland the 120576-insensitive loss function
1003816100381610038161003816119910 minus 119891(119909)
1003816100381610038161003816120576= max 0 100381610038161003816
1003816119910 minus 119891 (119909) |minus120576 (10)
is used to define an empirical risk functional which exhibitsthe same sparseness property as that for support vectorclassifiers (SVC) using the hinge loss function via the so-called support vectors If a data point 119909 lies inside theinsensitive zone called the 120576-tube that is |119910minus119891(119909)| le 120576 then itwill not incur any loss However the error parameter 120576 | 0 has
to be specified a priori by the user The primal optimizationproblem for 120576-SVR can be stated as follows
min119882120585(lowast)
120582
2
1198822
+
119899
sum
119894=1
(120585119894+ 120585lowast
119894)
119910119894minus (⟨119882 120601 (119909
119894)⟩ + 119908
0) le 120576 + 120585
119894
(⟨119882 120601 (119909119894)⟩ + 119908
0) minus 119910119894le 120576 + 120585
lowast
119894
120585lowast
119894ge 0
(11)
Relatively (3) transforms into
119891 (119909) =
119897
sum
119894=1
(minus120572119894+ 120572)119870 (119909
119894 119909) + 119887 (12)
Here 120572119894denotes Lagrange multiplier 119870(119909
119894 119909) is kernel
function and 119891(119909) is decision function For predicting it isonly a dot product operation and costs very low in the realtime Figure 4 shows the steps of the SVRM mapping ERI toFAP parameters
62 SVR-Based Algorithm According to the above theoryan FAP driving model for its every parameter is built Given
Journal of Applied Mathematics 5
(a) 3D mesh (b) Synthesis face
Figure 6 3D facial expression animation system
(a) Anger (b) Joy (c) Disgust
(d) Sadness (e) Surprise
Figure 7 Five basic 3D expressionsrsquo animation
6 Journal of Applied Mathematics
a set of training data (1199091 1199101) (1199092 1199102) (119909
119897 119910119897) sub 119883 times 119884
where 119883 denotes the space of the ERI and 1199091isin 119877 is the
ERIrsquos parameter and119884 denotes the space of the feature of FAPand 119910
1isin 119877 is an FAP We regard it as a regressive problem
and represent it by the support vector regression methodincluding 120576-SVR [16] and V-SVR [17] The offline learningprogress is as Figure 5
An offline learning process was showed in Figure 5 Inthe above section we have got the learning parameters ofthe FAPs and ERIs Then we can get a regressive model fromthe ERIrsquos parameters to the FAPs vectors through the SVRmodel In a statistical sense the results of FAPs can reflectthe expression relative with the ERI
There are several steps used to explain how the facialexpression animation is driven by video camera
Step 1 Establish the MPEG-4 based 3D facial expressionanimation system driving by FAP see Figure 6
Step 2 Capture anddetect face image fromvideo camera andcompute its ERI
Step 3 Forecast the FAPs of the current frame according toits ERI
Step 4 Compute the motion of the feature points in the facemesh based on the new FAP and animate the 3D human facemesh
7 Experiment Results
Wehave implemented all the techniques described above andbuilt an automatic 3D facial expression animation system onWindows environment The result was showed in Figure 7We represent six basic expressions driving by FAP whichcome from the forecast through ERI
8 Conclusion
In this paper we realize a 3D face animation system that cangenerate realistic facial animation with realistic expressiondetails and can apply in different 3D model similar tohuman It is capable of generating the statistical realistic facialexpression animation while only requiring simply cameradevice as the input data and it works better in any desired 3Dface model based on MPEG-4 standard
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This research was supported partly by NSFC Grant no11071279
References
[1] E Cosatto Sample-based talking-head synthesis [PhD thesis]Swiss Federal Institute of Technology 2002
[2] I S Pandzic ldquoFacial Animation Framework for the web andmobile platformsrdquo in Proceedings of the 7th International Con-ference on 3DWeb Technology (Web3D rsquo02) pp 27ndash34 February2002
[3] S Kshirsagar S Garchery and N Magnenat-Thalmann ldquoFea-ture point based mesh deformation applied to MPEG-4facial animationrdquo in Proceedings of the IFIP TC5WG510DEFORMrsquo2000 Workshop and AVATARSrsquo2000 Workshop onDeformable Avatars (DEFORM rsquo00AVATARS rsquo00) pp 24ndash34Kluwer Academic Press 2001
[4] F Parke andKWatersComputer Facial Animation A K PetersWellesley Mass USA 1996
[6] Z Liu Y Shan and Z Zhang ldquoExpressive expression mappingwith ratio imagesrdquo in Proceedings of the Computer GraphicsAnnual Conference (SIGGRAPH rsquo01) pp 271ndash276 August 2001
[7] F Pighin J Hecker D Lischinski R Szeliski and D H SalesinldquoSynthesizing realistic facial expressions from photographsrdquo inProceedings of the Annual Conference on Computer Graphics(SIGGRAPH rsquo98) pp 75ndash84 July 1998
[8] T Ezzat and T Poggio ldquoFacial analysis and synthesis usingimage-based modelsrdquo in Proceedings of the 2nd InternationalConference on Automatic Face and Gesture Recognition pp 116ndash120 October 1996
[9] Y Chang M Vieira M Turk and L Velho ldquoAutomatic 3Dfacial expression analysis in videosrdquo in Proceedings of the 2ndInternational Conference on Analysis andModelling of Faces andGestures (AMFG rsquo05) 2005
[10] P-H Tu I-C Lin J-S Yeh R-H Liang and M OuhyungldquoExpression detail for realistic facial animationrdquo in Proceedingof the Computer-Aided Design and Graphics (CAD rsquo03) pp 20ndash25 Macau China October 2003
[11] D-L Jiang W Gao Z-Q Wang and Y-Q Chen ldquoRealistic 3Dfacial animations with partial expression ratio imagerdquo ChineseJournal of Computers vol 27 no 6 pp 750ndash757 2004
[12] W Zhu Y Chen Y Sun B Yin and D Jiang ldquoSVR-basedfacial texture driving for realistic expression synthesisrdquo inProceedingsof the 3rd International Conference on Image andGraphics (ICIG rsquo04) pp 456ndash459 December 2004
[13] Y Du and X Lin ldquoEmotional facial expressionmodel buildingrdquoPattern Recognition Letters vol 24 no 16 pp 2923ndash2934 2003
[14] W U Yuan ldquoAn algorithm for parameterized expressionmap-pingrdquo Application of Computer Research Journal In press
[15] D Jiang Z Li Z Wang and W Gao ldquoAnimating 3D facialmodels with MPEG-4 facedeftablesrdquo in Proceedings of the 35thAnnual Simulation Symposium pp 395ndash400 2002
[16] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlJohn Wiley amp Sons New York NY USA 1998
[17] B Scholkopf and A Smola Learning with Kernels MIT PressCambridge Mass USA 2002
to 3D model In addition a parameterized ERI method wasproposed by Tu et al [10] and Jiang et al [11]
Zhu et al [12] used a method of SVR-based facial texturedriving for realistic expression synthesis In Zhursquos work aregression model was learned between ERIrsquos parameters andFAPs According to the inputs of FAPs themodel will forecastthe parameters of ERIs Then a reasonable facial expressionimage was generated with ERIs On the contrary our methodis to forecast FAPs from the parameters of the eigen-ERIsfurthermore we realize a 3D face animation system drivingby FAPs
In this paper we realize a 3D face animation systemthat can generate realistic facial animation with realisticexpression details and can apply in different 3Dmodel similarwith human The main problems of our facial animationsystem are the extraction of the ERI from the camera videoand learning the SVR-based model furthermore building anFAP driving 3D facial expression animation system Nextsection will introduce these
3 Video Data Preprocessing
31 Video Data Capture For the robustness of the algorithmwe get video data in normal light environment and videoequipment is often used to support continuous capture PCdigital cameraThenwe set the sampling rate of 24 frames persecond and the sampling resolution of 320lowast240 and the total1000 sampling frame was captured of which 200 expressionsdefined key frame
Anger
Joy
Disgust
SadnessFear
Surprise
Figure 3 Six basic facial expressions
In this paper we adopted the method of the markedpoints in face to extract facial motion data and the mostdifference with other methods is that our method is basedon the MPEG-4 standard (Figure 1(a)) The advantage of thestandard is that you can share daters and the data can be usedwith any standards-based grid By marked points we can getmore accurate facialmotion data while blue circle chip can bepasted in any place including eyebrows and eyelids and haveno effects on their movement which makes it possible to getthe whole facial motion data and satisfy with the processingof the next step
In order to obtain experimental data first we make acoarse positioning through the blue calibration point usingface detection tools and have calibration and reduction trans-formation operations on the sample data Then we design aface mesh model in Figure 1 to describe the geometry of thefacial animation Mesh vertices were mapped automaticallyusing active appearance model (AAM) and manual finetuning to meet subsequent demand for data extraction
After obtaining the data of the texture and feature pointsall texture will be aligned to the average model
32 Features Points According to MPEG-4 we defined sixbasic expressions types (see Figure 3) and captured them fromvideo camera Twenty-four key frames demanded for trainingand one nonkey frame used for testing per expression type areextractedwith resolution of 320lowast240 pixels In Figure 1(a) the
Journal of Applied Mathematics 3
68 feature points aremarked automatically with AAM [13] Inour experiment we adopted the 32 feature points belongingto the FPs in MPEG-4
4 Computation of Eigen-ERI
41 The Computation of the ERI For building general ERIrsquosmodel we used a large of frontal and neutral expressionface as sample library And we extract the outline of theface features by the method of the active appearance model(AAM) Then the following formula was defined to computethe average facersquos ERI as the standard shape model
119865119898=
1
119873
119873
sum
119894=1
119865119894 (1)
Here 119865119898means average face and 119865
119894means any one face
119894 = 1 2 119873 and 119873 means the number of the frontal andneutral expression faces
For getting facial expression details we compute the ERIfrom all key frames sequences continuouslyThe first frame isdenoted by119865 and regarded as a neutral face image119865
119894(1 le 119894 lt
119899) denoted the rest of expressive face images samples where nis the total number of key frames Each expressive face sample119865119894will be aligned to 119865 Then the ERI of the sample sequence
can be computed as follows
119877119894(119906 V) =
119865119894(119906 V)
119865 (119906 V) (2)
Here the (119906 V) denote the coordinates of a pixel in theimage and 119865
119894and 119865 denote the color values of the pixels
respectivelyBecause the computation of ERI is each point one by one
according to [14] the results will be unpredictable if the facefeatures cannot be aligned exactly So we revise the definitionof ERIrsquos computation
119877 (119906 V) =1198651015840
(119906 V)119860V119890 (119865 (119906 V))
(3)
42 Eigen-ERI The matrix of ERI is the 2D gray image119868(119909 119910) We represent it with the119882 times 119867 dimension vector ΓThe training sets are Φ
119894| 119894 = 1 119872 the119872 means the
sum of the images The average vector of the all images is
120595 =
1
119872
119872
sum
119894=1
Γ119894 (4)
The difference value of the ERIrsquos Γ119894and the average of the
Feature face is composed of the orthogonal eigenvector ofthe covariance matrix
ERI calculation only takes the 119872 eigenvector corre-sponding to the largest eigenvalue The119872 is decided by thethreshold of the 120579
120582
119869 = min119903
119903 |
sum119903
119894=1120582119894
sum119872
119895=1120582119895gt 120579120582
(7)
In this paper we select the maximal 21 variables in the eigen-ERI space to represent 96 variation information in thesample sets
5 Extraction of FAP
51 Definition of the FAP (Facial Animation Parameter) TheFAP is defined by a set of facial animation parameters inMPEG-4 standard FAP based on the face of small actionsis very close to the facial muscle movement In fact FAPparameter represents a set of basic facial movements includ-ing the head movement control tongue eyes and lips facialexpression and lip can reproduce the most natural moveIn addition like those of humans did not exaggerated facialexpressions cartoon FAP parameter can be traced
There are six basic expressions types defined in MPEG-4(see Figure 3) The MPEG-4 has a total of 68 FAP The valueof FAP is based on FAPU (facial animation parameter unit)as the unit so that FAP has the versatility The calculationformulation is
FAP119894=
FP1015840 minus FPFAPU
(8)
Among them 119894 = 3 68 FP and FP1015840 are neutral facialfeature points on the corresponding parts
52 The Implement of Face Animation Based on FAP Facialanimation definitions table defines three parts First the FAPrange is divided into several segments Second we want toknow which grid points are controlled by the FAP in themesh Third we must know the motion factor of the controlpoints in each segment Each FAP needs to find out whichparts of the three parts in facial animation definition tablethen according to the MPEG-4 algorithm calculated by thedisplacement of the FAP control all grid points For a set ofFAPs each FAP calculated the effective grid point rod sizeby the shift-and-add up you will get a vivid facial expression(Figure 3) The concrete implement may refer to [15]
6 SVR-Based FAP Driving Model
61 Support Vector Regression (SVR) In a typical regressionproblem we are given a training set of independent andidentically distributed (119894119894119889) examples in the form of 119899ordered pairs (119909
119894 119910119894)119899
119894=1sub 119877119889
times 119877 where 119909119894and 119910
119894
denote the input and output respectively of the 119894th trainingexample Linear regression is the simplest method to solvethe regression problem where the regression function is
a linear function of the input As a nonlinear extensionsupport vector regression is a kernel method that extendslinear regression to nonlinear regression by exploiting thekernel trick [16 17] Essentially each input 119909
119894isin 119877119889 is mapped
implicitly via a nonlinear regressionmap 120601(sdot) to some kernel-induced feature space119865where linear regression is performedSpecifically SVR learns the following regression function byestimating 119908 isin 119865 and 119908
where ⟨lowast lowast⟩ denotes the inner product in 119865 The problemis solved by minimizing some empirical risk measure that isregularized appropriately to control the model capacity
One commonly used SVR model is called 120576-SVR modeland the 120576-insensitive loss function
1003816100381610038161003816119910 minus 119891(119909)
1003816100381610038161003816120576= max 0 100381610038161003816
1003816119910 minus 119891 (119909) |minus120576 (10)
is used to define an empirical risk functional which exhibitsthe same sparseness property as that for support vectorclassifiers (SVC) using the hinge loss function via the so-called support vectors If a data point 119909 lies inside theinsensitive zone called the 120576-tube that is |119910minus119891(119909)| le 120576 then itwill not incur any loss However the error parameter 120576 | 0 has
to be specified a priori by the user The primal optimizationproblem for 120576-SVR can be stated as follows
min119882120585(lowast)
120582
2
1198822
+
119899
sum
119894=1
(120585119894+ 120585lowast
119894)
119910119894minus (⟨119882 120601 (119909
119894)⟩ + 119908
0) le 120576 + 120585
119894
(⟨119882 120601 (119909119894)⟩ + 119908
0) minus 119910119894le 120576 + 120585
lowast
119894
120585lowast
119894ge 0
(11)
Relatively (3) transforms into
119891 (119909) =
119897
sum
119894=1
(minus120572119894+ 120572)119870 (119909
119894 119909) + 119887 (12)
Here 120572119894denotes Lagrange multiplier 119870(119909
119894 119909) is kernel
function and 119891(119909) is decision function For predicting it isonly a dot product operation and costs very low in the realtime Figure 4 shows the steps of the SVRM mapping ERI toFAP parameters
62 SVR-Based Algorithm According to the above theoryan FAP driving model for its every parameter is built Given
Journal of Applied Mathematics 5
(a) 3D mesh (b) Synthesis face
Figure 6 3D facial expression animation system
(a) Anger (b) Joy (c) Disgust
(d) Sadness (e) Surprise
Figure 7 Five basic 3D expressionsrsquo animation
6 Journal of Applied Mathematics
a set of training data (1199091 1199101) (1199092 1199102) (119909
119897 119910119897) sub 119883 times 119884
where 119883 denotes the space of the ERI and 1199091isin 119877 is the
ERIrsquos parameter and119884 denotes the space of the feature of FAPand 119910
1isin 119877 is an FAP We regard it as a regressive problem
and represent it by the support vector regression methodincluding 120576-SVR [16] and V-SVR [17] The offline learningprogress is as Figure 5
An offline learning process was showed in Figure 5 Inthe above section we have got the learning parameters ofthe FAPs and ERIs Then we can get a regressive model fromthe ERIrsquos parameters to the FAPs vectors through the SVRmodel In a statistical sense the results of FAPs can reflectthe expression relative with the ERI
There are several steps used to explain how the facialexpression animation is driven by video camera
Step 1 Establish the MPEG-4 based 3D facial expressionanimation system driving by FAP see Figure 6
Step 2 Capture anddetect face image fromvideo camera andcompute its ERI
Step 3 Forecast the FAPs of the current frame according toits ERI
Step 4 Compute the motion of the feature points in the facemesh based on the new FAP and animate the 3D human facemesh
7 Experiment Results
Wehave implemented all the techniques described above andbuilt an automatic 3D facial expression animation system onWindows environment The result was showed in Figure 7We represent six basic expressions driving by FAP whichcome from the forecast through ERI
8 Conclusion
In this paper we realize a 3D face animation system that cangenerate realistic facial animation with realistic expressiondetails and can apply in different 3D model similar tohuman It is capable of generating the statistical realistic facialexpression animation while only requiring simply cameradevice as the input data and it works better in any desired 3Dface model based on MPEG-4 standard
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This research was supported partly by NSFC Grant no11071279
References
[1] E Cosatto Sample-based talking-head synthesis [PhD thesis]Swiss Federal Institute of Technology 2002
[2] I S Pandzic ldquoFacial Animation Framework for the web andmobile platformsrdquo in Proceedings of the 7th International Con-ference on 3DWeb Technology (Web3D rsquo02) pp 27ndash34 February2002
[3] S Kshirsagar S Garchery and N Magnenat-Thalmann ldquoFea-ture point based mesh deformation applied to MPEG-4facial animationrdquo in Proceedings of the IFIP TC5WG510DEFORMrsquo2000 Workshop and AVATARSrsquo2000 Workshop onDeformable Avatars (DEFORM rsquo00AVATARS rsquo00) pp 24ndash34Kluwer Academic Press 2001
[4] F Parke andKWatersComputer Facial Animation A K PetersWellesley Mass USA 1996
[6] Z Liu Y Shan and Z Zhang ldquoExpressive expression mappingwith ratio imagesrdquo in Proceedings of the Computer GraphicsAnnual Conference (SIGGRAPH rsquo01) pp 271ndash276 August 2001
[7] F Pighin J Hecker D Lischinski R Szeliski and D H SalesinldquoSynthesizing realistic facial expressions from photographsrdquo inProceedings of the Annual Conference on Computer Graphics(SIGGRAPH rsquo98) pp 75ndash84 July 1998
[8] T Ezzat and T Poggio ldquoFacial analysis and synthesis usingimage-based modelsrdquo in Proceedings of the 2nd InternationalConference on Automatic Face and Gesture Recognition pp 116ndash120 October 1996
[9] Y Chang M Vieira M Turk and L Velho ldquoAutomatic 3Dfacial expression analysis in videosrdquo in Proceedings of the 2ndInternational Conference on Analysis andModelling of Faces andGestures (AMFG rsquo05) 2005
[10] P-H Tu I-C Lin J-S Yeh R-H Liang and M OuhyungldquoExpression detail for realistic facial animationrdquo in Proceedingof the Computer-Aided Design and Graphics (CAD rsquo03) pp 20ndash25 Macau China October 2003
[11] D-L Jiang W Gao Z-Q Wang and Y-Q Chen ldquoRealistic 3Dfacial animations with partial expression ratio imagerdquo ChineseJournal of Computers vol 27 no 6 pp 750ndash757 2004
[12] W Zhu Y Chen Y Sun B Yin and D Jiang ldquoSVR-basedfacial texture driving for realistic expression synthesisrdquo inProceedingsof the 3rd International Conference on Image andGraphics (ICIG rsquo04) pp 456ndash459 December 2004
[13] Y Du and X Lin ldquoEmotional facial expressionmodel buildingrdquoPattern Recognition Letters vol 24 no 16 pp 2923ndash2934 2003
[14] W U Yuan ldquoAn algorithm for parameterized expressionmap-pingrdquo Application of Computer Research Journal In press
[15] D Jiang Z Li Z Wang and W Gao ldquoAnimating 3D facialmodels with MPEG-4 facedeftablesrdquo in Proceedings of the 35thAnnual Simulation Symposium pp 395ndash400 2002
[16] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlJohn Wiley amp Sons New York NY USA 1998
[17] B Scholkopf and A Smola Learning with Kernels MIT PressCambridge Mass USA 2002
68 feature points aremarked automatically with AAM [13] Inour experiment we adopted the 32 feature points belongingto the FPs in MPEG-4
4 Computation of Eigen-ERI
41 The Computation of the ERI For building general ERIrsquosmodel we used a large of frontal and neutral expressionface as sample library And we extract the outline of theface features by the method of the active appearance model(AAM) Then the following formula was defined to computethe average facersquos ERI as the standard shape model
119865119898=
1
119873
119873
sum
119894=1
119865119894 (1)
Here 119865119898means average face and 119865
119894means any one face
119894 = 1 2 119873 and 119873 means the number of the frontal andneutral expression faces
For getting facial expression details we compute the ERIfrom all key frames sequences continuouslyThe first frame isdenoted by119865 and regarded as a neutral face image119865
119894(1 le 119894 lt
119899) denoted the rest of expressive face images samples where nis the total number of key frames Each expressive face sample119865119894will be aligned to 119865 Then the ERI of the sample sequence
can be computed as follows
119877119894(119906 V) =
119865119894(119906 V)
119865 (119906 V) (2)
Here the (119906 V) denote the coordinates of a pixel in theimage and 119865
119894and 119865 denote the color values of the pixels
respectivelyBecause the computation of ERI is each point one by one
according to [14] the results will be unpredictable if the facefeatures cannot be aligned exactly So we revise the definitionof ERIrsquos computation
119877 (119906 V) =1198651015840
(119906 V)119860V119890 (119865 (119906 V))
(3)
42 Eigen-ERI The matrix of ERI is the 2D gray image119868(119909 119910) We represent it with the119882 times 119867 dimension vector ΓThe training sets are Φ
119894| 119894 = 1 119872 the119872 means the
sum of the images The average vector of the all images is
120595 =
1
119872
119872
sum
119894=1
Γ119894 (4)
The difference value of the ERIrsquos Γ119894and the average of the
Feature face is composed of the orthogonal eigenvector ofthe covariance matrix
ERI calculation only takes the 119872 eigenvector corre-sponding to the largest eigenvalue The119872 is decided by thethreshold of the 120579
120582
119869 = min119903
119903 |
sum119903
119894=1120582119894
sum119872
119895=1120582119895gt 120579120582
(7)
In this paper we select the maximal 21 variables in the eigen-ERI space to represent 96 variation information in thesample sets
5 Extraction of FAP
51 Definition of the FAP (Facial Animation Parameter) TheFAP is defined by a set of facial animation parameters inMPEG-4 standard FAP based on the face of small actionsis very close to the facial muscle movement In fact FAPparameter represents a set of basic facial movements includ-ing the head movement control tongue eyes and lips facialexpression and lip can reproduce the most natural moveIn addition like those of humans did not exaggerated facialexpressions cartoon FAP parameter can be traced
There are six basic expressions types defined in MPEG-4(see Figure 3) The MPEG-4 has a total of 68 FAP The valueof FAP is based on FAPU (facial animation parameter unit)as the unit so that FAP has the versatility The calculationformulation is
FAP119894=
FP1015840 minus FPFAPU
(8)
Among them 119894 = 3 68 FP and FP1015840 are neutral facialfeature points on the corresponding parts
52 The Implement of Face Animation Based on FAP Facialanimation definitions table defines three parts First the FAPrange is divided into several segments Second we want toknow which grid points are controlled by the FAP in themesh Third we must know the motion factor of the controlpoints in each segment Each FAP needs to find out whichparts of the three parts in facial animation definition tablethen according to the MPEG-4 algorithm calculated by thedisplacement of the FAP control all grid points For a set ofFAPs each FAP calculated the effective grid point rod sizeby the shift-and-add up you will get a vivid facial expression(Figure 3) The concrete implement may refer to [15]
6 SVR-Based FAP Driving Model
61 Support Vector Regression (SVR) In a typical regressionproblem we are given a training set of independent andidentically distributed (119894119894119889) examples in the form of 119899ordered pairs (119909
119894 119910119894)119899
119894=1sub 119877119889
times 119877 where 119909119894and 119910
119894
denote the input and output respectively of the 119894th trainingexample Linear regression is the simplest method to solvethe regression problem where the regression function is
a linear function of the input As a nonlinear extensionsupport vector regression is a kernel method that extendslinear regression to nonlinear regression by exploiting thekernel trick [16 17] Essentially each input 119909
119894isin 119877119889 is mapped
implicitly via a nonlinear regressionmap 120601(sdot) to some kernel-induced feature space119865where linear regression is performedSpecifically SVR learns the following regression function byestimating 119908 isin 119865 and 119908
where ⟨lowast lowast⟩ denotes the inner product in 119865 The problemis solved by minimizing some empirical risk measure that isregularized appropriately to control the model capacity
One commonly used SVR model is called 120576-SVR modeland the 120576-insensitive loss function
1003816100381610038161003816119910 minus 119891(119909)
1003816100381610038161003816120576= max 0 100381610038161003816
1003816119910 minus 119891 (119909) |minus120576 (10)
is used to define an empirical risk functional which exhibitsthe same sparseness property as that for support vectorclassifiers (SVC) using the hinge loss function via the so-called support vectors If a data point 119909 lies inside theinsensitive zone called the 120576-tube that is |119910minus119891(119909)| le 120576 then itwill not incur any loss However the error parameter 120576 | 0 has
to be specified a priori by the user The primal optimizationproblem for 120576-SVR can be stated as follows
min119882120585(lowast)
120582
2
1198822
+
119899
sum
119894=1
(120585119894+ 120585lowast
119894)
119910119894minus (⟨119882 120601 (119909
119894)⟩ + 119908
0) le 120576 + 120585
119894
(⟨119882 120601 (119909119894)⟩ + 119908
0) minus 119910119894le 120576 + 120585
lowast
119894
120585lowast
119894ge 0
(11)
Relatively (3) transforms into
119891 (119909) =
119897
sum
119894=1
(minus120572119894+ 120572)119870 (119909
119894 119909) + 119887 (12)
Here 120572119894denotes Lagrange multiplier 119870(119909
119894 119909) is kernel
function and 119891(119909) is decision function For predicting it isonly a dot product operation and costs very low in the realtime Figure 4 shows the steps of the SVRM mapping ERI toFAP parameters
62 SVR-Based Algorithm According to the above theoryan FAP driving model for its every parameter is built Given
Journal of Applied Mathematics 5
(a) 3D mesh (b) Synthesis face
Figure 6 3D facial expression animation system
(a) Anger (b) Joy (c) Disgust
(d) Sadness (e) Surprise
Figure 7 Five basic 3D expressionsrsquo animation
6 Journal of Applied Mathematics
a set of training data (1199091 1199101) (1199092 1199102) (119909
119897 119910119897) sub 119883 times 119884
where 119883 denotes the space of the ERI and 1199091isin 119877 is the
ERIrsquos parameter and119884 denotes the space of the feature of FAPand 119910
1isin 119877 is an FAP We regard it as a regressive problem
and represent it by the support vector regression methodincluding 120576-SVR [16] and V-SVR [17] The offline learningprogress is as Figure 5
An offline learning process was showed in Figure 5 Inthe above section we have got the learning parameters ofthe FAPs and ERIs Then we can get a regressive model fromthe ERIrsquos parameters to the FAPs vectors through the SVRmodel In a statistical sense the results of FAPs can reflectthe expression relative with the ERI
There are several steps used to explain how the facialexpression animation is driven by video camera
Step 1 Establish the MPEG-4 based 3D facial expressionanimation system driving by FAP see Figure 6
Step 2 Capture anddetect face image fromvideo camera andcompute its ERI
Step 3 Forecast the FAPs of the current frame according toits ERI
Step 4 Compute the motion of the feature points in the facemesh based on the new FAP and animate the 3D human facemesh
7 Experiment Results
Wehave implemented all the techniques described above andbuilt an automatic 3D facial expression animation system onWindows environment The result was showed in Figure 7We represent six basic expressions driving by FAP whichcome from the forecast through ERI
8 Conclusion
In this paper we realize a 3D face animation system that cangenerate realistic facial animation with realistic expressiondetails and can apply in different 3D model similar tohuman It is capable of generating the statistical realistic facialexpression animation while only requiring simply cameradevice as the input data and it works better in any desired 3Dface model based on MPEG-4 standard
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This research was supported partly by NSFC Grant no11071279
References
[1] E Cosatto Sample-based talking-head synthesis [PhD thesis]Swiss Federal Institute of Technology 2002
[2] I S Pandzic ldquoFacial Animation Framework for the web andmobile platformsrdquo in Proceedings of the 7th International Con-ference on 3DWeb Technology (Web3D rsquo02) pp 27ndash34 February2002
[3] S Kshirsagar S Garchery and N Magnenat-Thalmann ldquoFea-ture point based mesh deformation applied to MPEG-4facial animationrdquo in Proceedings of the IFIP TC5WG510DEFORMrsquo2000 Workshop and AVATARSrsquo2000 Workshop onDeformable Avatars (DEFORM rsquo00AVATARS rsquo00) pp 24ndash34Kluwer Academic Press 2001
[4] F Parke andKWatersComputer Facial Animation A K PetersWellesley Mass USA 1996
[6] Z Liu Y Shan and Z Zhang ldquoExpressive expression mappingwith ratio imagesrdquo in Proceedings of the Computer GraphicsAnnual Conference (SIGGRAPH rsquo01) pp 271ndash276 August 2001
[7] F Pighin J Hecker D Lischinski R Szeliski and D H SalesinldquoSynthesizing realistic facial expressions from photographsrdquo inProceedings of the Annual Conference on Computer Graphics(SIGGRAPH rsquo98) pp 75ndash84 July 1998
[8] T Ezzat and T Poggio ldquoFacial analysis and synthesis usingimage-based modelsrdquo in Proceedings of the 2nd InternationalConference on Automatic Face and Gesture Recognition pp 116ndash120 October 1996
[9] Y Chang M Vieira M Turk and L Velho ldquoAutomatic 3Dfacial expression analysis in videosrdquo in Proceedings of the 2ndInternational Conference on Analysis andModelling of Faces andGestures (AMFG rsquo05) 2005
[10] P-H Tu I-C Lin J-S Yeh R-H Liang and M OuhyungldquoExpression detail for realistic facial animationrdquo in Proceedingof the Computer-Aided Design and Graphics (CAD rsquo03) pp 20ndash25 Macau China October 2003
[11] D-L Jiang W Gao Z-Q Wang and Y-Q Chen ldquoRealistic 3Dfacial animations with partial expression ratio imagerdquo ChineseJournal of Computers vol 27 no 6 pp 750ndash757 2004
[12] W Zhu Y Chen Y Sun B Yin and D Jiang ldquoSVR-basedfacial texture driving for realistic expression synthesisrdquo inProceedingsof the 3rd International Conference on Image andGraphics (ICIG rsquo04) pp 456ndash459 December 2004
[13] Y Du and X Lin ldquoEmotional facial expressionmodel buildingrdquoPattern Recognition Letters vol 24 no 16 pp 2923ndash2934 2003
[14] W U Yuan ldquoAn algorithm for parameterized expressionmap-pingrdquo Application of Computer Research Journal In press
[15] D Jiang Z Li Z Wang and W Gao ldquoAnimating 3D facialmodels with MPEG-4 facedeftablesrdquo in Proceedings of the 35thAnnual Simulation Symposium pp 395ndash400 2002
[16] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlJohn Wiley amp Sons New York NY USA 1998
[17] B Scholkopf and A Smola Learning with Kernels MIT PressCambridge Mass USA 2002
a linear function of the input As a nonlinear extensionsupport vector regression is a kernel method that extendslinear regression to nonlinear regression by exploiting thekernel trick [16 17] Essentially each input 119909
119894isin 119877119889 is mapped
implicitly via a nonlinear regressionmap 120601(sdot) to some kernel-induced feature space119865where linear regression is performedSpecifically SVR learns the following regression function byestimating 119908 isin 119865 and 119908
where ⟨lowast lowast⟩ denotes the inner product in 119865 The problemis solved by minimizing some empirical risk measure that isregularized appropriately to control the model capacity
One commonly used SVR model is called 120576-SVR modeland the 120576-insensitive loss function
1003816100381610038161003816119910 minus 119891(119909)
1003816100381610038161003816120576= max 0 100381610038161003816
1003816119910 minus 119891 (119909) |minus120576 (10)
is used to define an empirical risk functional which exhibitsthe same sparseness property as that for support vectorclassifiers (SVC) using the hinge loss function via the so-called support vectors If a data point 119909 lies inside theinsensitive zone called the 120576-tube that is |119910minus119891(119909)| le 120576 then itwill not incur any loss However the error parameter 120576 | 0 has
to be specified a priori by the user The primal optimizationproblem for 120576-SVR can be stated as follows
min119882120585(lowast)
120582
2
1198822
+
119899
sum
119894=1
(120585119894+ 120585lowast
119894)
119910119894minus (⟨119882 120601 (119909
119894)⟩ + 119908
0) le 120576 + 120585
119894
(⟨119882 120601 (119909119894)⟩ + 119908
0) minus 119910119894le 120576 + 120585
lowast
119894
120585lowast
119894ge 0
(11)
Relatively (3) transforms into
119891 (119909) =
119897
sum
119894=1
(minus120572119894+ 120572)119870 (119909
119894 119909) + 119887 (12)
Here 120572119894denotes Lagrange multiplier 119870(119909
119894 119909) is kernel
function and 119891(119909) is decision function For predicting it isonly a dot product operation and costs very low in the realtime Figure 4 shows the steps of the SVRM mapping ERI toFAP parameters
62 SVR-Based Algorithm According to the above theoryan FAP driving model for its every parameter is built Given
Journal of Applied Mathematics 5
(a) 3D mesh (b) Synthesis face
Figure 6 3D facial expression animation system
(a) Anger (b) Joy (c) Disgust
(d) Sadness (e) Surprise
Figure 7 Five basic 3D expressionsrsquo animation
6 Journal of Applied Mathematics
a set of training data (1199091 1199101) (1199092 1199102) (119909
119897 119910119897) sub 119883 times 119884
where 119883 denotes the space of the ERI and 1199091isin 119877 is the
ERIrsquos parameter and119884 denotes the space of the feature of FAPand 119910
1isin 119877 is an FAP We regard it as a regressive problem
and represent it by the support vector regression methodincluding 120576-SVR [16] and V-SVR [17] The offline learningprogress is as Figure 5
An offline learning process was showed in Figure 5 Inthe above section we have got the learning parameters ofthe FAPs and ERIs Then we can get a regressive model fromthe ERIrsquos parameters to the FAPs vectors through the SVRmodel In a statistical sense the results of FAPs can reflectthe expression relative with the ERI
There are several steps used to explain how the facialexpression animation is driven by video camera
Step 1 Establish the MPEG-4 based 3D facial expressionanimation system driving by FAP see Figure 6
Step 2 Capture anddetect face image fromvideo camera andcompute its ERI
Step 3 Forecast the FAPs of the current frame according toits ERI
Step 4 Compute the motion of the feature points in the facemesh based on the new FAP and animate the 3D human facemesh
7 Experiment Results
Wehave implemented all the techniques described above andbuilt an automatic 3D facial expression animation system onWindows environment The result was showed in Figure 7We represent six basic expressions driving by FAP whichcome from the forecast through ERI
8 Conclusion
In this paper we realize a 3D face animation system that cangenerate realistic facial animation with realistic expressiondetails and can apply in different 3D model similar tohuman It is capable of generating the statistical realistic facialexpression animation while only requiring simply cameradevice as the input data and it works better in any desired 3Dface model based on MPEG-4 standard
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This research was supported partly by NSFC Grant no11071279
References
[1] E Cosatto Sample-based talking-head synthesis [PhD thesis]Swiss Federal Institute of Technology 2002
[2] I S Pandzic ldquoFacial Animation Framework for the web andmobile platformsrdquo in Proceedings of the 7th International Con-ference on 3DWeb Technology (Web3D rsquo02) pp 27ndash34 February2002
[3] S Kshirsagar S Garchery and N Magnenat-Thalmann ldquoFea-ture point based mesh deformation applied to MPEG-4facial animationrdquo in Proceedings of the IFIP TC5WG510DEFORMrsquo2000 Workshop and AVATARSrsquo2000 Workshop onDeformable Avatars (DEFORM rsquo00AVATARS rsquo00) pp 24ndash34Kluwer Academic Press 2001
[4] F Parke andKWatersComputer Facial Animation A K PetersWellesley Mass USA 1996
[6] Z Liu Y Shan and Z Zhang ldquoExpressive expression mappingwith ratio imagesrdquo in Proceedings of the Computer GraphicsAnnual Conference (SIGGRAPH rsquo01) pp 271ndash276 August 2001
[7] F Pighin J Hecker D Lischinski R Szeliski and D H SalesinldquoSynthesizing realistic facial expressions from photographsrdquo inProceedings of the Annual Conference on Computer Graphics(SIGGRAPH rsquo98) pp 75ndash84 July 1998
[8] T Ezzat and T Poggio ldquoFacial analysis and synthesis usingimage-based modelsrdquo in Proceedings of the 2nd InternationalConference on Automatic Face and Gesture Recognition pp 116ndash120 October 1996
[9] Y Chang M Vieira M Turk and L Velho ldquoAutomatic 3Dfacial expression analysis in videosrdquo in Proceedings of the 2ndInternational Conference on Analysis andModelling of Faces andGestures (AMFG rsquo05) 2005
[10] P-H Tu I-C Lin J-S Yeh R-H Liang and M OuhyungldquoExpression detail for realistic facial animationrdquo in Proceedingof the Computer-Aided Design and Graphics (CAD rsquo03) pp 20ndash25 Macau China October 2003
[11] D-L Jiang W Gao Z-Q Wang and Y-Q Chen ldquoRealistic 3Dfacial animations with partial expression ratio imagerdquo ChineseJournal of Computers vol 27 no 6 pp 750ndash757 2004
[12] W Zhu Y Chen Y Sun B Yin and D Jiang ldquoSVR-basedfacial texture driving for realistic expression synthesisrdquo inProceedingsof the 3rd International Conference on Image andGraphics (ICIG rsquo04) pp 456ndash459 December 2004
[13] Y Du and X Lin ldquoEmotional facial expressionmodel buildingrdquoPattern Recognition Letters vol 24 no 16 pp 2923ndash2934 2003
[14] W U Yuan ldquoAn algorithm for parameterized expressionmap-pingrdquo Application of Computer Research Journal In press
[15] D Jiang Z Li Z Wang and W Gao ldquoAnimating 3D facialmodels with MPEG-4 facedeftablesrdquo in Proceedings of the 35thAnnual Simulation Symposium pp 395ndash400 2002
[16] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlJohn Wiley amp Sons New York NY USA 1998
[17] B Scholkopf and A Smola Learning with Kernels MIT PressCambridge Mass USA 2002
a set of training data (1199091 1199101) (1199092 1199102) (119909
119897 119910119897) sub 119883 times 119884
where 119883 denotes the space of the ERI and 1199091isin 119877 is the
ERIrsquos parameter and119884 denotes the space of the feature of FAPand 119910
1isin 119877 is an FAP We regard it as a regressive problem
and represent it by the support vector regression methodincluding 120576-SVR [16] and V-SVR [17] The offline learningprogress is as Figure 5
An offline learning process was showed in Figure 5 Inthe above section we have got the learning parameters ofthe FAPs and ERIs Then we can get a regressive model fromthe ERIrsquos parameters to the FAPs vectors through the SVRmodel In a statistical sense the results of FAPs can reflectthe expression relative with the ERI
There are several steps used to explain how the facialexpression animation is driven by video camera
Step 1 Establish the MPEG-4 based 3D facial expressionanimation system driving by FAP see Figure 6
Step 2 Capture anddetect face image fromvideo camera andcompute its ERI
Step 3 Forecast the FAPs of the current frame according toits ERI
Step 4 Compute the motion of the feature points in the facemesh based on the new FAP and animate the 3D human facemesh
7 Experiment Results
Wehave implemented all the techniques described above andbuilt an automatic 3D facial expression animation system onWindows environment The result was showed in Figure 7We represent six basic expressions driving by FAP whichcome from the forecast through ERI
8 Conclusion
In this paper we realize a 3D face animation system that cangenerate realistic facial animation with realistic expressiondetails and can apply in different 3D model similar tohuman It is capable of generating the statistical realistic facialexpression animation while only requiring simply cameradevice as the input data and it works better in any desired 3Dface model based on MPEG-4 standard
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This research was supported partly by NSFC Grant no11071279
References
[1] E Cosatto Sample-based talking-head synthesis [PhD thesis]Swiss Federal Institute of Technology 2002
[2] I S Pandzic ldquoFacial Animation Framework for the web andmobile platformsrdquo in Proceedings of the 7th International Con-ference on 3DWeb Technology (Web3D rsquo02) pp 27ndash34 February2002
[3] S Kshirsagar S Garchery and N Magnenat-Thalmann ldquoFea-ture point based mesh deformation applied to MPEG-4facial animationrdquo in Proceedings of the IFIP TC5WG510DEFORMrsquo2000 Workshop and AVATARSrsquo2000 Workshop onDeformable Avatars (DEFORM rsquo00AVATARS rsquo00) pp 24ndash34Kluwer Academic Press 2001
[4] F Parke andKWatersComputer Facial Animation A K PetersWellesley Mass USA 1996
[6] Z Liu Y Shan and Z Zhang ldquoExpressive expression mappingwith ratio imagesrdquo in Proceedings of the Computer GraphicsAnnual Conference (SIGGRAPH rsquo01) pp 271ndash276 August 2001
[7] F Pighin J Hecker D Lischinski R Szeliski and D H SalesinldquoSynthesizing realistic facial expressions from photographsrdquo inProceedings of the Annual Conference on Computer Graphics(SIGGRAPH rsquo98) pp 75ndash84 July 1998
[8] T Ezzat and T Poggio ldquoFacial analysis and synthesis usingimage-based modelsrdquo in Proceedings of the 2nd InternationalConference on Automatic Face and Gesture Recognition pp 116ndash120 October 1996
[9] Y Chang M Vieira M Turk and L Velho ldquoAutomatic 3Dfacial expression analysis in videosrdquo in Proceedings of the 2ndInternational Conference on Analysis andModelling of Faces andGestures (AMFG rsquo05) 2005
[10] P-H Tu I-C Lin J-S Yeh R-H Liang and M OuhyungldquoExpression detail for realistic facial animationrdquo in Proceedingof the Computer-Aided Design and Graphics (CAD rsquo03) pp 20ndash25 Macau China October 2003
[11] D-L Jiang W Gao Z-Q Wang and Y-Q Chen ldquoRealistic 3Dfacial animations with partial expression ratio imagerdquo ChineseJournal of Computers vol 27 no 6 pp 750ndash757 2004
[12] W Zhu Y Chen Y Sun B Yin and D Jiang ldquoSVR-basedfacial texture driving for realistic expression synthesisrdquo inProceedingsof the 3rd International Conference on Image andGraphics (ICIG rsquo04) pp 456ndash459 December 2004
[13] Y Du and X Lin ldquoEmotional facial expressionmodel buildingrdquoPattern Recognition Letters vol 24 no 16 pp 2923ndash2934 2003
[14] W U Yuan ldquoAn algorithm for parameterized expressionmap-pingrdquo Application of Computer Research Journal In press
[15] D Jiang Z Li Z Wang and W Gao ldquoAnimating 3D facialmodels with MPEG-4 facedeftablesrdquo in Proceedings of the 35thAnnual Simulation Symposium pp 395ndash400 2002
[16] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlJohn Wiley amp Sons New York NY USA 1998
[17] B Scholkopf and A Smola Learning with Kernels MIT PressCambridge Mass USA 2002
a set of training data (1199091 1199101) (1199092 1199102) (119909
119897 119910119897) sub 119883 times 119884
where 119883 denotes the space of the ERI and 1199091isin 119877 is the
ERIrsquos parameter and119884 denotes the space of the feature of FAPand 119910
1isin 119877 is an FAP We regard it as a regressive problem
and represent it by the support vector regression methodincluding 120576-SVR [16] and V-SVR [17] The offline learningprogress is as Figure 5
An offline learning process was showed in Figure 5 Inthe above section we have got the learning parameters ofthe FAPs and ERIs Then we can get a regressive model fromthe ERIrsquos parameters to the FAPs vectors through the SVRmodel In a statistical sense the results of FAPs can reflectthe expression relative with the ERI
There are several steps used to explain how the facialexpression animation is driven by video camera
Step 1 Establish the MPEG-4 based 3D facial expressionanimation system driving by FAP see Figure 6
Step 2 Capture anddetect face image fromvideo camera andcompute its ERI
Step 3 Forecast the FAPs of the current frame according toits ERI
Step 4 Compute the motion of the feature points in the facemesh based on the new FAP and animate the 3D human facemesh
7 Experiment Results
Wehave implemented all the techniques described above andbuilt an automatic 3D facial expression animation system onWindows environment The result was showed in Figure 7We represent six basic expressions driving by FAP whichcome from the forecast through ERI
8 Conclusion
In this paper we realize a 3D face animation system that cangenerate realistic facial animation with realistic expressiondetails and can apply in different 3D model similar tohuman It is capable of generating the statistical realistic facialexpression animation while only requiring simply cameradevice as the input data and it works better in any desired 3Dface model based on MPEG-4 standard
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This research was supported partly by NSFC Grant no11071279
References
[1] E Cosatto Sample-based talking-head synthesis [PhD thesis]Swiss Federal Institute of Technology 2002
[2] I S Pandzic ldquoFacial Animation Framework for the web andmobile platformsrdquo in Proceedings of the 7th International Con-ference on 3DWeb Technology (Web3D rsquo02) pp 27ndash34 February2002
[3] S Kshirsagar S Garchery and N Magnenat-Thalmann ldquoFea-ture point based mesh deformation applied to MPEG-4facial animationrdquo in Proceedings of the IFIP TC5WG510DEFORMrsquo2000 Workshop and AVATARSrsquo2000 Workshop onDeformable Avatars (DEFORM rsquo00AVATARS rsquo00) pp 24ndash34Kluwer Academic Press 2001
[4] F Parke andKWatersComputer Facial Animation A K PetersWellesley Mass USA 1996
[6] Z Liu Y Shan and Z Zhang ldquoExpressive expression mappingwith ratio imagesrdquo in Proceedings of the Computer GraphicsAnnual Conference (SIGGRAPH rsquo01) pp 271ndash276 August 2001
[7] F Pighin J Hecker D Lischinski R Szeliski and D H SalesinldquoSynthesizing realistic facial expressions from photographsrdquo inProceedings of the Annual Conference on Computer Graphics(SIGGRAPH rsquo98) pp 75ndash84 July 1998
[8] T Ezzat and T Poggio ldquoFacial analysis and synthesis usingimage-based modelsrdquo in Proceedings of the 2nd InternationalConference on Automatic Face and Gesture Recognition pp 116ndash120 October 1996
[9] Y Chang M Vieira M Turk and L Velho ldquoAutomatic 3Dfacial expression analysis in videosrdquo in Proceedings of the 2ndInternational Conference on Analysis andModelling of Faces andGestures (AMFG rsquo05) 2005
[10] P-H Tu I-C Lin J-S Yeh R-H Liang and M OuhyungldquoExpression detail for realistic facial animationrdquo in Proceedingof the Computer-Aided Design and Graphics (CAD rsquo03) pp 20ndash25 Macau China October 2003
[11] D-L Jiang W Gao Z-Q Wang and Y-Q Chen ldquoRealistic 3Dfacial animations with partial expression ratio imagerdquo ChineseJournal of Computers vol 27 no 6 pp 750ndash757 2004
[12] W Zhu Y Chen Y Sun B Yin and D Jiang ldquoSVR-basedfacial texture driving for realistic expression synthesisrdquo inProceedingsof the 3rd International Conference on Image andGraphics (ICIG rsquo04) pp 456ndash459 December 2004
[13] Y Du and X Lin ldquoEmotional facial expressionmodel buildingrdquoPattern Recognition Letters vol 24 no 16 pp 2923ndash2934 2003
[14] W U Yuan ldquoAn algorithm for parameterized expressionmap-pingrdquo Application of Computer Research Journal In press
[15] D Jiang Z Li Z Wang and W Gao ldquoAnimating 3D facialmodels with MPEG-4 facedeftablesrdquo in Proceedings of the 35thAnnual Simulation Symposium pp 395ndash400 2002
[16] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlJohn Wiley amp Sons New York NY USA 1998
[17] B Scholkopf and A Smola Learning with Kernels MIT PressCambridge Mass USA 2002