-
Quo vadisFaceRecognition?
RalphGross,JianboShiRoboticsInstitute
CarnegieMellon UniversityPittsburgh,PA 15213�
rgross,jshi� @cs.cmu.edu
Jeffrey F. CohnDepartmentof Psychology
Universityof PittsburghPittsburgh,PA 15260
[email protected]
Abstract
Within the pastdecade, major advanceshaveoccurredin
facerecognition. With few exceptions,however, mostre-search
hasbeenlimited to training and testingon frontalviews. Little is
known about the extent to which facepose, illumination,
expression,occlusion, and individualdifferences,such
asthoseassociatedwith gender, influencerecognition accuracy. We
systematicallyvaried thesefac-tors to testtheperformanceof two
leading algorithms,onetemplatebasedand the other feature based.
Image dataconsistedof over 21000images from 3 publicly
availabledatabases: CMU PIE, Cohn-Kanade, and AR databases.In
general, bothalgorithmswere robustto variation in illu-mination
andexpression.Recognition accuracy washighlysensitiveto variation
in pose. For frontal training
images,performancewasattenuatedbeginningat about 15degrees.Beyond
about 30 degrees,performancebecameunaccept-able. For non-frontal
training images, fall off was moresevere. Smallbut
consistentdifferenceswere found for in-dividual differencesin
subjects.Thesefindingssuggestdi-rectionfor future research,
including designof experimentsanddatacollection.
1. Intr oduction
Is facerecognition a solvedproblem? Over the last 30years
facerecognition hasbecome oneof the beststudiedpatternrecognition
problemswith anearlyintractable num-ber of publications. Many of
the algorithms have demon-stratedexcellent recognition
results,often with error ratesof lessthan 10 percent.
Thesesuccesseshave led to thedevelopment of a number of commercial
facerecognitionsystems.Most of the current facerecognition
algorithmscanbe categorized into two classes,imagetemplatebased
or geometryfeature-based.Thetemplatebasedmethods[1]compute the
correlation betweena faceand one or moremodel templatesto
estimatethe face identity. Statisticaltools such as Support Vector
Machines (SVM) [30, 21],LinearDiscriminant Analysis(LDA) [2],
PrincipalCompo-nentAnalysis(PCA) [27, 29, 11], KernelMethods[25,
17],andNeural Networks[24, 7, 12, 16] havebeenusedto
con-structasuitablesetof facetemplates.While thesetemplatescanbe
viewedasfeatures,they mostly capture global fea-turesof the
faceimages.Facialocclusionis oftendifficultto handlein
theseapproaches.
Thegeometryfeature-basedmethodsanalyzeexplicit
lo-calfacialfeatures,andtheirgeometricrelationships.Cooteset al.
have presentedanactive shapemodelin [15] extend-ing the approachby
Yuille [34].Wiskott et al.
developedanelasticBunchgraphmatchingalgorithm for facerecog-nition
in [33]. Penev et. al [22] developedPCA into Lo-cal FeatureAnalysis
(LFA). This technique is thebasisforoneof
themostsuccessfulcommercial facerecognitionsys-tems,FaceIt.
Most facerecognition algorithms focuson frontal facialviews.
However, posechangescanoftenleadto largenon-linear variation in
facial appearancedue to self-occlusionandself-shading. To
addressthis issue,MoghaddamandPentland[20] presenteda
BayesianapproachusingPCA asa probability densityestimationtool. Li
et al. [17] havedeveloped a view-basedpiece-wiseSVM model for
facerecognition. In thefeaturebasedapproach,Cootesetal.
[5]proposeda 3D active appearancemodelto explicitly com-pute the
faceposevariation. Vetter et at. [32, 31] learna 3D
geometry-appearancemodel for faceregistrationandmatching. However,
todaythe exact trade-offs andlimita-tion of
thesealgorithmsarerelatively unknown.
To evaluatetheperformanceof thesealgorithms,Phillipset. al.
haveconductedtheFERETfacealgorithmtests[23],basedon
theFERETdatabasewhich now contains 14,126
-
imagesfrom 1,199individuals. More recently the FacialRecognition
Vendor Test[3] evaluatedcommercial systemsusingtheFERETandHumanID
databases.Thetestresultshaverevealed
thatimportantprogresshasbeenmadein facerecognition,andmany
aspectsof thefacerecognition prob-lemsarenow well
understood.However, therestill remainsa
gapbetweenthesetestingresultsandpractical userex-periencesof
commercial systems.While this gapcan,andwill,
benarrowedthroughtheimprovementsof practical
de-tailssuchassensorresolutionandview selection,wewouldlike to
understandclearly the fundamentalcapabilitiesandlimitationsof
currentfacerecognitionsystems.
In this paper, we will conduct a seriesof testsusingtwostateof
art facerecognition systemson threenewly
con-structedfacedatabasesto evaluatethe effect of
facepose,illumination, facial
expression,occlusionandsubjectgen-deron facerecognition
performance.
Thepaperis organizedasfollows. Wedescribethethreedatabaseusedin
our evaluationin Section2. In Section3 we introducethe two
algorithms we usedfor our eval-uations. The
experimentalproceduresandresultsarepre-sentedin Section4, andwe
concludein Section5.
2. Description of Databases
2.1. Overview
Table1 givesan overview of the databasesusedin
ourevaluation.
CMU PIE Cohn-Kanade AR DBSubjects 68 105 116Poses 13 1
1Illuminations 43 3 3Expressions 3 6 3Occlusion 0 0 2Sessions 1 1
2
Table 1. Overview over databases.
2.2. CMU Pose Illumination Expression (PIE)database
TheCMU PIEdatabasecontainsatotalof 41,368imagestakenfrom 68
individuals[26]. Thesubjectswereimagedin theCMU 3D Room[14] usinga
setof 13 synchronizedhigh-quality color camerasand 21 flashes. The
resultingimagesare 640x480 in size, with 24-bit color
resolution.Thecamerasandflashesaredistributedin a hemisphereinfront
of thesubjectasshown in Figure 1.
A seriesof imagesof asubjectacrossthedifferentposesis shown in
Figure2. Eachsubjectwasrecordedunder 4conditions:
1. expression: thesubjectswereaskedtodisplayaneutralface,to
smile,andto closetheireyesin order to simu-lateablink. Theimagesof
all 13camerasareavailablein thedatabase.
2. illumination 1: 21 flashesare individually turnedonin a rapid
sequence. In the first setting the imageswerecaptured with the room
lights on. Eachcamerarecorded 24 images,2 with no flashes,21 with
oneflashfiring andthenafinal imagewith noflashes.Onlytheoutput of
threecameras (frontal, three-quarterandprofileview) waskept.
3. illumination 2: the procedurefor the illumination
1wasrepeatedwith the room lights off. Theoutputofall 13
cameraswasretainedin thedatabase.Combin-ing thetwo illumination
settings,atotalof 43differentillumination
conditionswererecorded.
4. talking: subjectscounted startingat 1. 2 seconds(60frames)of
themtalkingwererecordedusing3 camerasasabove(again frontal,
three-quarterandprofile view).
Figure3 shows examplesfor illumination conditions 1 and2.
2.3. Cohn-Kanade AU-Coded Facial ExpressionDatabase
This is apublicly availabledatabasefrom CarnegieMel-lon
University [13]. It containsimagesequencesof facialexpressionfrom
menandwomenof varying ethnicback-grounds.Thecameraorientation is
frontal. Smallheadmo-tion is present.Imagesizeis 640by 480pixels
with 8-bitgray scaleresolution. There are threevariations in
light-ing: ambient lighting, single-high-intensity
lamp,anddualhigh-intensity lampswith reflective umbrellas. Facial
ex-pressionsarecoded usingtheFacialAction CodingSystem[8]
andalsoassignedemotion-specifiedlabels.For thecur-rent study, we
selected714 imageimagesequencesfrom105 subjects. Emotion
expressionsincluded happy, sur-prise,anger, disgust,fear,
andsadness.Examples for thedifferentexpressions areshown in
Figure4.
2.4. AR FaceDatabase
Thepublicly availableAR databasewascollectedat theComputer
Vision Centerin Barcelona[19]. It containsim-agesof 116 individuals
(63 malesand 53 females). Theimagesare768x576pixelsin sizewith
24-bit color resolu-tion.
Thesubjectswererecordedtwiceata2-weekinterval.
-
(a)
βα
cameraFlash
(b)−1.5 −1 −0.5 0 0.5 1 1.5
−0.3
−0.1
0.1
0.3
b
a
25
5
7
9
111422
22729
31
34 37
Figure 1.
PIEdatabasecamerapositions.(a)13synchro-nizedvideocamerascapturefaceimagesfrom
multiple an-gles,21 controlledflashunitsareevenly
distributedaroundthecameras.(b) A plot of theazimuth( � )
andaltitude( � )anglesof the cameras,alongwith the cameraID
number.9 of the 13 camerassamplea half circle at roughly
headheightrangingfrom afull left to a full right profileview(+/-60
degrees);2 cameraswereplacedabove andbelow thecentralcamera;and2
cameraswerepositionedin the cor-nersof theroom.
During eachsession13 conditions with varying facial
ex-pressions,illuminationandocclusionwerecaptured. Figure5
showsanexample for eachcondition.
3. FaceRecognitionAlgorithms
3.1. MIT , Bayesian Eigenface
Moghaddamet. al. generalize thePrincipalComponentAnalysis (PCA)
approachof Sirovich andKirby [28] andTurk andPentland[29] by
examining the probability dis-tribution of intra-personal
variations in appearanceof thesameindividualandextra-personal
variationsin appearancedueto differencein identity. Thisalgorithm
performedcon-sistentlynearthetop in the1996FERRETtest[23].
Given two faceimages,������� , let ����������� be
theimageintensitydifferencebetweenthem,we would like to
c25 c25
c22 c02 c37 c05 c27
c07
Figure 2. Posevariationin thePIEdatabase.8 of 13cam-eraviews
areshown here. The remaining5 cameraposesaresymmetricalto theright
sideof camerac27.
Figure 3. Illumination variationin thePIEdatabase.Theimagesin
thefirst row show facesrecordedwith roomlightson, theimagesin
thesecondrow show facescapturedwithonly flashillumination.
estimatetheposteriorprobability of �������� ��� , where ���
istheintra-personalvariation of subject� . According to
Bayesrule,we canrewrite it as:
����� � � ��� �����!� � � �"����� � ������!�
�#�$�%���&�#�&�(')���&�*� �#+,�"�����#+,� � (1)where�-+
is theextra-personal variation of all thesubjects.To estimatethe
probability densitydistributions �����!� ���$�and �����!� �-+.� ,
PCA is usedto derived a low (M) dimen-sion approximation of the
measured feature space � /021
( 3465��%798;:
-
Figure 4. Cohn-Kanade AU-CodedFacial
Expressiondatabase.Examplesof emotion-specified
expressionsfromimagesequences.
this is a mostly templatebasedclassificationalgorithm, al-though
somelocal featuresareimplicitly encoded throughthe“Eigen”
intra-personalandextra-personalimages.
3.2. Visionics,FaceIt
FaceIt’s recognition module is basedon Local
FeatureAnalysis(LFA) [22]. This technique addressestwo
majorproblemsof Principal ComponentAnalysis. The applica-tion of
PCA to a setof imagesyields a global representa-tion of the
imagefeaturesthat is not robust to variabilitydueto
localizedchangesin theinput [10]. FurthermorethePCA
representationis nontopographic,sonearbyvaluesinthe
featurerepresentationdo not necessarilycorrespondtonearby valuesin
theinput. LFA overcomestheseproblemsby using localized
imagefeaturesin form of multi-scalefilters. The featureimagesare
then encoded using PCAto obtaina compact description. According to
Visionics,FaceItis robustagainstvariations in lighting, skin
tone,eyeglasses,facialexpressionandhair style. They
furthermoreclaim to be able to handle posevariations of up to 35
de-grees in all directions. We
systematicallyevaluatedtheseclaims.
4. Evaluation
Following Phillips et. al. [23] we distinguish betweengallery
andprobeimages.Thegallerycontains theimagesusedduring training of
the algorithm. The algorithms aretestedwith theimagesin
theprobesets.All resultsreportedherearebasedon
non-overlappinggallery andprobesets(with theexceptionof
thePIEposetest).We usethecloseduniverse model for evaluating the
performance,meaningthatevery individual in theprobe setis
alsopresentin thegallery. Thealgorithmswerenotgivenany
furtherinforma-tion, sowe only evaluatethe facerecognition, not the
face
01
02 03 04
05 06 07
08 09 10
11 12 13
Figure 5. AR database. The conditions are: (1) neu-tral, (2)
smile, (3) anger, (4) scream,(5) left light on, (6)right light on,
(7) both lights on, (8) sun glasses,(9) sunglasses/leftlight
(10)sunglasses/rightlight, (11)scarf,(12)scarf/leftlight, (13)
scarf/rightlight
verification performance.
4.1. Face localization and registration
Facerecognition is a two stepprocessconsistingof
facedetectionandrecognition.First,thefacehastobelocatedintheimageandregisteredagainst
aninternalmodel. There-sult of this stageis a normalized
representationof theface,which therecognition algorithm
canbeappliedto. In orderto ensurethevalidity of ourfindingsin
termsof facerecog-nition accuracy, we provided both algorithms with
correctlocations of the left andright eyes. This is done by
apply-ing FaceIt’s facefinding module with a
subsequentmanualverification of the results. If the initial
facepositionwasincorrect, the locationof the left andright eye
wasmarkedmanually andthe facefinding module is rerun on the
im-age.Thefacedetectionmodule becamemore likely to
failasdeparturefrom thefrontalview increased.
4.2. Pose
Using the CMU PIE databasewe arein the unique po-sition to
evaluatetheperformanceof facerecognition algo-rithms with respectto
posevariations in great detail. Weexhaustively sampledthe
posespaceby using eachview
-
in turn asgallery with the remaining views asprobes. Asthereis
only a singleimagepersubjectandcamera view inthe database, the
gallery imagesareincluded in the probeset. Table2 shows
thecompleteposeconfusionmatrix forFaceIt.Of particularinterestis
thequestionhow far theal-gorithm cangeneralizefrom
givengalleryviews.
Two thingsareworth noting. First, FaceIthasa
reason-ablegeneralizability for frontal gallery
images:therecogni-tion ratedropsto the70%-80%rangefor 45degreeof
headrotation (corresponds to camerapositions 11and37 in Fig-ure 1
). Figure6 shows the recognition accuraciesof
thedifferentcameraviews for a mugshot gallery view.
00.25
0.50.75
1
Gallery pose: 27
0.030.93
0.93
0.94
0.750.13
0.03
0.06
1.000.93
0.07
0.03
0.62
Figure 6. Recognition accuraciesof all camerasfor themugshot
galleryimage.Therecognitionratesareplottedontheposepositionsshown
in figure1(b). Thedarker color inthe lower portion of
thegraphindicateshigherrecognitionrate.Thesquarebox
marksthegalleryview.
Second,for most non-frontal views (outsideof the 40degreerange),
facegeneralizability goesdown drastically,even for very closeby
views. This canbe seenin Figure7. Heretherecognition ratesareshown
for the two profileviewsasgalleryimages.Thefull setof
performancegraphsfor all 13gallery views is shown in appendix
A.
We thenaskedthequestion, if we cangainmore by in-cluding
multiple faceposesin the gallery set? Intuitively,given
multiplefaceposes,with correspondencebetweenthefacial
features,onecanhave a betterchance of predicting
00.25
0.50.75
1
Gallery pose: 34
0.000.01
0.00
0.03
0.000.01
0.01
0.01
0.040.00
0.03
1.00
0.03
00.25
0.50.75
1
Gallery pose: 22
0.000.03
0.03
0.03
0.010.01
1.00
0.03
0.030.01
0.01
0.00
0.04
Figure 7. Recognitionaccuracies of all camerasfor thetwo profile
posesasgallery images(cameras34 and22 in1b).
-
a-66 -47 -46 -32 -17 0 0 0 16 31 44 44 62b 3 13 2 2 2 15 2 1.9 2
2 2 13 3
ProbePose c34 c31 c14 c11 c29 c09 c27 c07 c05 c37 c25 c02
c22Gallery Pose
c34 1.00 0.03 0.01 0.00 0.00 0.03 0.04 0.00 0.01 0.03 0.01 0.00
0.01c31 0.01 1.00 0.12 0.16 0.15 0.09 0.04 0.06 0.04 0.03 0.06 0.00
0.01c14 0.04 0.16 1.00 0.28 0.26 0.16 0.19 0.10 0.16 0.04 0.03 0.03
0.01c11 0.00 0.15 0.29 1.00 0.78 0.63 0.73 0.50 0.57 0.40 0.09 0.01
0.03c29 0.00 0.13 0.22 0.87 1.00 0.75 0.91 0.73 0.68 0.44 0.03 0.01
0.03c09 0.03 0.01 0.09 0.68 0.79 1.00 0.95 0.62 0.87 0.57 0.09 0.01
0.01c27 0.03 0.07 0.13 0.75 0.93 0.94 1.00 0.93 0.93 0.62 0.06 0.03
0.03c07 0.01 0.07 0.12 0.38 0.70 0.57 0.87 1.00 0.73 0.35 0.03 0.03
0.00c05 0.01 0.03 0.13 0.54 0.65 0.75 0.91 0.75 1.00 0.66 0.09 0.01
0.03c37 0.00 0.03 0.04 0.37 0.35 0.43 0.53 0.23 0.60 1.00 0.10 0.04
0.00c25 0.00 0.01 0.01 0.06 0.04 0.07 0.04 0.03 0.06 0.07 0.98 0.04
0.04c02 0.00 0.01 0.03 0.03 0.01 0.01 0.01 0.04 0.01 0.01 0.04 1.00
0.03c22 0.00 0.01 0.01 0.01 0.01 0.03 0.03 0.03 0.03 0.04 0.03 0.00
1.00
Table 2. Confusiontablefor posevariation.Eachrow of
theconfusiontableshows therecognitionrateoneachof
theprobeposesgivena particulargallerypose.Thecamerapose,indicatedby
its azimuth(� ) andaltitude(� ) anglewasshown in Figure1.
novel faceposes.This testis carriedout by takingtwo setsof
faceposes,cd7d7;�eIdf\�hgif\j , and c�8ikl�eIdf\�eI�mnj
asgalleryandtestonall otherposes.Theresultsarepresentedin
Table3.
ProbePose 02 05 07 09 11 14 22 25 27 29 31 34 37Gallery Pose
11-27-37 0.01 0.99 0.91 0.93 1.0 0.35 0.01 0.1 1.0 0.91 0.19 0.0
1.005-27-29 0.01 1.0 0.90 0.91 0.88 0.24 0.01 0.1 1.0 1.0 0.12 0.01
0.66
Table 3. Posevariation.Recognition ratesfor FaceItwithmultiple
posesin thegalleryset.
An analysisof the results,shown in Figure8, indicatesthat with
this algorithm, no additional gain is achievedthroughmultiple
facegallery poses.This suggeststhat3Dfacerecognition
approachescouldhave an advantageovernaive integrationof multiple
faceposes,suchasin thepro-posed2DstatisticalSVM or
relatednon-linearKernelmeth-ods.
Weconductedthesamesetof experimentsusingtheMITalgorithm. The
resultsaremuchworsethanFaceIt’s evenwith manual identificationof
theeyes.We suspectthatthismight bedueto theextremesensitivity of
theMIT algorithmto faceregistrationerrors.
4.3. Illumination
For this test, the PIE andAR databasesare used.
Wefoundthatbothalgorithmsperformedsignificantlybetterontheillumination
images thanunderthevarious posecondi-tions. Table4 shows the
recognition accuraciesof FaceItandtheMIT algorithm on
bothdatabases.As described insection2.2thePIE databasecontainstwo
illuminationsets.
Theimages in setillumination 1 weretakenwith theroomlights on.
The mugshot gallery imagesdid not have flashillumination. For the
illumination 2 setof imagestheroomlight wasswitchedoff.
Theillumination for thegallery im-ageswasprovided by a
flashdirectly oppositeof the sub-ject’s face. In eachcasetheprobe
setwasmadeup by theremaining
flashimages(21and20imagesrespectively). Ascanbe expected, the
algorithms perform betterin the firsttest.
IlluminationPIE 1 PIE2 AR 05 AR 06 AR 07
FaceIt 0.97 0.91 0.95 0.93 0.86MIT 0.94 0.72 0.77 0.74 0.72
Table 4. Illumination results. PIE 1 and 2 refer tothe two
illumination conditionsdescribedin section2.2.oAR05,AR06,AR07p
arethe o left,right,bothp light oncon-
ditionsin theAR databaseasshown in Figure5
The resultson the PIE databaseareconsistentwith theoutcomeof
theexperimentson theAR database. Heretheimages5 through7 dealwith
differentillumination condi-tions,varying the lighting from the
left- andright sidestobothlights on.
While theseresultsleadoneto
concludethatfacerecog-nitionunderilluminationisasolvedproblem,wewouldliketo
caution that the illumination change could still causeamajor
problem when it is coupled otherchanges
(expres-sion,pose,etc.).
-
0
0.25
0.5
0.75
1
Gallery pose: 11
0
0.25
0.5
0.75
1
Gallery pose: 27
0
0.25
0.5
0.75
1
Gallery pose: 37
0
0.25
0.5
0.75
1
Gallery pose: 11 27 37
0.010.57
0.50
0.63
1.000.29
0.03
0.09
0.740.78
0.15
0.00
0.40 0.030.93
0.93
0.94
0.750.13
0.03
0.06
1.000.93
0.07
0.03
0.62
0.040.60
0.24
0.43
0.370.04
0.00
0.10
0.530.35
0.03
0.00
1.00 0.010.99
0.91
0.93
1.000.35
0.01
0.10
1.000.91
0.19
0.00
1.00
a1 b1
a2 b2
c1
c2
d1
d2
Figure 8. Multiple posesgallery vs. multiple singleposegallery.
Subplotsa1 anda2 show the recognition ratewith gallerypose11. With
a singlegallery image,the algorithmis ableto generalizeto
nearbyposes.Subplotsb
o1,2p andc o 1,2p show the
recognition ratesfor poses27 and37. Subplotsdo1,2p show the
recognition rateswith gallery poseof o 11,27,37p . Onecansee
do1,2p is the sameas taking the maximumvaluesin a-co 1,2p . In
this caseno additionalgain is achieved by using the joint seto
11,27,37p asthegalleryposes.
-
4.4. Expression
Facesundergo large deformationsunderfacial
expres-sions.Humanscaneasilyhandlethis variation, but we
ex-pectedthealgorithmsto haveproblemswith theexpressiondatabases.To
our surpriseFaceItandMIT performedverywell ontheCohn-KanadeandtheAR
database.In
eachtestweusedtheneutralexpressionasgalleryimageandprobedthealgorithm
with thepeakexpression.
ExpressionCohn-Kanade AR 02 AR 03 AR 04
FaceIt 0.97 0.96 0.93 0.78MIT 0.94 0.72 0.67 0.41
Table 5. Expressionresults. AR 02, AR 03 andAR 04referto
theexpressionchangesin theAR databaseasshownin figure5. Both
algorithmsperformreasonably well underfacialexpression, however
the“scream”expression, AR 04,produceslargerecognition errors.
Table5 shows theresultsof bothalgorithmson the
twodatabases.Thenotable exception is thescream(AR04) setof theAR
database.
For most facial expressions,the facial deformation iscentered
around the lower part of the face. This mightleave sufficient
invariant information in the upperfaceforrecognition, which
resultsin a high recognition rate.
Theexpression“scream”haseffectson both theupper andthelower
faceappearance,which leadsto a significantfall offin the
recognition rate. This indicatesthat 1) facerecog-nition under
extreme facialexpressionstill remainsanun-solved problem, and2)
temporal informationcanprovidesignificant additional informationin
facerecognitionunderexpression.
4.5. Occlusion
For theocclusion testswe look at
imageswherepartsofthefaceareinvisible for thecamera.TheAR
databasepro-videstwo
scenarios:subjectswearingsunglassesandsub-jectswearinga scarfaround
the lower portion of the face.The recognition ratesfor the
sunglassimagesareaccord-ing to expectations.As Table6 shows,
FaceItis unable tohandle this variation (AR08). The result further
deterio-rateswhentheleft or right light is switchedon (AR09
andAR10).Thisresultis readilyreplicatedontheimagesof
thesecondsession.
This testreveals thatFaceItis morevulnerableto
upperfaceocclusionthan the MIT algorithm. Facial
occlusion,particularly upper faceocclusion,remainsa difficult
prob-lemyet to besolved. Interestingopenquestionsare1) what
OcclusionAR 08 AR 09 AR 10 AR 11 AR 12 AR13
FaceIt 0.10 0.08 0.06 0.81 0.73 0.71MIT 0.34 0.35 0.28 0.46 0.43
0.40
Table 6. Occlusionresults.AR08, AR09, AR10 refer tothe
upperfacial occlusions,andAR11, AR12, AR13 referto the lower
facialocclusions asshown in figure 5. Upperfacialocclusioncausesa
majordropin recognition rates.
arethefundamentallimits of any recognition systemundervarious
occlusions, and2) to what extend canotheraddi-tional facial
information,suchasmotion, provide the nec-essaryhelpfor
facerecognitionunderocclusion.
4.6. Gender
Male andfemalefacesdiffer in both local featuresandin shape[4].
Men’s faceson average have thicker eye-brows andgreatertexture in
thebeardregion. In women’sfaces,the distancebetweenthe eyesandbrows
is greater,the protuberanceof the nosesmaller, and the chin
nar-rower than in men [4]. Peoplereadily distinguishmalefrom
femalefacesusing theseandotherdifferences(e.g.,hair
style),andconnectionist modeling hasyieldedsimilarresults[6, 18].
Little is known, however, about the sen-sitivity of
faceidentificationalgorithms to differencesbe-tweenmen’s andwomen’s
faces.The relative proportionsof menandwomenin trainingsamplesis
seldomreported,andidentification resultstypically fail to
reportwhether al-gorithmsaremoreor lessaccuratefor onesex or
theother.Otherfactors thatmayinfluenceidentification,
suchasdif-ferences in face shapebetweenindividuals of
European,Asian,andAfrican ancestry[4, 9], have similarly
beenig-nored in pastresearch.
We evaluatedthe influence of genderon facerecogni-tion
algorithmsontheAR databasedueto its
balancedratiobetweenthefemaleandmalesubjects.Figure9 shows
therecognition rateachievedby FaceItacrossthe13variationsincluding
illumination, occlusion, andexpression.
The resultsreveal a surprising trend:
betterrecognitionratesareconsistentlyachieved for
femalesubjects.Aver-agedacrossthe conditions (excluding the
testsAR08-10whereFaceIt breaks down) the recognition rate for
malesubjectsis 83.4%,while therecognition ratefor femalesub-jectsis
91.66%. It is not clearwhathascausedthis effect.To further validate
this result, a much larger databaseisneeded.
If this result is further substantiated,it opens up
manyinterestingquestionson facerecognition. In
particularitraisesthequestions:1)whatmakesonefaceeasierto
recog-
-
nizethananother, and2) aretherefaceclasseswith
similarrecognizability.
01 02 03 04 05 06 07 08 09 10 11 12 130
0.5
1ARDB Male/Female
Figure 9. ARDB resultsmalevs. female. The dashedline
indicatesthe recognition ratefor malesubjectsin theAR databaseshown
in figure5.
5. Discussion
In natural environments, pose,illumination, expression,occlusion
and individual difference among people repre-sentcritical
challengesto facerecognition algorithms. TheFERETtests[23] andthe
Facial Recognition Vendor Test2000 [3] provided initial resultson
limited variations ofthesefactors.
FaceItandtheMIT algorithmwereoverall thebestper-formerin
thesetests.Weevaluatedbothalgorithmsonmul-tiple
independentdatabasesthat systematicallyvary pose,illumination,
expression,occlusion,andgender. We found:
1. Pose:Posevariation still presentsa challengefor
facerecognition. Frontal trainingimageshavebettergener-alizability
to novel views thando non-frontal trainingimages. For a frontal
training view, we canachievereasonable recognition ratesof 70-80%
for up to 45degreeheadrotation.In addition, usingmultiple
train-ingviewsdoesnotnecessarilyimprovetherecognitionrate.
2. Illumination: Pureillumination changes on the faceare handled
well by current face recognition algo-rithms.
3. Expression: With the exception of extreme
expres-sionssuchasscream,thealgorithmsarerelatively ro-bust to
facial expression. Deformation of the mouth
andocclusionof theeyesby eye narrowing andclos-ing present a
problemto thealgorithms.
4. Occlusion: The performance of the
facerecognitionalgorithmsunder occlusionvaries.FaceItis more
sen-sitive to upper faceocclusionthanMIT. FaceItis morerobust to
lower faceocclusion.
5. Gender:We foundsurprisingly consistentdifferencesof
facerecognition ratesacrossthegender. This resultis basedon
testingon the AR databasewhich has70maleand60 femalesubjects.On
average the recog-nition ratefor femalesis consistentlyabout
5%higherthanfor males,acrossa range of perturbation. Whilethe
databaseusedin thesetestsis too small to drawgeneral conclusionsit
pointsinto aninterestingdirec-tion for
futureresearchanddatabasecollections.
The current study has several limitations. One, wedid not
examine the effect of faceimagesize on al-gorithm performancein
thevariousconditions. Mini-mum sizethresholds may well differ for
variousper-mutations, which would be important to determine.Two,
the influence of racial or ethnicdifferencesonalgorithm
performancecouldnot be examined duetothe homogeneityof racial
andethnicbackgrounds inthedatabases.While largedatabaseswith
ethnicvaria-tion areavailable,they lack theparametricvariation
inlighting, shape,poseandotherfactorsthatwerethefo-cusof this
investigation.Three,faceschange dramat-ically with development,but
the influence of changewith developmentonalgorithm
performancecouldnotbe examined. Fourth, while we were able to
exam-ine the combined effects of somefactors,
databasesareneededthatsupport examinationof all ecologicallyvalid
combinations,which may be non-additive. Theresultsof thecurrent
studysuggest thatgreateratten-tion be paid to the multiple sources
of variation thatarelikely to affect facerecognition in
naturalenviron-ments.
6. Acknowledgement
This researchis supported by ONR N00014-00-1-0915(DARPA HumanID)
andNSFIRI-9817496.
A. Appendix
The following figuresshow the recognition ratesfor
allcameraviews in turn asgallery images.They areorderedaccording to
thecameranumbersroughly going from left toright in Figure1.
-
00.25
0.50.75
1
Gallery pose: 34
0.000.01
0.00
0.03
0.000.01
0.01
0.01
0.040.00
0.03
1.00
0.03
Figure 10. Recognition accuraciesof all camerasforcameraview 34
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 31
0.000.04
0.06
0.09
0.160.12
0.01
0.06
0.040.15
1.00
0.01
0.03
Figure 11. Recognition accuraciesof all camerasforcameraview 31
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 14
0.030.16
0.10
0.16
0.281.00
0.01
0.03
0.190.26
0.16
0.04
0.04
Figure 12. Recognitionaccuraciesof all camerasforcameraview 14
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 11
0.010.57
0.50
0.63
1.000.29
0.03
0.09
0.740.78
0.15
0.00
0.40
Figure 13. Recognitionaccuraciesof all camerasforcameraview 11
asgalleryimage.
-
00.25
0.50.75
1
Gallery pose: 29
0.010.68
0.74
0.75
0.870.22
0.03
0.03
0.911.00
0.13
0.00
0.44
Figure 14. Recognition accuraciesof all camerasforcameraview 29
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 07
0.030.74
1.00
0.57
0.380.12
0.00
0.03
0.870.71
0.07
0.01
0.35
Figure 15. Recognition accuraciesof all camerasforcameraview 07
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 27
0.030.93
0.93
0.94
0.750.13
0.03
0.06
1.000.93
0.07
0.03
0.62
Figure 16. Recognitionaccuraciesof all camerasforcameraview 27
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 09
0.010.87
0.62
1.00
0.680.09
0.01
0.09
0.960.79
0.01
0.03
0.57
Figure 17. Recognitionaccuraciesof all camerasforcameraview 09
asgalleryimage.
-
00.25
0.50.75
1
Gallery pose: 05
0.011.00
0.75
0.75
0.540.13
0.03
0.09
0.910.65
0.03
0.01
0.66
Figure 18. Recognition accuraciesof all camerasforcameraview 05
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 37
0.040.60
0.24
0.43
0.370.04
0.00
0.10
0.530.35
0.03
0.00
1.00
Figure 19. Recognition accuraciesof all camerasforcameraview 37
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 02
1.000.01
0.04
0.01
0.030.03
0.03
0.04
0.010.01
0.01
0.00
0.01
Figure 20. Recognitionaccuraciesof all camerasforcameraview 02
asgalleryimage.
00.25
0.50.75
1
Gallery pose: 25
0.040.06
0.03
0.07
0.060.01
0.04
0.99
0.040.04
0.01
0.00
0.07
Figure 21. Recognitionaccuraciesof all camerasforcameraview 25
asgalleryimage.
-
00.25
0.50.75
1
Gallery pose: 22
0.000.03
0.03
0.03
0.010.01
1.00
0.03
0.030.01
0.01
0.00
0.04
Figure 22. Recognition accuraciesof all camerasforcameraview 22
asgalleryimage.
References
[1] RobertJ. Baron. Mechanismsof humanfacialrecog-nition.
International Journal ofMan-MachineStudies,15(2):137–178,1981.
[2] P. N. Belhumeur, J.P. Hespanha, andD.
J.Kriegman.Eigenfacesvs. fisherfaces: Recognition using
classspecificlinearprojection. IEEE Transactions on Pat-tern
Analysisand Machine Intelligence, 19(7):711–720,July 1997.
[3] D.M. Blackburn, M. Bone, and P.J. Philips. Facialrecognition
vendor test2000:evaluationreport, 2000.
[4] V. BruceandA. Young. In theeyeof thebeholder: Thescienceof
faceperception. Oxford University Press,1998.
[5] T. Cootes,K. Walker, andC. Taylor. View-basedac-tive
appearancemodels. In IEEE International Con-ferenceon
AutomaticFaceandGesture Recognition,March2000.
[6] G.W. CottrellandJ.Metcalfe.Empath:Face,emotion,and gender
recognition usingholons. In R.P. Lipp-mann,J.E.Moody,
andD.S.Touretzky, editors,Neu-ral informationprocessingsystems,
volume 3, pages53–60, SanMateo,CA, 1991. MorganKaufmann.
[7] M. N. Dailey andGarrisonW. Cottrell. Organizationof
faceandobjectrecognition in modular neuralnet-work models. Neural
Networks, 12(7/8),1999.
[8] P. EkmanandW.V. Friesen.Facial ActionCodingSys-tem.
ConsultingPsychologist Press,1978.
[9] L.G. Farkasand I.R. Munro. Anthropometric facialproportions
in medicine. C.C. Thomas,Springfield,IL, 1987.
[10] R. Gross,J. Yang,andA. Waibel. Facedetectionina
meetingroom. In Proceedings of the Fourth IEEEInternational
ConferenceonAutomaticFaceandGes-tureRecognition,
Grenoble,France,2000.
[11] Peter J. B. Hancock, A. Mike Burton, and VickiBruce.
Faceprocessing:Humanperception andprin-cipal componentsanalysis.
Memoryand Cognition,24(1):26–40,1996.
[12] A. JonathanHowell andHilary Buxton. Invarianceinradial
basisfunction neural networks in humanfaceclassification.Neural
ProcessingLetters, 2(3):26–30,1995.
[13] T. Kanade,J.F. Cohn, andY. Tian. Comprehensivedatabasefor
facial expressionanalysis. In Proceed-ingsof theFourth IEEE
International ConferenceonAutomatic Faceand Gesture Recognition,
pages46–53,Grenoble, France,2000.
[14] T. Kanade,H. Saito,andS.Vedula.The3droom: Dig-itizing
time-varying 3d eventsby synchronizedmul-tiple video streams.
TechnicalReportCMU-RI-TR-98-34,RoboticsInstitute,CarnegieMellonUniversity,Pittsburgh,
PA, December1998.
[15] Andreas Lanitis, Christopher J. Taylor, and Timo-thy
FrancisCootes.Automatic interpretationandcod-ing of
faceimagesusingflexible
models.IEEETrans-actionsonPatternAnalysisandMachine
Intelligence,19(7):743–756, 1997.
[16] Steve Lawrence,C. Lee Giles, Ah ChungTsoi, andAndrew D.
Back. Facerecognition: A convolutionalneural network
approach.IEEETransactionsonNeu-ral Networks, 8(1):98–113, 1998.
[17] Y. Li, S.Gong,andH. Liddell. Support vector
regres-sionandclassificationbasedmulti-view facedetectionand
recognition. In IEEE International Conferenceon AutomaticFaceand
Gesture Recognition, March2000.
[18] A.J.Luckman,N.M Allison, A. Ellis, andB.M. Flude.Familiar
facerecognition: A comparative studyof
aconnectionistmodelandhumanperformance.Neuro-computing,
7:3–27,1995.
-
[19] A. R. Martinez and R. Benavente. The ar
facedatabase.Technical Report24,ComputerVisionCen-ter(CVC)
TechnicalReport, Barcelona,Spain, June1998.
[20] BabackMoghaddamandAlex Paul Pentland.Proba-bilistic
visuallearningfor objectrepresentation.
IEEETransactionsonPatternAnalysisandMachineIntelli-gence,
19(7):696–710, 1997.
[21] E. Osuna,R. Freund, and F. Girosi. Training sup-port vector
machines: An application to facedetec-tion, 1997.
[22] P. Penev andJ. Atick. Local feature analysis: A
gen-eralstatisticaltheoryfor objectrepresentati on,1996.
[23] P. Jonathon Phillips, Harry Wechsler, Jeffrey
S.Huang,andPatrickJ.Rauss.TheFERETdatabaseandevaluationprocedurefor
face-recognition algorithms.ImageandVisionComputing, 16(5):295–306,
1998.
[24] T. Poggio andK.-K. Sung. Example-basedLearningfor
View-basedHumanFaceDetection.In 1994ARPAImage Understanding
Workshop, volume II, Novem-ber1994.
[25] B. Scholkopf, A. Smola,and K.-R. Muller. Kernalprincipal
component analysis.In Artificial Neural Net-worksICANN97, 1997.
[26] T. Sim, S. Baker, andM. Bsat. The CMU Pose,Il-lumination,
andExpression(PIE) databaseof humanfaces.Technical
ReportCMU-RI-TR-01-02,RoboticsInstitute,Carnegie Mellon University,
Pittsburgh, PA,January2001.
[27] L. Sirovich andM. Kirby. Low-dimensional proce-durefor
thecharacterizationof humanfaces.Journalof Optical Societyof
America, 4(3):519–524, March1987.
[28] L. SirovichandM. Kirby. Low-dimensionalprocedurefor
thecharacterizationof humanfaces,1987.
[29] Matthew Turk and Alex Paul Pentland. Eigenfacesfor
recognition. Journal of Cognitive Neuroscience,3(1):71–86,
1991.
[30] Vladimir N. Vapnik. Thenatureof statisticallearningtheory.
Springer Verlag,Heidelberg, DE, 1995.
[31] ThomasVetter. Synthesisof novel
viewsfromasinglefaceimage.International Journal of
ComputerVision,28(2):103–116,1998.
[32] ThomasVetter, Anya Hurlbert,
andTomasoPoggio.View-basedmodels of 3D object recognition:
Invari-anceto imaging transformations. Cerebral
Cortex,5(3):261–269,1995.
[33] LaurenzWiskott, Jean-Marc Fellous,Norbert Krüger,and
Christophvon der Malsburg. Facerecognitionby elastic bunch graph
matching. IEEE Transac-tions on Pattern Analysisand Machine
Intelligence,19(7):775–779, July1997.
[34] Alan L. Yuille. Deformabletemplatesfor facerecog-nition.
Journal of Cognitive Neuroscience, 3(1):59–70,1991.