Discriminantanalysisforrecognitionofhuman faceimages · Discriminantanalysisforrecognitionofhuman faceimages KamranEtemadandRamaChellappa...

1724 J. Opt. Soc. Am. A/Vol. 14, No. 8 /August 1997 K. Etemad and R. Chellappa

Discriminant analysis for recognition of humanface images

Kamran Etemad and Rama Chellappa

Department of Electrical Engineering and Center for Automation Research, University of Maryland, College Park,Maryland 20742

Received January 29, 1996; revised manuscript received October 25, 1996; accepted February 14, 1997

The discrimination power of various human facial features is studied and a new scheme for automatic facerecognition (AFR) is proposed. The first part of the paper focuses on the linear discriminant analysis (LDA)of different aspects of human faces in the spatial as well as in the wavelet domain. This analysis allows ob-jective evaluation of the significance of visual information in different parts (features) of the face for identifyingthe human subject. The LDA of faces also provides us with a small set of features that carry the most rel-evant information for classification purposes. The features are obtained through eigenvector analysis of scat-ter matrices with the objective of maximizing between-class variations and minimizing within-class variations.The result is an efficient projection-based feature-extraction and classification scheme for AFR. Each projec-tion creates a decision axis with a certain level of discrimination power or reliability. Soft decisions madebased on each of the projections are combined, and probabilistic or evidential approaches to multisource dataanalysis are used to provide more reliable recognition results. For a medium-sized database of human faces,excellent classification accuracy is achieved with the use of very-low-dimensional feature vectors. Moreover,the method used is general and is applicable to many other image-recognition tasks. © 1997 Optical Societyof America [S0740-3232(97)01008-9]

Key words: Face recognition, wavelet packets, discriminant eigenpictures, evidential reasoning, multi-source soft decision integration.

1. INTRODUCTIONInspired by the human’s ability to recognize faces as spe-cial objects and motivated by the increased interest in thecommercial applications of automatic face recognition(AFR) as well as by the emergence of real-time processors,research on automatic recognition of faces has becomevery active. Studies on the analysis of human facial im-ages have been conducted in various disciplines. Thesestudies range from psychophysical analysis of human rec-ognition of faces and related psychovisual tests1,2 to re-search on practical and engineering aspects of computerrecognition and verification of human faces and facialexpressions3 and race and gender classification.4,5

The problem of AFR alone is a composite task that in-volves detection and location of faces in a cluttered back-ground, facial feature extraction, and subject identifica-tion and verification.6,7 Depending on the nature of theapplication, e.g., image acquisition conditions, size of da-tabase, clutter and variability of the background and fore-ground, noise, occlusion, and finally cost and speed re-quirements, some of the subtasks are more challengingthan others.The detection of a face or a group of faces in a single

image or a sequence of images, which has applications inface recognition as well as video conferencing systems, isa challenging task and has been studied by manyresearchers.7–12 Once the face image is extracted fromthe scene, its gray level and size are usually normalizedbefore storing or testing. In some applications, such asidentification of passport pictures or drivers’ licenses, con-ditions of image acquisition are usually so controlled thatsome of the preprocessing stages may not be necessary.

0740-3232/97/0801724-10$10.00 ©

One of the most important components of an AFR sys-tem is the extraction of facial features, in which attemptsare made to find the most appropriate representation offace images for identification purposes. The main chal-lenge in feature extraction is to represent the input datain a low-dimensional feature space, in which points corre-sponding to different poses of the same subject are closeto each other and far from points corresponding to in-stances of other subjects’ faces. However, there is a lot ofwithin-class variation that is due to differing facial ex-pressions, head orientations, lighting conditions, etc.,which makes the task more complex.Closely tied to the task of feature extraction is the in-

telligent and sensible definition of similarity between testand known patterns. The task of finding a relevant dis-tance measure in the selected feature space, and therebyeffectively utilizing the embedded information to identifyhuman subjects accurately, is one of the main challengesin face identification. In this paper we focus on feature-extraction and face-identification processes.Typically, each face is represented by use of a set of

gray-scale images or templates, a small-dimensional fea-ture vector, or a graph. There are also various proposalsfor recognition schemes based on face profiles13 andisodensity or depth maps.14,15 There are two major ap-proaches to facial feature extraction for recognition incomputer vision research: holistic template matching-based systems and geometrical local feature-basedschemes and their variations.7

In holistic template-matching systems each template isa prototype face, a facelike gray-scale image, or an ab-stract reduced-dimensional feature vector that has been

1997 Optical Society of America

K. Etemad and R. Chellappa Vol. 14, No. 8 /August 1997 /J. Opt. Soc. Am. A 1725

obtained through processing the face image as a whole.Low-dimensional representations are highly desirable forlarge databases, fast adaptation, and good generalization.On the basis of these needs, studies have been performedon the minimum acceptable image size and the smallestnumber of gray levels required for good recognitionresults.6 Reduction in dimensionality can also beachieved by using various data-compression schemes.For example, representations based on principal-component analysis16–20 (PCA) and singular-value decom-position have been studied and used extensively for vari-ous applications. It has also been shown that the nonlin-ear mapping capability of multilayer neural networks canbe utilized and that the internal and hidden representa-tions of face patterns, which typically are of much lowerdimensionality than the original image, can be used forrace and gender classification.4,5 Some of the most suc-cessful AFR schemes are based on the Karhunen–Loevetransform17,19 (KLT), which yield the so-called eigenfaces.In these methods the set of all face images is considered avector space, and the eigenfaces are simply the dominantprincipal components of this face space; they are com-puted as eigenvectors of the covariance matrix of data.In geometrical feature-based systems, one attempts to

locate major face components or feature points in theimage.21–24 The relative sizes of and distances betweenthe major face components are then computed. The setof all normalized size and distance measurements consti-tute the final feature vectors for classification. One canalso use the information contained in the feature points toform a geometrical graph representation of the face thatdirectly shows sizes and relative locations of major faceattributes.22 Most geometrical feature-based systems in-volve several steps of window-based local processing, fol-lowed by some iterative search algorithms, to locate thefeature points. These methods are more adaptable tolarge variations in scale, size, and location of the face inan image but are more susceptible to errors when face de-tails are occluded by objects, e.g., glasses, or by facialhair, facial expressions, or variations in head orientation.Compared with template and PCA-based systems, thesemethods are computationally more expensive. Compara-tive studies of template versus local feature-based sys-tems can be found in Refs. 4, 7, and 25. There are alsovarious hybrid schemes that apply the KLT or the tem-plate matching idea to face components and usecorrelation-based searching to locate and identify facialfeature points.4,19 The advantage of performingcomponent-by-component matching is improved robust-ness against head orientation changes, but its disadvan-tage is the complexity of searching for and locating facecomponents.The human audiovisual system, as a powerful recogni-

tion model, takes great advantage of context and auxil-iary information. Inspired by this observation, one candevise schemes that can consistently incorporate contextand collateral information, when and if they becomeavailable, to enhance its final decisions. Incorporatinginformation such as race, age, and gender, obtainedthrough independent analysis, improves recognitionresults.19 Also, since face recognition involves a classifi-cation problem with large within-class variations, caused

by dramatic image variation in different poses of the sub-ject, one has to devise methods of reducing or compensat-ing for such variability. For example,1. For each subject, store several templates, one for

each major distinct facial expression and head orienta-tion. Such systems are typically referred to as view-based systems.2. Use deformable templates along with a three-

dimensional model of a human face to synthesize virtualposes and apply the template matching algorithm to thesynthesized representations.26

3. Incorporate such variations into the process of fea-ture extraction.

In this paper we take the third approach and keep thefirst method as an optional stage that can be employed de-pending on the complexity of the specific task. Our ap-proach is a holistic linear discriminant analysis (LDA)-based feature extraction for human faces followed by anevidential soft-decision integration for multisource dataanalysis. This method is a projection-based scheme oflow complexity that avoids any iterative search or compu-tation. In this method both off-line feature extractionand on-line feature computation can be done at highspeeds, and recognition can be done in almost real time.Our experimental results show that high levels of recog-nition performance can be achieved with low complexityand a small number of features.The organization of this paper is as follows. In Section

2 we provide an objective study of multiscale features offace images in terms of their discrimination power. InSection 3 we propose a holistic method of projection-baseddiscriminant facial feature extraction through LDA offace images. We also make a comparative study of thefeatures obtained with the proposed scheme and the fea-tures used in compression-based methods such as PCA.In Section 4 we address the task of classification andmatching through multisource data analysis and combin-ing soft decisions from multiple imprecise informationsources. Finally, we propose a task-dependent measureof similarity in the feature space that is based on the re-liability of the basic decisions, to be used at the identifi-cation stage.

2. LINEAR DISCRIMINANT ANALYSIS OFFACIAL IMAGESAs highly structured two-dimensional patterns, humanface images can be analyzed in the spatial and the fre-quency domains. These patterns are composed of compo-nents that are easily recognized at high levels but areloosely defined at low levels of our visual system.2,27

Each of the facial components (features) has a differentdiscrimination power for identifying a person or the per-son’s gender, race, and age. There have been many stud-ies of the significance of such features that used subjec-tive psychovisual experiments.1,2

Using objective measures, in this section we propose acomputational scheme for evaluating the significance ofdifferent facial attributes in terms of their discriminationpotential. The results of this analysis can be supportedby subjective psychovisual findings. To analyze any rep-


resentation V, where V can be the original image, its spa-tial segments, or transformed images, we provide the fol-lowing framework.First, we need a training set composed of a relatively

large group of subjects with diverse facial characteristics.The appropriate selection of the training set directly de-termines the validity of the final results. The databaseshould contain several examples of face images for eachsubject in the training set and at least one example in thetest set. These examples should represent different fron-tal views of subjects with minor variations in view angle.They should also include different facial expressions, dif-ferent lighting and background conditions, and exampleswith and without glasses. It is assumed that all imagesare already normalized to m 3 n arrays and that theycontain only the face regions and not much of the subjects’bodies.Second, for each image and subimage, starting with the

two-dimensional m 3 n array of intensity valuesI(x, y), we construct the lexicographic vector expansionf P Rm3n. This vector corresponds to the initial repre-sentation of the face. Thus the set of all faces in the fea-ture space is treated as a high-dimensional vector space.Third, by defining all instances of the same person’s

face as being in one class and the faces of different sub-jects as being in different classes for all subjects in thetraining set, we establish a framework for performing acluster separation analysis in the feature space. Also,having labeled all instances in the training set and hav-ing defined all the classes, we compute the within- andbetween-class scatter matrices as follows:

Sw~V ! 5 (

i51

L

Pr~Ci!S i , (1)

Sb~V ! 5 (

i51

L

Pr~Ci!~m 2 m i!~m 2 m i!T. (2)

Here Sw is the within-class scatter matrix showing theaverage scatter ( i of the sample vectors (V) of differentclasses Ci around their respective mean, vectors m i :

S i 5 E@~V 2 m i! 3 ~V 2 m i!TuC 5 Ci#. (3)

Similarly, Sb is the between-class scatter matrix, repre-senting the scatter of the conditional mean, vectors (m i)around the overall mean vector m. PrCi is the probabil-

ity of the ith class. The discriminatory power of a repre-sentation can be quantified by using various measures.In this paper we use the separation matrix, which showsthe combination of within- and between-class scatters ofthe feature points in the representation space. The classseparation matrix and a measure of separability can becomputed as

S ~V ! 5 Sw21Sb (4)

JV 5 sep~V ! 5 trace~S ~V !! (5)

JV is our measure of the discrimination power (DP) of agiven representation V. As mentioned above, the repre-sentation may correspond to the data in its original form(e.g., a gray-scale image), or it can be based on a set of ab-stract features computed for a specific task.For example, through this analysis we are able to com-

pare the DP’s of different spatial segments (components)of a face. We can apply the analysis to segments of theface images such as the areas around the eyes, mouth,hair, and chin or combinations of them. Figure 1 shows aseparation analysis for horizontal segments of the faceimages in the database. The results show that the DP’sof all segments are comparable and that the area betweenthe nose and the mouth has more identification informa-tion than other parts. Figure 2 shows that the DP of thewhole image is significantly larger than the DP’s of itsparts.Using wavelet transforms28–30 as multiscale orthogonal

representations of face images, we can also perform acomparative analysis of the DP’s of subimages in thewavelet domain. Different components of a wavelet de-composition capture different visual aspects of a gray-scale image. As Fig. 3 shows, at each level of decompo-sition there are four orthogonal subimages correspondingto

• LL: the smoothed, low-frequency variations.• LH: sharp changes in the horizontal direction, i.e.,

vertical edges.• HL: sharp changes in the vertical direction, i.e.,

horizontal edges.• HH: sharp changes in nonhorizontal, nonvertical

directions, i.e., other edges.

We applied the LDA to each subimage of the wavelettransform (WT) of the face and estimated the DP of each

Fig. 1. Variation of the discrimination power of horizontal segments of the face defined by a window of fixed height sliding from top tobottom of the image.


Fig. 2. Variation of the discrimination power of a horizontal segment of the face that grows in height from top to bottom of the image.

subband. Figure 3 compares the separations obtained byusing each of the subbands. Despite their equal sizes,different subimages carry different amounts of informa-tion for classification; the low-resolution components arethe most informative. The horizontal edge patterns arealmost as important as the vertical edge patterns, andtheir relative importance depends on the scale. Finally,the least important component in terms of face discrimi-nation is the fourth subband, i.e., the slanted edge pat-terns. These results are consistent with our intuitionand also with subjective psychovisual experiments.One can also apply this idea to the study of the impor-

tance of facial components for gender or race classificationfrom images.

3. DISCRIMINANT EIGENFEATURES FORFACE RECOGNITIONIn this section we propose a new algorithm for face recog-nition that makes use of a small yet efficient set of dis-criminant eigentemplates. The analysis is similar to themethod suggested by Pentland and colleagues,18,19 whichis based on PCA. The fundamental difference is that in

Fig. 3. Different components of a wavelet transform that cap-ture sharp variations of the image intensity in different direc-tions and have different discrimination potentials. The num-bers represent the relative discrimination power.

our system eigenvalue analysis is performed on the sepa-ration matrix rather than on the covariance matrix.Human face images as two-dimensional patterns have

a lot in common and are spectrally quite similar. There-fore, considering the face image as a whole, one expects tosee important discriminant features that have low ener-gies. These low-energy discriminant features may not becaptured in a compression-based feature-extractionscheme such as PCA, or even in multilayer neural net-works, which rely on minimization of the average Euclid-ean error. In fact, there is no guarantee that the errorincurred by applying the compression scheme, despite itslow energy, does not carry significant discrimination in-formation. Also, there is no reason to believe that for agiven compression-based feature space, feature pointscorresponding to different poses of the same subject willbe closer (in Euclidean distance) to one another than tothose of other subjects. In fact, it has been argued andexperimentally shown that ignoring the first few eigen-vectors, corresponding to the top principal components,can lead to a substantial increase in recognitionaccuracy.19,31 Therefore the secondary selection fromPCA vectors is based on their discrimination power. Butone could ask, why do we not start with criteria based ondiscrimination rather than on representation from the be-ginning to make the whole process more consistent?The PCA approach provides us with features that cap-

ture the main directions along which face images differthe most, but it does not attempt to reduce the within-class scatter of the feature points. In other words, sinceno class membership information is used, examples of thesame class and of different classes are treated the sameway. LDA, however, uses the class membership informa-tion and allows us to find eigenfeatures and therefore rep-resentations in which the variations among differentfaces are emphasized, while the variations of the sameface that are due to illumination conditions, facial expres-sion, orientation etc., are de-emphasized.According to this observation, and on the basis of the

results that follow, we believe that for classification pur-poses LDA-based feature extraction seems to be an appro-


priate and logical alternative to PCA or any othercompression-based system that tries to find the most com-pact representation of face images. Concurrently but in-dependently of our studies, LDA has been used by Swetsand Weng32,33 to discriminate human faces from other ob-jects.To capture the inherent symmetry of basic facial fea-

tures and the fact that a face can be identified from itsmirror image, we can use the mirror image of each ex-ample as a source of information.17 Also, by adding noisybut identifiable versions of given examples, we can ex-pand our training data and improve the robustness of thefeature extraction against a small amount of noise in theinput. Therefore for each image in the database we in-clude its mirror image and one of its noisy versions, asshown in Fig. 4. Let F denote the face database, i.e.,

F 5 $Fs : s 5 1, 2, ..., NS%, (6)

Fs 5 $f is, f i

s, ~f is 1 n! : i 5 1, 2, ..., NE ,

n 5 @N~0, s2!#m3n%, (7)

where f is and f i

s 1 n are mirror images and noisy ver-sions, respectively, of f i , the ith example of subject s inthe data base F. Also, NS is the number of subjects andNE is the number of examples per subject in the initialdatabase. Following our earlier observations, and hav-ing determined the separation matrix, we perform an ei-genvalue analysis of the separation matrix S (F) on theaugmented database:

eig$S ~F!%

5 $~l i , ui!, i 5 1, ..., NS 2 1,l i . l i11%. (8)

To reduce the computational cost for large data-set sizes,one can use the following equality32,34:

Sbui 5 l iSwui . (9)

This shows that the ui’s and l i’s are generalized eigenvec-tors of $Sb , Sw%. From this equation the l i’s can be com-puted as the roots of the characteristic polynomial

uSb 2 l iSwu 5 0, (10)

and then the ui’s can be obtained by solving

~Sb 2 l iSw!ui 5 0 (11)

Fig. 4. For each example in the database we add its mirror im-age and a noisy version.

only for the selected largest eigenvectors.32 Note thatthe dimensionality m of the resulting set of feature vec-tors is m , rank(S) 5 min(n, NS 2 1). Now define

L~m ! 5 $l i , i 5 1, ..., m , NS 2 1%, (12)

U ~m ! 5 $ui , i 5 1, ..., m , NS 2 1%, (13)

so that L (m) and U (m) represent the set of m largest ei-genvalues of S (F) and their corresponding eigenvectors.Considering U (m) as one of the possible linear transfor-mations V from Rn to Rm, withm , n, one can show that

V 5 $U : X , Rn → UTX 5 Y , Rm, m , n%, (14)

U ~m ! 5 argmin UPV$JX 2 JUTX%, (15)

where JX 5 tr(S (X)) and JY 5 tr(S (Y)) are separabilitiescomputed over the X and Y 5 UTX spaces, respectively.This means that U (m) minimizes the drop usep(X)2 sep(UTX)u in classification information incurred bythe reduction in the feature space dimensionality, and noother Rn to Rm linear mapping can provide more separa-tion than U (n) does.Therefore the optimal linear transformation from the

initial representation space in Rn to a low-dimensionalfeature space in Rm, which is based on our selected sepa-ration measure, results from projecting the input vectorsf onto m eigenvectors corresponding to the m largest ei-genvalues of the separation matrix S (F). These optimalvectors can be obtained from a sufficiently rich trainingset and can be updated if needed.The columns of U (m) are the eigenvectors correspond-

ing to them largest eigenvalues; they represent the direc-tions along which the projections of the face imageswithin the database show the maximum class separation.Each face image in the database is represented, stored,

and tested in terms of its projections onto the selected setof discriminant vectors, i.e., the directions correspondingto the largest eigenvalues of S (F):

;f is P Fs , ;u P U ~m ! : c i

s~u ! 5 ^f is, u&, (16)

Cs 5 $C is~u ! : ;u P U ~m !, I 5 1, ..., NS%. (17)

Although all images of each subject are considered inthe process of training, only one of them needs to besaved, as a template for testing. If a view-based ap-proach is taken, one example for each distinct view has tobe stored. Since only the projection coefficients need tobe saved, for each subject we retain the example that isclosest to the mean of the corresponding cluster in thefeature space. Storing the projection coefficients insteadof the actual images is highly desirable when large data-bases are used. Also, applying this holistic LDA to mul-tiscale representations of face images, one can obtainmultiscale discriminant eigentemplates. For example,one can apply LDA to each component of the WT of faceimages and select the most discriminant eigentemplatesobtained from various scales. This approach is morecomplex because it requires the WT computation of eachtest example, but in some applications it may be useful,for example, when the DP of the original representation is


not captured in the first few eigenvectors or when the con-dition of m , Nclasses 2 1) becomes restrictive, e.g., ingender classification.

4. MULTISOURCE SOFT-DECISIONINTEGRATIONA number of different approaches have been proposed foranalyzing information obtained from several sources.34–36

The simplest method is to form an extended data (feature)vector, containing information from all the sources andtreat this vector as the vector output of a single source.Usually, in such systems all similarities and distances aremeasured in the Euclidean sense. This approach can becomputationally expensive; it is successful only when allthe sources have similar statistical characteristics andcomparable reliabilities. In our application this assump-tion is not valid, and therefore a more intelligent alterna-tive approach has to be taken.Each projection of the input pattern onto a discrimi-

nant vector ui creates a decision axis with a certain levelof reliability and discrimination power. The level of sig-nificance or reliability a i of the decisions based on ui isdirectly related to the class separation along that axisthat is equal to the corresponding (normalized) eigen-value in the LDA:

;~l i , ui! P ~L~m ! 3 U ~m !! : a i 5l i

(i51

m

l i

. (18)

Fig. 5. Distribution of projection coefficients along three dis-criminant vectors with different levels of discrimination powerfor several poses from four different subjects.

Fig. 6. Raw distances between each test example and theknown clusters along each discriminant axis reuslt in the soft de-cision along that axis.

Figure 5 shows the distribution of projection coeffi-cients onto three discriminant vectors. For any test vec-torized face image f, we project the image onto each ofthe top discriminant vectors u. On the basis of the dis-tances between the resulting coefficients f(u) and thoseof the existing templates cu

s stored in the database, weestimate the level of similarity of the input image to eachknown subject (see Fig. 6):

;u P U ~m ! : f~u ! 5 ^f, u&, (19)

;s P S : du~f, s ! 5 uf~u ! 2 cusu, (20)

pu~f, s ! 5 1 2du~f, s !

(sPS

du~f, s !

, (21)

where pu(f, s) reflects the relative level of similarity be-tween input f and subject s according to source u, whichhas reliability au .Having determined our decision axis and the reliabili-

ties, we can apply a probabilistic or an evidential schemeof multisource data analysis to combine the soft decisionsmade on the basis of the individual imprecise sources toobtain a more precise and reliable final result. The nor-malized similarity measures (p’s) indicate the proportionsof evidence suggested by different sources. They can beinterpreted as the so-called basic masses of evidence orthey can be used as rough estimates of posterior prob-abilities given each measurement. From this stage on, aprobabilistic or an evidential reasoning approach can betaken to combine basic soft decisions. A comparativestudy of various probabilistic and evidential reasoningschemes is given in Ref. 35.Similarly working with distances as dissimilarity mea-

sures, one can combine a basic soft decision and incorpo-rate the reliability of each source to define a reasonablemeasure of distance in the feature space. Although themost common measure used in the literature is Euclideandistance, as a more reasonable measure we suggest aweighted-mean absolute square distance, with theweights based on the discrimination powers. In otherwords,

du~f, s ! 5du~f, s !

(sPS

du~f, s !

(22)

D~f, s ! 5 (uPU~m !

@du~f, s ! 3 au#. (23)

Therefore for a given input f the best match s0 and itsconfidence measure is

s0 5 argminsPS$D~f, s !%, (24)

conf~f, s0! 5 1 2D~f, s0!

D~f, s8!, (25)

where s8 is the second-best candidate and conf stands forconfidence measure. In this framework, incorporatingcollateral information or prior knowledge and expecta-tions from context becomes very easy and logical. All weneed to do is to consider each of them as an additionalsource of information corresponding to a decision axis


with a certain reliability and include it in the decisionprocess. See Table 1 for a summary of recognition rates.

5. EXPERIMENTS AND RESULTS

In our experiments, to satisfy the requirements men-tioned in Section 2 we used a mixture of two databases.We started with the database provided by Olivetti Re-search Ltd.37 This database contains 10 different imagesof each of 40 different subjects. All the images weretaken against a homogeneous background, and some weretaken at different times. The database includes frontalviews of upright faces with slight changes in illumination,facial expression (open or closed eyes, smiling or nonsmil-ing), facial details (glasses or no glasses), and some sidemovements. Originally we chose this database becauseit contained many instances of frontal views for each sub-ject. Then, to increase the size of the database, we addedsome hand-segmented face images from the FERRET

database.38 We also included mirror-image and noisyversions of each face example to expand the data set andimprove the robustness of recognition performance to im-age distortions. The total number of images used intraining and in testing were approximately 1500 and 500,respectively. Each face was represented by a 50 3 60pixel 8-bit gray-level image, which for our experimentswas reduced to 25 3 30. The database was divided intotwo disjoint training and test sets. Using this compositedatabase, we performed several tests on gender classifica-tion and face recognition.

Table 1. Summary of Recognition Rates

TaskNo. of

ExamplesNo. of

Features

RecognitionRate (%)

(Training Set)

RecognitionRate (%)(Test Set)

Facerecognition

2000 4 100 99.2

Genderclassification

400 1 100 95

The first test was on gender classification with use of asubset of the database containing multiple frontal viewsof 20 males and 20 females of different races. LDA wasapplied to the data, and the most discriminant templatewas extracted. Figure 7 shows this eigentemplate andthe distribution of projection coefficients for all images inthe set. As Fig. 7 shows, with only one feature very goodseparation can be achieved. Classification tests on a dis-joint test set also gave 95% accuracy. Also, applying thisdiscriminant template to a set of new faces from individu-als outside the training set reduced the accuracyto ;92%.As mentioned above, one can also apply LDA to wavelet

transforms of face images and extract the most discrimi-nant vectors of each transform component and combinemultiscale classification results by using the proposedmethod of soft-decision integration.We then applied LDA to a database of 1500 faces, with

60 classes corresponding to 60 individuals. Figure 8shows the discriminatory power of the top 40 eigenvectorschosen according to PCA and LDA. As Fig. 8 depicts, theclassification information of the principal componentsdoes not decrease monotonically with their energy; inother words, there are many cases in which a low-energycomponent has a higher discriminatory power than ahigh-energy component. The figure also shows that thetop few discriminant vectors from LDA contain almost allthe classification information embedded in the originalimage space.Figure 9 shows the separation of clusters for ten poses

of four different individuals, obtained with use of the twomost discriminatory eigenvectors or eigenpictures. AsFigure 9 indicates, the differences among classes (indi-viduals) are emphasized, while the variations of the sameface in different poses are deemphasized. The separationis achieved despite all the image variations resulting fromthe various poses of each subject. Figure 10 shows thedistribution of clusters for 200 images of 10 subjects inthe best two-dimensional discriminant feature space andin the best two-dimensional PCA-based space.For each test face example, we first projected it onto the

Fig. 7. Distribution of feature points for male and female examples in the database.


Fig. 8. Comparison of DP’s of the top 40 selected eigenvectors based on PCA and LDA.

Fig. 9. Separation of clusters in the selected two-dimensional feature space. Four clusters correspond to variations of the faces of fourdifferent subjects in the database.

selected eigenvectors and found the distance from the cor-responding point in the four-dimensional feature space toall of the previously saved instances. All distances weremeasured according to Eq. (23), and the best match wasselected. For the given database, excellent (i.e., 99.2%)accuracy was achieved on the test set.To evaluate the generalization of the feature extraction

beyond the original training and test sets, we tested theclassification results on pictures of new individuals, noneof whom was present in the training set. Because of ourlimitations in terms of data availability, we could use onlyten new subjects with two pictures per subject: onesaved in the database as a template and two for testing.As expected, the application of the projection templates to

these completely new faces resulted in a reduction in clas-sification accuracy to ;90%.This reduction was expected, considering the fact that

we did not have a very large training set. Extracting dis-criminant facial features from a large training set with di-verse examples should improve the generalization andperformance of the system on recognition of subjects out-side the training set.The simplicity of our systems, the size of the database,

and the robustness of the results to small variations ofthe pose and the noise show that our suggested scheme isa good alternative approach to face recognition. It pro-vides highly competitive results at much lower complexitywith the use of low-dimensional feature sizes.


Fig. 10. Cluster separation in the best two-dimensional feature space. Top, based on LDA; bottom, based on PCA.

6. CONCLUSIONSThe application of LDA to study the discriminatory powerof various facial features in spatial and wavelet domain ispresented. Also, an LDA-based feature extraction forface recognition is proposed and tested. A holisticprojection-based approach to face feature extraction istaken in which eigentemplates are the most discriminantvectors derived from LDA of face images in a rich enoughdatabase. The effectiveness of the proposed LDA-basedfeatures is compared with that of PCA-based eigenfaces.For classification a variation of evidential reasoning isused, in which each projection becomes a source of dis-criminating information, with reliability proportional toits discrimination power. The weighted combination ofsimilarity or dissimilarity scores suggested by all projec-tion coefficients is the basis for membership values.Several results on face recognition and gender classifi-

cation are presented, in which highly competitive recog-nition accuracies are achieved with a small number of fea-tures. The feature extraction can be applied to WTrepresentation of images to provide a multiscale discrimi-nant framework. In such cases the system becomes morecomplex at the expense of improving separability and per-formance. The proposed feature extraction combinedwith soft classification seems to be a promising alterna-tive to other face-recognition systems.

The support of this research by the Advanced ResearchProjects Agency (ARPA Order No. C635) and the U.S. Of-fice of Naval Research under contract N00014-95-1-0521is gratefully acknowledged.

REFERENCES1. R. Baron, ‘‘Mechanisms of human facial recognition,’’ Int. J.

Man–Machine Studies 15, 137–178 (1981).

2. G. Davies, H. Ellis, and E. J. Shepherd, Perceiving and Re-membering Faces (Academic, New York, 1981).

3. Y. Yacoob and L. S. Davis, ‘‘Computing spatio-temporal rep-resentations of human faces,’’ in Proceedings of the IEEEComputer Society Conference on Computer Vision and Pat-tern Recognition (IEEE Computer Society, Los Alamitos,Calif., 1994), pp. 70–75.

4. R. Brunelli and T. Poggio, ‘‘HyperBF networks for genderclassification,’’ in Proceedings of the DARPA Image Under-standing Workshop (Defense Advanced Research ProjectsAgency, Arlington, Va., 1992), pp. 311–314.

5. B. A. Golomb and T. J. Sejnowski, ‘‘SEXNET: A neural net-work identifies sex from human faces,’’ in Advances in Neu-ral Information Processing Systems 3, D. S. Touretzky andR. Lipmann, eds. (Morgan Kaufmann, San Mateo, Calif.,1991), pp. 572–577.

6. A. Samal and P. Iyengar, ‘‘Automatic recognition andanalysis of human faces and facial expressions: a survey,’’Pattern Recog. 25, 65–77 (1992).

7. R. Chellappa, C. L. Wilson, and S. Sirohey ‘‘Human andmachine recognition of faces, a survey,’’ Proc. IEEE 83,705–740 (1995).

8. V. Govindaraju, S. N. Srihari, and D. B. Sher, ‘‘A computa-tional model for face location,’’ in Proceedings of the ThirdInternational Conference on Computer Vision (IEEE Com-puter Society Press, Los Alamitos, Calif., 1990), pp. 718–721.

9. A. Shio and J. Sklansky, ‘‘Segmentation of people in mo-tion,’’ in Proceedings of the IEEE Workshop on Visual Mo-tion (Institute of Electrical and Electronics Engineers, Pis-cataway, N.J., 1991), pp. 325–332.

10. G. Yang and T. S. Huang, ‘‘Human face detection in ascene,’’ in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (Institute of Electrical andElectronics Engineers, Piscataway, N.J., 1993), pp. 453–458.

11. A. Pentland and B. Moghaddam, ‘‘Probabilistic visuallearning for object detection,’’ in Proceedings of the Interna-tional Conference on Computer Vision (IEEE Computer So-ciety Press, Los Alamitos, Calif., 1995), pp. 786–793.

12. K. Sung and T. Poggio, ‘‘Example-based learning for viewbased human face detection,’’ Proceedings of the IEEE Im-


age Understanding Workshop (Institute of Electrical andElectronics Engineers, Piscataway, N.J., 1994), pp. 843–850.

13. C. Wu and J. Huang, ‘‘Huang face profile recognition bycomputer,’’ Pattern Recog. 23, 255–259 (1990).

14. G. Gordon, ‘‘Face recognition based on depth maps and sur-face curvature,’’ in Geometric Methods in Computer Vision,B. C. Vemuri, ed., Proc. SPIE 1570, 234–247 (1991).

15. O. Nakamura, S. Mathur, and T. Minami, ‘‘Identification ofhuman faces based on isodensity maps,’’ Pattern Recog. 24,263–272 (1991).

16. Y. Cheng, K. Liu, J. Yang, and H. Wang, ‘‘A robust alge-braic method for human face recognition,’’ in Proceedings ofthe 11th International Conference on Pattern Recognition(IEEE Computer Society Press, Los Alamitos, Calif., 1992),pp. 221–224.

17. M. Kirby and L. Sirovich, ‘‘Application of the Karhunen–Loeve procedure for the characterization of human faces,’’IEEE Trans. Pattern. Anal. Mach. Intell. 12, 103–108(1990).

18. M. A. Turk and A. P. Pentland, ‘‘Face recognition usingeigenfaces,’’ in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition(IEEE Computer Society, Los Alamitos, Calif., 1991), pp.586–591.

19. A. Pentland, B. Moghaddam, T. Starner, and M. Turk,‘‘View-based and modular eigenspaces for face recognition,’’in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (IEEE ComputerSociety, Los Alamitos, Calif., 1994), pp. 84–91.

20. L. Sirovich and M. Kirby, ‘‘Low-dimensional procedure forthe characterization of the human face,’’ J. Opt. Soc. Am. A4, 519–524 (1987).

21. I. Craw, D. Tock, and A. Bennett, ‘‘Finding face features,’’in Proceedings of the Second European Conference on Com-puter Vision (Springer-Verlag, Berlin, 1992), pp. 92–96.

22. B. S. Manjunath, R. Chellappa, and C. v. d. Malsburg, ‘‘Afeature based approach to face recognition,’’ in Proceedingsof the IEEE Computer Society Conference on Computer Vi-sion and Pattern Recognition (IEEE Computer Society, LosAlamitos, Calif., 1992), pp. 373–378.

23. M. Lades, J. Vorbruggen, J. Buhmann, J. Lange, C.v.d.Malsburg, and R. Wurtz, ‘‘Distortion invariant object recog-nition in the dynamic link architecture,’’ IEEE Trans. Com-put. 42, 300–311 (1993).

24. M. Seibert and A. Waxman, ‘‘Recognizing faces from theirparts,’’ in Sensor Fusion IV: Control Paradigms and DataStructures, P. S. Schenker, ed., Proc. SPIE 1616, 129–140(1991).

25. A. Rahardja, A. Sowmya, and W. Wilson, ‘‘A neural networkapproach to component versus holistic recognition of facialexpressions in images,’’ in Intelligent Robots and ComputerVision X: Algorithms and Techniques, D. P. Casasent, ed.,Proc. SPIE 1607, 62–70 (1991).

26. A. Yuille, D. Cohen, and P. Hallinan, ‘‘Feature extractionfrom faces using deformable templates,’’ in Proceedings ofthe IEEE Computer Society Conference on Computer Visionand Pattern Recognition (IEEE Computer Society, LosAlamitos, Calif., 1989), pp. 104–109.

27. D. Marr, Vision (Freeman, San Francisco, Calif., 1982).28. R. R. Coifman and M. V. Wickerhauser, ‘‘Entropy based al-

gorithms for best basis selection,’’ IEEE Trans. Inf. Theory38, 713–718 (1992).

29. S. G. Mallat, ‘‘A theory for multi-resolution signal decompo-sition, the wavelet representation,’’ in IEEE Trans. Pat-tern. Anal. Mach. Intell. 11, 674–693 (1989).

30. I. Daubechies, ‘‘Orthonormal basis of compactly supportedwavelets,’’ Commun. Pure Appl. Math. 41, 909–996 (1988).

31. A. O’Toole, H. Abdi, K. Deffenbacher, and D. Valentin,‘‘Low-dimensional representation of faces in higher dimen-sions of the face space,’’ J. Opt. Soc. Am. A 10, 405–410(1993).

32. D. L. Swets and J. J. Weng, ‘‘SHOSLIF-O: SHOSLIF for objectrecognition (phase I),’’ in Tech. Rep. CPS 94-64 (MichiganState University, East Lansing, Mich., 1994).

33. D. L. Swets, B. Punch, and J. J. Weng, ‘‘Genetic algorithmsfor object recognition in a complex scene,’’ in Proceedings ofthe IEEE International Conference on Image Processing (In-stitute of Electrical and Electronics Engineers, Piscataway,N.J., 1995), pp. 595–598.

34. K. Fukunaga, ‘‘Statistical Pattern Recognition’’ (Academic,New York, 1989).

35. T. Lee, J. A. Richards, and P. H. Swain, ‘‘Probabilistic andevidential approaches for multisource data analysis,’’ IEEETrans. Geosci. Remote Sens. GE-25, 283–293 (1987).

36. P. L. Bogler ‘‘Shafer–Dempster reasoning with applicationsto multisensor target identification systems,’’ IEEE Trans.Syst. Man Cybern. 17, 968–977 (1987).

37. F. Samaria and A. Harter, ‘‘Parameterization of a stochas-tic model for human face identification,’’ in Second IEEEWorkshop on Applications of Computer Vision (Institute ofElectrical and Electronics Engineers, Piscataway, N.J.,1994).

38. P. Rauss, P. J. Phillips, M. Hamilton, and A. T. DePersia,‘‘FERET (Face Recognition Technology) Program,’’ in 25thAIPR Workshop: Emerging Applications of Computer Vi-sion, D. Schaefer and W. Williams, eds., Proc. SPIE 2962,253–263 (1996).

Discriminantanalysisforrecognitionofhuman faceimages · Discriminantanalysisforrecognitionofhuman faceimages KamranEtemadandRamaChellappa...

Documents