Top Banner
Journal of Experimental Psychology: Human Learning and Memory 1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology Unit Cambridge, England Two studies investigated recognition of pictures of faces, focusing on the effects of changes in appearance of the face from presentation to test and type of processing or encoding. Experiment 1 demonstrated that (a) pre- viously seen faces changed in pose and facial expression were discriminated from "new" faces essentially as well as pictures identical at presentation and test; (b) major changes in the appearance of a face ("disguises") reduced recognition almost to the level of chance; and (c) subjects encoding faces in terms of personality characteristics showed better recognition performance than subjects whose processing was based on physical, facial features. Ex- periment 2 expanded on result (b), utilizing photographs with systematic variations in pose and in the presence/absence of glasses, wig, and beard. The design required subjects to learn names for target faces and then to identify those targets in a series of test photographs. The manipulations of pose and disguising features produced effects on probability of identification that were orderly and dramatic in magnitude. Simple changes in appearance can effectively interfere with recognition of faces. When you encounter a face you have seen before, rarely if ever will the appear- ance of that face as a visual pattern be identical on the two occasions. Despite this fact, and although memory for faces has recently enjoyed some increased popularity as a topic for investigation, most studies have dealt only with identical pictures of a face at presentation and at test. Galper and Hochberg (1971) did determine that memory for a picture of a face carries some information about expression; and a num- ber of studies have looked at the effects of transforming faces via inversion or photo- graphic negative (for a review, see Ellis, 1975). In general, however, there is little experimental evidence about the effects of realistic changes in appearance on recogni- tion of faces. Experience tells us that some We wish to acknowledge the substantial assistance provided by D. C. V. Simmonds, whose excellent photography made these studies possible. Our thanks also to R. Milroy, who assisted in the analysis of Experiment 1, and to M. Woodhead, who partici- pated in the planning of this research. Requests for reprints should be sent to K. E. Patterson, MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 2EF, England. variations in appearance may go essentially unnoticed (e.g., a different facial expression or a small change in hair style), whereas other variations may produce at least a temporary failure of recognition, even in a person you have seen many times. Another issue in the study of memory for faces concerns the way in which the in- formation in a face is encoded. Craik and Lockhart's (1972) influential paper on "levels of processing" has focused attention on this issue generally; Bower and Karlin (1974) have applied this notion specifically to memory for faces, showing that a judg- ment about the honesty or likeableness of the person represented in a picture leads to better recognition of that picture than a judgment about the sex of the person. A decision about honesty or likeableness presumably requires more extensive en- coding of a face than does a decision about the sex of a person, and it is this deeper processing that is thought to underlie the superior memory performance. Warrington and Ackroyd (1975), in a somewhat differ- ent approach, have also shown effects on face recognition as a function of encoding task. Questions about type of processing 406
12

When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

Aug 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

Journal of Experimental Psychology:Human Learning and Memory1977, Vol. 3, No. 4, 406-417

When Face Recognition Fails

K. E. Patterson and A. D. BaddeleyMedical Research Council Applied Psychology Unit

Cambridge, England

Two studies investigated recognition of pictures of faces, focusing on theeffects of changes in appearance of the face from presentation to test andtype of processing or encoding. Experiment 1 demonstrated that (a) pre-viously seen faces changed in pose and facial expression were discriminatedfrom "new" faces essentially as well as pictures identical at presentation andtest; (b) major changes in the appearance of a face ("disguises") reducedrecognition almost to the level of chance; and (c) subjects encoding faces interms of personality characteristics showed better recognition performancethan subjects whose processing was based on physical, facial features. Ex-periment 2 expanded on result (b), utilizing photographs with systematicvariations in pose and in the presence/absence of glasses, wig, and beard.The design required subjects to learn names for target faces and then toidentify those targets in a series of test photographs. The manipulations ofpose and disguising features produced effects on probability of identificationthat were orderly and dramatic in magnitude. Simple changes in appearancecan effectively interfere with recognition of faces.

When you encounter a face you haveseen before, rarely if ever will the appear-ance of that face as a visual pattern beidentical on the two occasions. Despite thisfact, and although memory for faces hasrecently enjoyed some increased popularityas a topic for investigation, most studieshave dealt only with identical pictures ofa face at presentation and at test. Galperand Hochberg (1971) did determine thatmemory for a picture of a face carries someinformation about expression; and a num-ber of studies have looked at the effects oftransforming faces via inversion or photo-graphic negative (for a review, see Ellis,1975). In general, however, there is littleexperimental evidence about the effects ofrealistic changes in appearance on recogni-tion of faces. Experience tells us that some

We wish to acknowledge the substantial assistanceprovided by D. C. V. Simmonds, whose excellentphotography made these studies possible. Our thanksalso to R. Milroy, who assisted in the analysis ofExperiment 1, and to M. Woodhead, who partici-pated in the planning of this research.

Requests for reprints should be sent to K. E.Patterson, MRC Applied Psychology Unit, 15Chaucer Road, Cambridge CB2 2EF, England.

variations in appearance may go essentiallyunnoticed (e.g., a different facial expressionor a small change in hair style), whereasother variations may produce at least atemporary failure of recognition, even in aperson you have seen many times.

Another issue in the study of memory forfaces concerns the way in which the in-formation in a face is encoded. Craik andLockhart's (1972) influential paper on"levels of processing" has focused attentionon this issue generally; Bower and Karlin(1974) have applied this notion specificallyto memory for faces, showing that a judg-ment about the honesty or likeableness ofthe person represented in a picture leads tobetter recognition of that picture than ajudgment about the sex of the person. Adecision about honesty or likeablenesspresumably requires more extensive en-coding of a face than does a decision aboutthe sex of a person, and it is this deeperprocessing that is thought to underlie thesuperior memory performance. Warringtonand Ackroyd (1975), in a somewhat differ-ent approach, have also shown effects onface recognition as a function of encodingtask. Questions about type of processing

406

Page 2: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

WHEN FACE RECOGNITION FAILS 407

take on added interest in the context ofchanges in the appearance of a face. Sincea face will almost inevitably be somewhatdifferent when you see a person for thesecond time, at least in terms of its expres-sion or the angle from which you view it,some important similarity between the twoexperiences must exist for recognition tooccur. If certain kinds of processing empha-size those aspects that are relatively in-variant across alterations in appearance,then such processing might reasonably beexpected to enhance facial recognition.

The two issues delineated above weretranslated into the following experimentalquestions for Experiment 1:

1. How is performance in a recognitiontask affected when a picture of a face ischanged from presentation to test? Twotypes of change were employed: (a) Theperson's actual appearance remained thesame, but the test picture presented himin a different pose and with a differentfacial expression from the original picture;(b) the person's appearance was alteredby such features as a different hair style,an added (or removed) beard, added (orremoved) glasses, and so forth. The predic-tion was that identical pictures at presenta-tion and test would produce better recogni-tion than pictures that differed in any way,and further that more dramatic alterationsin appearance would produce larger decre-ments in performance. This predictionwould seem to be logical whether one be-lieves that recognition is based on rein-statement of the stimulus situation (e.g.,Melton, 1963) or reinstatement of cognitiveoperations (e.g., Kolers, 1973).

2. How does recognition performance varyas a function of encoding strategy? Twotypes of processing were studied, emphasiz-ing either facial features or personalitycharacteristics of the person whose facewas represented in the picture. AlthoughBower and Karlin (1974) have alreadydemonstrated a levels-of-processing effectfor faces, this question seems to warrantfurther attention for several reasons. First,it is frequently assumed (for example, inidentification techniques used for criminalinvestigations, such as Identi-Kit or Photo-

Fit) that specific physical features arecritically involved in the perception andrecognition of faces. Bower and Karlin'sexperiment, which compared judgments ofsex to judgments about personality, doesnot address this assumption. The possi-bility remains that subjects instructed tofocus on physical features (such as shapeof the face, distance between the eyes)might recognize faces as well as, or betterthan, subjects whose encoding was basedon personality judgments. Second, even ifthe deeper processing involved in person-ality judgments does yield superior recog-nition of identical pictures, it is possiblethat an encoding based on personalitycharacteristics will be more vulnerable tochanges in appearance than an encodingbased on facial features.

Two additional variables were includedin this experiment: (a) list length and (b)similarity between targets and distractors.Both are known to influence recognitionperformance with stimulus materials otherthan faces, and the latter has been shownto affect recognition of a single target face(Laughery, Fessler, Lenorovitz, & Yoblick,1974). These factors consequently providea basis for fitting our results into the con-text of previous findings on recognitionmemory.

Experiment 1

Method

Subjects, Subjects for this experiment were 36female members of the Applied Psychology Unitsubject panel, who were paid for their participation.

Materials. All pictures consisted of faces (orrather head and shoulders) of men, in black-and-white slide form. There were two broad categoriesof people: young enlisted men from the Royal Navy(who were photographed at the Applied PsychologyUnit) and professional but little-known actors(whose pictures were taken from Spotlight, a bookof actors' photographs published semiannually inthe United Kingdom). For the pictures of sailors,though no attempt was made to select similar-looking individuals, the interitem similarity can beconsidered fairly high due to the following factors:(a) The age of the men in question varied only fromabout 17-28 years; (b) they were photographed allwearing an identical black sweater; (c) none of themen was photographed wearing glasses; (d) thoughhair style is not strictly denned by military regula-

Page 3: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

408 K. E. PATTERSON AND A. D. BADDELEY

tions, this population certainly shows a narrowerrange of styles than the general population; and (e)none of the men had a beard or moustache. Therewere no such common factors among the photo-graphs of actors, which were consequently con-sidered of low interitem similarity.

Within each of these two categories, the stimulican be further characterized by the relationship be-tween the appearance of a face at presentation andits appearance at test. For the sailors, all presenta-tion photographs had a full-face pose and an un-smiling expression. At test, the picture of a to-be-recognized sailor was either the same picture as atpresentation or a picture with three-quarter-facepose and a smiling expression. These two types willbe labeled identical and changed. It should be clearthat the changed category involved no alteration inthe features of a person's actual appearance, only achange in his pose and expression.

For the actors, the two categories of relationshipbetween presentation and test pictures were identicaland disguised. The former requires no explanation;the latter involved various kinds of alterations inappearance: changed hair style, addition (or re-moval) of beard and/or moustache, addition (orremoval) of glasses. These so-called disguised pic-tures were available because some of the actors inthe book used as our source provide two photo-graphs, from different roles. No attempt was madeto define the difference in appearance, either quali-tatively or quantitatively; and the only selectioncriteria applied were the avoidance of (a) extreme,theatrical appearances that would stand out in aseries of ordinary-looking people and (b) drasticalterations in appearance that made it difficult todetermine that the two versions were in fact thesame person, even when held side by side. Thislatter criterion should not, however, give the im-pression that we are dealing with small changes ofdetail: A man's appearance in disguised form wasgenuinely different from his appearance in theoriginal photograph.

Design and procedure. The three variables ofinterest were processing instructions (two levels,between-subjects), list length (two levels, within-subjects), and stimulus type (four levels, within-subjects).

Processing instructions focused the subject's at-tention on either facial features or personality char-acteristics, with 18 subjects per condition. Subjectswere required to rate each face on four S-point scalesrelevant to either features or personality, the scalesbeing identified by their end points. For the featurescondition, the rating scales included (a) small nose -large nose, (b) thin lips - full lips, (c) eyes closetogether - eyes far apart, and (d) round face - longface. For the personality condition, the rating scaleswere (a) nice - nasty, (b) reliable - unreliable, (c)intelligent - dull, and (d) lively - stolid. Subjects inboth conditions were told that judgments of thekind they were to make might help them to recog-nize the faces on the test, and it was emphasizedthat these were subjective judgments with no

"right" answers. The presentation series appearedon a screen, one face at a time, each for 28 sec; duringthat time, the subject rated the face on all four ofthe appropriate scales, by circling numbers from 1to 5 on response sheets.

Each subject was presented with two lists or setsof faces, one consisting of 6 faces and the other of24. There were two separate lists of 24 differentfaces, List A and List B, with Set Size 6 comprisinga subset of Set Size 24. As is typical in this kind ofdesign, some confounding was inevitable because ifa subject received List A with Set Size 24, then shehad to receive List B for Set Size 6; and further, ifshe had Size 24 first in the test session, then shehad to have Size 6 second. There were thus fourcombinations (24-A followed by 6-B, 24-B followedby 6-A, 6-A followed by 24-B, and 6-B followed by24-A), which when added to the variable of proces-sing instructions (features or personality) yieldseight testing groups. Subjects within one group weretested together.

As described in the section on materials above,there were four types of stimuli, as defined by (a)the relationship between the appearance of a person'sface at presentation and the appearance of his faceat test and (b) the general similarity of the person'sappearance to that of other faces in the list. Thefour types (in descending order of expected recog-nizability) were (a) identical (dissimilar), (b) identi-cal (similar), (c) changed (similar), and (d) disguised(dissimilar). There are obviously some possiblecombinations missing from this set, but this wasnot intended to be a parametric study. Since changedfaces came only from the similar set and disguisedfaces only from the dissimilar set, these types willbe referred to simply as changed and disguised. Itshould perhaps be reiterated that reference to apicture at presentation as identical, changed, ordisguised takes account of the subsequent appear-ance of that person at test. Subjects, of course, hadno idea which stimulus pictures belonged to whichof these categories. They were, however, instructedabout the nature of the categories and were showna sample study and test photograph for eachcategory.

Stimulus type was a within-list variable. Each listor target set was composed in the following propor-tions: one sixth identical (dissimilar), one sixthidentical (similar), one third changed, and one thirddisguised. Thus for Set Size 6, the number of stimulirepresenting these four types, respectively, was one,one, two, two. For Set Size 24, the correspondingnumbers were four, four, eight, eight. The order ofthe stimuli within a list was randomized.

Test lists for the yes-no recognition test consistedof all of the target faces plus twice as many distrac-tors. Thus, test lists for Set Size 6 contained 18 facesand for Set Size 24 contained 72 faces, with targetprobability constant at .33. Half of the distractorswere faces from the similar set and the other halfwere of the dissimilar type. Further, half of thesimilar distractors were in full-face pose with un-smiling expression and the other half in three-

Page 4: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

WHEN FACE RECOGNITION FAILS 409

quarter-face pose with smiling expression. Thislatter manipulation was necessary because targetsof the identical (similar) type were full face andunsmiling at the test, whereas targets of the changedtype were three-quarter face and smiling at test.The pictures for the test lists were arranged suchthat each half of each test list contained half ofeach type of target and half of each type of distrac-tor; within this constraint, the order of pictures wasrandomized.

Subjects were tested in eight small groups, halfwith four members and half with five members.Testing was carried out in a large, dimly lit room,with subjects seated in a. row about 3-4 m from thescreen on which the slides were projected. Beforepresentation of the first target list, instructions weregiven regarding the rating task (features or person-ality) and the number and nature of the faces to bestudied for subsequent recognition. The slides werethen shown (28 sec per face) and subjects carriedout the rating task. The interval between presenta-tion and test was approximately 4 min, during whichtime the rating sheets were collected, the recognitionsheets provided, and the test instructions given.Subjects were told the length of the test list andwere reminded that since a target would often re-occur with changed appearance, the decision mustbe "not have I seen this exact picture before, butrather have I seen a picture of this person before?"The recognition test consisted of a binary recogni-tion decision plus a confidence rating. For each testpicture, the subject circled either yes or no andcircled one of the three words certain, probable, pos-sible. The presentation rate of the test series was10 sec per slide.

After the test on the first list, subjects were givena few moments to relax, and the experiment thenproceeded to the second list. Subjects were informedthat the second list would contain all new faces,that it would be different in length from the firstlist (6 if the first had been 24, or vice versa), andthat in all other respects the procedure would beidentical.

Results and Discussion

Recognition performance was analyzedin terms of hit rates, false-positive rates,and d' values. Because many of the relevantvariables affected false-positive rates inaddition to or instead of hit rates, a mea-sure like d' that incorporates the two seemsmost appropriate. It should be mentionedthat when the data were analyzed sepa-rately for the different types of stimulusfaces (particularly with List Length 6), anumber of subjects showed hit rates ofeither 1.0 or 0 for some of the types andoccasionally even false-positive values of1.0 or 0. This problem was handled by

converting rates of 1.0 to .99 and rates of0 to .01 and assigning d' accordingly.

A general pattern of results will besketched in first, followed by a closer lookat each of the experimental questions. Thedata were subjected to a four-way analysisof variance, where the four factors weretype of processing, list length, stimulustype, and a factor corresponding to thecombination of list (A or B) and order(first or second in the test session). Thislatter factor was not significant in any ofthe analyses. Considering only d' for themoment, each of the other three factorsyielded a significant main effect. Encodingfaces in terms of personality characteristicsproduced better recognition performance(mean d' = 1.94) than encoding based onfacial features (d1 = 1.50), F(l,32) = 7.28,p < .02. List Length 6 produced better dis-criminability (d' = 2.00) than List Length24 (mean d' = 1.44), ^(1, 32) = 23.9, p <.005. And stimulus type was a highly sig-nificant source of variance, F(3,96) = 48.06,p < .005, with the following mean d' values:identical (dissimilar) (that is, where thetarget face was an actor and the photo-graphs were identical at presentation andtest) d' = 3.00; identical (similar) (photo-graphs of sailors, unchanged from presenta-tion to test) d' = 1.67; changed (photo-graphs of sailors, changed in pose andfacial expression between presentation andtest) d' = 1.66; disguised (actors with realchanges in appearance) d' = .58. Apartfrom the three significant main effects inthe analysis of variance, there was onesignificant two-way interaction, betweenlist length and stimulus type, F(3, 96) =4.88, p < .01. An increase in list lengthproduced lower performance for all stimu-lus types, but this decrement was largerfor changed targets than for any of theother types. None of the remaining inter-actions was significant.

The hit rates, false-positive rates, and d'values for each combination of variablesappear in Table 1. The experimental ques-tions outlined in the introduction will nowbe taken up in turn.

How is performance in a recognition taskaffected when the picture to be recognized

Page 5: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

410 K. E. PATTERSON AND A. D. BADDELEY

Table 1Recognition Performance in Experiment 1 as a Function of Stimulus Type, Encoding Strategy,and List Length (6 or 24 Items)

Stimulus type

Encodingstrategy

d'FeaturesPersonalityM

Hit rateFeaturesPersonalityM

False-positive rateFeaturesPersonalityM

Identical(dissimilar)

6

3.073.41

3,

1.001.00

.31

.22

24

2.462.92

,00

.931.00

.98

.31

.31.29

Identical(similar)

6

1.711.80

1

.94

.89

.59

.48

24

1.361.80

.67

.86

.96.91

.54

.57.55

Changed

6

2.122.55

1

.81

.92

.29

.35

24

.691.30

.66

.75

.79.82

.54

.42.40

Disguised

6

.111.22

.58

.36

.53.45

.31

.22.29

24

.45

.55

.46

.46

.31

.31

M

1.501.94

.76

.82

.40

.36

changes from presentation to test? Theeffects of a change from full-face unsmilingphotograph at presentation to three-quarter-face smiling at test can be assessed by com-paring identical (similar) and changed.Targets with changed pose and expressionshowed a lower overall hit rate (.82) thanthose tested with identical photographs(.91), <(96) = 2.55, p < .01. However,distractor items with pose and expressioncomparable to the changed targets alsoshowed a much lower false-positive rate(.40) than distractors comparable to un-changed targets (.55), *(96) = 4.33, p <.001. The net result is that discriminabilitybetween targets and distractors was un-affected by this manipulation (mean d' =1.67 vs. 1.66). This result implies thatsubjects did encode information about poseand expression (as concluded by Galper &Hochberg, 1971); but in this situation theydid so only as a characteristic of targets ingeneral rather than as part of the code forspecific faces. Thus, if a test photographhad the same pose and expression as theoriginal pictures, it was more likely to becalled "old" whether it was a target or adistractor. This might seem a logical strat-egy in a design where pose and expressionof all original pictures is similar, though

subjects were instructed that a number ofthe targets would be changed on thesedimensions between presentation and test.

By contrast, and perhaps not surpris-ingly, recognition performance was dra-matically affected when test photographsof the targets included a changed hairstyle, added (or removed) beard, and soforth. The relevant data here come fromidentical (dissimilar) and disguised. Thefalse-positive rate for these two conditionsis the same, since in this particular experi-mental design it was not possible to haveseparate sets of distractors for these twoclasses of targets. The overall hit ratedropped from .98 to .45 in response to thedisguise manipulation; d' dropped from amean of 3.00, which represents very gooddiscriminability, to .58, which is approach-ing chance performance. Disguises thusinterfered quite effectively with recognizingfaces, even in a situation in which subjectshad a reasonably long time for originalencoding of the faces (28 sec each).

How does recognition performance varyas a function of encoding strategy? Themean hit rate for subjects who judged facialfeatures (combined over all stimulus typesand both list lengths) was .76, while the hitrate for subjects making personality judg-

Page 6: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

WHEN FACE RECOGNITION FAILS 411

ments was .82. The superiority of thepersonality group falls just short of sta-tistical significance, F(l, 32) = 3.32, .05 <p < .10. The personality group also showeda somewhat lower false-positive rate thanthe feature group (.36 vs. .40), which (whilea nonsignificant difference on its own) whencombined with hit rates to yield d' valuesproduced the significant processing effectreferred to above (dr — 1.94 vs. 1.50).Making judgments about the personalityof a person in a photograph thus resultedin somewhat better ability to discriminatebetween targets and distractors than didanalysis of the person's physical facialfeatures.

This levels-of-processing effect is not asstriking in magnitude as that obtained byBower and Karlin (1974), which is under-standable since specific feature judgmentsmust involve more extensive processing ofa face than a decision about the sex of aperson. The important point is that thebasic pattern of results is in agreement withBower and Karlin's finding; our resultsclearly do not implicate analysis of facialfeatures as a critical or optimal basis forface recognition. Finally, it is not reallypossible to evaluate our tentative notionsabout type of encoding in the context ofchanges in appearance; one type of changeproduced negligible effects on recognitionperformance, whereas the other virtuallyeliminated recognition. Given the lack ofa statistical interaction between type ofprocessing and stimulus type, we can prob-ably abandon the notion that feature pro-cessing might subserve better recognition offaces with changed appearance.

We have interpreted the difference be-tween personality and feature encoding interms of deeper processing engendered bythe former. It is possible, however, thatthis difference would be better conceptual-ized by an alternative (though not incom-patible) notion: Perhaps personality ratingsdirected attention to the face as a whole,whereas feature ratings emphasized isolatedsegments. The whole-versus-parts explana-tion seems unlikely to suffice, given thatfaces are probably seen as integrated unitswhatever the instructions, and given also

that face shape (one of the feature ratings)seems something more than an isolatedsegment. But we acknowledge that thecritical difference between our two encod-ing groups remains unspecified.

How is facial recognition affected byvariables known to influence performancewith other sorts of materials? Increasinglist length, for example, produces lowerlevels of recognition performance withwords (Murdock & Anderson, 1975). In thepresent study, lists of 24 faces showed aslightly lower hit rate (.78) than lists of6 (.81), but this difference failed to reachsignificance, F(l, 32) = 1.72, p > .10. Onthe other hand, list length was a significantsource of variance both for false-positiverate (.41 for 24 vs. .35 for 6), F(l, 32) =7.44, p < .02, and, as already mentioned,for d'. Increasing the size of the target setthus produced a reliable decrease in sub-jects' ability to discriminate between previ-ously seen and new faces.

The extent to which target and distrac-tor items are similar or belong to commoncategories is known to influence both recog-nition of words (Mandler, 1972; Rabino-witz, Mandler, & Patterson, in press) and,more germane to the present context, recog-nition of complex pictures (Mandler &Johnson, 1976). The effect of interitemsimilarity here can be assessed by compar-ing identical (dissimilar) and identical(similar). There was a marginal differencebetween hit rates (probably attenuated bya ceiling effect), .98 for actors versus .91for sailors, *(96) = 1.84, p < .05. A verysubstantial difference in false-positive rates(.29 for dissimilar vs. .55 for similar), how-ever, when combined with the difference inhit rates, indicates that discrimination be-tween old and new faces was substantiallyless accurate for similar photographs (meand' = 1.67) than for dissimilar photographsof faces (d1 — 3.00). As a cautionary noteto this conclusion, it should be emphasizedthat no ratings of similarity were takenhere. Our classification rests merely on theassumption that relative homogeneity ofage, hair style, dress, presence of accessories(glasses, beard, etc.), and general style ofphotographs (lighting, pose, etc.) yields

Page 7: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

412 K. E. PATTERSON AND A. D. BADDELEY

higher interitem similarity than relativeheterogeneity of these factors. The possi-bility remains that some other differencebetween the photographs of sailors andthose of actors might alternatively or addi-tionally account for the obtained effect.Similarity seems the most plausible explana-tion, however, especially since Laugheryet al. (1974) have found strong similarityeffects in a series of studies requiring recog-nition of a single target face.

The possibility has often been raised thatfaces may be handled by the cognitivesystem in a way that is physiologically orstrategically distinct from the processing ofother sorts of material (see Ellis, 1975, for areview). This experiment does not, norcould it, offer any conclusive evidence onthis issue. What it does show is that interms of the vulnerability of memory to theinfluence of variables like list length andsimilarity, faces do not appear to belong ina class by themselves.

A few additional aspects of the datashould be mentioned. First, there was anoticeably high overall rate of false posi-tives. Although the ratio of distractors totargets in the test lists was 2:1, and wasspecified as such in the test instructions,the subjects seemed to be operating oncloser to a 1:1 response criterion. We hadselected a target probability of .33 as a com-promise between the standard figure of .5,which seems so far removed from the fre-quency of recognizing people outside theexperimental situation, and the procedureused by Laughery and his co-workers (e.g.,Laughery, Alexander, & Lane, 1971) ofembedding one target face in a long testlist, which is more realistic but also moreexpensive as a data-gathering procedure.

Confidence ratings in recognition werecollected and analyzed but do not con-tribute much information on the questionsof interest. Considering possible as 1, prob-able as 2, and certain as 3, we found thetypical overall difference between meanconfidence of hit responses (2.49) and false-positive responses (2.04), indicating thatsubjects have more information in a recog-nition task than is revealed by a simpleyes-no decision. The only experimental

variable that produced notable effects onconfidence ratings was disguise: Mean con-fidence of hits on identical actors was 2.90,whereas confidence for hits on disguisedactors was 2.02. The latter figure is indeedscarcely higher than the mean confidenceof false-positive responses to actors, 1.90.In other words, subjects not only failed torecognize many targets of the disguised setbut also expressed low confidence in theirresponses when they did correctly recog-nize targets of this type.

Finally, with regard to generality of theobtained results, while the subjects de-scribed here were all female (though witha wide range of ages and backgrounds), theexperiment has also been carried out with16 young enlisted men as subjects. Theirdata have not been fully analyzed and re-ported because unfortunately some of thetest subjects knew some of the photo-graphic subjects. Enough data were ob-tained in this replication, however, to es-tablish that the results with male subjectsdid not differ in any important way fromthe pattern delineated here.

Experiment 2

Experiment 1, while showing that dis-guises are effective, does not permit con-clusions about the sort of change that pro-duces an effective disguise. The two differ-ent photographs of an actor often involvedconcurrent changes in many factors, suchas lighting, makeup, even age (that is, insome cases the two photographs had beentaken several years apart), as well as hairstyles, beards, and so forth. Experiment 2was designed as a first step in evaluatingthe basis of an effective disguise and in-volved production of our own photographs,so that specific aspects of appearance couldbe systematically altered.

Experiment 2 also provided a vehicle forincorporating another, and largely ignored,aspect of face recognition. Virtually allstudies in the literature have asked thesubject to answer the question "Have youseen this face before?" When recognizingfaces outside the laboratory, however, youdo not determine merely that you have seen

Page 8: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

WHEN FACE RECOGNITION FAILS 413

a face before; you decide whose face it is.In this experiment, we switched from arecognition paradigm to an identificationprocedure, where subjects learned namesfor target faces; at test they responded"no" to an unfamiliar face and respondedwith a name for a familiar face.

Method

Subjects. Sixty-two female members of the Ap-plied Psychology Unit's subject panel participatedin this experiment; none had been tested in Experi-ment 1.

Materials. A series of photographs was taken ofeach of 10 men, 5 enlisted men from the Royal Navy(all approximately in their early 20s) and 5 membersof a local amateur dramatics group (ranging in agefrom early 30s to mid-60s). The series consisted ofthree different views or poses (full face, three-quarterface, profile; all head and shoulders only) in each ofeight versions of appearance: (a) natural, (b) withglasses (if a man normally wore glasses, they wereremoved for the "natural" version), (c) with a wig,(d) with glasses and wig, (e) with a beard, (f) withglasses and beard, (g) with wig and beard, (h) withglasses, wig, and beard. Pose and version combinedthus yielded 24 photographs, which were made intoblack and white slides. Five wigs were hired from atheatrical wardrobe supplier, and the most appro-priate-looking one (in terms of color, hair length)was selected for each man. Beards (which alwaysincluded both beard and moustache) were createdindividually for each man out of commercially ob-tained crepe hair, attached by a combination ofspirit gum and double-sided tape; again, in termsof color and style, the attempt was to produce asnormal an appearance as possible. The generalstyle of all photographs was the same with regardto lighting, background (plain white), and facialexpression (unsmiling).

Design. Two target sets (A and B) of five meneach were created from the total set of ten; A in-cluded three sailors and two actors, B was com-prised of two sailors and three actors. For eachtarget, one of the eight versions of appearance wasselected at random to be the study version; withineach target set, however, this selection was withoutreplacement. These study photographs were all infull-face view. Subjects studied either Set A or B,and were "introduced" to the five men, each witha name. The same five names, selected from atelephone directory, were used for both target sets:Gordon Davis, Andrew Williamson, Duncan Harvey,Howard Foster, Peter Jessop. Subjects saw the fivestudy photographs several times and learned thename for each.

The recognition, or rather identification, testpresented a series of 80 slides, consisting of all 8versions of all 10 men. For subjects who had studiedTarget Set A, then, the 40 pictures of Target Set B

served as distractors, and vice versa. In the case ofboth targets and distractors, half of the test photo-graphs were in three-quarter view and half in profileview. Thus, although subjects saw 8 test photo-graphs of each target face they had studied, 1 ofwhich was the same version of appearance as thestudy photograph, none of the actual study photo-graphs (all full face) appeared in the test series.Pose was combined with version for each man byassigning either three-quarter or profile to the"natural" version and then alternating between the2 views in regular progression through the 8 versions.For the test, two different random orders of the 80slides were prepared, with the following restrictions:(a) Two photographs of the same man did notappear in immediate succession; (b) each half of thetest list contained roughly half of the pictures ofeach man, the split being either 3-5, 4-4, or 5-3.Each of the two test orders was given to half of thesubjects who had studied Target Set A and half ofthose studying Set B.

Procedure. Subjects were tested in four groups(of approximately IS each) corresponding to thefour possible combinations of target set and testorder. Subjects were first instructed that they wereto learn names for five men; the names were on ablackboard at the front of the room, so that per-formance would not suffer from an inability toretrieve a name. The series of five target faces wasthen shown a total of six times, in different randomorders, with each face on view for a total of about45 sec over the six exposures. The first two timesthrough the set, subjects sat silently while the ex-perimenter called out the name for each target; forthe third and fourth exposures of the set, subjectswere asked to call out the names; on the fifth run,subjects were instructed to name each face silently;and the final exposure was like the first two.

Before the main identification test, subjects weregiven a preliminary test to assess their ability toname the actual study photographs. A sequence of10 slides was shown (each target appearing twice,pictures identical to those studied) in a randomorder so that, at least until the 10th, identificationcould not be done purely by elimination. Subjectswrote a name for each photograph on a sheet ofpaper numbered 1-10.

For the main test, subjects were given a small8-page booklet, each page having 10 lines; on eachline was printed the five full names plus the wordnone. Instructions described the nature of the photo-graphs, including the facts that they would bethree-quarter and profile poses instead of full face,that the appearance of the target faces had beenchanged in various ways, and that there would bemultiple photographs of both targets and distrac-tors. For each slide, subjects were to underline oneof the five names or the word none on the appro-priate line.

The timing of the test was designed to provide ashort exposure time for the faces without puttingunnecessary time pressure on responding. The 80test slides were arranged in two Carousel magazines

Page 9: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

414 K. E. PATTERSON AND A. D. BADDELEY

•90-

•8O

!*>

i! '5OL,

ouc '40o

.2

Q.

'30

•2O

J3/4 Profile

Figure 1. Proportion of correct identifications as afunction of changes in pose, wig, and beard.

with a blank after each slide. A timer attached tothe projector exposed each test slide for a durationof 2 sec from the onset of the slide to onset of thefollowing blank (about 1.3 sec actual viewing time) ;after presentation of each slide, the timer wasswitched off and subjects were given a comfortable6 sec (approximately) in which to make their re-sponses. At the end of the response period, theexperimenter called out "ready" to ensure thatattention would be on the screen when the next slideappeared.

Data analysis. Four of the 62 subjects had to bediscarded for failure to carry out the procedureproperly; 1 never used the response none, and theremaining 3 left too many items blank to providereasonable data. Of the remaining 58 subjects, 45performed perfectly (10/10) on the preliminaryidentification of study photographs. Although itmakes for a rather high rate of attrition, we decidedto include only these 45 subjects in the analysis ofthe main identification test. If a subject could notidentify a target from an identical and thoroughlystudied photograph, interpretation of the effects ofchanges in appearance and pose on identificationwould be considerably more ambiguous.

Due to the rather small number of targets, ourstrategy was to analyze the effects of change in eachelement (glasses, beard, wig) and the various com-binations of elements without regard to the directionof change (adding vs. removing). With this ap-proach, all target faces contribute to evaluating theeffect of each type of change.

Results and Discussion

The data on probability of correct identi-fication of target faces were subjected to afive-way analysis of variance, with thefollowing five factors: pose (three-quarteror profile), glasses, wig, beard (unchangedor changed, for each of these three), andgroup (the four combinations of two dif-ferent target sets and two different test-listorders). The effect of this last factor did notapproach significance (F < 1), and conse-quently all data presented will be combinedfor the two target sets and the two testorders. Three of the four main variablesproduced large and straightforward effects;the fourth, glasses, had a rather inconsistenteffect that tends to complicate a picture ofsurprising simplicity in other respects.

Concentrating on that simplicity for themoment, Figure 1 presents the probabilityof correct identification as a function ofpose, wig, and beard. The data here havebeen combined over the glasses variable,such that the top line reflects performanceon photographs with no change in appear-ance plus those with a change in glasses,the next line includes both change in wigonly and change in wig and glasses, and soon. Figure 1 shows that the changes inappearance and pose investigated here pro-duced dramatic variations in level of per-formance, from 90% correct identificationunder the best conditions to 30% underthe worst. The major results, illustrated byFigure 1 and substantiated by the analysisof variance, are: (a) The probability ofcorrectly identifying a target in three-quarter pose (overall mean = .65) was con-sistently and significantly greater than forprofile pose (mean = .50),F(1,41) = 46.79,p < .001. (b) Probability of identifying atarget was substantially higher with un-changed hair style (overall mean = .67)than when a wig had been added or re-moved (mean = .49), F(\, 41) = 68.52,p < .001. (c) For the largest main effect,photographs unchanged in presence or ab-sence of a beard were much easier to iden-tify (.69) than those changed on the beardvariable (.47), /?(!, 41) = 94.42, p < .001.

Page 10: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

WHEN FACE RECOGNITION FAILS 415

(d) None of the interactions among thesethree variables was statistically significant.

Of the 24 possible interactions in a five-way analysis, 3 did reach significance. Atwo-way interaction of pose and group,F(3, 41) = 8.26, p < .01, indicates that thedecrease in correct identification from three-quarter view to profile was stronger forTarget Set A than for Set B. The othersignificant interactions were both three-way : a substantial Pose X Glasses X Beardeffect, F(l, 41) = 11.74, p < .001, and amarginal Glasses X Wig X Beard effect,F(l, 41) = 4.47, p < .05. Both of theseinteractions involve glasses, which we shallnow consider.

A more complete version of the resultsis provided in Table 2, in which the effectof the glasses variable can be inspectedby comparing vertical pairs of values in thefirst two columns of the table. First, itshould be stated that there was a significantmain effect of glasses: Target photographsunchanged on this variable were identifiedmore readily (overall mean = .61) thanthose with a change in glasses (.54), F(l,41) = 13.71, p < .001. As Table 2 and thestatistical interactions reveal, however,while the effect of glasses may have beensignificant, it was not consistent. For ex-ample, a change in glasses alone producedno effect at all for three-quarter view buta substantial reduction in identification forprofile. And while adding a change inglasses to a change in either beard alone orwig and beard caused a further decrementin performance, a parallel decrement didnot occur when a change in glasses accom-panied a change in wig.

Table 2 also records the number of ob-servations on which the data on correctidentification are based. The combinationof two design features, (a) systematic alter-nation between three-quarter and profileas each target changed from normal ap-pearance through glasses-wig-beard and(b) random selection of the study versionof appearance for each target, yielded anunequal number of observations for thevarious conditions. The total number foreach version of appearance, that is withthree-quarter and profile combined, was

Table 2Proportion of Correct Identifications andNumber of Observations on Which They AreBased in Experiment 2

Features changed

Mean No.proportion observations

Three- Three-quarter Profile quarter Profile

NoneGlasses onlyWig onlyGlasses + wigBeard onlyGlasses + beardWig + beardGlasses + wig + beard

.889

.889

.644

.671

.673

.563

.519

.385

.796

.563

.511

.516

.493

.467

.339

.293

1121369069

15613589

113

11389

1351566990

136112

always 225, corresponding to five targetsfor 45 subjects.

We have presented data only for correctidentifications, and some other generalcharacteristics of performance should bementioned at least briefly. Failure to iden-tify a target photograph correctly couldconstitute either a misidentification (awrong-name response) or a miss (a "none"response). Both types of errors occurred,though misses were twice as frequent (.28overall) as misidentifications (.14). Fordistractor photographs, the overall false-positive rate (assigning a target name to adistractor) was .30, and the remaining 70%of distractors were correctly called "none."Anecdotally, it might be noted that false-positive and misidentification responses tocertain distractors and targets showed astriking degree of intersubject concordance,suggesting that the sort of misidentification(even by multiple witnesses) that some-times occurs in criminal cases may be re-producible in the laboratory.

Regarding the effects of pose, the resultsof Experiments 1 and 2 together suggestthat when a face has been studied in full-face view, a change to three-quarter viewhas little if any effect on recognition,whereas a change to profile view makesrecognition more difficult. Changing pic-tures from full face to three-quarter didaffect performance in Experiment 1, but itwas an effect on criterion rather than dis-criminability: Subjects were generally morelikely to respond "old" to full-face pictures

Page 11: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

416 K. E. PATTERSON AND A. D. BADDELEY

in the recognition test. No full-face viewswere included in the test in Experiment 2,so this kind of response bias did not occur.In fact, there was no criterion difference asa function of pose in Experiment 2: Themean false-positive rates for distractoritems in three-quarter and profile viewswere .30 and .29, respectively. In the ab-sence of any changes of appearance inExperiment 2, three-quarter view targetswere correctly identified with a probabilityof .9, little lower than the perfect identi-fication of full-face targets on the prelimi-nary naming test. Profile view, on the otherhand, had a consistently detrimental effecton identification in Experiment 2. Previouswork on this question by Laughery et al.(1971) had shown only a nonsignificanttrend toward lower hit rates for profile ascompared with three-quarter view.

Regarding the success of disguises, Ex-periment 2 corroborates the result of Ex-periment 1, and extends it by demonstrat-ing that the effectiveness of a disguise doesnot depend upon complex changes inappearance or style of photograph. Simpleaddition or removal of various elements ofappearance can have substantial effects onthe probability of identifying faces that arereadily identifiable without such changes.

Concluding Comments

For a summary of the information ob-tained about face recognition from theseexperiments, we return for a third and finaltime to the main issues raised in the intro-duction to this article.

How is recognition affected by changesin appearance? Experiment 1 gave the notsurprising pair of answers that performancewas very little affected by small changesin pose and expression but was seriouslyimpaired by major changes in appearance.Experiment 2 explored disguises in a moreanalytical and controlled fashion; andwhile its results, too, are congruent withcommon-sense expectations, the orderlinessand magnitude of the effects seem ratherimpressive. Recall that subjects (a) studiedeach of a small number of targets exten-sively, (b) could at least initially identify

those targets perfectly from the studyphotographs, (c) knew that each targetwould appear a number of times in theidentification test, and (d) knew that thetarget photographs in the test would bechanged in pose and appearance. Undersuch circumstances, and where there is nopenalty for false positives, one might expectsubjects to adopt a strategy of "call itHoward Foster if it looks even faintly likeHoward Foster." True, no such criterion-biasing instructions were given. Yet itseems quite dramatic that if a particularfull-face picture of Howard Foster can bereliably identified, a change to three-quarterpose and a single change in either hair styleor beard should reduce the probability ofidentification to around .65; a change inglasses, wig, and beard (still three-quarterface) should bring it down to .385; and thelatter combination of changes presented inprofile pose should pull it yet further downto .29, a figure essentially equal to thefalse-positive rate.

How is recognition affected by encodingstrategy? In Experiment 1, encoding a facein terms of inferred personality characteris-tics produced better performance than en-coding in terms of facial features. As alreadynoted, this levels-of-processing effect wassomewhat smaller than that shown byBower and Karlin (1974); yet there is asense in which the current result seems morestriking. Bower and Karlin discussed theidea that their relatively shallow processingtask (judging a pictured face to be that ofa male or a female) might be based on asingle salient characteristic (e.g., presenceof a necktie or of cosmetic makeup). Thedeeper processing involved in a judgmentof honesty, on the other hand, would prob-ably "require comparison to an idiosyn-cratic set of vague prototypes or criteria,regarding the patterning of features suchas distance between the eyes, size of eyes,size of pupils, curvature of mouth, thick-ness of lips, and so on" (Bower & Karlin,1974, p. 756). The intriguing thing aboutthe present result is that when subjectswere asked to regard just such features asthese, but without reference to personalitycharacteristics, the resulting memory per-

Page 12: When Face Recognition Fails - Semantic Scholar...1977, Vol. 3, No. 4, 406-417 When Face Recognition Fails K. E. Patterson and A. D. Baddeley Medical Research Council Applied Psychology

WHEN FACE RECOGNITION FAILS 417

formance was reliably (though not greatly)inferior to recognition following personalityjudgments. Further, the features we se-lected are among the putative set of fea-tures critically involved, according to anapproach like Identi-Kit or Photo-Fit, inface recognition. Our results therefore sug-gest that, should we ever find an optimumstrategy for encoding faces, analysis of in-dividual features is unlikely to be its focus.

References

Bower, G. H., & Karlin, M. B. Depth of processingpictures of faces and recognition memory. Journalof Experimental Psychology, 1974, 103, 751-757.

Craik, F. I. M., & Lockhart, R. S. Levels of pro-cessing: A framework for memory research.Journal of Verbal Learning and Verbal Behavior,1972, 7/, 671-684.

Ellis, H. D. Recognizing faces. British Journal ofPsychology, 1975, 66,409-426.

Galper, R. E., & Hochberg, H. Recognition memoryfor photographs of faces. American Journal ofPsychology, 1971, 84, 351-354.

Kolers, P. A., Remembering operations. Memory &Cognition, 1973, 1, 347-355.

Laughery, K. R., Alexander, J. F., & Lane, A. B.Recognition of human faces: Effects of target

exposure time, target position, pose position, andtype of photograph. Journal of Applied Psy-chology, 1971, 55, 477-483.

Laughery, K. R., Fessler, P. K., Lenorovitz, D. R.,& Yoblick, D. A. Time delay and similarityeffects in facial recognition. Journal of AppliedPsychology, 1974, 59, 490-496.

Handler, G. Organization and recognition. In E.Tulving & W. Donaldson (Eds.), Organization ofmemory. New York: Academic Press, 1972.

Handler, J. M., & Johnson, N. S. Some of thethousand words a picture is worth. Journal ofExperimental Psychology: Human Learning andMemory, 1976, 2, 529-540.

Melton, A. W. Implications of short-term memoryfor a general theory of memory. Journal of VerbalLearning and Verbal Behavior, 1963, 2, 1-21.

Murdock, B. B., Jr., & Anderson, R. E. Encoding,storage, and retrieval of item information. In R.L. Solso (Ed.), Information processing and cogni-tion: The Loyola symposium. Potomac, Md.:Erlbaum, 1975.

Rabinowitz, J. C., Handler, G., & Patterson, K. E.Determinants of recognition and recall: Accessi-bility and generation. Journal of ExperimentalPsychology: General, in press.

Warrington, E. K., & Ackroyd, C. The effects oforienting tasks on recognition memory. Memory& Cognition, 1975, 3, 140-142.

Received January 19,1977 •