Constructing faces from memory: the impact of image likeness and prototypical representations

1

Constructing faces from memory: the impact of image likeness and prototypical

representations

Charlie D. Frowd (1*)

David White (2)

Richard I. Kemp (2)

Rob Jenkins (3)

Kamran Nawaz (4)

Kate Herold (4)

(1) Department of Psychology, University of Winchester, Winchester SO22 4NR, UK

(2) School of Psychology, University of New South Wales 2052 Australia

(3) Department of Psychology, University of Glasgow G12 8QQ UK

(4) School of Psychology, University of Central Lancashire PR1 2HE UK

* Corresponding author: Charlie Frowd, Department of Psychology, University of

Winchester, Winchester SO22 4NR, UK. Email: [email protected].

Phone: (01962) 624943.

Running head: Pictorial influences on face construction

2

Research suggests that memory for unfamiliar faces is pictorial in nature, with

recognition negatively affected by changes to image-specific information such as head

pose, lighting and facial expression. Further, within-person variation causes some

images to resemble a subject more than others. Here, we explored the impact of

target-image choice on face construction using a modern evolving type of composite

system, EvoFIT. Participants saw an unfamiliar target identity and then created a

single composite of it the following day with EvoFIT by repeatedly selecting from

arrays of faces with ‘breeding’, to ‘evolve’ a face. Targets were images that had been

previously categorised as low, medium or high likeness, or a face prototype

comprising averaged photographs of the same individual. Identification of

composites of low likeness targets was inferior but increased as a significant linear

trend from low to medium to high likeness. Also, identification scores decreased

when targets changed by pose and expression, but not by lighting. Similarly,

composite identification from prototypes was more accurate than those from low

likeness targets, providing some support that image averages generally produce more

robust memory traces. The results emphasise the potential importance of matching a

target’s pose and expression at face construction; also, for obtaining image-specific

information for construction of facial-composite images, a result that would appear to

be useful to developers and researchers of composite software.

(224 words.)

Originality: This current project is the first of its kind to formally explore the

potential impact of pictorial properties of a target face on identifiability of faces

created from memory. The design followed forensic practices as far as is practicable,

to allow good generalisation of results.

3

Witnesses and victims of crime often work with a forensic practitioner to

produce a likeness of an offender's face from memory. The resulting image is used by

law enforcement to generate new lines of enquiry in the hope of identifying the

offender. There are two contrasting types of software system employed to recover

these so-called ‘composite’ images from memory: observers construct a face by

selecting individual facial features—eyes, nose hair, mouth, etc.—or they select

whole faces from arrays of alternatives, with ‘breeding’, to ‘evolve’ a face (for a

review of computerised and non-computerised methods, see Frowd, Carson, Ness,

Richardson et al., 2005). Considerable research effort has been carried out over the

past four decades to understand these methods, identify their strengths and

weaknesses, and make improvements (for a review, see Frowd, 2012).

To test the effectiveness of composite-construction systems, a design is

normally used that simulates the applied context. Participants inspect a photograph of

a target person who is unfamiliar to them, and construct a composite of that person’s

face. Subsequently, another group of participants who are familiar with the target

attempt to recognise the constructed composite (e.g. Brace, Pike & Kemp, 2000;

Frowd, Carson, Ness, McQuiston et al., 2005; Valentine et al., 2010).

Image-choice is an important consideration for memory research using

unfamiliar-face stimuli, which is because superficial differences between target and

test images can have a significant effect on memory accuracy (e.g. Bruce, 1982).

When creating composites of target faces, the salience of these superficial pictorial

properties, resulting from transient environmental variables, is likely to reduce their

effectiveness. For example, Figure 1B shows a selection of photographs sourced from

Google Image. These images vary with respect to a range of environmental factors

including head pose, expression, head angle, lighting, camera-to-subject distance and

4

lens characteristics. For images of unfamiliar faces, even modest changes along such

dimensions can cause substantial error in both memory for unfamiliar faces (e.g.

Bruce, 1982; Davies & Milne, 1982; Longmore, Liu & Young, 2008; Valentine &

Bruce, 1988) and matching tasks where images are simultaneously presented (e.g.

Bruce et al., 1999; Jenkins, White, Van Montford & Burton, 2011). Further, since

most approaches for creating composites produce images that are standardized with

regards to pose, expression and lighting, the face-construction task is likely to be at

odds with the image-specific nature of unfamiliar face memory.

Whilst memory for unfamiliar faces is sensitive to image-specific variation,

recognition of familiar faces appears to be largely unaffected by these variables. For

example, familiar faces can be recognised very accurately even when image quality is

very poor (Burton, Wilson, Cowan & Bruce, 1999). Similarly, when facial

composites are constructed by participants that have prior familiarity with the target

identities, subsequent recognition of these composites is improved dramatically

relative to composites constructed of unfamiliar faces (Davies et al., 2000; Frowd,

Skelton et al. 2011). Thus, the low levels of composite identification accuracy

reported in the literature may be due in part to the inadequacy of face constructors’

memory representations rather than limitations in the process of memory construction.

In the current study, we examined the contribution of face representation to

the quality of constructed likenesses from memory: we exposed face constructors to

either photographs of faces or to prototypes derived from multiple images of an

individual’s face. The face prototypes were generated according to a procedure

described by Burton, Jenkins, Hancock and White (2005), who modelled the process

of face familiarisation as a cumulative refinement of memory representations driven

by a simple image averaging process. Indeed, by calculating the average values of

https://www.researchgate.net/publication/232528737_Verification_of_face_identities_from_iImages_captured_on_video?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==


5

correspondent pixels across a range of photographs of the same person, the

researchers produced average images that were recognised more easily by both

humans and computer algorithms than the individual exemplars used to create these

averages (Burton et al., 2005; Jenkins & Burton, 2008). This result indicates that

average images provide a more stable representation of identity than individual

photographs, by strengthening features that are consistent across images whilst

softening the contribution of uncorrelated features. The resultant representation has

reduced low-level image variation and tends to be neutral for the various image-

specific factors listed above that are known to impede unfamiliar face processing.

Our aim for exploring this more theoretical (less applied) issue was that facial

composites based on a memory of a face prototype—a representation based on the

average of several photographs—will be more accurately recognised than a composite

based on any one of the photographs making up the average. This is because the

nature of a particular image of the target (an individual instance) will be influenced

both by the characteristic aspects of the target’s appearance and by image-specific

factors such as lighting, distance, perspective and properties of the camera. When

people are unfamiliar with a target face, they are unable to reliably separate this

variance from identity-specific information in the image (e.g. Hill & Bruce, 1996;

Jenkins et al. 2011; Liu & Ward, 2006). The outcome is that people are likely to

create a composite based on image-specific details, information which does not

provide useful cues to identity, and so compromise composite identification.

We also explored the effect of using different instances of the same person on

face construction. In forensic construction, witnesses and victims create a composite

from a specific memory—an instance of the face—and so an understanding of the

importance of pictorial information is worthwhile forensically. More specifically,

6

photographs of people’s faces were used to represent specific instances. Some

photographs of faces are recognised more successfully than others (e.g. Carbon,

2008), and because they vary in the degree to which they are perceived to resemble

the person (Jenkins et al., 2011), it is likely that different instances will produce

greater- or lesser-quality composites. Therefore, evaluating the impact of target-

image choice on facial-composite construction provides important information for

researchers who attempt to understand and improve the effectiveness of composite

images and systems.

EXPERIMENT

The experiment was carried out in four stages: selection of target images

(Stage 1), construction of composites (Stage 2), naming of composites (Stage 3) and

ratings of composites’ pictorial match with the target (Stage 4), as described below.

Stage 1: Selection of target images

Our aim was to use face-construction procedures in the laboratory that were

similar to the applied context (e.g. Frowd, Carson, Ness, McQuiston et al., 2005). We

therefore chose target identities that were unfamiliar to participants who would

construct the composites, but thereafter familiar to the judges who would attempt to

name the composites. This aim was facilitated using celebrity targets that were

familiar to participants living in Australia but were largely unknown in the UK.

We were interested in collecting a good range of variation in individual

instances for a number of identities, and so collected 12 photographs of each of 40

Australian celebrities. Australian participants who were familiar with them were

https://www.researchgate.net/publication/48989762_Contemporary_Composite_Techniques_The_impact_of_a_forensically-relevant_target_delay?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

7

asked to provide likeness ratings indicating the extent to which they looked like the

relevant person. From each set of 12 photographs belonging to the same identity, we

produced an average (prototype), and then selected three photographs of each

celebrity representing poor, medium, and good likeness (see Materials for more

details).

It was anticipated that composites produced from good-likeness images would

be recognised better than composites produced from poor likenesses, with medium-

likeness targets producing intermediate-quality composites. In addition, because the

averaging process has the effect of removing image-specific variance, it was

hypothesised that composites based on prototype targets would be more recognisable

than composites based on images from the three likeness categories.

Method

Participants

Twelve (nine female) undergraduate and post-graduate students from the

University of New South Wales (Australia) volunteered to participate in Stage 1.

Participants’ age ranged from 19 to 27 years (M = 23.7 years, SD = 2.1 years), and

they participated in exchange for course credit or a small cash incentive.

Materials

For each of 40 Australian celebrities (20 males and 20 females), 12 images

were downloaded from the Internet (480 images in total). The images were collected

via the Google Image search engine using celebrities’ names as search terms and so

varied in terms of image quality, lighting, background, head pose and facial

8

expression. We accepted the first 12 colour images of each face that: i) exceeded 150

pixels in height, ii) had a somewhat frontal aspect and iii) were free from occlusions.

An image average (prototype) was created for each celebrity using the procedure

described by Burton et al. (2005). This involved morphing each celebrity photograph

to a standard shape template using in-house software to align facial features across the

set. Mean values were then calculated for corresponding pixels before the resultant

‘shape-free’ image was morphed to the average shape for that celebrity. Because this

averaging process generates images that are cropped around the internal features of

the face (excluding ears, hair and face-shape), we also removed the external features

of all 480 photographs in the same way. Example stimuli are shown in Figure 1.

Figure 1. Image A is an example of a face prototype (image average) of the current

Australian Prime Minister, and B is a selection of the photographs that were used to

create this image (12 were used in our experiment). Details of the procedure used to

create the prototype can be found in Burton et al. (2005); in the experiment colour

images were used. For reasons of copyright, we are not able to show the target

photographs used in our experiment.

https://www.researchgate.net/publication/7567452_Robust_representations_for_face_recognition_the_power_of_averages_Cogn_Psychol?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==


9

Design and Procedure

Participants were tested individually. The 480 photographs and the 40 image

prototypes were presented blocked by identity, in an attempt to avoid making the task

too disjointed (which may have occurred if identity was randomised across the set),

and participants saw the 12 photographs and the prototype of each celebrity

sequentially. They were given a different random block order, and image order

within the block was also random for each person. The celebrity’s name was

displayed below each image to avoid ambiguity regarding identity. For each image,

participants were asked to provide a likeness rating using an on-screen scrollbar

labelled at end-points, “nothing like them" (rating value of 1) and "perfect likeness"

(100). Images were presented centrally to dimensions of 6.5 cm wide by 9.5 cm high.

If a participant was not familiar with a particular celebrity this was indicated by

clicking on a button labelled ‘unfamiliar’, and image presentation resumed from the

start of the next block. The task was self-paced and each image remained visible until

a response was made. Testing sessions lasted for approximately 30 minutes per

participant.

Results

Mean likeness ratings were calculated for individual photographs and for

prototypes. As most composites created in police investigations are male, the focus

was on this target gender here (with female targets set aside for other projects). Also,

as having generally identifiable targets was important for composite naming in Stage

3, six identities were excluded that were not well known—in this case, those who

were identified as familiar by less than 65% of participants. Two further photographs

10

had distinctive facial hair and so images from both of these celebrities were omitted,

to avoid producing composites that would have been too unusual in this respect.

Based on a G*Power analysis (Faul et al., 2007), we estimated that eight

targets would provide a practically-useful, large effect size (f = .4) with very-good

power (1-β = .9) for the planned by-items analysis in the composite evaluation

(naming) stage (parameter settings: α = .05, Repetitions = 4, Groups = 1, r = .7, ε =

1.0). Thus, eight identities were selected at random from the remaining 12 male faces

(see Appendix). The mean rated likeness of photographs (instances) of the selected

faces ranged from 33 to 94 (M = 64.7, SD = 14.3). Skew (-0.05) and kurtosis (-0.70)

were within the expected range for a Normal distribution.

For each celebrity, we selected instances corresponding to the lowest and the

highest mean-rated likeness, as well as the photograph that was closest to the mean

likeness for the relevant celebrity. We refer to these photographs as having low, high

and medium likeness, respectively. Mean ratings of selected photographs was 43.5

(SD = 6.2) for low, 67.2 (SD = 4.8) for medium and 85.2 (SD = 6.6) for high likeness;

for prototype images, it was 83.9 (SD = 6.8). Repeated-Measures Analysis of

Variance (ANOVA) on these by-items likeness ratings was significant for the

Mauchly's Test of Sphericity [Mauchly's W(5)= 0.06, ε = .49, p = .007], indicating

unequal differences between category variances—here, ratings are relatively less

variable in the medium category (which is not a problem in itself). Degrees of

freedom were adjusted using the Greenhouse-Geisser correction, and the ANOVA

was significant for target type [F(1.5,10.3) = 104.0, p < .001, ω2 = .89].

11

Repeated contrasts of the ANOVA appropriately found that rating categories

of photographs were greater both for medium than for low [t(7) = 25.1, p < .001, dc(1)

= 3.7], and for high than for medium [t(7) = 8.7, p < .001, dc = 3.0]. There was

equivalence between prototype and high [t(7) = 0.3, p = .75], which was expected,

since research has found a null effect between equivalent categories for images

averaged in the same way (Bruce, Ness, Hancock, Newman & Rarity, 2002). Two

further contrasts were conducted, with Bonferroni correction applied (α = .05/2 =

.025), which indicated that prototypes were rated higher than both medium [t(7) = 5.8,

p < .001, dc = 2.8] and low categories [t(7) = 11.6, p < .001, dc = 6.2].

Stage 2: Composite construction

Composites were constructed using EvoFIT computer software. The

underpinnings of this system were first described in prototype form in Hancock

(2000), and EvoFIT has now been the focus of extensive research and development.

For a detailed description of the main technical aspects, refer to Frowd, Carson and

Hancock (2004), while a summary of milestones in development, which particularly

relate to the psychology of face construction, can be found in Frowd (2012) or Frowd,

Skelton, Atherton and Hancock (2012).

EvoFIT is one of two main commercial implementations that create a

composite based on the natural processes of selection and breeding: the other system

is EFIT-V (e.g. Valentine et al., 2010). With EvoFIT, face constructors repeatedly

select faces from arrays of complete faces, with ‘breeding’, to ‘evolve’ a composite.

1 To avoid over-estimating the standard effect size (Cohen’s d) for correlated contrasts, dc is calculated using Equation 3 of Dunlap, Cortina, Vaslow and Burke (1996).

https://www.researchgate.net/publication/42637987_Evolving_and_Combining_Facial_Composites_Between-Witness_and_Within-Witness_Morphs_Compared?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

12

They first select items which resemble the target in terms of facial ‘shape’,

specifically shape and placement of individual features, and then in terms of facial

‘texture’, greyscale colouring of eyes, brows and overall appearance of the skin.

Selected choices are combined (using genetic cross-over and mutation operations) and

the process is repeated. Once a face has been evolved, software tools are used to

improve the perceived match to the target for age, weight, masculinity and other

overall properties; constructors may also manipulate the shape and placement of

individual features.

EvoFIT was also chosen as it can readily produce faces containing just internal

features. Such images are not only a better match to our target pictures (that contain

principally internal features), but represent the region that is most important when

another person recognises a photograph of a familiar face (e.g. Ellis, Shepherd &

Davies, 1979) or names a facial composite (e.g. Frowd, Bruce, McIntyre, et al., 2007;

Frowd, Skelton et al., 2011)—see Figure 2 for examples of this facial region. Indeed,

constructing internal features first (without seeing external features) and then adding

external features afterwards produces much more identifiable faces (M = 46% correct

naming) with EvoFIT than when both internal and external features are constructed

simultaneously (M = 23%) (Frowd, Skelton, Atherton, Pitchford et al., 2012). In a

recent police audit, this approach was shown to be effective, with 60% of EvoFIT

composites directly leading to identification of a suspect, and one in six cases overall

leading to conviction (Frowd, Pitchford et al., 2012).

Method

Participants

https://www.researchgate.net/publication/51818709_Familiarity_effects_in_the_construction_of_facial_composite_images_using_software_systems?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/224869599_Recovering_Faces_From_Memory_The_Distracting_Influence_of_External_Facial_Features?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

13

Thirty-two (17 female) volunteer students from the University of Central

Lancashire participated in the Composite construction. Their ages ranged from 18 to

55 years (M = 26.1 years, SD = 10.1 years), and were recruited on the basis of being

unfamiliar with Australian celebrities. Eight participants were assigned to each of the

four levels of the between-subjects factor, target type (low, medium and high likeness;

prototype).

Materials

Target stimuli for face construction were 24 photographs (eight celebrities

each at low, medium and high likeness) and eight prototypes of Australian celebrities

as selected in Stage 1. Each image was printed to dimensions of approximately 8 cm

(high) to 5 cm (wide), in colour, on single sheets of A4 paper. EvoFIT version 1.5

was used to produce the composites.


Participant-constructors were tested individually throughout. They were

randomly assigned to one of four different target types. Participants in each of the

four conditions were shown the same eight celebrities, but in a different target image

condition (low, medium and high likeness; prototype). Participants were recruited on

the basis of being unfamiliar with Australian celebrities. They were presented with a

randomly-selected target image and asked whether it was familiar (no one reported

the face was known; had they done so, another image would have been shown).

Participants were instructed to study the face for 60 seconds. Afterwards, they were

told that a composite of this face would be constructed the following day.

14

Between 20 and 28 hours after the target had been presented, the experimenter

administered a cognitive interview, a standard technique used by forensic

practitioners for obtaining a detailed description of the face (e.g., see Frowd, Nelson

et al., 2012). This involved participant-constructors being asked to visualise the target

face and then freely describe it in their own time and in as much detail as possible,

without guessing; the experimenter also mentioned that he would not interrupt while

this was being carried out but would note down what was said. While this face-recall

task has the potential to interfere with participants’ recognition ability, known as the

verbal-overshadowing effect (Schooler & Engstler-Schooler, 1990), its use reflects

practice for police practitioners; however, indications are that the size of the effect is

small for composites (e.g. Frowd & Fields, 2010). Next, constructors were given a

brief overview of EvoFIT and the procedure used to construct a composite with this

system. The experimenter operated the software and presented the necessary screens.

The face-construction procedure is described briefly above (introduction to Stage 2)

and in detail in Frowd, Skelton, Atherton, Pitchford et al. (2012, Experiment 3). Each

participant produced a single composite, resulting in a total of 32 composites (8

identities x 4 target types). Face-construction sessions took about an hour to complete

per person.

Stage 3: Composite evaluation (naming)

In this stage, composites were evaluated by asking participants familiar with

the targets to name the composites.

Method

https://www.researchgate.net/publication/230501463_Interviewing_Techniques_for_Darwinian_Facial-Composite_Systems?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==


https://www.researchgate.net/publication/20861308_Verbal_overshadowing_of_visual_memories_Some_things_are_better_left_unsaid_Cognit_Psychol_22_36-71?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

15

Participants

Seventy-two students (41 female) from the University of New South Wales

volunteered for the composite evaluation task. Their ages ranged from 18 to 57 years

(M = 23.1 years, SD = 9.1 years) and were recruited on the basis of being generally

familiar with Australian celebrities. Participants were assigned equally to the four

levels of the between-subjects factor, target type.

Materials

Stimuli were 32 individual composites (constructed from 24 photographs and

8 prototype stimuli) and the 32 target pictures used to create these in the previous

stage. Example composites constructed are presented in Figure 2. Four testing sets

were prepared for naming, with each containing composites produced from one

target-image category (low, medium and high likeness; and prototype). Each set

contained eight A4 pages with a single composite printed on each from that condition.

Composites were printed in greyscale, the modality of EvoFIT images, and measured

approximately 8 cm (high) x 6 cm (wide). Target pictures were reproduced in colour,

one image per page, at 8 cm x 7 cm.

(a) (b) (c) (d) (e)

16

Figure 2. Example likenesses produced in the study of the Australian television

personality, Bert Newton. They were created by different people (participant-

constructors) who saw a target image categorised as (a) low likeness, (b) medium

likeness, (c) high likeness and (d) prototype. Image (e) is the prototype target image

presented (in colour) to the participant who constructed composite image (d).


In a criminal investigation, the most valuable outcome for a composite is for

someone who is familiar with the face to correctly name it to the police. To simulate

this process, we recruited participants on the basis of being familiar with the relevant

targets and asked them to name the composites. Four testing sets were prepared, each

one containing the eight composites produced from one target type (low, medium and

high likeness; and prototype). As it is possible for carry-over effects to artificially

inflate naming levels when seeing more than one composite of the same identity,

participants inspected composites from one type of target only. They were therefore

randomly assigned to one of these four testing sets in a between-subjects design.

Participants first attempted to name each composite, a task we refer to as

‘spontaneous’ naming. Next, to gauge the extent to which participants were familiar

with the relevant identities, and to check for systematic bias by target type, they then

named the eight photographs that were used to construct their assigned set of

composites (target naming) and indicated which identities they were familiar with

from a list of written celebrity names (name recognition).

Previous studies have indicated that participants find it difficult to name

composites spontaneously in the absence of external features, and produce few correct

names (Frowd, Herold, Duckworth & Hassan, 2012). Correct naming of internal

17

features of composites is rendered more accurate, however, when participants select

identities from a list of written names—a task which we refer to as ‘constrained’

naming. The task has some ecological validity, at least in the sense that a member of

the public (or a police officer) may try to identify a composite through a process of

elimination. Research also indicates that constrained naming is a good proxy to

spontaneous naming of complete composites (Frowd, Bruce, Gannon et al., 2007;

Frowd, Nelson et al., 2012). The task was carried out after name recognition.

The number of participants (evaluators) required for the planned by-

participants naming analysis was estimated using a G*Power analysis (Faul et al.,

2007). This indicated that 72 people in total were needed to achieve a large effect

size (f = .4) with good power (α = .05, Groups = 4, 1-β = .8).

Participant-evaluators were tested individually, and the task was self-paced.

They were told that facial composites of Australian celebrities would be shown for

them to name, or guess if unsure; participants were also told that ‘don’t know’

responses were acceptable. Participants were randomly assigned, with equal

sampling, to one of four testing sets of composites (for low, medium and high

likeness; and prototype). The eight composites from the assigned set were presented

sequentially, and evaluators offered a name for each or a ‘don’t know’ response. The

eight target photographs or prototypes used to construct the composites (from the

assigned set) were then presented sequentially and participants were asked to name

those. Next, participants indicated which identities they were familiar with, using a

written list of the eight celebrities’ names. Finally, the composites were presented

again, in the same order as before, and participants were asked to select the correct

identities from the written list—an eight alternative-force-choice task. The order of

presentation of composites and target pictures was randomised for each person. No

https://www.researchgate.net/publication/4269462_Evolving_the_Face_of_a_Criminal_How_to_Search_a_Face_Space_More_Effectively?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==


18

feedback was given as to the accuracy of responses. Testing sessions lasted for

approximately ten minutes, after which participants were debriefed with the aims of

the experiment.

Results

Participants’ data were checked for missing data (of which no such cases were

found) and then scored for accuracy for each naming task separately: spontaneous

naming of composites, spontaneous naming of targets, recognition of written target-

names and constrained naming of composites. These data are summarised in Table 1.

Table 1. Performance of each sub-task completed during naming of the composites’

internal-features region. Data are grouped by target type and values are expressed in

percentage correct.

Target type

Type of task Low Medium High Prototype Mean

Spontaneous composite naming 1.8 (0.7)

1.3 (0.7)

2.6 (0.7)

1.3 (0.5)

1.7 (0.4)

Spontaneous target naming 52.8 (9.9)

58.3 (7.9)

60.4 (9.5)

61.1 (10.5)

58.2 (2.6)

Name recognition 88.9 (4.9)

84.0 (6.4)

87.5 (6.6)

84.7 (6.1)

86.3 (1.1)

Constrained composite naming 15.3 (3.9)

37.5 (7.8)

43.8 (6.8)

44.4 (6.8)

35.2 (6.0)

Note. Figures in parentheses are standard errors of the by-item means.

We initially analysed the effect of target type (low, medium and high likeness;

and prototype) on both spontaneous target naming and name recognition. This

19

analysis is necessary to ensure that the random allocation of participants to target type

had eliminated group differences in target familiarity—if not, this could influence the

composite-naming analysis.

As can be seen in Table 1, mean values increased somewhat across the

categories of low, medium and high for target naming (second row of data), but they

varied little by condition for name recognition (third row). ANOVA was not

significant for either target naming [by-participants, F1(3,68) = 0.5, p = .71; by-items,

F2(3,21) = 1.5, p = .25] or name recognition [F1(3,68) = 0.4, p = .77; F2(3,21) = 1.2, p

= .32], thus providing no evidence of systematic differences in familiarity between

subject groups. We note that these non-significant results also suggest that it is

unlikely that the different categories of image differed in their respective ‘iconicness’,

which has been shown to promote superior recognition performance (Carbon, 2008).

The second analysis considered naming of composites. Spontaneous naming

scores were calculated by dividing the number of correct responses for each

composite by the number of correct responses for the relevant target picture (Table 1,

first row). As expected, percentage-correct means were low and therefore, due to the

floor effect observed in this task, we did not conduct an inferential analysis.

Constrained-naming scores for composites were calculated in the same was as

for spontaneous naming of composites (correct responses of composites divided by

correct responses of targets). These data indicated a somewhat equal increase from

low to medium to high categories; in addition, naming scores in the prototype

condition were somewhat higher than in the medium and about the same as in the

high category. ANOVA of these scores was significant for target type [Between-

subjects: F1(3,68) = 10.1, p < .001, ω2 = .28; Within-subjects: F2(3,21) = 6.3, p =

.003, W(5) = 0.39, ω2 = .26], a result that generalizes both by participants and by

20

items at the same time [minF'(3,49) = 3.9, p = .014] (e.g. Clark, 1973; Raaijmakers,

Schrijnemakers & Gremmen, 1999).

Repeated contrasts of the ANOVA indicated that composites were named

significantly higher for medium than for low likeness targets [by-participants, t1(34) =

4.4, p < .001, d = 1.4; by-items, t2(7) = 2.5, p = .039, dc = 1.3], but there was no

significant difference between composite naming of medium and high likeness targets

[t1(34) = 1.0, p = .33; t2(7) = 0.8, p = .43]. Composites were also identified

significantly better in the high than the low likeness category [t1(34) = 4.8, p < .001, d

= 1.8; t2(7) = 3.3, p = .014, dc = 1.8]. In addition, we carried out a more sensitive

polynomial trend analysis (in the category order of low, medium and high) to further

explore the relationship between these three variables. This was reliable as a linear

[by-participants, p1 < .001; by-items, p2 = .014] but not a quadratic trend [p1 = .11; p2

= .29], indicating that composites were more recognisable as the similarity match of

the target increased from low to medium to high likeness.

A repeated contrast further indicated equivalence between composites based

on prototype and high likeness targets [t1(34) = 0.1, p = .72; t2(7) = 0.2, p = .85]. Two

further contrasts were conducted, with Bonferroni correction applied (α = .05/2 =

.025), which indicated that composites of prototypes were also equivalent to

composites of medium likeness targets [t1(34) = 0.1, p = .92; t2(7) = 1.0, p = .37], but

were superior to composites of low likeness targets [t1(34) = 3.8, p < .001, d = 1.6;

t2(7) = 3.2, p = .014, dc = 1.9].

Stage 4: Composite evaluation (Similarity ratings)

21

Modern composite systems (EvoFIT included) generally produce frontal faces

that are evenly lit with a neutral expression. Our target photographs naturally varied

from this standard and included image-specific characteristics which were likely to

serve as a source of distraction to participant-constructors—stimuli which contain

pictorial properties that we have argued are a source of variability and interfere with

face construction. In contrast, the prototypes contain fewer of these image-specific

characteristics and, as a result, we expected them to be more like the faces produced

from the composite system. To test for this possibility, participants in this stage were

asked to rate the similarity between each target and its corresponding composite for

pictorial properties. Three properties were chosen that are known to influence

unfamiliar-face recognition (see Introduction), and they also appeared to be a major

source of variation in our target set: lighting, head pose and facial expression.

Method

Participants

Eighteen staff and student volunteers (11 females) from the University of

Central Lancashire were recruited for the similarity-rating task. As with participants

that constructed the composites, they were selected on the basis of being unfamiliar

with Australian celebrities. Their ages ranged from 18 to 54 years (M = 27.0 years,

SD = 11.2 years). None had participated in any other phase of the study.

Materials

Thirty-two composites were printed, one per page, alongside the relevant

target photograph or prototype. Target photographs and prototypes were printed in

greyscale, and to the same dimensions as in Stage 3.

22


Each person was presented with either target picture or prototype alongside

the corresponding composite and rated similarity on a 10 point scale. Tasks were

blocked by rating scale (lighting, pose and expression) and participants were

presented with all 32 composite-target pairs in each block (block order was

counterbalanced across subjects). The design was thus repeated-measures for rating

scale and target type (low, medium and high likeness; and prototype).

Participant-evaluators were tested individually. They were requested to rate

the accuracy of a set of composites using a 10-point scale (1 = very-poor match / 10 =

very-good match) according to how well a target face and a composite constructed of

it matched in terms of lighting, head pose and facial expression. Composite-target

pairs were presented sequentially, in a different random order for each person, and

evaluators provided a rating score for each pair in their own time. Testing sessions

lasted for about 20 minutes. Participants were debriefed on the experimental aims.

Results

Rating data are summarised in Table 2.

Table 2. Mean matching scores of targets (classified into low, medium and high

likeness; and prototype) and their composites in terms of lighting, head pose and

facial expression. Values range from 1 (very-poor match) to 10 (very-good match).

23

Target type

Rating type Low Medium High Prototype Mean

Lighting 3.8 (0.2)

4.3 (0.4)

4.0 (0.2)

5.2 (0.3)

4.3 (0.2)

Pose 4.6 (0.6)

5.5 (0.3)

5.2 (0.6)

6.9 (0.3)

5.5 (0.4)

Expression 3.3 (0.6)

3.9 (0.3)

3.9 (0.6)

4.4 (0.3)

3.9 (0.4)

Mean 3.9 (0.4)

4.5 (0.3)

4.4 (0.4)

5.5 (0.2)

4.6 (0.2)

Note. Figures in parentheses are standard errors of the by-item means.

Mean rating scores were analysed using 3 [Rating (within-subjects):

expression, lighting, pose] x 4 [Target (within-subjects): low, medium, high,

prototype] ANOVA. This analysis was significant for both rating [F1(1.6,27.9) =

22.1, W(2) = 0.7, ε (Huynh-Feldt) = 0.8, p < .001, ηp2 = .57; F2(2,14) = 11.1, W(2) =

0.5, p = .001, ηp2 = .61; minF'(2,28) = 7.4, p = .011] and target type [F1(3,51) = 78.2,

W(5) = 0.5, p < .001, ηp2 = .82; F2(3,21) = 4.1, W(5) = 0.2, p = .020, ηp

2 = .37;

minF'(3,23) = 3.9, p = .021], but not for the interaction [F1(6,102) = 3.7, p = .002, ηp2

= .18; F2(6,42) = 0.7, p = .62].

For target type, repeated contrasts revealed no significant difference between

either low and medium (p1 < .001; p2 = .33) or between medium and high likeness

targets (p1 = .041; p2 = .67), but (unlike naming) ratings of match were reliably higher

for prototype than for high-likeness targets (p1 < .001, dc = 1.1; p2 = .029, dc = 1.2).

Repeated contrasts indicated that rating of match between composites and

targets was closer for pose than for both lighting [p1 = .001, dc = 1.1; p2 < .001, dc =

1.0] and expression [p1 < .001, dc = 1.5; p2 = .008, dc = 1.8]; ratings between lighting

and expression did not differ significantly [t1(17) = 2.6, p = .017; t2(7) = 1.2, p = .27].

24

We next assessed the impact on composite naming of target type (photograph,

prototype) and pictorial variation (lighting, pose and expression). This was achieved

using a stepwise linear regression with pictorial variation as predictor variables and

constrained naming as the DV. The backward method was chosen, a technique which

begins with all variables and iteratively removes those without useful contribution

(criteria for removal, p > .1); the method has the benefit of revealing suppressor

variables: variables that are influenced by the presence of other variables. We note

that Multicollinearity was not an issue here as pictorial variables were not too highly

correlated with each other (all r < .6). We also included a dichotomous variable that

coded whether the composite’s target was an individual photograph or prototype. The

model achieved a good fit [F(2,31) = 6.9, p = .003, R2 = .32, Durbin-Watson d = 2.1]

with two equally-weighted positive correlations for pose [B = 0.04, SE(B) = 0.02,

r(part) = .30, VIF = 1.3] and for expression [B = 0.05, SE(B) = 0.03, r(part) = .30,

VIF = 1.3]. This suggests that composites were much-more identifiable when they

were a better match to the target by pose and expression, but not by lighting.

Discussion

We have assessed the potential impact of target representation on face

construction using the EvoFIT face evolving system. Our data provide partial support

for our main experimental hypotheses. It was anticipated that composites based on

the memory of a prototype image would be more recognisable than composites from

any of the three target likeness categories (low, medium or high), but this was only

found to be true for the low likeness category: composite identification was

comparable between prototype and the other two categories. So, the data do not

provide strong support for an advantage of a prototype, which would have required it

25

to be superior to the three likeness categories. Instead, composites were only superior

for prototypes compared with the lowest likeness category. It was also expected that

composites would be more identifiable from low to medium, and from medium to

high likeness categories. Again, the main analysis implicated benefit relative to the

lowest category: there was a reliable increase in composite identification when targets

were of medium relative to low likeness, but there was equivalence between the

medium and high. In a more sensitive test, a polynomial-trend analysis did provide

overall support for successively higher identification from low to medium to high

target likeness categories.

So, the findings provide some evidence to support the notion that memory for

unfamiliar faces includes characteristics of a particular image, although the evidence

is not as strong as was initially anticipated. While there was a reliable difference in

composite identification between low likeness and prototype targets, the strongest

support is from the trend analysis which indicated that higher categorical likeness of

targets led to more identifiable composites. This result does not appear to be a

consequence of increasing likeness for the targets since there was no evidence of

reliable category differences by either spontaneous target naming or name recognition

in Stage 3.

The linear-regression analysis (Stage 4) indicated a strong relationship for

match of composites and targets by both head pose and facial expression. These two

variables produced semipartial correlations that were positive and medium sized,

indicating importance for face production. For EvoFIT, while its faces have a fairly-

neutral expression during the evolving stage, constructors had some control over

expression in later holistic-tool use. It is possible therefore for them to add a smile or

a frown, to match their memory of the target’s expression. This enhancement may

26

have helped to create a more identifiable likeness. In contrast, it is not currently

possible to vary head pose in EvoFIT—all images are rendered in a front-face view.

So, targets with more frontal angle-of-view were constructed more identifiably: those

that required constructors to carry out mental rotation did worse, presumably due to

errors introduced when we process novel views of unfamiliar faces (e.g. Bruce, 1982;

Longmore et al., 2008; Valentine & Bruce, 1988).

This issue is relevant to the real-world application where eyewitnesses may

not have seen an offender’s face front-on, but are required to produce a front-view

composite. Seeing an offender, for instance, through the side window of a car or from

a raised elevation is unlikely to provide a frontal view of the face. It would appear

worthwhile, therefore, for composite systems to be able to render images at a specific

view to match the eyewitness’s memory, an idea for which there is already some

experimental support (Ness, 2003). Note also the sensitivity of head pose: changes in

angle of view for our targets varied within about 30 degrees (in any direction), and yet

this was sufficient to produce interference with face construction—for unfamiliar-face

recognition using actual photographs of faces, measurable decrements have been

found with similar changes in head pose (e.g. Longmore et al., 2008). A system

enabling faces to be constructed at a precise angle-of-view does appear to exist, at

least in development (Blanz, Albrecht, Haber & Seidel, 2006). In addition to having

control of angle of view, this particular system is able to cope with a range of

expressions, and so could potentially be used to minimise the effect of exposure-

specific appearance reported here.

The other main implication of our results is not one of direct practical

importance, but instead relates to the methodology used in research and assessments

involving facial composites and their production systems. Given that the likelihood

https://www.researchgate.net/publication/227793210_Creating_Face_Models_from_Vague_Mental_Images?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/16138791_Changing_Faces_Visual_and_Non-visual_Coding_Processes_in_Face_Recognition?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/5604365_Learning_Faces_From_Photographs?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==


https://www.researchgate.net/publication/227336903_Mental_rotation_of_faces?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

27

of a composite being recognised by those familiar with the target depends on the

likeness of the image to which the witness was exposed, it would seem sensible for

forensic researchers to take care when choosing stimuli: they should establish the

likeness of stimuli with people who are familiar with the target identities. This is

particularly important for targets that are perceived as being of low likeness, since

such images produce markedly worse quality composites than composites from

medium and high likeness targets. Indeed, of the 240 male images we collected on

the Internet, around 30% of them fell into this category. Therefore, given the large

extant literature using image-memory paradigms to evaluate facial-composite systems

(e.g. Frowd, Bruce, McIntyre & Hancock, 2007), we wondered whether the current

experiment might bring the reliability of this literature into question?

In order to address this concern, we carried out a review of studies that have

used a common methodology with EvoFIT (including a nominal 24 hr delay between

target exposure and face construction) that have used different targets; these were

either videos (Frowd, Nelson et al., 2012; Frowd, Skelton, Hepton et al., in press) or

photographs (Frowd, Pitchford et al., 2010; Frowd, Skelton, Atherton, Pitchford et al.,

2012). In the equivalent condition in these four studies, mean naming for complete

composites was approximately 25% correct and varied by less than 2%, thus

indicating good consistency by target image mode and good reliability.

While published research on EvoFIT has been considerable to date (see

Frowd, 2012), there is at least some formal research carried out on the other main

commercial evolving system, EFIT-V (e.g., Gibson, Solomon, Maylin & Clark,

2009). The evidence is that this approach behaves similarly to a more traditional

‘feature’ system when tested after a short target delay (Frowd, Carson, Ness,

Richardson et al., 2005; Valentine et al., 2010): laboratory performance following a


https://www.researchgate.net/publication/45637757_The_Psychology_of_Face_Construction_Giving_Evolution_a_Helping_Hand?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/239433830_New_methodology_in_facial_composite_construction_From_theory_to_practice?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==





https://www.researchgate.net/publication/286958474_Facial_recall_and_computer_composites?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

28

more forensically-relevant delay is unknown. However, given that EFIT-V operates

in a broadly similar way to EvoFIT, one would anticipate that the results found here

would generalise to this other system. Our findings, in particular to the

methodological issue related to selection of target images, would also appear to apply

to the assessment of feature systems.

To summarize, the results reported here have theoretical value for the study of

face memory in general. Some benefit was found to suggest that average

representations (prototypes) are more suitable for the purpose of identification

(Burton et al. 2005; Jenkins et al., 2011), at least in terms of the prototypes yielding

EvoFIT composites that were more identifiable than composites of low rated likeness

targets, and the graded effect (in the trend analysis) such that better likeness images

produced better memory constructions. These results suggest that previous failed

attempts to detect cognitive effects of image likeness may be due to the particular

methodology used (e.g. Johnston & Barry, 2001). Care should be taken to standardize

procedures for image selection when carrying out research using facial composites

and facial-composite systems. The results also indicate that achieving a good match

by target pose and expression (but not lighting) is likely to achieve a more-identifiable

image for practitioners than if these properties are not correctly rendered in the face.

In addition, system designers could usefully enhance their software to allow inclusion

of pictorial information.


https://www.researchgate.net/publication/11945492_Best_face_forward_Similarity_effects_in_repetition_priming_of_face_recognition?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/51617401_Variability_in_photos_of_the_same_face?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

29

References

Blanz, V., Albrecht, I., Haber, J., & Seidel, H.P. (2006). Creating Face Models from

Vague Mental Images. Computer Graphics Forum, 25, 645-654.

Brace, N., Pike, G., & Kemp, R. (2000). Investigating E-FIT using famous faces. In

A. Czerederecka, T. Jaskiewicz-Obydzinska & J. Wojcikiewicz (Eds.). Forensic

Psychology and Law (pp. 272-276). Krakow: Institute of Forensic Research

Publishers.

Bruce, V. (1982). Changing faces: Visual and non-visual coding processes in face

recognition. British Journal of Psychology, 73, 105-116.

Bruce, V., Henderson, Z., Greenwood, K., Hancock, P.J.B., Burton, A.M., & Miller,

P. (1999). Verification of face identities from images captured on video. Journal

of Experimental Psychology: Applied, 5, 339 360.

Bruce, V., Ness, H., Hancock, P.J.B., Newman, C., & Rarity, J. (2002). Four heads

are better than one. Combining face composites yields improvements in face

likeness. Journal of Applied Psychology, 87, 894-902.

Burton, A.M., Jenkins, R. Hancock, P.J.B. and White, D. (2005). Robust

representations for face recognition: The power of averages. Cognitive

Psychology, 51, 256-284.

Burton, A., Wilson, S., Cowan, M., & Bruce, V. (1999). Face recognition in poor-

quality video: evidence from security surveillance. Psychological Science, 10,

243–248.

Carbon, C.C. (2008). Famous faces as icons: the illusion of being an expert in the

recognition of famous faces. Perception, 37, 801-806.








https://www.researchgate.net/publication/11067056_Four_heads_are_better_than_one_Combining_face_composites_yields_improvements_in_face_likeness?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==






https://www.researchgate.net/publication/263536297_Face_Recognition_in_Poor-Quality_Video_Evidence_From_Security_Surveillance?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/5246159_Famous_faces_as_icons_The_illusion_of_being_an_expert_in_the_recognition_of_famous_faces?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/5246159_Famous_faces_as_icons_The_illusion_of_being_an_expert_in_the_recognition_of_famous_faces?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

30

Clark, H.H. (1973). The language-as-fixed-effect fallacy: A critique of language

statistics in psychological research. Journal of Verbal Learning and Verbal

Behavior, 12, 335-359.

Davies, G.M., & Milne, A. (1982). Recognizing faces in and out of context. Current

Psychological Research, 2, 235-246.

Davies, G.M., van der Willik, P., & Morrison, L.J. (2000). Facial Composite

Production: A Comparison of Mechanical and Computer-Driven Systems.

Journal of Applied Psychology, 85, 119-124.

Dunlap, W. P., Cortina, J. M., Vaslow, J. B., & Burke, M. J. (1996). Meta-analysis of

experiments with matched groups or repeated measures designs. Psychological

Methods, 1, 170-177.

Faul, F., Erdfelder, E., Lang, A.G., & Buchner, A. (2007). G*Power 3: A flexible

statistical power analysis program for the social, behavioural, and biomedical

Sciences. Behavior Research Methods, 39, 175-191.

Frowd, C.D. (2012). Facial Recall and Computer Composites. In C. Wilkinson and C.

Rynn (Eds). Facial Identification (pp. 42 – 56). Cambridge University Press:

New York.

Frowd, C.D., Bruce, V., Gannon, C., Robinson, M., Tredoux, C., Park., J., McIntyre,

A., & Hancock, P.J.B. (2007). Evolving the face of a criminal: how to search a

face space more effectively. In A. Stoica, T. Arslan, D.Howard, T. Kim and A.

El-Rayis (Eds.) 2007 ECSIS Symposium on Bio-inspired, Learning, and

Intelligent Systems for Security, (pp. 3-10). NJ: CPS. (Edinburgh).

Frowd, C.D., Bruce, V., McIntyre, A., & Hancock, P.J.B. (2007). The relative

importance of external and internal features of facial composites. British Journal

of Psychology, 98, 61-77.

https://www.researchgate.net/publication/225362829_Recognizing_faces_in_and_out_of_context?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/225362829_Recognizing_faces_in_and_out_of_context?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/12575276_Facial_composite_production_a_comparison_of_mechanical_and_computer-driven_systems_J_Appl_Psychol?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==






https://www.researchgate.net/publication/6487302_The_relative_importance_of_external_and_internal_features_of_facial_composites_Br_J_Psychol?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/280844938_Meta-Analysis_of_Experiments_with_Matched_Groups_or_Repeated_Measures_Designs?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/259703353_The_Language-as-Fixed-Effect_Fallacy_A_Critique_of_Language_Statistics_in_Psychological_Research?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/286958474_Facial_recall_and_computer_composites?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

31

Frowd, C.D., Carson, D., Ness, H., McQuiston, D., Richardson, J., Baldwin, H., &

Hancock, P.J.B. (2005). Contemporary Composite Techniques: the impact of a

forensically-relevant target delay. Legal & Criminological Psychology, 10, 63-81.

Frowd, C.D., Carson, D., Ness, H., Richardson, J., Morrison, L., McLanaghan, S., &

Hancock, P.J.B. (2005). A forensically valid comparison of facial composite

systems. Psychology, Crime & Law, 11, 33-52.

Frowd, C.D., & Fields, S. (2010). Verbal overshadowing interference with facial

composite production. Psychology, Crime and Law, 17, 731-744.

Frowd, C.D., Hancock, P.J.B., & Carson, D. (2004). EvoFIT: A holistic, evolutionary

facial imaging technique for creating composites. ACM Transactions on Applied

Psychology (TAP), 1, 1-21.

Frowd, C.D., Herold, K., Duckworth, L., & Hassan, A. (2012). The impact of hair for

the construction and recognition of facial-composite images. Manuscript under

revision.

Frowd, C.D., Nelson, L., Skelton F.C., Noyce, R., Atkins, R., Heard, P., Morgan, D.,

Fields, S., Henry, J., McIntyre, A., & Hancock, P.J.B. (2012). Interviewing

techniques for Darwinian facial composite systems. Applied Cognitive

Psychology, DOI: 10.1002/acp.2829.

Frowd, C.D., Pitchford, M., Bruce, V., Jackson, S., Hepton, G., Greenall, M.,

McIntyre, A., & Hancock, P.J.B. (2010). The psychology of face construction:

giving evolution a helping hand. Applied Cognitive Psychology. DOI:

10.1002/acp.1662.

Frowd, C.D., Pitchford, M., Skelton, F., Petkovic, A., Prosser, C., & Coates, B.

(2012). Catching Even More Offenders with EvoFIT Facial Composites. In A.

Stoica, D. Zarzhitsky, G. Howells, C. Frowd, K. McDonald-Maier, A. Erdogan,

https://www.researchgate.net/publication/32247661_A_forensically_valid_comparison_of_facial_composite_systems?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/37244540_EvoFIT_A_holistic_evolutionary_facial_imaging_technique_for_creating_composites?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==











https://www.researchgate.net/publication/262332211_Catching_Even_More_Offenders_with_EvoFIT_Facial_Composites?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/262332211_Catching_Even_More_Offenders_with_EvoFIT_Facial_Composites?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==




32

and T. Arslan (Eds.) IEEE Proceedings of 2012 Third International Conference

on Emerging Security Technologies, DOI 10.1109/EST.2012.26 (pp. 20 - 26).

Frowd, C.D., Skelton F., Atherton, C., Pitchford, M., Hepton, G., Holden, L.,

McIntyre, A., & Hancock, P.J.B. (2012). Recovering faces from memory: the

distracting influence of external facial features. Journal of Experimental

Psychology: Applied, 18, 224-238.

Frowd, C.D., Skelton, F., Atherton, C., & Hancock, P.J.B. (2012). Evolving an

identifiable face of a criminal. The Psychologist, 25, 116 – 119.

Frowd, C.D., Skelton, F., Butt, N., Hassan, A., & Fields, S. (2011). Familiarity effects

in the construction of facial-composite images using modern software systems.

Ergonomics, DOI: 10.1037/a0027393.

Frowd, C.D., Skelton F., Hepton, G., Holden, L., Minahil, S., Pitchford, M.,

McIntyre, A., Brown, C., & Hancock, P.J.B. (in press). Whole-face procedures

for recovering facial images from memory. Science and Justice.

Gibson, S.J., Solomon, C.J., Maylin, M.I.S., & Clark, C. (2009). New methodology in

facial composite construction: from theory to practice. International Journal of

Electronic Security and Digital Forensics, 2, 156-168.

Hill, H., & Bruce, V. (1996). Effects of lighting on the perception of facial surfaces.

Journal of Experimental Psychology: Human Perception and Performance, 22,

986-1004.

Jenkins, R., & Burton, A. M. (2008). 100% accuracy in automatic face recognition.

Science, 319, 435.

Jenkins, R., White, D., Van Montfort, X., & Burton, A. M. (2011). Variability in photos

of the same face. Cognition, 121, 313-323.

Johnston, R. A. & Barry, C. (2001). Best face forward: Similarity effects in repetition




https://www.researchgate.net/publication/14445635_Effects_of_Lighting_on_the_Perception_of_Facial_Surfaces?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/236250425_Whole-face_procedures_for_recovering_facial_images_from_memory?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



https://www.researchgate.net/publication/298262595_Evolving_an_identifiable_face_of_a_criminal?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/298262595_Evolving_an_identifiable_face_of_a_criminal?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==








https://www.researchgate.net/publication/5632650_100_Accuracy_in_Automatic_Face_Recognition?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/5632650_100_Accuracy_in_Automatic_Face_Recognition?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==



33

priming of face recognition. The Quarterly Journal of Experimental Psychology

Section A, 54, 383-396.

Liu, C. H., & Ward, J. (2006). Face recognition in pictures is affected by perspective

transformation but not by the centre of projection. Perception, 35, 1637-1650.

Longmore, C.A., Liu, C.H. & Young, A.W. (2008). Learning Faces From

Photographs. Journal of Experimental Psychology: Human Perception and

Performance, 34, 77–100.

Ness, H. (2003). Improving facial composites produced by eyewitnesses. Unpublished

Ph.D. thesis, University of Stirling.

Raaijmakers, J.G., Schrijnemakers, J.M., & Gremmen, F. (1999). How to deal with

“the languageas-‐fixed-‐effect fallacy”: common misconceptions and alternative

solutions. Journal of Memory and Language, 41, 416–426.

Schooler, J.W., & Engstler-Schooler, T.Y. (1990). Verbal overshadowing of visual

memories: some things are better left unsaid. Cognitive Psychology, 22, 36-71.

Valentine, T., & Bruce, V. (1988). Mental rotation of faces. Memory & Cognition, 16,

556-566.

Valentine, T., Davis, J. P., Thorner, K., Solomon, C., & Gibson, S. (2010). Evolving

and combining facial composites: Between-witness and within-witness morphs

compared. Journal of Experimental Psychology: Applied, 16, 72 – 86.

https://www.researchgate.net/publication/6520674_Face_recognition_in_pictures_is_affected_by_perspective_transformation_but_not_by_the_centre_of_projection?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==

https://www.researchgate.net/publication/6520674_Face_recognition_in_pictures_is_affected_by_perspective_transformation_but_not_by_the_centre_of_projection?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==






https://www.researchgate.net/publication/259703484_How_to_Deal_with_The_Language-as-Fixed-Effect_Fallacy_Common_Misconceptions_and_Alternative_Solutions?el=1_x_8&enrichId=rgreq-2eeeee91-e5b5-44ea-aab7-2860064f8590&enrichSource=Y292ZXJQYWdlOzI2ODIxNjk0ODtBUzoxNjMxNzQ0OTgxODUyMTZAMTQxNTkxNTI0MDcxNA==








34

Appendix. Australian celebrities used in the study.

Kim Beazley

Hamish Blake

Jamie Durie

Grant Hackett

Eddie McGuire

Brendan Nelson

Bert Newton

Guy Sebastian

Constructing faces from memory: the impact of image likeness and prototypical representations

Documents