Top Banner
Journal of Occupational and Organiiational Psychology (2009), 82, 639-659 & 2009 The ßrteh Psychohgicol Society »*—'^ Society 639 The British Psychological www.bp sjournals.co.uk The use of personality test norms in work settings: Effects of sample size and relevance Robert R Tett'* Jenna R. Fitzke^ Patrick L Wadlington^ Scott A. Davies^, Michael G. Anderson'* and Jeff Foster^ 'Department of Psychology, University of Tulsa. Tulsa, Oklaboma, USA ^University of Wisconsin-Madison, Madison, Wisconsin, USA Pearson Testing, Bloomington, Minnesota. USA "CPP Inc., Mountain View, California. USA ^Hogan Assessment Systems, Tulsa, Oklahoma. USA The value of personality test norms for use in work settings depends on norm sample size (N) and relevance, yet research on these criteria is scant and corresponding standards are vague. Using basic statistical principles and Hogan Personality Inventory (HPI) data from 5 sales and 4 trucking samples (N range = 394-6,200). we show that (a) N > 100 has little practical impact on the reliability of norm-based standard scores (max = ± 1 0 percentile points in 99% of samples) and (b) personality profiles vary more from using different norm samples, between as well as within job families. Averaging across scales. T-scores based on sales versus trucking norms differed by 7.3 points, whereas maximum differences averaged 7.4 and 7.5 points within the sets of sales and trucking norms, respectively, corresponding in each case to approximately ± 14 percentile points. Slightly weaker results obtained using nine additional samples from clerical, managerial, and financial job families, and regression analysis applied to the 18 samples revealed demographic effects on four scale means independently of job family. Personality test developers are urged to build norms for more diverse populations, and test users, to develop local norms to promote more meaningful interpretations of personality test scores. I Personality test scores arc often interprcted in employment settings with reference to scale norms (i.e. means and standard deviations; Bartram, 1992; Cook et a!., 1998; Müller & Young, 1988; Van Dam, 2003). Accordingly, the accuracy of norm- transformed scores in capturing an individual's relative standing on a set of personality scales rests on the quality of the underlying norms. Two critical and generally recognized concerns regarding norm use are (a) the size of the normative sample (TV) and (b) the relevance of the normative sample to the population to which the given * Correspondence should he aàdtessed to Dr Roben P. Jett, Department of Psychology. University of Tuka. Tulsa. OK 74104- 3189, USA (e-mail: [email protected]). DOI:tO.(34e/0963l7908X336IS9
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2 Personality Test Norms

Journal of Occupational and Organiiational Psychology (2009), 82, 639-659& 2009 The ßrteh Psychohgicol Society » * — ' ^ Society

639

TheBritish

Psychological

www.bp sjournals.co.uk

The use of personality test norms in work settings:Effects of sample size and relevance

Robert R Tett'* Jenna R. Fitzke^ Patrick L Wadlington^Scott A. Davies^, Michael G. Anderson'* and Jeff Foster^'Department of Psychology, University of Tulsa. Tulsa, Oklaboma, USA^University of Wisconsin-Madison, Madison, Wisconsin, USAPearson Testing, Bloomington, Minnesota. USA

"CPP Inc., Mountain View, California. USA^Hogan Assessment Systems, Tulsa, Oklahoma. USA

The value of personality test norms for use in work settings depends on norm samplesize (N) and relevance, yet research on these criteria is scant and correspondingstandards are vague. Using basic statistical principles and Hogan Personality Inventory(HPI) data from 5 sales and 4 trucking samples (N range = 394-6,200). we show that(a) N > 100 has little practical impact on the reliability of norm-based standard scores(max = ± 1 0 percentile points in 99% of samples) and (b) personality profiles vary morefrom using different norm samples, between as well as within job families. Averagingacross scales. T-scores based on sales versus trucking norms differed by 7.3 points,whereas maximum differences averaged 7.4 and 7.5 points within the sets of sales andtrucking norms, respectively, corresponding in each case to approximately ± 14percentile points. Slightly weaker results obtained using nine additional samples fromclerical, managerial, and financial job families, and regression analysis applied to the 18samples revealed demographic effects on four scale means independently of job family.Personality test developers are urged to build norms for more diverse populations, andtest users, to develop local norms to promote more meaningful interpretations ofpersonality test scores.

I

Personality test scores arc often interprcted in employment settings with referenceto scale norms (i.e. means and standard deviations; Bartram, 1992; Cook et a!., 1998;Müller & Young, 1988; Van Dam, 2003). Accordingly, the accuracy of norm-transformed scores in capturing an individual's relative standing on a set of personalityscales rests on the quality of the underlying norms. Two critical and generallyrecognized concerns regarding norm use are (a) the size of the normative sample (TV)and (b) the relevance of the normative sample to the population to which the given

* Correspondence should he aàdtessed to Dr Roben P. Jett, Department of Psychology. University of Tuka. Tulsa. OK 74104-3189, USA (e-mail: [email protected]).

DOI:tO.(34e/0963l7908X336IS9

Page 2: 2 Personality Test Norms

640 Robert P. TeU et al.

test taker belongs. Despite being recognized as important, sample size and populationrelevatice (i.e. representativetiess) have received little research attention and standardsregarding these qualities are ambiguous. In this article, we show what happens whenpersonalit>- profiles are generated under varying conditions regarding the size andsource of the normative sample, with the overall aim of refining best practices in theuse of personality test norms. We begin by considering how such norms are used inwork settings.

Uses of personality test norms in the workplaceA scale score, by itself, reveals little as to the location of an individual on the measureddimension. Standard scores, such as z or 7; use a tiortn sample mean and standarddeviation to clarify' where an individual respondent falls on the measured constructrelative to other people. Personality test tiorms have several work-related applications.First, they can facilitate individualized developmental feedback. For example, workersmay be better prepared to interact with others in a team or with customers if they have aclearer understanding of their relative standing on traits relevant to such interactions(e.g. emt)tional control, sociability, t{)lerance). Second. personalit\' test norms canfacilitate selection decisions. Tojvdown hiring does not require test norms, butexclusionary strategies based on test score cut-offs (e.g. hiritig from among applicantsscoring above a given cut-ofO call for normative comparisons. Norms are especiallyimportant in Wring when applicants are few in tiumber, as this mitigates reliance on top-down methods. Third, norms ean help an organization judge the overall standing of atargeted work-group (e.g. a sales team) relative to a lai-ger, more general, ¡ob-relevantpopulation (e.g. American sales people), as a basis, perhaps, for determining futurehiring standards. Success in all such norm applications rests on norm quality. Bestpractices iti this area are reviewed next.

Best practices regarding test normsVirtually every bcmk on psychological testing offers recommendations on the use oftest norms (e.g. Anastasi & Urbina, 1997; Crocker & Algina, 1986; Kline, 1993). Themost consistent message is that the norm sample should be relevant to the individualwhose scores are being interpreted. Some (e.g. Croker & Algina, 1986; Kline, 1993)articulate further that norm samples are more credible if stratified in terms of variablesmost highly correlated with the test. Accordingly, to permit reasoned judgments ofnorm relevance, test developers are urged to report key demographic characteristics(e.g. mean age, gender composition, job category). Also important to report are thesampling strategies used, the time frame of norm dala collection, and the responserate, as all such information speaks to the representativeness of the normative samplewith respect to the targeted population.

The 1999 Standards for Educational and Psychological Testing specify that:

Norms, if used, should refer to clearly described populations. These populations should includeindividuals or groups to whom test users will ordinarily wish to compare their own examinees(Standard 4.5. p. 55)

Reports of norming studies should include precise specification of the population that wassampled, sampling procedures and participation rates, any weighting of the sample, the dates oftesting, and descriptive statistics. The information provided should be sufficient to enable users to

Page 3: 2 Personality Test Norms

Personality test norms 641

judge the appropriateness of the norms for interpreting the scores of local examinees. Technicaldocumentation should indicate the precision of the norms themselves. (Standard 4.6, p. 55).

Local norms should be developed when necessary to support test users' intendedinterpretations. {Standard 13.4. p. 146)

Focusing on test use ftjr the purpose of hiring, the Principles for the Validatioti amiUse of Personnel Selection Procedures (SIOP, 2003) state that:

Normative information relevant to the applicant pool and the incumbent population shouldbe presented when appropriate. The normative group should be described in terms of itsrelevant demographic and occupational characteristics and presented for subgroups withadequate sample sizes. The time frame in v/hich the normative results were established shouldbe stated (p. 48).

Two points warrant di.sctission here. First, the issue of satnple size is raised in thePrinciples, but what coutits as adequate' A is unclear. Statistical theory (readilyconfirmed in practice) tells us that the reliability of norms is closely tied to TV. Lackingspecifics, practitioners are left to define 'adequate' on their own, which underminesstandardization of sound testing practice and norm use. Second, the Standardsencourage development of local norms when necessary to support test users'intended interpretations. Amhiguit>; again, precludes standardized practice. Ourprimary aim in this article is to clarify what counts as 'adequate' A' and sufficient"representativeness in a normative sample, as a basis for refining use of personalitynorms in work settings.

rCurrent practices regarding personality test normsIn light of the recognized standards regarding norm use, we examined technicalmanuals for eight popular personalit>' instruments: the Adjective Checklist (ACL),California Psychological Inventory (CPI), HPI, Jackson Personality' Inventory -Revised OPIR), and NEO Personality Inventory-R (NEO-Pl-R form S), OccupationalPersonality Questionnaire (OPQ), Personality Research Form (PIÍF). and 16PF Select(16PF)." The goal of our review was to assess the degree to which the noted standardsregarding norms are being met in practice. The manuals were reviewed primarily fornorm sample size and the reporting of demographics and sampling procedures. Wealso look note of the number of norm samples reported and whether or not dates oftesting and response rates were provided. Results of our review are provided in Tables1 and 2.

Several observations bear comment, the first two regarding results in Table 1 andthe remainder with respect to Table 2. First, a variety of norm groups is available forfive of the tests, including, most freqtiently, samples of liigh school students, collegestudents, and assorted occupationai categories. Second, normative sample sizes arelaige, on the whole, averages per test ranging from 695 to 22,023. Third, with respectto demographics, gender composition is most often reported, followed by educationlevel. Least often reported are ethnicity and age. Some manuals (e.g. OPQ, CPI, and

The mçar\ing of 'local norms' varies by applicatior}. fn cross-cu/tura/ reseorch, for example, tí^ey are norms specific to ocountry or language. In this article, we use the term to denote norms specific to a particular job or job type witbin a specificorganization.

We were unabie to obtain technical manuals for two other popular tests: [he Cuilford-Zimmerman Temperament Survey andMuJudimensional Personality Questionnoire; hence, we offer no summary for those tesis.

Page 4: 2 Personality Test Norms

642 Robert P Tett et al.

TCOO

ooo—1

CO

00

o1

(N

f S• * •

O —

m CO

o ü« t;ij raOt K

. 1..lo ra

T3200

i•o

DO 1^

^ Q-

— D c

.2 o

o st: >^•= ° SM- ^ .tíat • = ~x: S c

. at O)1/1 ^ >

ti iE =

S3

ï U. Q- C

2 -O OO £ Së ™ ß

_rt

O M ^. Q .

JS

O .y . _ o

s ti i0 Ë

O) at C : - ^

"5 =5 a i-

T3 OJ -•A V yj -^

_2> ñ <u

oj " -S " ai-— , j Q. T3 1/1

?- ^ - 5 -np O) ifl ra y,« Tl aj „ a>ra y , - c uS ro - ?

-c atM "O

-T. y =3 -:5

9- o *"ra u ID

o

:2 S í t at:E •— o 'a.

u

"Ö ai ou T3 _O

O 3 u> 10 M

- o IS t Î3

1 8 |

_Ó aLLj a .

z o

O

bo i- —O Q

. aia ^ c

- ~ 3

o i£cu _0Ö Z

IO

. = " - L. .

cM O2 ^

Ef o at

Í .SsO b:

i Ê

OuTO 3c °O M

-il ^O m

0)

5 O•D O

< X

d "*< E

o TÍ E

Page 5: 2 Personality Test Norms

Personality test norms 643

oo.

o • "a. O)

E "o« ¡u

Û

01 re

re ^

E «o "s

^ iE -o

-7 re

O O

o o o o o o o o

o o o o fN O— O

i ^ o o o o o —Ul sO ^ o o "í"

o o o o o o o oo LO o o

O O O O O O - 4 - Oo o o fS o

o o o o o o —o o o o o o rv.

— u

"~ oM *^

V

^ O

2 £~CG Tí

.£ o 15J

c£ .3!

re o

O ^^^

ra a)

Q. (x:U IJ ^ >•

s S— aj

o; ^

G" po 4:

o 00

Jz X

±i := r. m

X 5

a1-1 ^ >.^ ' p

2 of 'su — c01 û- O

c -C.9 -oQÍ t .

O

U

^ 00

< o

u <«< c

1- ûi

ifre rac «.2 c

S-S0

Page 6: 2 Personality Test Norms

644 Robert R Tett eí al.

PRF) offer additional descriptive data, such as work functions, industries, and joh titles,which is more informative than simply adults' or 'managers. Fourth, few manualsreport specific sampling procedures (e.g. random, clu.ster. stratified). Along the samelines, many of the norm groups are convenience samples. For example, the ACL wasnormed, in part, on local' school systems. Fifth, dates (e.g. years) of testing andparticipation rates are rarely reported. Tlie OPQ and I6PF manuals are the (inlysources consistently providing dates of te.sting.

All told, our review of personality test manuals yields mixed results regardingcompliance with recognized standards of norm use. On the plus sitie, the majorit>' ofnorm samples appear to be ample in size, norms tor most tests are available for a variety- ofpopulations, and hasic descriptive information is offered in most cases. More challengingare the lack of descriptions of sampling procedures, reliance on convenience samples,and failure to report dates of testing and participation rales. A more fundamentalquestion, however, is whether or not these things matter when a test user turns toavailable norms as a frame of reference for interpreting the scores of a given indivtdiuil orgroup. It is this latter question that the current article seeks to address, hi particular, wefocus on both norm sample size and population relevance with respect to how each canaffect an individual's personality profile.

Research questions and overviewWe assessed the effects of sampling error and population relevance in terms of thestandardized Escore distribution. 7" is deñned as

T = [WiX - M)/s] +'iO, I

where vY is an individual's rawtest score, and^Wand sare the normative mean and standarddeviation, respectively. A person's true 7^score is derived when M and .v are accurateestimates of ¡x and a.'' Thus, to the degree a given sample mean (M) over- orunderestimates /x, a person's observed 7^score w ill be inaccurate. \f M underestimatesju, 7" will be overestimated, and if M overestimates fi, 7" will be underestimated.

It is well understood that random error in estimating fi decreases monotonically assample size (TV) increases. This is directly evident in the equation for the standard errorof the mean:

cSE M ^^

which is the expected standard deviation of means from multiple samples of size A'drawn randomly from a population. We used the standard error to determine upper andlower estimates of fJ- (:is AT) imder different A's and different levels of certainty. Widerintervals, at any specified level of certainty, pose greater concerns in interpreting a givennorm-transformed score. What is considered an acceptable margin of error in estimating\i. however, is unclear. We suggest that an interval of S T-scorc points (i.e. ± 2.5),representing the middle 20% of the normiU distribution (i.e. ± 10% around |x), is worthyof concern, as error of that magnitude could be expected to alter test scoreinterpretations in practically meaningful terms. Tlius. for example, underestimating aperson's T-score by 2.5 points (due to overestimating fx by 2.5 points) would place that

' Error in s as an estimate of a is iess than error in M o% on estimate of ft and is ignored in the current undenoking.tnaccuraàes orising from measurement error ore ignored here, as well

Page 7: 2 Personality Test Norms

Personality lest norms 645

person up to 10 percentile units lower on the scale. In a selection situation, this meansthat 1 in 10 applicants could be falsely rejected based on the test score cut-off. If theFscore is overestimated by 2.S points (due to underestimating fx by 2.5 points), theindividual stands a l-in-K) better chance of being .selected based on the cut-off. In brief,error in estimating fj. undermines the fidelity of the test score cut-off, increasingthe likelihood of either hiring an unqualified applicant or not hiring a qualifiedapplicant. * Similar concerns arise in developmental applications.

The degree of over- or underestimation in percentile units varies along the T scorecontintium, owing to the curvilinear relationship between T (a transform of z) andpercentiles. Tbis is revealed in Table 3 for the case where ¡x is under or overestimated(asjW) by 2.5 T score points. The noted 10% difference occurs only at the middle of thedistribution. Specifically, if an individual's true T score is 50 and M overestimates /x by 2.5points, then the individuals observed T-score will be 47.5, dropping that individual'sstanding by 9.87 percentiles. The same degree of overestimation of /x has less impact inpercentile units for people whose true 7 scores depart from 50. For example, as indicatedin Table 3, someone with a true 7'-score of 45 (and where ¡x is overestimated by 2.5 points)will fall 8.19 percentiles below his or her true percentile of 30.85. The drop in percentilesreduces to around one when true T ^ 30. We advocate the ± 2.5 Escore differencel-)etween M and ¿u. as a benchmark, notwithstanding the smaller difference in percentilesthat occurs at extreme values of T, because it is unclear at what point along the scalepersonality score cut-offs are most often invoked, and the mid-point, where the maximum10 percentile point difference occurs, seems a reasonable expectation in many hiringsituations.

Tlic question of population relevance concerns the representativeness of anormative sample regarding the population to wliich the individual test taker belongs.Comparing an individual's test score to a sample mean representing an irrelevantpopulation defeats the purpose of norm-based comparisons, fostering inaccurateinteqiretations. Unlike tbe effects of sampling error, which are random, the effects ofpopulation relevance are tied to systematic differences between populations, includingdemographic (e.g. age, sex) and situational variables (e.g. job type). We assessed thequestion of population relevance directly by deriving personalit>- profiles using normsfrom multiple sales and truck driver populations, allowing comparisons botb betweenand witbin job types. Differences in profiles between job types would confirm thewidely held belief that norms are specific to job types. Notable differences within jobtypes would raise concerns about the value of job-type-specific norms derived fromconvenience samples, calling for local norming.

MethodData sourcesThe data were derived from a large archival database of HPI responses at HoganAssessment Systems (HAS). The HPI was the first measure of normal personalitydeveloped explicitly to assess the Five Factor Model in occupational settings.Tbe measurement goal of the HPI is to predict real-world outcomes, and it is an original

What one considers an acceptable margin of error in practical terms is subjective. Readers targeting error rates below ¡0%will seek a T-score margir) less than ± 2.5 units in widtí), calling for normative samples larger and more representotive of thetest-taker's population than the standards advanced in the current article.

Page 8: 2 Personality Test Norms

646 Robert P. Tett et af.

oo

80

oo

/b

oo

708

65

S

60

oolOLO

8

50

ooro

OLOrM

O

orM

om

oo

otn

o

ooo

coON

0

00roONO

rM

97

rMro

93

m

84

LO

sO

O

oo

oLO 77

otn

P!

otn

67

oLOrM

OmIV

m

OLnrMm

O

m¡i:

tn

rM

LO

rM

tnIV

tnrM

LOrvO

tn

o

LO

O

orvON

00rvco

ONON

mON

ON00

roIV

rv00ON

in

o

IV

O

OvOO

rorv

0000ro

ON|vNO

corMCTN

IVaiON

otn

co

omIVIV

oLOrMIV

Otnrv•.o

oLO

NO

oin

r-.LO

oLOrMtn

LOrMm

tnfv

rM

in

rM

LOrv

mrM

lOrvo

tnrMO

O N

ONON

OrvONON

COrvcoON

ONON

tn

00

rorv

IV00ON

m

OO

ino

NO

o

tvNO

r4

rotn

19

00

ceo>.

o o mo LO COin Ö Ö

O O IVp p 00Ö — in•<r I —

o o 00p p rMÖ rM rM

o in NO —Lri rM LO rorv — O lo

o tn — rvLO o *Or-i — ^ rj

o o m

p p —Ö ro ÖrM I

o LO o rM

p po Ö

O tn ro 00LO rM — rMr. o o o

o LO NO O^tn rv NO h*rM Ö rM NO

•^ 1 r M

IV — o ro

c co O

O N Û.-°.OJ3O

Page 9: 2 Personality Test Norms

Personality test norms 647

and well-known measure of the Five Factor Model. Eleven normative samples were usedin the main analyses. Five are from sales people, four are from truck drivers, and theremaining two are, respectively, the combination of the five sale.s .samples and the fourtrucking samples, using the iV-weighted mean and the standard deviation for combinedgroups (Ferguson. 1959). Sales and trucking jobs were targeted, in part, due to theavailability of multiple subsaniples within each job family and because the job familieswere expected to yield distinct norms.^ The subsamples in botb jobs were selected onlybecause tbey were the largest available, ranging in A from 953 to 6.200 in tbe case ofsales (mean — 2,977). and from 394 to 2.520 in the case of truckers (mean — 971). TotalTVs for the two broader samples are 14,885 (sales) and 3.H85 (truckers). Means andstandard deviations for the seven HPI scales in tbe main samples are reported in Table 4.Norms for nine additional samples, three from each of the clerical, managerial, andfinancial job families, were drawn from the HAS database using the same criteria toassess the generalizability of the main results. These additional norms, ba,sed on Msranging from 609 to 13,450, are provided in Table 5. All samples consist entirely of jobapplicants, Alpha reliabilities range from .30 to .87 across scales and samples, withmedian = .77.^

Data analysisEffects of sampling error in estimating the normative population mean were assessedusing ï^scores with /J. = 50 and rr = 10, at 5 levels of certainty: 50%, (corresponding toz = ±0.67 standard errors), 68% (z = ±1.0), 80% (z = ±\.28), 95% (z= ±í.96), and99% (2" = ±2.58). Standard errors were generated using the equation provided earlier,for selected A's ranging from 5 to 10,0(H), and with a = 10. Lower bound estimates of /xwere calculated by subtracting from 50 the product of tbe standard error and the zassociated with tbe given level of certainty (e.g. 1.96 tor 95% certainty), and upperbound estimates of fi were calculated by adding that product to 50.

Effects of population relevance were assessed b)- deriving HFI profiles for eacb of the11 primar)- normative samples, assuming the individual scored at the mean on eachscale. Profile comparisons between tbe overall sales and trucking norms would speak togro.ss misapplications (e.g. applying trucking norms to raw scores tor .salespeople),whereas comparisons witbin sets wouid permit evaluation of less obvious misuses (e.g.applying sales norms from one organization to the raw scores of a salesperson fromanother oi^anization). In the between-set comparison, the sales profile was generatedusing the combined trucking sample as tbe norm sample, and the trucking profile wasgenerated using the combined sales sample as the norm sample. In the within-sales andwithin-trucking comparisons, profiles for each sample were generated by comparingscores falling at the mean for tliat sample against tbe combined norms from theremaining samples in the given job category. Lack of bias would be evident in a profileforming a flat line falling at the 50 7^score mark. We adopted the ± 2.5 7-scorebenchmark, corresponding to the middle 20% of cases in a normal distribution, in

As the main point here is to examine the variability in norms across samples with/n job families, between job differences servemore os a ber\chmark for comparison than as a key focus of study per se.'The lowest alphas are for Ukeability (UK: ranges .30~.57, median = .41: all other scales: range = .S9-.87,median — . 78). In addition to being more heterogeneous in content. LIK is also the shortest scale, with 22 items relative to 37in adjustment, the longest scale. (Correcting to 37-item lengua, using the Spearman-ñrown formula, yields range = .42-69.median = .53). Of particular relevance to the current effort, the modest alphas for I./K suggest that normative differencesbetween samples on thot scale underestimote those expected for more reliable scales tar^ting similar constructs.

Page 10: 2 Personality Test Norms

648 Roben R Tett et al.

U

tn o ro 00 00 00 rMON p p rv in ro NÛ

— rv NO Ö ro rv' Öro rM — rM ol — —

ON ro h«, tn tn rM 00p •^ -i IV tn ON 00— |v rv' Ö rj rv Öfï rM — fM rM — —

00 00 NO 00 — ^> oro — rM ^ NÛ ro ^— IV Ln Ö ro NO Öro rM — rM rM — —

coÖ

rM ^ rM O o NO r00 "^ Lrj — "^ 00 •^ro — ro — ro ro H

CO

oiro

tnocdrM

ON

rMod Ö

rM

COrrWf^

00—cd

00ooÖ

•6HO

< < LO

U s 3

c41

z

o—'.ro

COrou-ir

mtnoi—

rMÖ

IVNO

roO)

00 NOtn ON

u

— ro •^ 5; — Q —rv rv Ln 00 NO CD roON ro — ON ro -^ 00rM rM — — rM —

O'

.2/

•>o

00rvro

.11

rM

OrM

.46

o

IV o« oON Ln m— ro ^

00 p LnON "^ ^— rM —..

O N• ^

ro

.45

CO

a.3Q1_BO• a

<uc'3E0un

NO

d^ON• ^

rM

00

—'.^—

00

ÖrM

^^ON

oirM

COpun

Oo00

û X CTC^ rM ro (V ro rotn ro Ö ro NO ÖOJ — rM r«t — —

3 J

Q

oVc

£0

Page 11: 2 Personality Test Norms

Personality test norms 649

Table 5. Normative means and standard deviations for three clerical, three managerial, and threermanclal samples

Job family/HPI scale

ClericalAdjustmentAmbition

SociabilityLikeabilityPrudenceIntellectanceLearning approach

ManagerialAdjustmentAmbitionSociabilityLikeabiiityPrudenceIntellectanceLearning approach

FinancialAdjustmentAmbitionSociabilityLikeabilityPrudenceIntellectanceLearning approach

Mean

( N -31.8624.9513.4620.9624.4315.6711.19

(N =31,4227.1514.6320.2924.1716.8511.08

( N -31.9326.6914.9720.8224.4616.6710.78

SD

13.450)4.163.634.301.193.444.542.55

8,089)4.442.494.511.403.634.592.74

4,484)4.212.684.251.153.574.402.66

Mean

(N =31.9826.9516.3820.9323.4018.3511.16

(N =32.7327.6816.5520.4023.4518.6511.38

(N.32.1527.7517.0220.6323.2216.4610.54

SD

11,299)4.082.524.081.143.764.082.50

2.032)4.002.164.981.443.704.832.71

^800)4.321.824.041.313.804.512.71

Mean

{N--32.1125.9314.3620.9424.0417.6211.31

(N31.3427.1114.9220.4823.0917.479.73

(N31.1926.7216.2720.6122.6416.6910.67

SD

= 6.406)4.193.154.401.203.674.312.48

= 777)4.492.574.491.333.794.263.18

= 609)4.652.124.171.453.904.282.60

offering practical guidance on norm use. We adopted a similar strategy in replicationusing the nine samples from clerical, managerial, and financial ¡oh families. Rather thandraw comparisons between jobs, however, we focused on within job comparisons,creating profiles for individuals falling at the HPI means from one sample using normscombined across the remaining two samples per job family.

Results

Upper and lower bounds of intervals around a T-score fj. = 50 under various A's andlevels of certainty are reported in Table 6. The table shows, for example, that whenN - 100, 80% of sample means are expected to fall between 48.7 and 51.3. Increasing Ato 300 yields 49.3 and 50.7 as the lower and upper 10% boundaries. What is perhapsmost notable in this table is the stability' of ^ as an estimate of /i with even modestsample sizes. With N ^ 100, t<)r example, 99% of sample means are expected to fellwithin a relatively narrow interval of ± 2.6 ï^score points (i.e. 47.4-52.6). Thus, withrespect to sampling error alone, an individual's T-score falling at the true populationmean (i.e. 50) will be overestimated as no Iiigher than 52.6 and underestimated as nolower than 47.4, using norms from 99% of samples with A' = 100. The range ofdistortion in observed T reaches a noisy 10-point span (i.e. 45-55) within 99% ofsamples only when A' drops below 30.

Page 12: 2 Personality Test Norms

650 Robert P. Ten et al.

CO

a)>03•DCn2• o

oc+II

Ö+1

î inQ r4

> unO fS

0)

Ü

lab

.00;

oin

11

UI

oo

10.

s

re

E

Z

^n ^^ ^H CO ^J * ^~ ^D C^ ^D ^^ 00 LO f ï f^ ^^ 00 ^^ ^ f ^^v^ CO ^3 ^1 ^^ ^^ ^f f^ ^ ' l O4 ^^ ^^ ^^ ^^ ^^ ^3 ^^ ^ j ^^ ^^

fo ^^ r^ ^r ^r i ^ ^o ^0 ^* ^^ r CO oo co oo ÎT* ^^ ^^ ^^ ^^

00 ^^ ~~^ ^r i^ D "^ 00 f^ ^^ ^0 " ^ ~~^" d ^^ ^^ ^û ^^ ^n ^1

CO ^0 '•O ^^ f^ r O C ^^ f^ ^^ ^^ ^^ ^^ ^3 ^J ^3 ^^ ^3 ^3

'T ^ i r*4 T^ f i f^ D rO ^^ ^0 f^ i^ ^* 00 r ^0 O ro f^ ^J

r ^ L O ^ f o r r i r n o i r ^ - ^ — — — Ö Ö Ö Ö Ö O O Ou S ^ r n f S f N ( N ( N — — — • — ' Ö Ö Ö Ö Ö Ö Ö Ö O

^ m f N f ^ i N — — — — — O Ö O Ö Ö Ö Ö Ö Ö O

Ijft ^0 ^v P"^ CO OO 00 CO CO ^ ^ ^ ^ i ^ O^ O^ O^ ^ ^ ^ ^ ^ ^ ^ ^ ^ *

f * i r - i — — —^ — — — Ö Ö Ö Ö Ö Ö Ö Ö Ö Ö Ö Ö

i r r n r N f M f N — — — — — Ö Ö Ö Ö Ö Ö Ö Ö Ö Ö

m o m o i n o o o m o o o o o o o o o O Q

— (N in Ö

Page 13: 2 Personality Test Norms

Personality test norms 65 í

The effects of population relevance on HPT profiles are depicted in Figures 1-3-hi Figure 1, HPI 7^score profiles are plotted for hoth the combined sales and thecomhined trucking samples, based on hypothetical raw scores falling at the mean oneach scale and using the other combined group as the reference sample in each case.Differences between samples var\- across HPI .scales. The largest difference arises onsociahility (15.4 T^score points) and the smallest difference on j^rutlence (.4 points). Theaverage difference is 7.3 T^score points. Figure 2 depicts profiles for the five salessamples and Figure 3, for the four tnicking samples. Notable differences are evidentwithin each figure, 7-scores on ambition and sociability in the sales group, in particular,vary considerably across samples {range = 40.0-55.4 for ambition; 42.7-57.6 forsociability). The largest differences within the trucking norm set arise for learningapproach (range = 44.1-56.1). The average maximum differences in tbe sales andtrucker groups (i.e. averaging across the seven scales in each case) are 7.4 and 7.5,respectively.

Within job family differences in HPI profiles for each of the clerical, managerial, andfinancial job families are depicted in Figure 4. The largest differences are evident in theclerical samples, where, for example, /^scores on ambition, sociability, and intellectancevar)' by more than 10 points between samples A and B. The largest differences in themanagerial samples are for sociability (7.2 points), intellectance (7.0), and learningapproach (6.6); and the largest differences in the financial samples are for sociability(8.6), prudence (8.4), and ambition (7.1). Averaging differences across all seven HPIscales within job fomiUes yields 5.5 for clerical jobs, 5.2 for managerial jobs, and 4.6for financial jobs. These are smaller than tbe averages from sales and trucking (i.e. 7.4and 7-5), but 10 of the 21 HPI scales-in-jobs exceed the ± 2.5-point standard adoptedhere, corresponding to a 10% decision error rate with a 7" = 50 cut-score.

DiscussionOur goal was to clariiy hest practices regarding use of personality tests in work settingsby assessing the impact of normative sample A' and population relevance on thereliability of judged personality test scores. Where personality scores arc standardized

HPI profiles for sales and trucking

- -•- Sales —»—Trucking60.0-157.5-55.0-

2 52.5-§ 50.0h 47.5-

45.0-42.5-40.0

<D

rud

a.

CD

ect

:ell

HP! scale

Figure I. HPI profiles for sales and trucking.

Page 14: 2 Personality Test Norms

652 Robert P. Tett et al.

60.0-157.5-55.0-52.5-50.0-47.5-45.0-42.5-40.0

HPI profiles from 5 sales samples raw score = mean

•- A — • — B —A—C — • - - D ^*«—E

.E a.E <CO

HPI scale

Figure 2. HPI profiles from five sales samples.

using norms (e,g. expressed here in terms of T scores), we foimd that over andunderestimation of /x due to random sampling error is relatively minor as A' exceeds 1 ()().Clearly, such errors will decrease as A' increases, but the gains diminish quite rapidly inpmctical terms at A' = 300 and above. We suggest that, with respect to sampling erri>rak)ne, norms based on /Vs as low as 100 need not raise serious concerns regarding norm-based test score interjiretations. This may be surprising to some test users, as testdevelopers typically tout normative sample sizes well in excess of SOO in their testmanuals in a spirit of 'more is better. Although precisely true, the practical merit of.samples exceeding A' — 100 is generally weaker than the effect of choosing one normsample over another, even from within the same job category.

Pscore profiles generated using sales and trucking norm sets in the currentundertaking revealed a mix of similarities and differences. Profiles based on thecombined sales and combined inicking norms (see Figure I) are notably discrepant,exceeding the ± 2.5 7 score point benclimark (corresponding to ± 10 percentile points

HPI profiles from 4 trucker samples raw score = mean

- - • - A —•—B —A—C —•- -D60.0-157.5-55.0-

£ 52.5-§ 50.0-y^ 47.5-

45.0-42.5-40.0

CO

g Q .E <

HPI scale

Figure 3. HPI profiles from four trucker samples.

Page 15: 2 Personality Test Norms

Persono/rty test norms 653

60.0-,57.5-55.052.5 H50.047.545.0 H42.540.0

HPl profiles from 3 clerical samples raw score = mean- A — • — B —ä—C

X

E <

HPl scale

60.0-157.5-55.052.5-50.0-47.5-45.0-42.5-40.0

HPl profiles from 3 managerial samples

E

<

laE<

nro'oOw

HP! scale

arni A

|

HPl profiles from 3 financial samples

B

60.0 n57.5-

55.0-

52.5-

50.0-

47.5-

45.0-

42.540.0-

r

Adj

ustm

ent

Am

bitio

n

Soc

iabi

lity .£-

15(0

o

HPl scale

^ • ^

Pru

denc

e

' ^ -

^^

Inte

llect

ance

Lear

ning

App

.

Figure 4. HPl profiles from three clerical, three managerial, and three financial samples.

Page 16: 2 Personality Test Norms

654 Robert ?. Tett et al.

at the middle of the distribution) for five of the seven HPI scales (all but prudence andadjustment). Ttie average difference of 7.3 Escore points corresponds to ± 14 percentilepoints. HPI sociahility yielded a difference of 15.4 T score points, corresponding to ± 28percentile points. Such errors support the widely held belief that an individual's testscores bear comparison to norms representing the same job category to which thatperson belongs.

Notably, however, similar differences are evident within both the sales group and thetrucking group (averages — 7.4 and 7.5 7 score points). Large differences were ob.servedtor both ambition and .sociability among the sales groups (15.3 and 14.9. respectively),corresponding in each case to approximately ± 27 percentile points. Thus, someonefalling at the mean of their local cohort on either of these two scales could be judged asfalling as low as the 23rd percentile or as high as the 77th percentile when compared toothers in the same job category at other organizations combined. (The situation worsensif norms are based on any single organization rather than combining across organizationsas the latter averages out extreme values. ) Such discrepancies are especially problematicgiven that ambition and sociability are arguably among the most relevant traits in selectingand developing sales people and are, therefore, likely to be prime targets of concern.

Similar, albeit weaker, discrepancies emerged within the clerical, managerial, andfinancial norm .sets. Prudence, the most closely related of the seven HPI scales toConscientiousness, yielded maximum T score differences of 4.7 in both the clerical andmanagerial samples, and 8.4 in the financial samples, corresponding to ± 9 percentilepoints and ± 16 percentile points, respectively To the degree that prudence is relevantto performance in these jobs/ use of non-local job family norms in each case, especiallyin the financial samples, could lead to non-trivial errors in judging the relative merits of ajob applicant or current employee with respect to true local standards.

Our results suggest that differences in norm-based standard scores within the samejob categor>' can be similar to those derived between job categories, challengingreliance solely on job type as a basis for judging the suitability of a normative sample.Underlying the noted differences are any of a host of demographic and situationalvariables with possible links to personality' scale scores. Identii 'ing all the variables thatmight explain the differences depicted in Figures 1-4 is beyond the scope of the currentdiscussion. Some possibilities, based on available demographics, are reported in Table 7.To assess the linear effects of these variables on the HPI means, we regressed the means,per scale, on to proportion white, proportion black, proportion male, and mean age(/V = 18 samples**). Differences among the five job families were assessed by enteringfour corresponding dummy-coded variables in the first step. Results are reported inTable 8. Step 1 results show that the sample means vary among the five job families forall HPI scales except adjustment and prudence. Additional effects are evident in resultsfrom Step 2. Specifically, after controlling for job family effects, ambition means arelower in samples with higher %blacks, Likeability means are higher in samples wiihhigher %whites, prudence means are lower in older samples, and learning approachmeans are lower in samples with higher %males. Whether or not these findings replicatein larger sample sets (i.e. with A' > 18 samples) is a matter for further research.

^Conscientiousness is reievant to performance in most jobs; e.g. Borricfc ond Mount f /99/) .Missing mean ages for the three clerical samples were substituted by the mean from the remaining 15 samples. Results for

mean age based on the IS samples reporting useable data were very similar to those obtained using mean substitution and areavailable on request Also, the remainirig ethnic groups were not ossessed owing to their relatively small proportions within thenormative samples.

Page 17: 2 Personality Test Norms

Personality test norms 655

Table 7. Norm sample demographics

Sample

SalesABCDEWeighted mean

TruckingABCDWeighted mean

ClericalABCWeighted mean

ManagerialABCWeighted mean

FinancialABCWeighted mean

White

46.185.173.967.466.765.0

81.079.650.525.971.7

78.360.156.467.2

50.650.969.052.0

66.077.186.069.5

Black

40.06.9

17.316.116.723.2

9.5l i .e33.835.515.3

4.17.4

10.46.6

27.443.415729.5

21.010.82.4

17.7

Ethnicity

Hisp.

10.94.35.0

15.28.38.4

4.86.1

15.735.59.3

10.422.123.017.2

19.13.8

10.415.6

7.76.86.47.4

(%)

Asian

2.43.03.40.68.32.9

4.80.40.02.63.4

6.27.76.96.9

2.31.93.42.3

5.25.25.25.2

Native Amer.

0.60.60.30.60.00.5

0.02.10.00.40.3

1.02.73.32.1

0.70.01.50.6

O.I0.20.0O.I

Gender (%)

Male

44.557.348.647.436.848.9

59.198.899.296.872.9

11.544.028.126.7

56.349.167.155.7

38.367.69.5

39.3

Female

55.542.751.452.663.251.1

40.91.20.83.2

27.1

88.556.071.973.3

43.750.932.944.3

61.732.490.560.7

Mean age

33.133.330.428.334.332.5

37.536.939.236.837.5

NANANANA

32.736.636.033.7

27.737.034.329.6

Our point here is that comparing an individual to norms from the same job family can,nonetheless, pose uncertainties owing to other characteristics of the norm sample thatmay also be related to personality scale scores.

Independent research supports current findings suggesting that personality scoresare related to job category (e.g. RIASEC; liarrick, Mount, & Gupta. 200. ) anddemographic characteristics most often described in test manuals (Roberts, Walton, &Vieehtbauer, 2006). Other work-related correlates of personality have recently beenidentified. Judge and Cable (1997) report that personality is related to organizationaleuiture preferences sucli that, for example, conscientious people prefer detail-orientedand resuItSK>riented cultures. Thus, means for conscientiousness (and more specifictniits falling within that category) can be expected to be elevated in organizations withthose types of cultures. Similar results linking personality with organizational culturepreferences have been reported by Warr and Pierce (2004) and Ang, van Dyne, and Koh(2006). Along similar lines, Furnham, Petrides. and Tsaousis (2005) found that the BigFive, especially Openness to Experience, are related to work values pertaining tocultural diversity. To the degree that organizational culture and work values each affect

Page 18: 2 Personality Test Norms

656 Robert P. Tett et al.

Table 8. Regression results for effects of job family and demographic variables on HPI scale means

( N = 18 samples)

HPI scale

AdjustmentAmbitionSociabilityLikea bilityPrudenceIntellectanceLearning approach

Step 1Job family

Adjusted R^

.02

.61**

.60**

.76**- . 1 9

.45*

.63^^

Adjusted R

.70

.81

.15

.73

Step :%white, %black. %male, mean age'

Change in^ adjusted R

.09

.05

.34

.10

Sig. predictor

%black

%whitemean age

%male

ß

- . 37 *

.25*- . 78 *

- . 53 *

*p < .05; *^ < .01; two-tailed." Forced entry

Stepwise entry.'^ Mean substitution for three clerical samples.

personality scores independently of job type, personality test developers are urgedto report details t)f norm sample culture preferences and values as a basis for judgingnorm relevance in work settings.

A potentially more important variable affecting normative means on personalityscales may be reliance on job applicants versus incumbents. The question of faking inpersonality assessment has been a dominant focus of investigation for many years. Thereis now general consensus that people can fake when instructed to do so (e.g.Viswesvaran & Ones. 1999). More recently, the focus has shifted to whether or notpeople actually do fake in selection settings. Some (e.g. Arthur. Woehr, & Graziano,2000; Hogan, Barrett, & Hogan, 2007; Hough & Schneider, 1996; Ones & Viswesvaran,1998; Viswesvaran & Ones, 1999) downplay the effects of voluntary faking, whereasothers (Grifftth, Chmielowski, & Yoshita, 2007; Rosse, Stecher, Miller, & Levin, 1998;Stark, Chernyshenko. Chan, Lee, & Drasgow, 2001 ; Tett & Christiansen, 2007) argue thatit is indeed problematic. Summarizing the 'do-fake' literature in selection contexts, Tett etal.(2006) report a nieta-anai>tic mean cl effect size of .35, averaging across the Big Five(excluding Openness, whose effect is close to 0. yields a mean of 0.52). This result supportsthe applicant/incumbent distinction raised in the SIOP Principies regarding norm use, andclarifies that test developers and publishers need to diffcrentiatc nomis ba.scd on thisdistinction. Specifically, if a personality test is being used in hiring, tbe relevant nomi groupis one drawn from an applicam s;imple, as norms based tin incumbents can be expected toyield (mostly) lower means and, hence, elevated /^scores (or their cquiv-alent) inapplicants.' If targeted for use in developing persomiel, on the other hand, personality testscores bear comparison to norms derived fn)m incumbents, as reliance on appficant normswill likely underestimate individuals' true standing.

' A / / norm samples presented here, drawn from the HAS database, include only job applicants: the applicant/incumbentdistinction may be relevant to other tests used in work settings, particular^ those lacking applicant norms.

Page 19: 2 Personality Test Norms

Personality test norms 657

That personality scores may be related to a diverse array of demographic andsituational factors and, plausibly, to interactions among those variables, raises concernsregarding the generalizability of normative samples as, with increasing numbers ofdistinct correlates, comes a decreasing likelihood that a normative sample reported in atest manual is relevant to any individual not included in that sample. This goes beyond theissue of whether or not the norm sample is described in detail; tJie point is that, regardlessof such descriptive detail, norm samples are inherently specific to populations identifiedmostly by convenience, wliich are very likely to be different from the population ofinterest in specific norm applications, namely, in the case of selection, the population oflocal applicants, or, in the case of personnel development, local incumbents. The conceptof representativeness in judgitig norm suitability is, in this light, a fleeting ideal, andassuming representativeness given only a limited set of norm sample descriptors (e.g. jobtype, age, race, and gender composition) is likely to engender false interpretationsregarding an individual's or group's standing on targeted personality traits relative tt) thetrue local population.

Our findings are generally consistent with the spirit of the Standards and SIOPPrincipies regarding norms, noted in the introduction. They are particularly supportive ofmore restrictive recommendations offered in coniiinction with the internationalpersonality item pool (http://ipip.ori.org/newNoniis.htm), which explicitly offersno norms:

One should be very wary of using canned 'norms' because it isn't obvious that one could everfind a population of which one's present sample is a representative subset. Most "norms' aremisleading, and therefore they should not be used.

Far more defensible are local norms, which one develops oneself. For example, if one vrants togive feedback to members of a class of students, one should relate the score of each individual tothe means and standard deviations derived from the class itself.

ConclusionsOur review of current standards and practice regarding use of personality test normsand our findings driven by basic statistical principles and real data suggest the followingconclusions.

(1) Sample size has little practical impact on the reliability of normative means and onstandard seores and corresponding percentiles thereby derived, once an A' ofaround 300 is reached. Test users need not be overly war>' of nt)rms based on A' ofeven 100. Test developers are urged to seek norms for more diverse groups basedon modest A's rather than seeking larger samples per se.

(2) Beyond A'— 100, norm sample composition becomes the more importantconsideration. Notable discrepancies in personality profiles are likely not onlybetween job families (e.g. sales vs. trucking in the present case) but also within ¡ohfamilies (based on samples from different organizations). Sucb differences withincategories raise concerns about the usefulness of norms provided in test manuals,which typically offer little more than job category and basic demographicdescriptors as bases for judging nt)rm suitability.

(3) Personality scores vary for reasons other than those targeted in standards andprinciples regarding norm use. Organizational culture, work values, and incumbentversus applicant settings, all of which var\' independently of job category and basicdemographics, are also worthy of considenition in judgments of norm relevance.

Page 20: 2 Personality Test Norms

658 Robert P. Tett et al.

(4) The diversity and complexity of factors affecting personality scale scores encourageuse of local norms over those provided in test manuals. That A need not beimpractically large (e.g. 100) favours such efforts in furthering organizationallymeaningful personality test score interpretations, especially for use in personneldevelopment and selection.

(5) Use of general norms may have merit at the group level (e.g. assessing where thesales group at Company A stands in relation to national sales people). Special effortsare required, however, to ensure that the general population, defined explicitly interms of diverse personality correlates (e.g. job category, demographics, applicantvs. incumbent, organizational culture, work values), is suitably represented by thenormative sample. Strategies for developing such norms are worthy of futureresearch.

AcknowledgementsAn earlier version of this article was presented at the 21st Annual Conference of the Society forIndustrial and Oi^anizational Psychology, May, 2006, Dallas, TX.

ReferencesAmerican Psycholopicai Association (1999). Standards for educationetl and psychological

testing. Washington, DC: American Psychological Association.Anastasi, A., & Urbina. S. (1997). Psychological testing. Upper Saddle River, NJ: Prentice Hall.Ang, S., van Dyne. L., & Koh, C. (2006). Personality correlates of the four-factor model of cultural

intelligence. Group and Organization Management, JÍ, 100-123.Arthur, W., Wííehr, D. J., & (iraziano, W. G. (2000). Personality testing in employment settings:

Problems and issues in the application of t>'pical selection practices. Personnel ftei'iew, 30,657-676.

Barrick, M. R.. & Mount, M. K. (1991). The big ñvc personality dimensions and job performance:A meta-analysis. Personnel Psychology, 44, 1-26.

Barrick, M. R., Mount, M. K., & Gupta, R. (2003) Meta-analysis of the relationship between thefive-factor model of personality and Holland s occupational types. Personnel Psychology, 56,45-74.

Bartram, D. (1992). l l ie personality of UK managers: 16PF norms for short-listed applicants.Journal of Occupational and Organizational Psychology, 65, 159-172.

Cook, M., Young. A.. Taylor. D., OSbca, A., Chitashvili, M.. Lepeska. V, et aL (1998). Personalityprofiles oí managers in former Soviet countries: Problem and remedy. Journal of ManageriaiPsychology, 13, 567-579.

Costa, P T, & McCrae, R. R. (1992). NEO PI-R Professional Manual. Lutz, FL: PsychologicalAssessment Resources. Inc.

Crocker. L.. & Algina, J. (1986). Norms and standard scores. In Introduction to classical andmodem test theory (Chapter 19). New York: Harcourt. Brace, and Jovanovich.

Ferguson, G. A. (1959). Statistical analysis in Psycholog}- and Education. New York:McGraw-Hill.

Furnham. A., Petrides. K. V. & Tsaousis. I. (2(K>5). A cross-cultuntl investigation into therelationships between personality traits and work values. Journal of Psychology:interdisciplinary and Applied, ¡39, 5-32.

Gough, H. Ci., & Bradley. P. (1996). California Psychological Inventor)' manual. Palo Alto, (;A:Consulting Psychologists Press.

Gough. H. G., & Hcilbnin. A. B., Jr. (1983). The Adjective Checklist Manual. Mountain View, CA:Consulting Psycbologists Press.

Page 21: 2 Personality Test Norms

Personality test norms 659

Griftith, R. L., Chmielowski, T., & Yoshita, Y. (2007). Do applicants fake? An examination of thefrcqucriLy of applicant faking behavior. Personnel Revieiv, ^6. 3-4l-3'>'>.

Hogan, J.. Barrett, P. & Hogan. R. (2007). Personaiit>' measurement, faking, and employment.selection, yo/vr««/ of Applied Psychology, 92, 1270-1285.

Hogan, R., & Hogan, J. (1995). Hogan Personality Inventory M an u a t (,2ni\ ed.). Tulsa. OK: Ho íiinAssessment Systems.

Hough, L, M., ¿t Schneider, R. J. (1996). Personality tniits, tiixonomics and iippliciitions inoi^nizations. In K. R. Murphy (Ed.), ¡näivieliuU differences and behavior in vrganiziUions(pp. 31-88). San Francisco, CA: Jossey-Bass.

Jackson. D. N. i\^)^)4). Jackson Personality Inventory - Revised manual. Port Huron, MI: SigmaAssessment Systems.

Jackson, D. N. (1999). Personality Research Form manual. Port Huron, MI: Sigma AssessmentSystems.

Judge, T. A.. & Cable, D. M. (1997). Applicant personality, organizational culture, and organizationalattniction. Perstmnci Psychology, 50. 549-394.

Kelly, M I,. (Kd.). (1999) 16PF Select manual. Champaign, 11.: Institute for Personality and AbilityTesting. I

Kline, F (1993). The handbook of psychological testing. New York: Routledge.MulIer,J., & Young, R. (1988). An evaluation ot psycbological tests in tbc selection process for EEG

technician trainees. American Journal of FJ-Xi Technology. 2.Í, 1Í7-I58.Ones, JJ. S., Si. Viswesvaran. C. (1998). The cifccts of social desirability and tiiking on personality

and integrity assessment for personnel selection. Human Performance, 11, 245-269.Roberts. B. W.. Walton. K. E., & Viecbtbauer, W. (2006). Patterns of mean-level cbange in

personality traits across the life course: A meta-analysis of iongiliidinal studies. PsychologicalBulletin, ¡32, 1-25.

Rosse, J. G., Stecber, M. D., Miller, J. L., & Levin, R. A. (1998). The impact of response di.stortion onpreemployment personality testing and biring decisions, yowrn«/ of Applied Psycbolog}', fi3,634-644.

SHL Group (2006). Occupational Personality' Questionnaire 32 technical manual. TbamesDitton: SHL Group.

Society for Industrial and Organizational Psycbology (2003). Principles for the validation and useof personnel selection procedures (4th ed.). Bowling Green, OH: SIOP

Stark. S-, Cbtrmyshcnko. O. S.. Chan, K.. Lee, W. C, & Dni.sgow, F (2001). Effects of tbe testingsituation on item responding: Cause ior concern. Journal of Applied Psycholog}; 86. 943-953.

Ten, R. P, Anderson. M. G., Ho, C. L., Yang, T. S., Huang, L.. & Hanvongse, A. (2006). Seven nestedquestions about faking on personality tests: An overview and interactioni,st miKlel of item-levelre.spon.se distortion, hi R, (iriffnh (Fd.), A closer examination of applicant faking behavior.Greenwich, CT: Information Age Publishing.

Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A re.sponseto Morgeson, Campion, Dipboye Hollenbeck, Murphy, and Sclimitt (2007). PersonnelPs}-chology\ 60, 967-993.

Van Dam, K. (2003). Tmit perception in the employment interview: A five-factor modelperspective. International Journal of Selection and Assessment, II, 43-'>'5.

Viswesvaran, C, & Ones, D. S. (1999). Mcta-analyses of fakability estimates: Implications forpersonality measurement. Educational and Psychological Measurement, 59, 197-210,

Warr, P, & Pearce, A. (2004). Preferences for careers and organizational cultures as a function oflogically related personality traits. Applied Psychology: An International Review, 53<423-435.

Received 31 May 2007; revised version received 4 June 2008

Page 22: 2 Personality Test Norms