Top Banner
CHAPTER 24 Assessing Personality and Psychopathology With Self-Report Inventories YOSSEF S. BEN-PORATH 553 EARLY HISTORY 554 Allport’s (1937) Critique 555 Ellis’s (1946) Review of Personality Questionnaires 556 Summary of Early History 558 CURRENT ISSUES IN SELF-REPORT INVENTORY INTERPRETATION 558 Approaches to SRI Scale Construction 558 Approaches to SRI Scale Score Interpretation 559 Generating Scores for SRI Interpretation: Standard Score Derivation 562 Threats to SRI Protocol Validity 566 FUTURE DIRECTIONS FOR SELF-REPORT INVENTORY RESEARCH 572 Approaches to SRI Scale Score Interpretation 573 Standard Score Derivation for SRIs 574 Assessing Threats to Protocol Validity 574 CONCLUSION 574 REFERENCES 575 Self-report inventories (SRIs) have been a mainstay of assess- ment psychology for over seven decades. Researchers and clinicians use them frequently in a broad range of settings and applications. These assessment devices require that the test taker respond to a series of stimuli, the test items, by indicat- ing whether, or to what extent, they describe some aspect of his or her functioning. The response format varies from a dichotomous “true” or “false” to a Likert scale indication of degree of agreement with the statement as a self-description. Some SRIs focus primarily on abnormal functioning or psy- chopathology, whereas others concentrate more on the normal personality range. Still others cover both normal and abnor- mal aspects of personality and psychopathology. In addition to their relative emphasis on normal versus ab- normal personality, self-report inventories differ in several no- table ways. One important variable is their conceptual basis. Some SRIs are developed with guidance from a particular per- sonality or psychopathology theory or model, whereas others are based more on the results of empirical analyses. In this con- text, Ben-Porath (1994) noted that personality test developers have pursued their efforts from two broad, non–mutually ex- clusive perspectives. One approach, the clinical perspective, is designed to produce clinically useful instruments that help de- tect psychopathology. Self-report inventory developers who follow this approach typically are clinically trained psycholo- gists who focus on conducting applied research. On the other hand, test developers working from the normal personality perspective typically have backgrounds in personality or de- velopmental psychology and often seek to construct measures of normal-range personality constructs that can serve as tools in basic personality research. The clinical perspective on self-report instrument develop- ment has its origins in the psychiatric medical model, in which psychopathology is viewed generally as typological in nature and measures are designed to identify membership in distinct diagnostic classes. In contrast, the normal person- ality perspective has its origins in the field of differential psy- chology. Its focus is on personality traits, and dimensional constructs are used to describe meaningful differences among individuals. As just noted, the two perspectives are not mutu- ally exclusive. Tests developed from the clinical perspective have been found to be quite useful in personality research, and normal-range personality inventories are used in a variety of clinical applications. Self-report inventories can also be distinguished in terms of the approaches used to construct and interpret scores on their scales, the methods used to derive standard scores for these scales, and the availability and types of scales and techniques designed to monitor individuals’ test-taking attitude and its impact on scale scores. This chapter first describes the history and development of SRIs and summarizes early criticisms of this technique. Next, current issues in SRI interpretation are described and
25

Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Jan 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

CHAPTER 24

Assessing Personality and PsychopathologyWith Self-Report Inventories

YOSSEF S. BEN-PORATH

553

EARLY HISTORY 554Allport’s (1937) Critique 555Ellis’s (1946) Review of Personality Questionnaires 556Summary of Early History 558

CURRENT ISSUES IN SELF-REPORTINVENTORY INTERPRETATION 558Approaches to SRI Scale Construction 558Approaches to SRI Scale Score Interpretation 559Generating Scores for SRI Interpretation:

Standard Score Derivation 562

Threats to SRI Protocol Validity 566FUTURE DIRECTIONS FOR SELF-REPORT

INVENTORY RESEARCH 572Approaches to SRI Scale Score Interpretation 573Standard Score Derivation for SRIs 574Assessing Threats to Protocol Validity 574

CONCLUSION 574REFERENCES 575

Self-report inventories (SRIs) have been a mainstay of assess-ment psychology for over seven decades. Researchers andclinicians use them frequently in a broad range of settings andapplications. These assessment devices require that the testtaker respond to a series of stimuli, the test items, by indicat-ing whether, or to what extent, they describe some aspect ofhis or her functioning. The response format varies from adichotomous “true” or “false” to a Likert scale indication ofdegree of agreement with the statement as a self-description.Some SRIs focus primarily on abnormal functioning or psy-chopathology, whereas others concentrate more on the normalpersonality range. Still others cover both normal and abnor-mal aspects of personality and psychopathology.

In addition to their relative emphasis on normal versus ab-normal personality, self-report inventories differ in several no-table ways. One important variable is their conceptual basis.Some SRIs are developed with guidance from a particular per-sonality or psychopathology theory or model, whereas othersare based more on the results of empirical analyses. In this con-text, Ben-Porath (1994) noted that personality test developershave pursued their efforts from two broad, non–mutually ex-clusive perspectives. One approach, the clinical perspective, isdesigned to produce clinically useful instruments that help de-tect psychopathology. Self-report inventory developers whofollow this approach typically are clinically trained psycholo-gists who focus on conducting applied research. On the other

hand, test developers working from the normal personalityperspective typically have backgrounds in personality or de-velopmental psychology and often seek to construct measuresof normal-range personality constructs that can serve as toolsin basic personality research.

The clinical perspective on self-report instrument develop-ment has its origins in the psychiatric medical model, inwhich psychopathology is viewed generally as typologicalin nature and measures are designed to identify membershipin distinct diagnostic classes. In contrast, the normal person-ality perspective has its origins in the field of differential psy-chology. Its focus is on personality traits, and dimensionalconstructs are used to describe meaningful differences amongindividuals. As just noted, the two perspectives are not mutu-ally exclusive. Tests developed from the clinical perspectivehave been found to be quite useful in personality research,and normal-range personality inventories are used in a varietyof clinical applications. Self-report inventories can also bedistinguished in terms of the approaches used to construct andinterpret scores on their scales, the methods used to derivestandard scores for these scales, and the availability and typesof scales and techniques designed to monitor individuals’test-taking attitude and its impact on scale scores.

This chapter first describes the history and developmentof SRIs and summarizes early criticisms of this technique.Next, current issues in SRI interpretation are described and

Page 2: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

554 Assessing Personality and Psychopathology With Self-Report Inventories

discussed. Finally, directions for future SRI research are out-lined. This chapter’s focus is on general issues related toSRI-based assessment of personality and psychopathology.Literature reviews relating to use of specific instruments havebeen provided by Craig (1999), Dorfman and Hersen (2001),Groth-Marnat (1997), Maruish, (1999), and Strack and Lorr(1994).

EARLY HISTORY

Ben-Porath and Butcher (1991) identified three primary per-sonality assessment techniques and differentiated betweenthem based on the means and sources used for data collec-tion. Behavioral observations include methods in whichpersonality is assessed by systematically recorded observa-tions of an individual’s behavior. Examples include Cattell’s(1965, 1979) T (systematic experimentation) and L (behav-ioral observation) data. Somatic examinations consist oftechniques that rely on some form of physical measurementas the basis for assessing psychological functioning. Exam-ples include various psychophysiological measures (e.g.,Keller, Hicks, & Miller, 2000). Verbal examinations rely onverbalizations (oral, written, or a combination of the two)produced by the individual being assessed or another personwho presumably knows the assessment target. Self-reportinventories, as defined earlier, are a subclass of the verbalexamination techniques. Projective assessment techniques(e.g., the Rorschach and Thematic Apperception Test, orTAT) also fall under this definition and are reviewed byViglione in his chapter in this volume.

Ben-Porath and Butcher (1991) traced the early origins ofverbal examinations to an elaborate system of competitiveexaminations (described in detail by Dubois, 1970) used forover 3000 years to select personnel for the Chinese civilservice. Candidates for government positions were tested(and retested every three years) to determine their suitabilityfor these prestigious appointments. Examinees were requiredto write essays for hours at a time, over a period of severalsuccessive days. The essays were used (among other pur-poses) to gauge the candidates’ character and fitness foroffice (DuBois, 1970).

In the modern era, Sir Francis Galton was the first to sug-gest and try out systematic procedures for measuring psycho-logical variables based on verbalizations (as well as somenovel approaches to behavioral observations). Influencedheavily by the writings of his cousin, Charles Darwin, Galtonwas interested in devising precise methods for measuring in-dividual differences in mental traits he believed were the

product of evolution. Laying the foundations for quantitativeapproaches to personality assessment, Galton wrote:

We want lists of facts, every one of which may be separately ver-ified, valued, and revalued, and the whole accurately summed. Itis the statistics of each man’s conduct in small everyday affairs,that will probably be found to give the simplest and most precisemeasure of his character. (Galton, 1884, p. 185)

Most of Galton’s efforts to elicit such information throughverbalizations focused on devising various associative tasks.The Dutch scholars Heymans and Wiersma (1906) were thefirst to devise a questionnaire for the task of personality assess-ment. They constructed a 90-item rating scale and asked some3,000 physicians to use the scale to describe people with whomthey were well acquainted. Based upon correlations they foundamong traits that were rated, Heymans and Wiersma, inessence, developed a crude, hierarchical, factor-analyticallygenerated personality model. They proposed that individualsmay be described in terms of their standing on eight lower-order traits: Amorphous, Apathetic, Nervous, Sentimental,Sanguine, Phlegmatic, Choleric, and Impassioned. These traitsconsisted, in turn, of various combinations of three higher-order traits labeled Activity, Emotionality, and Primary versusSecondary Function. This structure bears substantial similarityto Eysenck’s three-factor (Extraversion, Neuroticism, Psy-chopathy) personality model (Eysenck & Eysenck, 1975).

Hoch and Amsden (1913) and Wells (1914) providedfurther elaboration on the Heymans and Wiersma (1906)model’s utility for personality description and assessment byadding to it various psychopathology symptoms. Their work,in turn, laid the foundations for the first systematic effort todevelop a self-report personality questionnaire, Woodworth’s(1920) Personal Data Sheet. Woodworth developed the Per-sonal Data Sheet to assist in identifying psychoneurotic indi-viduals who were unfit for duty in the U.S. military duringWorld War I. This need arose because of the large number ofcombat personnel who had developed shell shock during theconflict. The questionnaire was to be used as a screening in-strument so that recruits who exceeded a certain thresholdwould be referred for follow-up examinations.

DuBois (1970) reported that Woodworth initially compiledhundreds of “neurotic” items from various sources as candi-dates for inclusion on his questionnaire. Candidate items wereselected if their content was judged to be potentially relevantto identifying neurosis. Items were phrased in question form,and test takers were instructed to answer “yes” or “no” toindicate whether each item described them accurately.Woodworth conducted a series of empirical investigations

Page 3: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Early History 555

and eliminated items answered “yes” by large numbers ofnormal individuals. The final questionnaire consisted of116 items. All were keyed such that a “yes” response was anindication of psychoneurosis. Although the Personal DataSheet was never used for the purposes for which it was con-structed—the war had ended by the time it was completed—both its items and Woodworth’s reliance (in part) on empiricalanalyses for its construction served as the cornerstones formost subsequent self-report personality inventories.

With the conclusion of World War I Woodworth aban-doned his test development efforts and refocused his atten-tion on experimental psychology. However, a number ofresearchers in the then-novel subdiscipline called personalitypsychology followed in his footsteps. Downey’s (1923)Will-Temperament tests, Travis’s (1925) Diagnostic Char-acter Test, Heidbreder’s (1926) Extraversion-Introversiontest, Thurstone’s (1930) Personality Schedule, and Allport’s(1928) Ascendance-Submission measure were among themore prominent early successors to Woodworth’s efforts.Over the next three decades, a substantial literature evaluatingthe SRI technique’s merits accumulated. Two comprehensivereviews of this methodology reflected the normal personalityand clinical perspectives on assessing personality and psy-chopathology by self-report. Both Allport (1937), adopting anormal personality perspective, and Ellis (1946), from theclinical perspective, noted SRIs’ rapid proliferation, whileexpressing concern (for somewhat different reasons) abouttheir scientific foundations.

Allport’s (1937) Critique

Allport (1937), among the originators of the field of personal-ity psychology, anticipated (correctly) that SRIs would enjoywidespread use in personality research and compared them(somewhat skeptically) with the then more established use ofbehavioral ratings as a source for quantitative personalitydata:

Though less objective than behavioral scales, standardized ques-tionnaires have the merit of sampling a much wider range ofbehavior, through the medium of the subject’s report on his cus-tomary conduct or attitudes in a wide variety of situations. Thesepaper and pencil tests are popular for a number of reasons. Forone thing, they are fun to construct and fun to take. Students findthem diverting, and teachers accordingly use them as agreeableclassroom demonstrations. Furthermore, the scores on the testscan be manipulated in diverse ways, and when the quantitativeyield of the coefficients and group differences is complete,everyone has a comforting assurance concerning the “scientific”status of personality. (p. 448)

In considering self-report personality questionnaires’ mer-its, Allport (1937) identified several limitations that remainsalient in current applications of this methodology. One wasthat “It is a fallacy to assume that all people have the samepsychological reasons for their similar responses [to self-report items]” (p. 449). Allport answered this concern byquoting Binet: “Let the items be crude if only there be enoughof them. . . . One hopes through sheer length of a series thatthe erroneous diagnoses will to a certain extent cancel oneanother, and that a trustworthy residual score will remain”(p. 449).

In describing a second major limitation of personalitytests, Allport stated:

Another severe criticism lies in the ability of the subject to fakethe test if he chooses to do so. . . . Anyone by trying can(on paper) simulate introversion, conservatism, or even happi-ness. And if he thinks he has something to gain, he is quite likelyto do so. . . . Even well intentioned subjects may fail insight orslip into systematic error or bias that vitiates the value of theiranswers. (p. 450)

Thus, Allport listed their transparent nature and susceptibilityto intentional and unintentional manipulation among SRI’smajor limitations.

In reviewing the major SRIs of his time, Allport (1937) sin-gled out the Bernreuter Personality Inventory (BPI; Bernreuter,1933). The BPI consisted of 125 items (originating fromseveral previous SRIs including the Personal Data Sheet)phrased as questions with a “yes” “no” or “?” (i.e., cannot say)response format. The items yielded scores on four commonpersonality traits, labeled Dominance, Self-Sufficiency, Intro-version, and Neuroticism. Each of the 125 items was scored onall four scales (although some were scored zero), according toempirically derived criteria. For example, if answered “?,” theitem “Do you often feel just miserable” was scored –3 on intro-version, –1 on dominance, 0 on neuroticism, and 0 on self-sufficiency. Allport (1937) questioned the logic of thisapproach and recommended instead that items be scored onsingle scales only.

Finally, Allport (1937) grappled with the question ofwhether multiscaled SRIs should be designed to measureindependent traits or constructs. Commenting on the then-budding practice of factor analyzing scores on multiscaleSRIs to derive “independent factors,” Allport noted:

Unnecessary trouble springs from assuming, as some testers do,that independent factors are to be preferred to inter-dependenttraits. What if certain scales do correlate with each other. . . .Each scale may still represent a well-conceived, measurable

Page 4: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

556 Assessing Personality and Psychopathology With Self-Report Inventories

common trait. . . . No harm is done by overlap; indeed, overlap isa reasonable expectation in view of that roughness of approxi-mation which is the very nature of the entire procedure (also inview of the tendency of certain traits to cluster). Well-consideredscales with some overlap are preferable to ill-conceived scaleswithout overlap. To seek intelligible units is a better psychologi-cal goal than to seek independent units. (p. 329)

In summary, viewing SRIs from the normal personalityperspective, Allport (1937) raised several important concernsregarding the early successors to Woodworth’s Personal DataSheet. Recognizing their simplicity of use and consequentappeal, Allport cautioned that SRIs, by necessity, distillhuman personality to common traits at the expense of a morecomplete, individually crafted personality description. Heemphasized SRIs’ tremendous vulnerability to intentionaland unintentional distortion, viewing it as an inherent featureof this methodology. He criticized the BPI’s method of scor-ing the same item on multiple scales, as well as early factoranalysts’ efforts to reduce multiscale instruments such as theBPI to a small number of independent factors. Allport (1937)offered this rather ambivalent concluding appraisal of thenascent area of personality assessment by self-report: “His-torically considered the extension of mental measurementsinto the field of personality is without doubt one of the out-standing events in American psychology during the twentiethcentury. The movement is still in its accelerating phase, andthe swift output of ingenious tests has quite outstrippedprogress in criticism and theory” (p. 455).

Ellis’s (1946) Review of Personality Questionnaires

Ellis (1946), writing from the clinical perspective, offered acomprehensive appraisal of personality questionnaires nearthe midpoint of the twentieth century. He opened his critiquewith the following generalization:

While the reliabilities of personality questionnaires have beennotoriously high, their validities have remained more question-able. Indeed some of the most widely known and used paper andpencil personality tests have been cavalierly marketed withoutany serious attempts on the part of their authors to validate themobjectively . . . no real endeavors have been made to show that,when used according to their standard directions, these instru-ments will actually do the clinical jobs they are supposed to do:meaning, that they will adequately differentiate neurotics fromnon-neurotics, introverts from extroverts, dominant from sub-missive persons, and so on. (p. 385)

Ellis’s (1946) opening comments reflected aptly theclinical perspective’s focus on classifying individuals into

dichotomous, typological categories. Ellis noted that severalauthors had preceded him in criticizing SRIs and outlined thefollowing emerging points of concern:

• Most empirical SRI studies have focused on their reliabil-ity (which has been established), while ignoring matters ofvalidity.

• SRIs do not provide a whole, organismic picture of humanbehavior. Although they may accurately portray a group ofindividuals, they are not useful in individual diagnosis.

• Some questionnaires (like the BPI) that purport to mea-sure several distinct traits are, at best, measuring the sameone under two or more names.

• Different individuals interpret the same SRI questions indifferent ways.

• Most subjects can easily falsify their answers to SRIs andfrequently choose to do so.

• SRIs’ “yes/?/no” response format may compromise thescales’ validity.

• Lack of internal consistency may invalidate a question-naire, but presence of internal consistency does not neces-sarily validate it.

• SRIs’ vocabulary range may cause misunderstandings byrespondents and thus adversely affect validity.

• Testing is an artificial procedure, which has little to dowith real-life situations.

• Some personality questionnaires are validated againstother questionnaires from which their items were largelytaken, thus rendering their validation spurious.

• Even when a respondent does his best to answer questionstruthfully, he may lack insight into his true behavior ormay unconsciously be quite a different person from thepicture of himself he draws on the test.

• Armchair (rather than empirical) construction andevaluation of test items is frequently used in personalityquestionnaires.

• Uncritical use of statistical procedures with many person-ality tests adds a spurious reality to data that were none tooaccurate in the first place.

• Many personality tests that claim to measure the sametraits (e.g., introversion-extroversion) have very low inter-correlations with each other.

• There are no statistical shortcuts to the understanding ofhuman nature; such as the ones many test users try toarrive at through involved factorial analyses.

Although generated from a notably different perspective,Ellis’s (1946) concerns overlap substantially with Allport’s

Page 5: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Early History 557

(1937) reservations. The two authors also shared consternationthat, in spite of these glaring deficiencies, SRIs had becomequite popular: “In spite of the many assaults that have beenmade against it, the paper and pencil personality test has gotalong splendidly as far as usage is concerned. For there can belittle doubt that Americans have, to date, taken more of theWoodworth-Thurstone-Bernreuter type of questionnaires thanall other kinds of personality tests combined” (Ellis, 1946,p. 388).

To explain their seemingly unfounded popularity, Ellis(1946) identified several advantages that their proponentsclaimed for SRIs:

• They are relatively easy to administer and score.

• Even if the respondent’s self-description is not taken atface value, it may itself provide some clinically meaning-ful information.

• Although scale scores may be meaningless, examinationof individual responses by experienced clinicians mayprovide valid clinical material.

• Statistical analyses had shown that the traits posited byquestionnaires were not simply the product of chancefactors.

• Normal and abnormal test takers tended to give differentanswers to SRI items.

• It does not matter if respondents answer untruthfully onpersonality questionnaires, since allowances are made forthis in standardization or scoring of the tests.

• Traditional methods of validating questionnaires by outsidecriteria are themselves faulty and invalid; hence, validationby internal consistency alone is perfectly sound.

Having outlined the prevailing pros and cons for personal-ity questionnaires (from a decidedly con-slanted perspective)Ellis (1946) proceeded to conduct a comprehensive, albeitcrude, meta-analysis of the literature on personality tests’validity, differentiating between two methods for validatingpersonality questionnaires. He dubbed one method subjec-tive and described it rather derogatorily as consisting of“checking the test against itself: that is[,] seeing whetherrespondents answer its questions in a manner showing it to beinternally consistent” (p. 390).

Ellis (1946) described the second method, labeled objec-tive personality test validation, as

checking a schedule, preferably item by item, against an outsideclinical criterion. Thus, a questionnaire may be given to a groupof normal individuals and to another group of subjects who havebeen diagnosed by competent outside observers as neurotic, or

maladjusted, or psychotic, delinquent, or introverted. Often, theclinically diagnosed group makes significantly higher neuroticscores than does the normal group, [so] the test under considera-tion is said to have been validated. (p. 390)

Ellis (1946) questioned whether the subjective methodhad any bearing on tests’ validity, stating “Internal consis-tency of a questionnaire demonstrates, at best, that it is a reli-able test of something; but that something may still have littleor no relation to the clinical diagnosis for which the test pre-sumably has been designed” (p. 391). He also found verylimited utility in the objective methods of test validation, cit-ing their sole reliance on questionable validity criteria.Nonetheless, he proceeded to review over 250 published ob-jective validation studies classified into six types based on themethod used to generate criterion validity data. Ellis soughtto quantify his findings by keeping count of the number ofpositive, negative, and questionable findings (based onwhether these were statistically significant) in each categoryof studies. Overall, he found positive results in 31 percent ofthe studies, questionable ones in 17 percent, and negativefindings in 52 percent of the publications included in his sur-vey. Ellis (1946) concluded, “Obviously, this is not a verygood record for the validity of paper and pencil personalityquestionnaires” (p. 422).

In selecting studies for inclusion in his analysis, Ellis(1946) singled one instrument out for separate treatment andanalysis, the then relatively unknown Minnesota MultiphasicPersonality Inventory (MMPI; Hathaway & McKinley, 1943).Ellis explained that, unlike the more established instrumentsincluded in his review, which were administered anony-mously by paper and pencil to groups of subjects, the MMPIwas administered individually, simulating more accurately aclinical interview. Of the fifteen MMPI studies he reviewed,Ellis reported positive results in ten studies, questionable onesin three, and negative findings in two investigations.

Ellis’s overall conclusions regarding SRIs’ validity werequite negative:

We may conclude, therefore, that judging from the validitystudies on group-administered personality questionnaires thusfar reported in the literature, there is at best one chance in twothat these tests will validly discriminate between groups ofadjusted and maladjusted individuals, and there is very little in-dication that they can be safely used to diagnose individualcases or to give valid estimations of the personality traits of spe-cific respondents. The older, more conventional, and morewidely used forms of these tests seem to be, for practical diag-nostic purposes, hardly worth the paper on which they areprinted. Among the newer questionnaires, the Minnesota Multi-phasic schedule appears to be the most promising one—perhaps

Page 6: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

558 Assessing Personality and Psychopathology With Self-Report Inventories

because it gets away from group administration which hashitherto been synonymous with personality test-giving. Moreresearch in this direction is well warranted at the present time.(1946, p. 425)

Judged with the hindsight of 55 years, Ellis’s critique ap-pears rather naïve and inherently flawed. As has been shown,Ellis himself questioned the utility of the validation methodsused in the studies he included in his analyses, noting (cor-rectly) that many, if not most, relied on questionably validcriteria. Given this limitation, these studies could not ade-quately demonstrate SRIs’ validity or invalidity. Although thetests he reviewed were indeed psychometrically inadequate,Ellis’s effort to appraise them empirically was hampered sig-nificantly by limitations in the literature he reviewed. More-over, his summary dismissal of internal consistency as havinglittle or no bearing on validity was overstated.

Nonetheless, Ellis’s review, published in the prestigiousPsychological Bulletin, had a devastating effect on SRIs’position within the budding field of clinical psychology.Dahlstrom (1992) described the widespread skepticism withwhich all SRIs were perceived for the ensuing 10 years fol-lowing this and several similar analyses. Indeed, use of tests(such as the BPI) singled out for their lack of validity waneddramatically in the years that followed. Ellis (1946) did, how-ever, anticipate correctly that the MMPI might emerge as aviable alternative to the SRIs of the first half of the twentiethcentury.

Ellis (1953) revisited this issue seven years later, in an up-dated review of personality tests’ validity. He concluded thatthere had been limited progress in developing valid personal-ity SRIs and focused his criticism on the instruments’ suscep-tibility to intentional and unintentional distortion. He wasparticularly concerned with the effects of unconscious de-fenses. Ellis (1953) again singled out the MMPI as an instru-ment whose authors had at least made an attempt to correct forthese effects on its scale scores, but he expressed skepticismabout such corrections’ success. He also observed that the ef-forts involved in correcting and properly interpreting MMPIscores might better be otherwise invested, stating, “The clini-cal psychologist who cannot, in the time it now takes a trainedworker to administer, score, and interpret a test like the MMPIaccording to the best recommendations of its authors, getmuch more pertinent, incisive, and depth-centered ‘personal-ity’ material from a straightforward interview techniquewould hardly appear to be worth his salt” (p. 48).

Curiously, Ellis (1953) saw no need to subject the pre-ferred “straightforward interview technique” to the type ofscrutiny he applied handily to SRIs. That task would be leftto Meehl (1956) in his seminal monograph comparing thevalidity of clinical and actuarial assessment techniques.

Summary of Early History

Self-report inventories emerged as an attractive but scientifi-cally limited approach to personality assessment during thefirst half of the twentieth century. Representing the normalpersonality perspective, Allport (1937) criticized these in-struments for being inherently narrow in scope and unneces-sarily divorced from any personality theory. Ellis (1946),writing from the clinical perspective, concluded that therewas little or no empirical evidence of their validity as diag-nostic instruments. Both authors identified their susceptibil-ity to intentional and unintentional distortion and the implicitassumption that test items have the same meaning to differentindividuals as major and inherent weakness of SRIs as per-sonality and psychopathology measures.

CURRENT ISSUES IN SELF-REPORTINVENTORY INTERPRETATION

In spite of their shaky beginnings, SRIs emerged during the sec-ond half of the twentieth century as the most widely used andstudied method for assessing personality and psychopathology.Modern SRI developers sought to address the limitations oftheir predecessors in a variety of ways. Various approaches toSRI scale construction are described next, followed by a reviewof current issues in SRI scale score interpretation. These includethe roles of empirical data and item content in interpretingSRI scale scores, methods used to derive standard scores forSRI interpretation, and threats to the validity of individual SRIprotocols.

Throughout this section, examples from the SRI literatureare cited, and most of these involve either the MMPI orMMPI-2. Emphasis on the MMPI/MMPI-2 reflects this in-strument’s central role in the modern literature as the mostwidely studied (Butcher & Rouse, 1996) and used (Camara,Nathan, & Puente, 2000) SRI.

Approaches to SRI Scale Construction

Burisch (1984) described three primary, non–mutually exclu-sive approaches that have been used in SRI scale construc-tion. The external approach involves using collateral (i.e.,extratest) data to identify items for an SRI scale. Here, indi-viduals are classified into known groups based on criteria thatare independent of scale scores (e.g., psychiatric diagnoses)and items are chosen based on their empirical ability to dif-ferentiate among members of different groups. The method issometimes also called empirical keying. Self-report inven-tory developers who view personality or psychopathologycategorically and seek to develop empirical methods for

Page 7: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 559

classifying individuals into predetermined categories typi-cally use the external scale construction method. Often, thesecategories correspond to diagnostic classes such as schizo-phrenia or major depression. As would be expected, scaledevelopers who rely on this approach typically assume aclinical perspective on personality assessment.

Ellis (1946) highlighted a major limitation of the externalapproach in his critique of SRIs as measures of personalityand psychopathology. That is, their validity is constrained bythe criteria that are used in their development. Absent con-sensually agreed-upon criteria for classification (a situationnot uncommon in psychological assessment, and what typi-cally motivates efforts to develop a scale to begin with),test developers must rely upon imperfect or controversialexternal criteria for subject classification, item selection, andsubsequent cross-validation. Consequently, scales developedwith this method have generally not fared well as predictorsof the class membership status that they were designed topredict. However, in some instances (e.g., the MMPI clinicalscales), subsequent (to their development) empirical researchhas guided fruitful application of externally developed scalesin ways other than those in which their developers intendedoriginally that they be used, by identifying clinically mean-ingful correlates of these scales and the patterns of scoresamong them.

Scale developers who follow the inductive approach, ac-cording to Burisch (1984), assume that there exists a basic,probably universal personality structure, which they attemptboth to discover and to measure. The approach is consideredinductive because its adherents do not set out to measure apreconceived set of traits, but instead leave it up to empiricalanalyses to reveal important personality dimensions and therelations among them. In the process, an SRI is developed tomeasure the discovered personality structure. Scale develop-ers who apply the inductive approach often adhere to a normalpersonality perspective on assessment. They typically rely onvarious forms of factor analysis, and the constructs they iden-tify characteristically are dimensional. A leading example ofan inductively derived SRI is Cattell’s 16 Personality FactorQuestionnaire (16PF; Cattell, Cattell, & Cattell, 1993). In-ductive scale development often follows an iterative processof item writing, data collection, factor analysis, and item revi-sion, followed by subsequent rounds of data collection, analy-sis, and item modification (e.g., Tellegen, 1982).

Finally, Burisch (1984) describes the deductive approach topersonality scale construction as one in which developers startwith a conceptually grounded personality model and rationallywrite or select items that are consonant with their conceptual-ization. Most early personality and psychopathology SRIdevelopers followed this approach in developing the MMPIprecursors so devastatingly criticized by Allport (1937) and

Ellis (1946). Consequently, deductive scale construction wasviewed for many years as an inferior, less sophisticated formof SRI development. Burisch argued and demonstrated thatthese seemingly less sophisticated scale development tech-niques often yield measures that compare quite favorably withproducts of external and inductive scale construction.

The three approaches to scale construction are not mutu-ally exclusive. Any combination of the three may be used inconstructing an SRI scale, or different sets of scales within thesame instrument. For example, the MMPI-2 (Butcher et al.,2001) contains three sets of scales, each initially based on adifferent one of the three approaches to scale construction—the clinical scales, originally (Hathaway, 1956; Hathaway &McKinley, 1940, 1942; McKinley & Hathaway, 1940, 1942,1944) based on the external method; the Content Scales(Butcher, Graham, Williams, & Ben-Porath, 1990), con-structed with a modified deductive approach; and the Person-ality Psychopathology Five (PSY-5; Harkness, McNulty, &Ben-Porath, 1995), the end product of an inductive researchproject (Harkness & McNulty, 1994).

Approaches to SRI Scale Score Interpretation

Two general approaches to SRI scale score interpretation canbe identified based on their sources for interpretive conclu-sions. Empirically grounded interpretations rely on empiricaldata to form the basis for ascribing meaning to SRI scalescores. Content-based interpretations are guided by SRIscales’ item content. Empirically grounded approaches haveplayed a more central role in personality and psychopathol-ogy SRIs; however, more recently, content-based interpreta-tion has gained increasing recognition and use. As will bediscussed after the two approaches are described, they are notmutually exclusive.

Empirically Grounded Interpretation

Meehl (1945) outlined the basic logic of empiricallygrounded SRI scale interpretation in his classic article “TheDynamics of ‘Structured’ Personality Inventories.” Respond-ing to early SRI critics’ contention that the instruments areinherently flawed because their interpretation is predicatedon the assumption that test takers are motivated, and able, torespond accurately to their items, he stated:

A “self-rating” constitutes an intrinsically interesting and signif-icant bit of verbal behavior, the non-test correlates of which mustbe discovered by empirical means. Not only is this approach freefrom the restriction that the subject must be able to describe hisown behavior accurately, but a careful study of structured per-sonality tests built on this basis shows that such a restriction

Page 8: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

560 Assessing Personality and Psychopathology With Self-Report Inventories

would falsify the actual relationships that hold between what aman says and what he is. (p. 297)

Thus, according to Meehl, empirical interpretation is neitherpredicated nor dependent on what the test taker says (orthinks he or she is saying) in responding to SRI items, butrather on the empirical correlates of these statements (as sum-marized in SRI scale scores).

Two subclasses can be distinguished among the empiricalapproaches to SRI scale interpretation. Scales constructedwith the external approach are expected, based on the methodused in their construction, to differentiate empirically be-tween members of the groups used in their development. Thisform of empirical interpretation, which may be termed em-pirically keyed interpretation, is predicated on the assump-tion that if members of different groups (e.g., a target groupof depressed patients and a comparison sample of nonpa-tients) answer a set of items differently, individuals who an-swer these items similarly to target group members likelybelong to that group (i.e., they are depressed). This turns outto be a problematic assumption that requires (and often failsto achieve) empirical verification. Consequently, empiricallykeyed interpretations, as defined here, are used infrequentlyin current SRI applications.

The second approach to empirical interpretation is predi-cated on post hoc statistical identification of variables that arecorrelated with SRI scale scores (i.e., their empirical corre-lates). The empirical correlate interpretation approach is in-dependent of the method used to develop a scale and may beapplied to measures constructed by any (one or combination)of the three methods just outlined. Unlike the empiricallykeyed approach, it requires no a priori assumptions regardingthe implications of one scale construction technique or an-other. All that is required are relevant extratest data regardingindividuals whose SRI scale scores are available. Statisticalanalyses are conducted to identify variables that are corre-lated empirically with SRI scale scores; these are their empir-ical correlates. For example, if a scale score is empiricallycorrelated with extratest indicators of depressive symptoms,individuals who score higher than others on that scale can bedescribed as more likely than others to display depressivesymptomatology.

Empirical correlates can guide SRI scale interpretation attwo inference levels. The example just given represents asimple, direct inference level. The empirical fact that a scalescore is correlated with an extratest depression indicator isused to gauge the depression of an individual who produces agiven score on that scale. The correlation between scale andexternal indicator represents its criterion validity, which inturn reflects the confidence level we should place in an inter-pretation based on this correlation.

Although the concept of interpreting scale scores based ontheir criterion validity represents a simple and direct inferencelevel, the process of establishing and understanding SRIs’criterion validity is complex and challenging. As alreadynoted, the absence of valid criteria often motivates scale de-velopment to begin with. In addition, as with any psychologi-cal variable, criterion measures themselves are always, tosome extent, unreliable. Consequently, validity coefficients,the observed correlations between SRI scale scores and crite-ria, always underestimate the scales’criterion validity. If a cri-terion’s reliability can be reasonably estimated, correction forattenuation due to unreliability is possible to derive a more ac-curate estimate of criterion validity. However, this is rarelydone, and it does not address limitations in criterion validitycoefficients imposed by the criterion measures’ imperfect va-lidity. Self-report inventory critics often point to rather lowcriterion validity coefficients as indications of these instru-ments’ psychometric weakness, without giving adequate con-sideration to the limitations just noted.

A second, more complex, and less direct inference level inempirical interpretation of SRI scale scores involves relianceon their construct validity. Cronbach and Meehl (1955) indi-cated that

Construct validation is involved whenever a test is to be inter-preted as a measure of some attribute or quality, which is not “op-erationally defined” . . . . When an investigator believes that nocriterion available to him is fully valid, he perforce becomes in-terested in construct validity because this is the only way to avoidthe “infinite frustration” of relating every criterion to some moreultimate standard . . . . Construct validity must be investigatedwhenever no criterion or universe of content is accepted as en-tirely adequate to define the quality to be measured. (p. 282)

Cronbach and Meehl (1955) described construct valida-tion as an ongoing process of learning (through empiricalresearch) about the nature of psychological constructs thatunderlie scale scores and using this knowledge to guide andrefine their interpretation. They defined the seemingly para-doxical bootstraps effect, whereby a test may be constructedbased on a fallible criterion and, through the process of con-struct validation, that same test winds up having greatervalidity than the criterion used in its construction. As an ex-ample, they cited the MMPI Pd scale, which was developedusing an external scale construction approach with the intentthat it be used to identify individuals with a psychopathic per-sonality. Cronbach and Meehl (1955) noted that the scaleturned out to have a limited degree of criterion validity forthis task. However, as its empirical correlates became eluci-dated through subsequent research, a construct underlyingPd scores emerged that allowed MMPI interpreters to de-scribe individuals who score high on this scale based on both

Page 9: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 561

a broad range of empirical correlates and a conceptual under-standing of the Pd construct. The latter allowed for furtherpredictions about likely Pd correlates to be made and testedempirically. These tests, in turn, broadened or sharpened(depending on the research outcome) the scope of the Pd con-struct and its empirical correlates.

Knowledge of a scale’s construct validity offers a rich,more comprehensive foundation for empirical interpretationthan does criterion validity alone. It links the assessmentprocess to theoretical conceptualizations and formulations in amanner described by Cronbach and Meehl (1955) as involvinga construct’s nomological network, “the interlocking systemof laws which constitute a theory” (p. 290). Thus, empirical re-search can enhance our understanding of (and ability to inter-pret) psychological test results by placing them in the contextof well-developed and appropriately tested theories.

Whether it is based on criterion or construct validity, orboth, empirically grounded SRI interpretation can occur at twolevels, focusing either on individual scale scores or on config-urations among them. Configural interpretation involves si-multaneous consideration of scores on more than one SRIscale. Linear interpretation involves separate, independentconsideration and interpretation of each SRI scale score.

Much of the literature on this topic involves the MMPI.The move toward configural MMPI interpretation came onthe heels of the test’s failure to meet its developers’ originalgoal, differential diagnosis of eight primary forms of psy-chopathology. Clinical experience, bolstered by findingsfrom a series of studies (e.g., Black, 1953; Guthrie, 1952;Halbower, 1955; Hathaway & Meehl, 1951) led MMPI inter-preters to conclude that robust empirical correlates for thetest were most likely to be found if individuals were classi-fied into types based on the pattern of scores they generatedon the test’s clinical scales. Based partly on this development,Meehl’s (1954) treatise on clinical versus actuarial predictionadvocated that researchers pursue a three-pronged task: First,they must identify meaningful classes within which individ-uals tend to cluster. These would replace the inadequateKraepelinian nosology that served as the target for the MMPIclinical scales’ original development. Next, investigatorswould need to devise reliable and valid ways of identifying towhich class a given individual belongs. Finally, they wouldidentify the empirical correlates of class membership.

In his subsequent call for a so-called cookbook-basedinterpretation, Meehl (1956) proposed that MMPI profilescould serve all three purposes. Patterns of scores (i.e., config-urations) on MMPI clinical scales could be used to identifyclinically meaningful and distinct types of individuals; thesescores could be used (based on a series of classification rules)to assign individuals to a specific profile type; and empiricalresearch could be conducted to elucidate the correlates of

MMPI profile type group membership. Several investigators(most notably Marks and Seeman, 1963, and Gilberstadt andDuker, 1965) followed Meehl’s call and produced suchMMPI-based classification and interpretation systems.

Underlying configural scale score interpretation is the as-sumption that there is something about a combination ofscores on SRI scales that is not captured by consideration ofeach scale score individually (i.e., a linear interpretation) andthat the whole is somehow greater than (or at least differentfrom) the sum of its parts. For example, there is somethingto be learned about an individual who generates his or hermost deviant scores on MMPI scales 1 (Hypochondriasis)and 3 (Hysteria) that is not reflected in the individual’s scoreson these scales when they are considered separately. Statisti-cally, this amounts to the expectation of an interaction amongscale scores in the prediction of relevant extratest data.

Surprisingly, the assumption that configural interpretationshould be more valid than linear approaches has not beentested extensively. Goldberg (1965) conducted the most elab-orate examination of this question to date. He found thata linear combination of scores on individual MMPI scalescores was more effective than the configural set of classifi-cation rules developed by Meehl and Dahlstrom (1960) todifferentiate between neurotic and psychotic test takers. Theimplicit assumption of an interaction among scales that makeup the configuration has yet to be extensively tested.

Content-Based Interpretation

Content-based SRI interpretation involves reliance on itemcontent to interpret scale scores. For example, if a scale’sitems contain a list of depressive symptoms, scores on thatscale are interpreted to reflect the individual’s self-reporteddepression. It is distinguished from deductive SRI scaleconstruction in that the latter involves using item content forscale development, not necessarily interpretation. Indeed,scales constructed by any of the three primary approaches (ex-ternal, inductive, or deductive) can be interpreted based ontheir item content, and SRI measures constructed deductivelycan be interpreted with an empirically grounded approach.

Content-based SRI interpretation predates empiricallygrounded approaches and was the focus of many aspects ofboth Allport’s (1937) and Ellis’s (1946) early SRI critiques.Meehl’s (1945) rationale for empirically grounded SRI scalescore interpretation was a reaction to the criticism that content-based interpretation was predicated on the dubious assump-tions that test items have the same meaning to test takers thatthey do to scale developers and that all respondents understanditems comparably and approach testing in a motivated and co-operative manner. Meehl (1945) agreed (essentially) that suchassumptions were necessary for content-based interpretation

Page 10: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

562 Assessing Personality and Psychopathology With Self-Report Inventories

and that they were unwarranted. His influential paper on thistopic left content-based interpretation in ill repute among so-phisticated SRI users for the next twenty years.

Content-based SRI interpretation began to make a come-back when Wiggins (1966) introduced a set of contentscales developed with the MMPI item pool. Using a deduc-tive scale development approach complemented by empiricalrefinement designed to maximize their internal consistency,Wiggins constructed a psychometrically sound set of 13MMPI content scales and proposed that they be used to aug-ment empirically grounded interpretation of the test’s clinicalscales. In laying out his rationale for developing a set of con-tent scales for the MMPI, Wiggins commented:

The viewpoint that a personality test protocol represents a com-munication between the subject and the tester (or the institutionhe represents) has much to commend it, not the least of which isthe likelihood that this is the frame of reference adopted by thesubject himself. (p. 2)

He went on to acknowledge that

Obviously, the respondent has some control over what hechooses to communicate, and there are a variety of factors whichmay enter to distort the message . . . . Nevertheless, recognitionof such sources of “noise” in the system should not lead us tooverlook the fact that a message is still involved. (p. 25)

Wiggins was keenly aware of the inherent limits of SRI as-sessment in general, and content-based interpretation in par-ticular. However, he argued that how an individual chooses topresent him- or herself, whatever the reasoning or motivation,provides useful information that might augment what couldbe learned from empirically grounded interpretation alone.Wiggins (1966) advocated that although we need (and, in-deed, should) not take it at face value, how a person chooses topresent him- or herself is inherently informative and that it isincumbent upon SRI interpreters to make an effort to find outwhat it was that an individual sought to communicate in re-sponding to an SRI’s items. By developing internally consis-tent SRI scales, psychologists could provide a reliable meansfor communication between test takers and interpreters.

Empirically Grounded Versus Content-BasedSRI Interpretation

Empirically grounded and content-based interpretations are notmutually exclusive. Often, scale scores intended for interpreta-tion based on one approach can also be (and are) interpretedbased on the other. For example, although the MMPI-2 clini-cal scales are interpreted primarily based on their empirical

correlates, the Harris-Lingoes subscales augment clinical scaleinterpretation by identifying content areas that may primarilybe responsible for elevation on a given clinical scale. Con-versely, although their interpretation is guided primarily byitem content, the MMPI-2 Content Scales (Butcher et al., 1990)also are interpreted based on their empirical correlates.

A primary distinction between empirically grounded andcontent-based interpretation is that the latter (as SRI criticshave long argued) is more susceptible to intentional and unin-tentional distortion. However, as is discussed later in detail,appropriate application of SRIs requires that test-taking atti-tude be measured and considered as part of the interpretationprocess. Because of its inherent susceptibility to distortion,content-based interpretation requires that an SRI be par-ticularly effective in measuring and identifying misleadingapproaches and that its users apply tools designed to do soappropriately.

Generating Scores for SRI Interpretation:Standard Score Derivation

Depending upon their response format, SRI raw scores con-sist either of a count of the number of items answered in thekeyed (true or false) direction or a sum of the respondent’sLikert scale ratings on a scale’s items. These scores have nointrinsic meaning. They are a function of arbitrary factorssuch as the number of items on a scale and the instrument’sresponse format. Raw scores typically are transformed tosome form of standard score that places an individual’s SRIscale raw score in an interpretable context. Standard scoresare typically generated by a comparison of an individual’sraw score on a scale to that of a normative reference group(s)composed of the instrument’s standardization or normativesample(s). Because of their critical role in SRI interpretation,it is important to understand how standard scores are derivedas well as the factors that determine their adequacy.

The most common standard score used with SRIs is theT score, which expresses an individual’s standing in refer-ence to the standardization sample on a metric having a meanof 50 and a standard deviation of 10. This is typically accom-plished through the following transformation:

T = RS − MRS

SDRS∗ 10 + 50,

where T is the individual’s T score, RS is his or her raw score,MRS is the standardization sample’s mean score, and SDRSis the sample’s standard deviation on a given SRI scale. AT score of 50 corresponds to the mean level for the standard-ization sample. A T score equal to 60 indicates that the per-son’s score falls one standard deviation above the normative

Page 11: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 563

mean. The choice of 50 and 10 for the standard scores’ meanand standard deviation is arbitrary but has evolved as com-mon practice in many (although not all) SRIs.

Standard scores provide information on where the respon-dent stands on the construct(s) measured by a SRI scale incomparison with a normative reference. A second importantand common application of standard scores is to allow forcomparisons across SRI scales. Given the arbitrary nature ofSRI scale raw scores, it is not possible to compare an indi-vidual’s raw scores on, for example, measures of anxiety anddepression. Transformation of raw scores to standard scoresallows the test interpreter to determine whether an individualshows greater deviation from the normative mean on onemeasure or another. Such information could be used to assistan assessor in differential diagnostic tasks or, more generally,to allow for configural test interpretation, which, as describedearlier in this chapter, involves simultaneous consideration ofmultiple SRI scale scores.

The accuracy and utility of standard scores rest heavily onthe nature and quality of the normative reference sample. Tothe extent that they aptly represent the target population,standard scores will provide an accurate gauge of the individ-ual’s standing on the construct(s) of interest and allow forcomparison of an individual’s standing across constructs and,more generally, facilitate configural SRI interpretation. Con-versely, if the normative reference scores somehow misrepre-sent the target population, the resulting standard scores willhinder all of the tasks just mentioned. Several factors must beconsidered in determining whether a normative referencesample represents the target population appropriately. Theseinvolve various types and effects of normative samplingproblems.

Types and Effects of Normative Sampling Problems

Identifying potential problems with standard scores can beaccomplished by considering the general formula for trans-forming raw score to standard scores:

SS = RS − MRS

SDRS∗ NewSD + NewMean,

where SS is the individual’s standard score, RS is his or herraw score, MRS is the standardization sample’s mean score,SDRS is the sample’s standard deviation on a given SRI scale,NewSD is the target standard deviation for the standard scores,and NewMean is the target mean for these scores. As dis-cussed earlier, the target mean and standard deviations arearbitrary, but common practice in SRI scale development is touse T scores that have a mean of 50 and a standard deviation of10. An important consideration in evaluating standard scores’

adequacy is the extent to which the normative reference (orstandardization) sample appropriately represents the popula-tion’s mean and standard deviation on a given SRI scale.

Examination of the general transformation formula showsthat if a normative sample’s mean (MRS) is higher than theactual population mean, the resulting standard score will un-derestimate the individual’s standing in reference to the nor-mative population. Consider a hypothetical example in whichT scores are used, the individual’s raw score equals 10, thenormative sample’s mean equals 12, the actual populationmean equals 8, and both the sample and population standarddeviations equal 5. Applying the T score transformation for-mula provided earlier,

T = 10 − 12

5∗ 10 + 50 = 46

we find that this normative sample yields a T score of 46,suggesting that the individual’s raw score falls nearly half astandard deviation below the normative mean on this con-struct. However, had the sample mean reflected accuratelythe population mean, applying the T score transformationformula

T = 10 − 8

5∗ 10 + 50 = 54

would have yielded a T score of 54, indicating that the indi-vidual’s raw score falls nearly half a standard deviation abovethe population mean. Larger discrepancies between sampleand population mean would, of course, result in even greaterunderestimates of the individual’s relative standing on theconstruct(s) of interest. Conversely, to the extent that thesample mean underestimates the population mean, the result-ing standard scores will overestimate the individual’s relativeposition on a given scale.

Asecond factor that could result in systematic inaccuraciesin standard scores is sampling error in the standard deviation.To the extent that the normative sample’s standard deviationunderestimates the population standard deviation, the result-ing standard score will overestimate the individual’s relativestanding on the scale. As we apply again the T score transfor-mation formula, consider an example in which the individual’sraw score equals 10, the sample and population means bothequal 8, and the sample standard deviation equals 2, but thepopulation standard deviation actually equals 5. Applying theformula based on the sample data

T = 10 − 8

2∗ 10 + 50 = 60

yields a T score of 60, indicating that the individual’sscore falls one standard deviation above the normative mean.

Page 12: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

564 Assessing Personality and Psychopathology With Self-Report Inventories

However, an accurate estimate of the population standarddeviation

T = 10 − 8

5∗ 10 + 50 = 54

would have produced a T score of 54, reflecting a score thatfalls just under half a standard deviation above the normativemean. Here, too, a larger discrepancy between the sample andpopulation standard deviation results in an even greater over-estimation of the individual’s standing on the construct(s) ofinterest, and, conversely, an overestimation of the popula-tion’s standard deviation would result in an underestimationof the individual’s relative score on a given measure.

Causes of Normative Sample Inadequacies

In light of their importance in determining standard scores’adequacy, it is essential to identify (and thus try to avoid) rea-sons why standardization samples may inaccurately estimatea target population’s mean or standard deviation on an SRIscale. Three general types of problems may generate inaccu-rate normative means and standard deviations: samplingproblems, population changes, and application changes. Twotypes of sampling problems error may occur. The simplestamong these is random sampling error, in which, as a resultof random factors associated with the sampling process,the normative sample mean or standard deviation fails to rep-resent accurately the relevant population statistics. This canbe minimized effectively by collecting sufficiently large nor-mative samples.

Systematic sampling errors occur when, due to specificsampling flaws, the sample mean or standard deviation re-flects inaccurately the relevant population statistics. In suchcases a normative sample fails to represent accurately one ormore segments of the target population as a result of samplingbias. This will negatively affect the normative sample’s ade-quacy if two conditions are met: (1) a sample fails to representaccurately a certain population segment, and (2) the inade-quately represented population segment differs systemati-cally (in its mean, its standard deviation, or both) from theremaining population on a particular SRI scale. For example,if as a consequence of the sampling method used youngeradults are underrepresented in a normative sample that isdesigned to represent the entire adult population, and youngeradults differ systematically from the remaining adult popula-tion on the scale being standardized, this could result in biasedestimates of both the population mean and its standard devia-tion. This might occur with a scale designed to measuredepression, a variable that tends to vary as a function of age.

If younger adults are represented inadequately in a normativesample (this could occur if the sampling process failed to in-corporate college students and military personnel) used todevelop standard scores on a depression scale, the normativesample would overestimate the population mean on thescale and underestimate its standard deviation, resulting in theeffects discussed previously.

Note that in order for systematic sampling error to affectscale norms, both conditions just specified must be met. Thatis, a population segment must be misrepresented and this seg-ment must differ systematically from the remaining popula-tion on the scale being standardized. If only the first conditionis met, but the misrepresented segment does not differ sys-tematically from the remaining population, this will not resultin biased estimates of the population mean and standard de-viation. Such a scenario occurred with the updated normativesample used to standardize the MMPI-2 (Butcher, Dahlstrom,Graham, Tellegen, & Kaemmer, 1989). Data included in theMMPI-2 manual indicated that the new normative samplediffered substantially from the general adult population ineducation. Specifically, the normative sample significantlyunderrepresented individuals with lower levels of educationand overrepresented people with higher levels of education inthe general adult population. Some authors (e.g., Duckworth,1991) expressed concern that this may introduce system-atic bias in the updated norms. However, subsequent analy-ses demonstrated that this sampling bias had no significantimpact on resulting test norms because education is not cor-related substantially with MMPI scale scores (Schinka &LaLone, 1997).

Population changes are a second reason why normativesamples may inadequately represent their target population.These occur when, over the course of time, the target popula-tion changes on the construct that a scale measures. For ex-ample, Anastasi (1985), in a review of longitudinal researchon intelligence, found a trend for population-wide increases inintelligence over the first half of the twentieth century. Thesewere the result primarily of increases in population educationlevels in general and literacy levels in particular. To accountfor these changes’effects on their norms, it has been necessaryfor intelligence test developers to periodically collect newnormative data. To the extent that constructs measured bySRIs are affected similarly by population changes, it becomesnecessary to update their normative databases as well. Thiswas one of the considerations that led to the development ofnew norms for the MMPI (Butcher et al., 1989).

Application changes are a third reason why normativesamples may misrepresent target populations. Two types ofapplication changes can be distinguished. Changes in admin-istration practices may affect normative data adequacy. For

Page 13: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 565

example, the original MMPI normative data were collectedusing the test’s so-called Box Form. Each of the test’s itemswas typed on a separate card and test takers were instructedto sort the cards (which were presented individually in a ran-dom order) into three boxes representing a “true,” “false,” or“cannot say” response. The instructions given to the originalnormative sample did not discourage the “cannot say” option.However, after the normative data were collected, an admin-istration change was introduced and test takers were in-structed to “be sure to put less than 10 cards behind the‘cannot say’.” (Dahlstrom, Welsh, & Dahlstrom, 1972, p.32). Later still, the Box Form was largely superseded by theMMPI Group Form, a booklet that presented the test’s itemsin a fixed order and required that the test taker record his orher responses (true, false, or cannot say) on an answer sheet.Here, too, test takers were admonished to attempt to answerall of the test items.

To the extent that either of these changes in administrationpractices affected individuals’ responses to the test items, thiscould have resulted in the original normative sample data’smisrepresenting population statistics under the revised admin-istration procedures. In fact, the updated MMPI-2 norms aresignificantly different from the original norms in both theirmeans and standard deviations, and to some extent these shiftsare a product of the administration changes just described. Forexample, as a result of the change in instructions regarding the“cannot say” option, the new normative sample membersomitted far fewer items than did their original counterparts,which in turn probably contributed to the new sample’s highermean raw scores on many of the test’s original scales. In otherwords, the original normative sample underestimated the tar-get population’s mean raw scores on the MMPI scales giventhe shift in administration procedure, contributing partly to theartificially elevated T scores generated by individuals andgroups tested with the original MMPI when they were trans-formed to standard scores based on the original test norms.

A more recent change in SRI administration practices fol-lowed the introduction of computer technology. Althoughmost SRI norms were collected using booklet forms, soft-ware is now available to administer most tests by computer.Such a change in administration practice could also, poten-tially, affect these instruments’ norms’ adequacy if the differ-ent administration format resulted in a systematic change inresponses to SRI items. Reassuringly, a recent meta-analysisby Finger and Ones (1999) demonstrated that computerizedtest administration does not affect group means or standarddeviations (and thus would have no negative impact on thetest’s norms) on MMPI/MMPI-2 scales. Butcher, in his chap-ter in this volume, provides further discussion of computerapplications in psychological assessment.

A second type of application change that could potentiallyaffect norms’ adequacy involves expansion of the target popu-lation. When an SRI developed for use with a rather narrowlydefined population is considered for application to a broaderpopulation, the possibility that its norms will no longer accu-rately reflect the expanded population’s means and standarddeviations on its scales needs to be considered. For example,the MMPI was developed originally for use at the University ofMinnesota Hospital, and its normative sample, made up pri-marily of a group of Caucasian farmers and laborers with anaverage of eight years of education, represented fairly well thistarget population. As the test’s use expanded to the broaderU.S. population, concerns were raised (e.g., Gynther, 1972)about the MMPI norms’ adequacy for interpreting scores gen-erated by minorities, primarily African Americans, who werenot included in the original normative sample.

The effects of expanding an SRI’s population on its norma-tive sample’s adequacy depend upon the new population seg-ment’s performance on its scales. To the extent the newsegment differs systematically from the original on a scale’smean or standard deviation, this would necessitate an expan-sion of the instrument’s normative sample to reflect moreaccurately the expanded population’s scale parameters. Thiswas one of the primary considerations that led to the collectionof new normative data for the MMPI and publication of theMMPI-2 (Butcher et al., 1989). Similarly, as the test’s use hasexpanded beyond the United States to other countries, cultures,and languages, researchers throughout the world have col-lected new normative data for MMPI and later MMPI-2 appli-cation in an ever-increasing number of countries (c.f., Butcher,1996; Butcher & Pancheri, 1976).

The effects of expanding an SRI’s target population onnormative data adequacy should not be confused with ques-tions about an instrument’s validity across population seg-ments, although frequently these very separate concerns areconfounded in the literature. Ensuring that various populationsegments are represented adequately in an SRI’s normativesample is not sufficient to guarantee that the test is as valid anindicator of its target psychological constructs in the new seg-ment as it was in the original. To the extent that an instru-ment’s interpretation is predicated on an SRI’s empiricalcorrelates, its construct validity, or the combination of the two(as discussed earlier), its application to the expanded popula-tion is predicated on the assumption that these test attributesapply comparably to the new population segment.

General Population Versus Population Subsegment Norms

A final consideration in evaluating normative data adequacyis whether an SRI’s standard scores are derived from general

Page 14: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

566 Assessing Personality and Psychopathology With Self-Report Inventories

or more narrowly defined and specific normative samples.When a general population normative sample is used, thesame set of standard scores is applied regardless of the assess-ment setting or the individual’s membership in any specificpopulation subsegments. Thus, for example, the MMPI-2has just one set of standard scores generated based on a nor-mative sample designed to represent the general U.S. popula-tion. The same set of standard scores is used regardless ofwhere the test is applied.

A more recently developed SRI, the Personality Assess-ment Inventory (PAI; Morey, 1991) provides two sets ofnorms for its scales, based on a sample of community-dwelling adults and a clinical sample. Morey (1991) explainsthat clinical norms are designed to assist the interpreter intasks such as diagnosis:

For example, nearly all patients report depression at their initialevaluation; the question confronting the clinician considering adiagnosis of major depression is one of relative severity of symp-tomatology. That a patient’s score on the PAI DEP [Depression]scale is elevated in comparison to the standardization sample isof value, but a comparison of the elevation relative to a clinicalpopulation may be more critical in formulating diagnostichypotheses. (p. 11)

The ability to know how an individual compares with othersknown to have significant psychological problems may in-deed contribute useful information to test interpretation. How-ever, this particular approach to generating such informationhas a significant drawback. If, using Morey’s example, nearlyall members of a clinical reference sample report depressionwhen their normative data are collected, then a typical patientexperiencing significant problems with depression will pro-duce a nondeviant score on the instrument’s clinically re-ferenced depression measure, thus obscuring depression’sprominence in the presenting clinical picture.

A similar problem results when SRI scales are normedbased on other narrowly defined, setting-specific populationsegments. For example, Roberts, Thompson, and Johnson(1999) developed several additional sets of PAI norms for usein assessing applicants for public safety positions. The addi-tional reference samples were all made up of public safety jobapplicants. A feature common among individuals undergoingevaluations for possible employment in public safety posi-tions is the tendency to deny or minimize any behavioral andemotional problems that they believe may cast them in a neg-ative light and reduce the likelihood that they will be offeredthe position they seek. As a result, most individuals testedunder these circumstances tend to score higher than the gen-eral population on defensive test-taking measures. Deviantscores on defensiveness scales alert the interpreter that the test

taker is probably minimizing or denying such problems. How-ever, when compared with other individuals tested under sim-ilar circumstances, public safety position applicants producenondeviant scores on defensiveness measures when they arein fact approaching the assessment with a defensive attitude.Here, too, narrowly defined norms may obscure an importantfeature (defensiveness) of a test taker.

Threats to SRI Protocol Validity

The impact of test-taking approaches on SRIs has long beenthe focus of heated debate. As reviewed earlier in this chapter,early SRI critics (e.g., Allport, 1937; Ellis, 1946) cited theirvulnerability to intentional and unintentional distortion bythe test taker as SRIs’ primary, inherent limitation. The basicconcern here is that, even if he or she is responding to a psy-chometrically sound SRI, an individual test taker may, for avariety of reasons, approach the assessment in a manner thatcompromises the instrument’s ability to gauge accurately hisor her standing on the construct(s) of interest. In such cases, apsychometrically valid test may yield invalid results.

Use of the term validity to refer to both a test’s psychome-tric properties and an individual’s test scores can be confus-ing. A distinction should be drawn between instrumentvalidity and protocol validity. Instrument validity refers to atest’s psychometric properties and is typically characterizedin terms of content, criterion, and construct validity. Protocolvalidity refers to the results of an individual test administra-tion. Use of the term validity to refer to these two very differ-ent aspects of SRI assessment is unfortunate, but sufficientlywell grounded in practice that introduction of new terminol-ogy at this point is unlikely to succeed.

A need to distinguish between psychometric and protocolvalidity has been highlighted in a debate regarding the widelystudied NEO Personality Inventory-Revised (NEO-PI-R).Responding to suggestions by Costa and McCrae (1992a; theNEO-PI-R developers) that practioners use this test in clini-cal assessment, Ben-Porath and Waller (1992) expressed theconcern (among others) that the absence of protocol validityindicators on the NEO-PI-R may limit the instrument’s clini-cal utility. Costa and McCrae (1992b) responded that validityscales were unnecessary, in part because evidence has shownthat test scores may be psychometrically valid even in in-stances in which validity indicators showed evidence of lim-ited protocol validity.

Most recently, Piedmont, McCrae, Riemann, andAngleitner(2000) sought to demonstrate this point by showing that scoreson an SRI’s validity scales (designed to assess protocol validity)were unrelated to the NEO-PI-R’s psychometric validity. How-ever, their analyses were based on data generated by research

Page 15: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 567

volunteers who completed the instruments anonymously. Thus,unlike respondents in most clinical assessment settings, theseresearch volunteers had nothing at stake when responding to theNEO-PI-R. In contrast, as reviewed next, test takers in clinicalsettings may be motivated by various factors to present them-selves in a particular manner. Moreover, psychometric validityin this and similar studies was established based on statisticalanalyses of group data, whereas protocol validity pertains to in-dividual test results. If, for example, one of the participants insuch a study marked his or her answer sheet randomly, withoutactually reading the SRI items, his or her resulting scale scoresare completely invalid and uninterpretable, regardless of howothers in the sample responded.

Consideration of protocol validity is one aspect of SRI-based assessment in which users are able to take an individ-ualized perspective on a generally normative enterprise.Allport (1937) distinguished between idiographic (individual-ized) and nomothetic (generalized) approaches to personalityresearch and assessment. Drawing an analogy to the diagnos-tic process in medicine, he noted that the two approaches arenot mutually exclusive. Rather, a combined idiographic-nomothetic approach is likely to yield the optimal perspectiveon diagnosis and assessment. Consideration of protocol valid-ity offers an important window into idiographic aspects ofSRI-based assessment.

In sum, instrument validity is necessary but insufficient toguarantee protocol validity. Although it sets the upper limiton protocol validity, information regarding instrument valid-ity does not address a critical question that is at issue in everyclinical assessment: Is there anything about an individual’sapproach to a particular assessment that might compromiseits user’s ability to interpret an SRI’s scores? To answer thisquestion, users must be aware of various threats to protocolvalidity.

Types of Threats to Protocol Validity

Threats to SRI protocol validity need to be considered in eachSRI application because of their potential to distort the re-sulting test scores. This information can be used in two im-portant ways. First, knowledge of threats to protocol validitymakes it possible for test users to attempt to prevent or mini-mize their occurrence. Second, it makes it possible to antici-pate invalid responding’s potential impact on the resultingtest scores and, on the basis of this information, provide ap-propriate caveats in test interpretation. Such statements mayrange from a call for caution in assuming that an interpreta-tion will likely reflect accurately the individual’s standingon the construct(s) of interest to an unambiguous declarationthat protocol validity has been compromised to a degree that

makes it impossible to draw any valid inferences about thetest taker from the resulting SRI scale scores.

Threats to protocol validity fall broadly into two cate-gories that reflect test item content’s role in the invalid re-sponding. Important distinctions can be made within each ofthese categories as well. Table 24.1 provides a list of the var-ious non-content- and content-based threats to protocol valid-ity identified in this chapter.

Non-Content-Based Invalid Responding. Non-content-based invalid responding occurs when the test taker’s answersto an SRI are not based on an accurate reading, processing, andcomprehension of the test items. Its deleterious effects on pro-tocol validity are obvious: To the extent that a test taker’s re-sponses do not reflect his or her actual reactions to an SRI’sitems, then those responses cannot possibly gauge the individ-ual’s standing on the construct of interest. This invalidatingtest-taking approach can be divided further into three modes:nonresponding, random responding, and fixed responding.

Nonresponding occurs when the test taker fails to providea usable response to an SRI item. Typically, this takes theform of failing to provide any response to an SRI item, but itmay also occur if the test taker provides more than one re-sponse to an item. Nonresponding may occur for a variety ofreasons. Test takers who are uncooperative or defensive mayfail to respond to a large number of an SRI’s items. Less in-sidious reasons why individuals may fail to respond appropri-ately to a SRI may include an inability to read or understandits items, cognitive functioning deficits that result in confu-sion or obsessiveness, or limits in the test taker’s capacity forintrospection and insight.

Nonresponding’s effect on protocol validity depends, inpart, on the SRI’s response format. In tests that use a “true”“false” response format, a nonresponse is treated typically asa response in the nonkeyed direction. In SRIs with a Likertscale response format, a nonresponse typically receives the

TABLE 24.1 Threats to Self-Report Inventory Protocol Validity

Non-content-based invalid respondingNonrespondingRandom responding

Intentional random respondingUnintentional random responding

Fixed respondingContent-based invalid responding

OverreportingIntentional overreporting

Exaggeration versus fabricationUnintentional overreporting (negative emotionality)

UnderreportingIntentional underreporting

Minimization versus denialUnintentional underreporting (social desirability)

Page 16: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

568 Assessing Personality and Psychopathology With Self-Report Inventories

value zero. These ipso facto scores can by no means be as-sumed to provide a reasonable approximation of how the re-spondent would have answered had he or she chosen or beenable to do so. Therefore, to the extent that nonrespondingoccurs in a given SRI protocol, this will distort the resultingtest scores. For example, in a true/false response format arespondent’s failure to respond appropriately to a large num-ber of items will result in artificial deflation of his or her scoreson the instrument’s scales, which, if not identified and consid-ered in scale score interpretation, may result in underestima-tion of the individual’s standing on the constructs measured bythe affected scales.

Random responding is a test-taking approach character-ized by an unsystematic response pattern that is not based onan accurate reading, processing, and understanding of anSRI’s items. It is not a dichotomous phenomenon, meaningthat random responding may be present to varying degreesin a given test protocol. Two types of random respondingcan be distinguished. Intentional random responding occurswhen the individual has the capacity to respond relevantly toan SRI’s items but chooses instead to respond irrelevantly inan unsystematic manner. An uncooperative test taker who isunwilling to participate meaningfully in an assessment mayengage in intentional random responding rather than becom-ing embroiled in a confrontation with the examiner over hisor her refusal to participate. In this example, the test takerprovides answers to an SRI’s items without pausing to readand consider them. He or she may do this throughout the testprotocol or at various points along the way in responding toan SRI’s items.

Unintentional random responding occurs when the individ-ual lacks the capacity to respond relevantly to an SRI’s items,but, rather than refraining from giving any response to theitems, he or she responds without having an accurate under-standing of the test items. Often these individuals are not awarethat they lack this capacity and have failed to understand andrespond relevantly to an SRI’s items.

Several factors may lead to unintentional random respond-ing. Reading difficulties may compromise the test taker’s abil-ity to respond relevantly to an SRI’s items. Most current SRIsrequire anywhere from a fourth- to a sixth-grade reading levelfor the test taker to be able to read, comprehend, and respondrelevantly to the items. Regrettably, this is not synonymouswith having completed four to six years of education. Somehigh school graduates cannot read at the fourth grade level. Ifthe examiner has doubts about a test taker’s reading ability, astandardized reading test should be administered to determinehis or her reading level. For individuals who do not havethe requisite reading skills, it may still be possible to adminis-ter the test if the problem is strictly one of literacy rather than

language comprehension. In such cases, an SRI’s items canbe administered orally, preferably using standard stimulusmaterials such as an audiotaped reading of the test items.

Comprehension deficits can also lead to random respond-ing. In this case the individual may actually be able toread the test items but does not have the necessary lan-guage comprehension skills to process and understand them.This could be a product of low verbal abilities. In other in-stances, comprehension deficits may be found in those lack-ing familiarity with English language nuances, for example,among individuals for whom English is not their primarylanguage.

Unintentional random responding can also result fromconfusion and thought disorganization. In some instances,these types of difficulties may have prompted the assessmentand SRI administration. Whereas reading and comprehensiondifficulties tend to be relatively stable test-taker characteris-tics that will probably compromise protocol validity regard-less of when an SRI is administered, confusion and thoughtdisorganization are often (although not always) transitoryconditions. If and when the individual’s sensorium clears, sheor he may be able to retake an SRI and provide valid re-sponses to its items.

Finally, random responding may result from responserecording errors. Many SRIs are administered by having therespondent read a set of items from a booklet and recordthe responses on a separate answer sheet. If the respondentmarks his or her answer to an SRI’s items in the wrong loca-tion on an answer sheet, he or she is essentially providingrandom responses. This could result from the test taker’smissing just one item on the answer sheet or from an overallcareless approach to response recording.

Fixed responding is a non-content-based invalidating test-taking approach characterized by a systematic response patternthat is not based on an accurate reading, processing, and under-standing of an SRI’s items. In contrast to random responding,here the test taker provides the same non-content- based re-sponses to SRI items. If responding to a true/false format SRI,the test taker indiscriminately marks many of the test items ei-ther “true” or “false.” Note that if the test taker provides both“true” and “false” responses indiscriminately, then he or she isengaging in random responding. In fixed responding the indis-criminant responses are predominantly either “true” or “false.”In fixed responding on a Likert scale, the test taker marks itemsat the same level on the Likert rating scale without properlyconsidering their content. Like nonresponding and random re-sponding, fixed responding is a matter of degree rather than adichotomous all-or-none phenomenon.

Unlike nonresponding and random responding, fixedresponding has received a great deal of attention in the

Page 17: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 569

SRI-based assessment literature. Jackson and Messick (1962)sparked this discussion when they proposed that much (if notall) of the variance in MMPI scale scores was attributable totwo response styles, termed acquiescence and social desir-ability. Acquiescence was defined as a tendency to respond“true” to MMPI items without consideration of their content.This type of non-content-based responding is labeled fixedresponding in this chapter.

A detailed examination of Jackson and Messick’s argu-ments and the data they analyzed in its support is beyond thescope of this chapter. Essentially, Jackson and Messick factoranalyzed MMPI scale scores in a broad range of samples andfound recurrently that two factors accounted for much of thevariance in these scores. They attributed variance on thesefactors to two response styles, acquiescence and social desir-ability, and cautioned that MMPI scale scores appear primar-ily to reflect individual differences on these nonsubstantivedimensions. They suggested that MMPI scales were par-ticularly vulnerable to the effects of acquiescence and itscounterpart, counteracquiescence (a tendency to respond“false” to self-report items without consideration of theircontent), because their scoring keys were unbalanced. Thatis, for some MMPI scales many, if not most, of the items werekeyed “true,” whereas on other scales most of the items werekeyed “false.”

In an extensive and sophisticated series of analyses, Block(1965) demonstrated that the two primary MMPI factorsreflected substantive personality dimensions rather than styl-istic response tendencies. With regard specifically to acqui-escence, he showed that completely balanced MMPI scales(i.e., ones with equal numbers of “true” and “false” keyeditems) yielded the same factor structure that Jackson andMessick (1962) attributed to the effect of response styles. Heshowed further that the so-called acquiescence factor wascorrelated with substantive aspects of personality function-ing. Block (1965) labeled this factor ego control and demon-strated that its association with extratest data was unchangedas a function of whether it was measured with balanced orunbalanced scales.

It is important to note that Block’s analyses did not indi-cate that acquiescence is never a problem in SRI-based as-sessment. In the relatively rare instances when they occur,acquiescence and counteracquiescence can indeed jeopardizeprotocol validity. In the most extreme case of acquiescence, ifa respondent answers “true” to all of a scale’s items withoutreference to their content, his or her score on that scale is ob-viously invalid. In addition, use of a Likert scale format doesnot obviate the potential effects of this response style, be-cause with this format, as well, it is possible for test takers toprovide a fixed response that is independent of item content.

Block’s compelling demonstration notwithstanding,Jackson and Messick and their followers continued to advo-cate the response style position and argue that acquiescencerepresented a serious challenge to MMPI use and interpreta-tion. Most recently, Helmes and Reddon (1993) revisited thisissue and criticized the MMPI and MMPI-2 (among otherthings) for their continued susceptibility to the effects ofacquiescence. These authors again identified the test’s unbal-anced scoring keys as a primary reason for its susceptibilityto acquiescence. In constructing his own SRI, the Basic Per-sonality Inventory (BPI), Jackson (1989) indeed adopted thebalanced scoring key solution for its scales, each of which ismade up of 20 items, half keyed “true” and the others keyed“false.” However, balanced scoring keys actually provide noprotection whatsoever against the protocol invalidating ef-fects of fixed responding. Consider the hypothetical examplejust mentioned, in which a test taker responds “true” to all20 BPI scale items without actually referring to their content.The only effect a balanced key might have in this instancemight be to instill a false sense of security in the test inter-preter that the scale is not susceptible to the protocol invali-dating effects of acquiescence, when, in fact, it is.

In summary, although fixed responding does not pose asbroad a threat to protocol validity as Jackson and Messickwould argue, in cases in which a test taker uses this responsestyle extensively, the resulting SRI scale scores will be in-valid and uninterpretable. Constructing scales with balancedkeys or Likert scale response formats does not make an SRIless susceptible to this threat to protocol validity. Self-reportinventory users need to determine in each instance that a testis used whether, to what extent, and with what impact fixedresponding may have compromised protocol validity. Thisrequires that the SRIs include measures of fixed responding.

Content-Based Invalid Responding. Content-based in-valid responding occurs when the test taker skews his or heranswers to SRI items and, as a result, creates a misleading im-pression. This test-taking approach falls broadly into twoclasses that have been discussed under various labels in the lit-erature. The first of these has been termed alternatively over-reporting, faking bad, and malingering. The second type ofcontent-based invalid responding has been labeled underre-porting, faking good, and positive malingering. In this chap-ter, they will be discussed under the more neutral labels ofover- and underreporting.

Overreporting occurs when, in responding to an SRI, a testtaker describes him- or herself as having more serious difficul-ties, a greater number of them, or both than he or she actuallyhas. Underlying this definition is the hypothetical notion that ifa completely objective measure of psychological functioning

Page 18: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

570 Assessing Personality and Psychopathology With Self-Report Inventories

was available, the overreporter’s subjective self-report wouldindicate greater dysfunction than does the objective indicator.Two non–mutually exclusive types of overreporting can be dis-tinguished. Intentional overreporting occurs when the individ-ual knowingly skews his or her self-report. This test taker istypically motivated by some instrumental gain and thus fits theDSM-IV definition of malingering (APA, 2000). The labelfaking bad also carries with it a connotation of volitionaldistortion and similarly falls under the category of intentionaloverreporting.

It is important to note that intentional overreporting is notin itself an indication that psychopathology is absent. That is tosay, if an individual intentionally overreports in responding toa SRI, that, in itself, does not indicate that he or she is actuallyfree of bona fide psychological dysfunction. It is, in fact, pos-sible for someone who has genuine psychological difficultiesto amplify their extent or significance when responding to SRIitems. On the other hand, some people who intentionally over-report in response to an SRI actually have no problems. Thedistinction here is between exaggeration and fabrication ofdifficulties. Both forms of intentional overreporting fall underthe DSM-IV definition of malingering: “the intentional pro-duction of false or grossly exaggerated physical or psycholog-ical symptoms, motivated by external incentives such asavoiding military duty, avoiding work, obtaining financialcompensation, evading criminal prosecution, or obtainingdrugs” (APA, 2000, p. 739). In practice, distinguishing be-tween exaggeration and fabrication in SRI protocol validitydetermination is quite challenging.

In unintentional overreporting, the test taker is unawarethat she or he is deviating from a hypothetically objectiveself-description and describing her- or himself in an overlynegative manner. Here, it is the test taker’s self-concept thatis skewed. Individuals who engage in this test-taking ap-proach believe mistakenly that they are providing an accurateself-description when in fact they are overreporting theirdifficulties.

Tellegen (1985) has described a primary personality trait,negative emotionality, which predisposes individuals to per-ceive their environment as more threatening than it is in reality,and themselves as having greater subjective difficulty func-tioning than they actually have. Individuals high in negativeemotionality do indeed experience psychological dysfunction;however, they overestimate, and as a result, overreport its ex-tent and significance. As a consequence, they produce deviantscores on SRIs that confound genuine with unintentionallyoverreported dysfunction.

Demonstrating and evaluating the extent of the confoundbetween genuine and unintentionally overreported psycholog-ical dysfunction is quite challenging because of the inherent

difficulty in obtaining objective indicators of functioning. Justabout any effort to derive an objective measure of psychologi-cal functioning relies, at least to some extent, on self-report orself-presentation. Structured diagnostic interviews and eveninformant reports are influenced by how an individual re-sponds to specific interview questions (asked in person by aninterviewer rather than impersonally by a questionnaire) orthe impression a person creates on others who are asked todescribe her or his psychological functioning.

Watson and Pennebaker (1989) provided a compellingillustration of this phenomenon by focusing on the role nega-tive emotionality plays in assessing physical functioning.Unlike psychological functioning, in assessing physicalhealth it is possible to obtain objective indicators of dysfunc-tion that are independent of self-report. These investigatorsexamined the relation between self-reported negative emo-tionality, self-reported health complaints, and objectivelyderived physical functioning indicators (e.g., fitness andlifestyle variables; frequency of illness; health-related visitsor absences; objective evidence of risk, dysfunction, orpathology; and overall mortality). They found a consistentcorrelation between negative emotionality and self-reportedhealth problems, but little or no correlation between self-reported negative emotionality and objective health indica-tors. The unintentional overreporting associated with negativeemotionality accounted almost entirely for its relation withphysical health complaints, leading the investigators to con-clude that there was little or no association between thisconstruct and actual physical health.

Negative emotionality’s role in assessing mental healthand personality functioning is more complex. People highin negative emotionality are genuinely psychologically dis-tressed, and their difficulties are often manifested in multipleareas of psychological dysfunction. In diagnostic terms, thisresults in substantial levels of psychopathology comorbidity.Mineka, Watson, and Clark (1998) reported, for example, thatanxiety and mood disorders have approximately a 50% rateof co-occurrence. Similar levels of comorbidity have beenreported among other Axis I diagnostic categories, amongAxis II diagnoses, and across Axis I and Axis II. Althoughdiagnostic comorbidity is real, unintentional overreportingassociated with negative emotionality probably inflates esti-mates of its extent. In SRI measures of personality and psy-chopathology, this inflation has the effect of yielding deviantscores on multiple scales. It also results in phenotypic corre-lations among SRI scales that overestimate the actual correla-tions among the latent constructs they are designed tomeasure. When correlations among SRI scales are factor ana-lyzed, they yield typically one very strong general factor thatrepresents both the genuine psychological sequela of negative

Page 19: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Current Issues in Self-Report Inventory Interpretation 571

emotionality (i.e., true phenotypic comorbidity) and the con-founding effects of unintentional overreporting.

In summary, overreporting in response to SRI items re-sults in scale scores that overestimate the extent or signifi-cance of psychological problems the respondent experiences.If overreporting is suspected, the test interpreter is confrontedwith the challenge of determining whether, and to what ex-tent, it might involve intentional distortion versus manifesta-tions of negative emotionality and, if it is intentional, whetherit involves fabrication or exaggeration of problems. More-over, these threats to protocol validity are not mutually ex-clusive, and the interpreter needs to consider the possibilitythat some or all may be manifested in a given protocol.

Underreporting occurs when in responding to an SRI a testtaker describes him- or herself as having less serious difficul-ties, a smaller number of difficulties, or both than he or sheactually has. To refer back to the hypothetical objective func-tioning indicator, in underreporting the individual’s self-report reflects better functioning than would be indicated byan objective assessment. Here, too, a distinction may bedrawn between intentional and unintentional underreport-ing. In intentional underreporting, the individual knowinglydenies or minimizes the extent of his or her psychologicaldifficulties or negative characteristics. As a result, the individ-ual’s SRI scale scores underestimate his or her level of dys-function. Differentiation between denial and minimization isimportant but complex. The distinction here is between an in-dividual who blatantly denies problems that she or he knowsexist and one who may acknowledge some difficulties or neg-ative characteristics but minimizes their impact or extent.

Unintentional underreporting occurs when the individualunknowingly denies or minimizes the extent of his or herpsychological difficulties or negative characteristics. Here,too, objective and subjective indicators of psychologicalfunctioning would be at odds; however, in unintentional un-derreporting this discrepancy results from the individual’sself-misperception rather than an intentional effort to pro-duce misleading test results.

Much of the discussion of this topic in the assessmentliterature has appeared under the label social desirability.Edwards (1957) defined social desirability as “the tendencyof subjects to attribute to themselves, in self-description, per-sonality statements with socially desirable scale values and toreject those socially undesirable scale values” (p. vi). As wasthe case with acquiescence (discussed earlier), social desir-ability was proposed as a response style, “an organized dis-position within individuals to respond in a consistent manneracross a variety of substantive domains” (Wiggins, 1973).Edwards (1970) differentiated between social desirabilityand what he called “impression management,” a deliberate

attempt to lie or dissimulate for ulterior motives, that is, in-tentional underreporting as defined in this chapter. Thus, asconceptualized by Edwards (1957, 1970), social desirabilitywas a form of unintentional underreporting in response toSRI items.

Edwards (1957) argued that much of the variance inMMPI scale scores could be attributed to social desirability.He based this conclusion on research he did with an MMPIscale he constructed and labeled social desirability. The scalewas made up of 39 items that 10 judges unanimously deemedto reflect highly desirable self-statements. Edwards (1957,1970) reported that this scale was correlated highly with mostMMPI scales in general, and the strong, omnipotent first fac-tor that emerged from factor analyses of MMPI scale scores.He concluded that MMPI scale scores were thus hopelesslyconfounded with the social desirability response style and,therefore, could not be used to identify meaningful (ratherthan stylistic) individual differences.

As was the case with Jackson and Messick’s (1962) argu-ment regarding acquiescence (see the discussion of fixed re-sponding), Block (1965) provided a definitive refutation ofEdwards’s (1957) social desirability critique. Block demon-strated that Edwards’s social desirability scale was in fact amarker of a substantive personality dimension he termed egoresiliency. Following the earlier work of Wiggins (1959),Block developed an ego resiliency–free measure of socialdesirability and found much lower levels of overlap withsubstantive MMPI scale scores than Edwards reported for hissocial desirability scale. Moreover, Block demonstrated thatboth a social-desirability-independent ego resiliency scale heconstructed and Edwards’s social desirability scales werecorrelated with meaningful non-MMPI variables that re-flected substantive individual differences.

Commenting on Edwards’s (1957) claim that the MMPIscales were hopelessly confounded with social desirability,Block (1965) observed:

Confounding is a blade that, if held too tightly, will cut itswielder. With the same logic advanced for social desirability asunderlying MMPI scales, one can argue that the [first] factor ofthe MMPI represents a personality dimension that is vital to un-derstanding the SD scale. Many of the MMPI scales have empir-ical origins and demonstrable validity in separating appropriatecriterion groups. The high correlations found between thesescales and the SD measure therefore plausibly suggest—not anartifact or naiveté in the construction of the earlier scales—butrather that the SD scale, wittingly or not, is an excellent measureof some important variable of personality. (pp. 69–70)

When we reflect on the methods Edwards (1957) used toconstruct his social desirability scale, the resulting confound

Page 20: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

572 Assessing Personality and Psychopathology With Self-Report Inventories

is readily understood. Stated simply, psychopathology is un-desirable. Ask a group of persons to identify SRI items thatreflect undesirable characteristics, and, if they are included inthe pool, participants will undoubtedly generate a list ofitems describing negative psychological characteristics.Edwards’s assumption that individuals’ responses to suchitems reflect a substantively meaningless response styleproved subsequently to be unwarranted and was refuted byBlock’s (1965) analyses. Nonetheless, Edwards and somefollowers continued to raise these arguments. For example,relying (like Edwards) on scales that reflected desirabilityjudgments, Jackson, Fraboni, and Helmes (1997) criticizedthe MMPI-2 Content Scales (Butcher et al., 1990) for beinghighly saturated with social desirability. As Edwards failed todo before them, these authors did not explain how scales thatthey concluded were highly saturated with irrelevant stylisticvariance could account significantly for a wide range of ex-tratest personality and psychopathology variables (Butcheret al., 1990).

Implications of Threats to Protocol Validity

The issues discussed and highlighted in this section illustratethe crucial role played by respondents’ test-taking approachesin determining the interpretability of SRI scale scores.Allport (1937) and Ellis (1946) foresaw accurately that re-liance on an individual’s willingness and ability to generatean accurate self-portrayal when responding to test items wasthe among the greatest challenges facing SRI developers andusers. Subsequent decades of research and practice have illu-minated a host of threats to protocol validity ( just described),all manifestations of the kinds of concerns identified early onby Allport and Ellis. Self-report inventory developers haveresponded to these threats in various ways, ranging from thedevelopment of validity scales, SRI measures designed to as-sess and, in some instances, correct for the effects of protocolinvalidating test-taking approaches (e.g., the MMPI-2 valid-ity scales; Butcher et al., 2001), to declaration and attempts todemonstrate that these threats do not really amount to much(Costa & McCrae, 1992a; Piedmont et al., 2000) and the con-sequent decision not to include validity scales on some in-struments (e.g., the NEO-PI-R; Costa & McCrae, 1992c).

Commenting on the then-prevalent paucity of efforts bySRI developers to address threats to protocol validity, Meehland Hathaway (1946) observed:

It is almost as though we inventory-makers were afraid to say toomuch about the problem because we had no effective solution forit, but it was too obvious a fact to be ignored so it was met bya polite nod. Meanwhile the scores obtained are subjected to

varied “precise” statistical manipulations which impel the stu-dent of behavior to wonder whether it is not the aim of the per-sonality testers to get as far away from any unsanitary contactwith the organism as possible. Part of this trend no doubt reflectsthe lack of clinical experiences of some psychologists who con-cern themselves with personality testing . . . . (p. 526)

Acting on this concern, Hathaway and McKinley incorpo-rated two validity scales, L and F, in their original MMPIdevelopment efforts. The MMPI was not the first SRI to makevalidity scales available to its users. Cady (1923) modifiedthe Woodworth Psychoneurotic Inventory (derived from of theoriginal Personal Data Sheet) to assess juvenile incorrigibilityand incorporated negatively worded repeated items in the re-vised inventory to examine respondents’ “reliability.” Maller(1932) included items in his Character Sketches measure de-signed to assess respondents’ “readiness to confide.” Hummand Wadsworth (1935), developers of the Humm-WadworthTemperament Scales, incorporated scales designed to identifydefensive responding to their SRI. Ruch (1942) developed an“honesty key” for theBPI, the most widely used SRI prior tothe MMPI.

Hathaway and McKinley’s inclusion of validity scales onthe original MMPI was thus consistent with growing recog-nition among SRI developers of the need to incorporate for-mal means for assessing and attempting to correct for threatsto protocol validity. In describing their efforts to developand apply the MMPI K scale and K-correction, Meehl andHathaway (1946) articulated the conceptual and empiricalunderpinnings of MMPI approaches to assessing threats toprotocol validity. As MMPI use and research proliferatedthroughout the latter part of the twentieth century, Hathaway,McKinley, and Meehl’s emphasis on assessing threats toprotocol validity was continued through efforts to develop avariety of additional MMPI and MMPI-2 validity scales. Fol-lowing in this tradition, most (but not all) modern SRIs in-clude measures designed to provide information regardingthreats to protocol validity.

FUTURE DIRECTIONS FOR SELF-REPORTINVENTORY RESEARCH

Self-report measures play a vital role in personality and psy-chopathology assessment. Self-report inventories are usedcommonly and routinely in various applied assessment tasks,and they have been the focus of thousands of empirical in-vestigations. Considerable progress was made in developingthis technology over the course of the twentieth century, andmany of the concerns identified early on by Allport (1937)and Ellis (1946) have been addressed in modern self-report

Page 21: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

Future Directions for Self-Report Inventory Research 573

measures. Three primary aspects of SRI-based assessmentwere reviewed and analyzed in this chapter: approaches toSRI scale score interpretation, standard score derivation forSRIs, and threats to protocol validity. As discussed earlier,modern SRIs offer a variety of solutions to the challengesposed in each of these areas. However, this review has alsopointed out needs for further research-based refinement ineach of these aspects of SRI-based assessment. The final partof this chapter highlights needs and directions for further re-search in SRI-based approaches to assessing personality andpsychopathology.

Approaches to SRI Scale Score Interpretation

Two primary approaches to SRI scale score interpretation, em-pirically grounded and content-based, were identified in this re-view. Not surprisingly, much of the research in this area hasfocused on empirically grounded SRI scale score interpreta-tion. This is understandable because, by definition, empiri-cally grounded interpretation is research-dependent. However,content-based interpretation can and should be subjected to rig-orous empirical scrutiny. Specifically, research is needed toexamine the validity of content-based SRI scale score interpre-tation. Such investigations should explore the content validityof content-based measures (i.e., the extent to which they ade-quately canvass the relevant content domain) and the criterionand ultimately construct validity of content-based interpreta-tion. Moreover, as detailed earlier, content-based and empiri-cally grounded approaches are not mutually exclusive, andresearch is needed to guide SRI users regarding optimal waysto combine them in scale score interpretation.

Several aspects of empirically grounded SRI scale score in-terpretation also require further elaboration. As reviewed pre-viously, empirically keyed interpretation has garnered limitedsupport in the SRI literature to date. It is unclear whether thisis a product of limitations inherent in the external approach toSRI scale construction, in which case further efforts at devel-oping empirically keyed interpretative approaches should beabandoned, or whether the problem rests more in deficienciesof previous efforts at external scale construction that attenu-ated the validity of their products. There has been no extensiveeffort at external scale construction since the original MMPIclinical scales were developed. Considerable progress hassince been made in other approaches to diagnostic classifica-tion (e.g., development of structured diagnostic interviews)and in the methodologies and technology available to test con-structors. It is possible (if not likely) that a comprehensiveeffort to develop SRI scales keyed to differentiate empiricallybetween reliably (with the aid of structured diagnostic inter-views) diagnosed classes of individuals will yield diagnostic

indicators that are more valid than the original MMPI clinicalscales.

As noted previously, most empirically grounded SRI scalescore interpretation has followed the empirical correlate ap-proach. Much of the research in this area has focused onthe direct, simple inference level afforded by knowledge of ascale score’s criterion validity. Limited attention has beenpaid in this literature to an issue that receives prominent at-tention in the industrial/organizational (I/O) assessment liter-ature, the question of validity generalization: Under whatcircumstances are empirical correlates identified in one set-ting likely to apply to others? Following the seminal work ofI/O researchers Schmidt and Hunter (1977), I /O psycholo-gists have developed various techniques to appraise validitygeneralization for their assessment instruments. In light ofthe particularly prominent role of criterion validity in SRI-based assessment of personality and psychopathology, simi-lar research in this area is clearly needed.

Configural interpretation (examination of patterns amongSRI scale scores; as distinguished from linear interpreta-tion, which involves independent consideration of SRI scalescores) is another aspect of criterion-validity-based SRI appli-cation requiring further examination. As discussed earlier, theprimary assumption underlying configural interpretation (thatthere is something about the pattern of scores on a set of SRIscales that is not captured when they are interpreted linearly)has seldom been tested empirically. Moreover, in the rarecases in which it has been tested, configural interpretation hasnot demonstrated incremental validity in reference to linearapproaches. Configural approaches may improve upon linearinterpretation either by enhancing the scales’ convergent va-lidity or by sharpening their discriminant validity. Research isneeded to evaluate the extent to which configural interpreta-tion adds (beyond linear interpretation) to either or both.

Finally, with respect to scale score interpretation, researchhas yet to mine adequately the prospects of construct validity.As a result, SRI users are unable to rely on construct valid-ity adequately as an interpretive source. Most empiricallygrounded SRI scale score interpretation is guided by the sim-ple, direct inference level afforded by criterion validity data.Concurrent with the move in psychiatry toward a descriptive,atheoretical nosology, research on clinical applications ofSRIs has similarly focused narrowly on their scales’ criterionvalidity. Cronbach and Meehl’s (1955) admonition that psy-chological tests be used to identify and elucidate the nature ofmajor constructs, and that the resulting enhancement in ourunderstanding of these constructs guide our interpretation oftest scores, has not been followed. We remain largely inca-pable of interpreting SRI scale scores in the context of theo-retically grounded nomological networks.

Page 22: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

574 Assessing Personality and Psychopathology With Self-Report Inventories

A potential exception to this trend is the five-factor model(FFM) of personality, which focuses on five core personalitytraits: extraversion, agreeableness, conscientiousness, neuroti-cism, and openness/intellect. Although not without its critics(e.g., Block, 1995; Loevinger, 1994), this product of the normalpersonality assessment literature has generated an empirical lit-erature base that can be used to elucidate a rich, theoreticallygrounded nomological network associated with its five coreconstructs (e.g., John & Srivastava, 1999). Unfortunately,efforts to date to apply this rich framework to clinical assess-ment tasks have met with limited success. These difficulties,however, appear largely to be a product of limitations in testsdesigned to measure the FFM (e.g., questions about the clinicalutility of the NEO-PI-R related to its authors’ decision not tomeasure potential threats to protocol validity; Costa & McCrae,1992c). Alternative conceptualizations (e.g., Harkness andMcNulty’s PSY-5 model; 1994), developed from the clinicalrather than normal personality perspective, may ultimatelyprove more fruitful. In any event, enhancing SRI interpreters’ability to rely on their construct validity should be a major goalof further research efforts in this area.

Standard Score Derivation for SRIs

Two primary needs for further research exist with respect tostandard score derivation for SRIs. First, as reviewed earlier,various problems in normative sampling may result in over- orunderestimation of an individual’s standing on SRI-measuredconstructs. Current and future SRIs need to be scrutinizedcarefully to determine whether, and to what extent, the sys-tematic sampling errors, population changes, and applicationchanges described previously might compromise their norma-tive samples’ adequacy.

A second aspect of standard score derivation for SRIs thatshould be the focus of further research efforts relates to theadvisability and feasibility of using special norms when ap-plying SRIs to specific subpopulations or setting types. Someapproaches to incorporating population subsegment informa-tion in SRI scale score interpretation involve developing sep-arate norms for use in these applications (e.g., Roberts et al.’sapproach to using the PAI in public safety personnel screen-ing; 1999). However, as discussed earlier, use of so-calledspecial norms may obscure features shared commonly bymembers of a population subsegment or by individuals testedunder similar circumstances (e.g., defensiveness among indi-viduals being screened for public safety positions or depres-sion in people tested in clinical settings).

An alternative method for considering how an individual’sSRI scale scores compare with those of population subseg-ments is to provide interpreters data on group members’means and standard deviations on the relevant scales. Such

data could be provided in professional publications or alongwith individual test scores generated through automated scor-ing services. For example, many automated scoring servicescurrently include a graphic printout of the individual’s stan-dard scores on a profile sheet. Group mean profiles, alongwith their associated standard deviations or errors plottedas confidence intervals, could be added to these printouts.This would allow the test interpreter to learn how the individ-ual’s scores compare with both the general normative stan-dard and with relevant comparison groups without obscuringthe effects of group deviations from the mean.

Assessing Threats to Protocol Validity

Several types of threats to SRI protocol validity were identifiedin this chapter. Existing instruments vary in the extent to whichthey provide interpreters information regarding these threats’presence in a given protocol. Most SRIs provide means for as-sessing at least some of the categories of threats outlined inTable 24.1. The recently updated MMPI-2 (Butcher et al.,2001) contains scales designed to tap each of the types andsubtypes of threats described earlier. Within the category ofNon-Content-Based Invalid Responding, nonresponding isassessed by the Cannot Say scale; random responding by theVariable Response Inconsistency (VRIN) scale; and fixed re-sponding is measured by the True Response Inconsistency(TRIN) scale. In the category of Content-Based Invalid Re-sponding, overreporting is gauged by the infrequency scales F(Infrequency), Fb (Back Infrequency), and Fp (Infrequencypsychopathology), and underreporting is assessed by thedefensiveness indicators L (Lie), K (Defensiveness), and S(Superlative).

Existing validity scales fall short, however, in their abilityto differentiate meaningfully among threats within thesesubtypes. For example, existing scales do not allow for dif-ferentiation among intentional versus unintentional randomresponding, intentional versus unintentional over- or under-reporting, exaggeration versus fabrication, or minimizationversus denial. Some of these distinctions may only be possi-ble through consideration of extratest data; however, furtherresearch is needed to explore whether configural interpreta-tion of existing validity scales or development of additionalvalidity scales may allow SRI interpreters to more finely dis-tinguish among the various threats and levels of threats toprotocol validity.

CONCLUSION

This chapter provided an overview of the historical founda-tions and early criticisms of self-report measures, currentissues and challenges in SRI interpretation, and needs for

Page 23: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

References 575

future research in this area. A great deal of progress has beenmade in developing this technology’s conceptual and empiri-cal foundations. Over the past 50 years, the challenges articu-lated early on by Allport (1937) and Ellis (1946) have beenaddressed (with varying degrees of success) by subsequentSRI developers and researchers. These efforts have beendocumented in an elaborate body of scholarly literature that,of course, goes well beyond the scope of this chapter. Otherchapters in this volume cover additional aspects of this litera-ture, in particular the chapters by Garb on clinical versus sta-tistical prediction, Bracken and Wasserman on psychometriccharacteristics of assessment procedures, and Reynolds andRamsey on cultural test bias. Chapters on assessment in vari-ous settings include reviews of more setting-specific aspectsof the SRI literature. Overall, these chapters indicate that as-sessment of personality and psychopathology by self-reportrests on solid foundations that leave this technology wellpositioned for future research and development efforts.

REFERENCES

Allport, G. W. (1928). A test for ascendance-submission. Journal ofAbnormal and Social Psychology, 23, 118–136.

Allport, G. W. (1937). Personality: A psychosocial interpretation.New York: Henry Holt.

Anastasi, A. (1985). Some emerging trends in psychologicalmeasurement: A fifty year perspective. Applied PsychologicalMeasurement, 9, 212–138.

Ben-Porath, Y. S. (1994). The MMPI and MMPI-2: Fifty years ofdifferentiating normal and abnormal personality. In S. Strack &M. Lorr (Eds.), Differentiating normal and abnormal personal-ity. (pp. 361–401) New York: Springer.

Ben-Porath, Y. S., & Butcher, J. N. (1991). The historical develop-ment of personality assessment. In C. E. Walker (Ed.), Clinicalpsychology: Historical and research foundations (pp. 121–156)New York: Plenum Press.

Ben-Porath, Y. S., & Waller, N. G. (1992). “Normal” personalityinventories in clinical assessment: General requirements andpotential for using the NEO Personality Inventory. Psychologi-cal Assessment, 4, 14–19.

Bernreuter, R. J. (1933). Theory and construction of the personalityinventory. Journal of Social Psychology, 4, 387– 405.

Black, J. D. (1953). The interpretation of MMPI profiles of collegewomen. Dissertation Abstracts, 13, 870–871.

Block, J. (1965). The challenge of response sets: Unconfoundingmeaning, acquiescence, and social desirability in the MMPI.New York: Appleton-Century-Crofts.

Block, J. (1995). A contrarian view of the five factor approach to per-sonality description. Psychological Bulletin, 117, 187–215.

Burisch, M. (1984).Approaches to personality inventory construction:A comparison of merits. American Psychologist, 39, 214–227.

Butcher, J. N. (1996). International adaptations of the MMPI-2:Research and clinical applications. Minneapolis: University ofMinnesota Press.

Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., &Kaemmer, B. (1989). The Minnesota Multiphasic PersonalityInventory-2 (MMPI-2): Manual for administration and scoring.Minneapolis: University of Minnesota Press.

Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Tellegen, A.,Dahlstrom, W. G., & Kaemmer, B. (2001). The MinnesotaMultiphasic Personality Inventory-2 (MMPI-2): Manual foradministration, scoring, and interpretation (Revised ed.).Minneapolis: University of Minnesota Press.

Butcher, J. N., Graham, J. R., Williams, C. L., & Ben-Porath, Y. S.(1990). Development and use of the MMPI-2 content scales.Minneapolis: University of Minnesota Press.

Butcher, J. N., & Pancheri, P. (1976). Handbook of cross-nationalMMPI research. Minneapolis: University of Minnesota Press.

Butcher, J. N., & Rouse, S. V. (1996). Personality: Individual differ-ences and clinical assessment. Annual Review of Psychology, 47,87–111.

Cady, V. M. (1923). The estimation of juvenile incorrigibility.Elementary School Journal, 33, 1–140.

Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychologicaltest usage implications in professional psychology. ProfessionalPsychology: Research and Practice, 31, 141–154.

Cattell, R. B. (1965). The scientific analysis of personality.Baltimore: Penguin Books.

Cattell, R. B. (1979). Personality and learning theory (Vol. 1). NewYork: Springer.

Cattell, R. B., Cattell, A. K., & Cattell, H. E. (1993). SixteenPersonality Factors Questionnaire, fifth edition. Champaign, IL:Institute for Personality and Ability Testing.

Costa, P. T., & McCrae, R. R. (1992a). Normal personality assess-ment in clinical practice: The NEO Personality Inventory. Psy-chological Assessment, 4, 5–13.

Costa, P. T., & McCrae, R. R. ( 1992b). Normal personality inven-tories in clinical assessment: General requirements and thepotential for using the NEO Personality Inventory. A reply. Psy-chological Assessment, 4, 20–22.

Costa, P. T., & McCrae, R. R. (1992c). Revised NEO PersonalityInventory (NEO-PI-R) and NEO Five Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assess-ment Resources.

Craig, R. J. (1999). Interpreting personality tests: A clinical manualfor the MMPI-2, MCMI-III, CPI-R, and 16PF. New York: Wiley.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psy-chological tests. Psychological Bulletin, 52, 281–302.

Dahlstrom, W. G. (1992). The growth in acceptance of the MMPI.Professional Psychology: Research and Practice, 23, 345–348.

Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E. (1972). AnMMPI handbook, Vol. 1: Clinical interpretation (Revised ed.).Minneapolis: University of Minnesota Press.

Page 24: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

576 Assessing Personality and Psychopathology With Self-Report Inventories

Dorfman, W. I., & Hersen, M. (2001). Understanding psychologicalassessment. Dordecht, Netherlands: Kluwer Academic.

Downey, J. E. (1923). The will-temperament and its testing. NewYork: World Book.

Dubois, P. L. (1970). A history of psychological testing. Boston:Allyn and Bacon.

Duckworth, J. C. (1991). The Minnesota Multiphasic PersonalityInventory-2: A review. Journal of Counseling and Development,69, 564–567.

Edwards, A. L. (1957). The social desirability variable in personal-ity assessment and research. New York: Dryden.

Edwards, A. L. (1970). The measurement of personality traits byscales and inventories. New York: Holt, Reinhart, and Winston.

Ellis, A. (1946). The validity of personality questionnaires. Psycho-logical Bulletin, 43, 385–440.

Ellis, A. (1953). Recent research with personality inventories. Jour-nal of Consulting Psychology, 17, 45–49.

Eysenck, J. J., & Eysenck, S. B. G. (1975). Manual for the EysenckPersonality Questionnaire. San Diego, CA: Educational andIndustrial Testing Service.

Finger, M. S., & Ones, D. S. (1999). Psychometric equivalence ofthe computer and booklet forms of the MMPI: A meta-analysis.Psychological Assessment, 11, 58–66.

Galton, F. (1884). Measurement of character. Fortnightly Review,42, 179–185.

Gilberstadt, H., & Duker, J. (1965). A handbook for clinical andactuarial MMPI interpretation. Philadelphia: W. B. Saunders.

Goldberg, L. R. (1965). Diagnosticians versus diagnostic signs: Thediagnosis of psychosis versus neurosis from the MMPI. Psycho-logical Monographs, 79(Whole No. 602).

Groth-Marnat, G. (1997). Handbook of psychological assessment(3rd ed.). New York: Wiley.

Guthrie, G. M. (1952). Common characteristics associated withfrequent MMPI profile types. Journal of Clinical Psychology, 8,141–145.

Gynther, M. D. (1972). White norms and black MMPIs: A prescrip-tion for discrimination? Psychological Bulletin, 78, 386–402.

Halbower, C. C. (1955). A comparison of actuarial versus clinicalprediction of classes discriminated by the MMPI. Unpublisheddoctoral dissertation, Minneapolis, MN.

Harkness, A. R., & McNulty, J. L. (1994). The Personality-Psychopathology Five: Issues from the pages of a diagnosticmanual instead of a dictionary. In S. Strack & M. Lorr (Eds.),Differentiating normal and abnormal personality (pp. 291–315).New York: Springer.

Harkness, A. R., McNulty, J. L., & Ben-Porath, Y. S. (1995). ThePersonality Psychopathology Five (PSY-5) Scales. Psychologi-cal Assessment, 7, 104–114.

Hathaway, S. R. (1956). Scales 5 (Masculinity-Femininity), 6 (Para-noia), and 8 (Schizophrenia). In G. S. Welsh & W. G. Dahlstrom(Eds.), Basic readings on the MMPI in psychology and medicine(pp. 104–111). Minneapolis: University of Minnesota Press.

Hathaway, S. R., & McKinley, J. C. (1940). A Multiphasic Person-ality Schedule (Minnesota): I. Construction of the schedule.Journal of Psychology, 10, 249–254.

Hathaway, S. R., & McKinley, J. C. (1942). A Multiphasic Person-ality Schedule (Minnesota): II. The measurement of sympto-matic depression. Journal of Psychology, 14, 73–84.

Hathaway, S. R., & McKinley, J. C. (1943). The Minnesota Mul-tiphasic Personality Inventory. Minneapolis: University ofMinnesota Press.

Hathaway, S. R., & Meehl, P. E. (1951). The Minnesota MultiphasicPersonality Inventory. In Military clinical psychology (Depart-ment of the Army Technical Manual TM 8:242; Department ofthe Air Force AFM 160-145). Washington, DC: GovernmentPrinting Office.

Heidbreder, E. (1926). Measuring introversion and extraversion.Journal of Abnormal and Social Psychology, 21, 120–134.

Helmes, E., & Reddon, J. R. (1993). A perspective on developmentsin assessing psychopathology: A critical review of the MMPIand MMPI-2. Psychological Bulletin, 113, 453–471.

Heymans, G., & Wiersma, E. (1906). Beitrage zur spezillen psy-chologie auf grund einer massenunterschung [Contribution ofpsychological specialists to personality analysis]. Zeitschrift furPsychologie, 43, 81–127.

Hoch,A., &Amsden, G. S. (1913).Aguide to the descriptive study ofpersonality. Review of Neurology and Psychiatry, 11, 577–587.

Humm, D. G., & Wadsworth, G. W. (1935). The Humm-Wadsworthtemperament scale. American Journal of Psychiatry, 92, 163–200.

Jackson, D. N. (1989). Basic Personality Inventory manual. PortHuron, MI: Sigma Assessment Systems.

Jackson, D. N., Fraboni, M., & Helmes, E. (1997). MMPI-2 contentscales: How much content do they measure? Assessment, 4, 111–117.

Jackson, D. N., & Messick, S. (1962). Response styles on theMMPI: Comparison of clinical and normal samples. Journal ofAbnormal and Social Psychology, 65, 285–299.

John, O. P., & Srivastava, S. (1999). The big five trait taxonomy:History, measurement, and theoretical perspectives. In L. A.Pervin & O. P. John (Eds.), Handbook of personality: Theoryand research (pp. 102–138). New York: Guilford Press.

Keller, J., Hicks, B. D., & Miller, G. A. (2000). Psychophysiology inthe study of psychopathology. In J. T. Caciopo & L. G. Tassinary(Eds.), Handbook of psychophysiology (pp. 719–750). NewYork: Cambridge University Press.

Loevinger, J. (1994). Has psychology lost its conscience? Journal ofPersonality Assessment, 62, 2–8.

Maller, J. B. (1932). The measurement of conflict between honestyand group loyalty. Journal of Educational Psychology, 23, 187–191.

Marks, P. A., & Seeman, W. (1963). The actuarial description ofabnormal personality: An atlas for use with the MMPI. Baltimore:Williams and Wilkins.

Maruish, M. E. (1999). The use of psychological testing for treatmentplanning and outcomes assessment. Mahwah, NJ: Erlbaum.

Page 25: Assessing Personality and Psychopathology With Self-Report ... · 3/4/2015  · 554 Assessing Personality and Psychopathology With Self-Report Inventories discussed. Finally, directions

References 577

McKinley, J. C., & Hathaway, S. R. (1940). A Multiphasic Person-ality Schedule (Minnesota): II. A differential study of hypochon-driasis. Journal of Psychology, 10, 255–268.

McKinley, J. C., & Hathaway, S. R. (1942). A MultiphasicPersonality Schedule (Minnesota): IV. Psychasthenia. Journal ofApplied Psychology, 26, 614–624.

McKinley, J. C., & Hathaway, S. R. (1944). A Multiphasic Per-sonality Schedule (Minnesota): V. Hysteria, Hypomania, andPsychopathic Deviate. Journal of Applied Psychology, 28, 153–174.

Meehl, P. E. (1945). The dynamics of “structured” personality tests.Journal of Clinical Psychology, 1, 296–303.

Meehl, P. E. (1954). Clinical versus statistical prediction: A theoret-ical analysis and review of the evidence. Minneapolis: Univer-sity of Minnesota Press.

Meehl, P. E. (1956). Wanted—A good cookbook. American Psy-chologist, 11, 263–272.

Meehl, P. E., & Dahlstrom, W. G. (1960). Objective configural rulesfor discriminating psychotic from neurotic MMPI profiles. Jour-nal of Consulting Psychology, 24, 375–387.

Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a suppres-sor variable in the MMPI. Journal of Applied Psychology, 30,525–564.

Mineka, S., Watson, D., & Clark, L. A. (1998). Comorbidity of anx-iety and unipolar mood disorders. Annual Review of Psychology,49, 377–412.

Morey, L. C. (1991). Personality Assessment Inventory: Professionalmanual. Odessa, FL: Psychological Assessment Resources.

Piedmont, R. L., McCrae, R. R., Riemann, R., & Angleitner, A.(2000). On the invalidity of validity scales: Evidence from self-reports and observer ratings in volunteer samples. Journal ofPersonality and Social Psychology, 78, 582–593.

Roberts, M. D., Thompson, J. R., & Johnson, W. (1999). The PAILaw enforcement, corrections, and public safety selection report:Manual for the professional report service. Odessa, FL: Psycho-logical Assessment Resources.

Ruch, F. L. (1942). A technique for detecting attempts to fakeperformance on a self-inventory type of personality test. In

Q. McNemar & M. A. Merrill (Eds.), Studies on personality(pp. 61–85). New York: Saunders.

Schinka, J. A., & LaLone, L. (1997). MMPI-2 norms: Comparisonswith a census-matched subsample. Psychological Assessment, 9,307–311.

Schmidt, F. L., & Hunter, J. E. (1977). Development of a generalsolution to the problem of validity generalization. Journal ofApplied Psychology, 62, 529–540.

Starck, S., & Lorr, M. (1994). Differentiating normal and abnormalpersonality. New York: Springer.

Tellegen, A. (1982). Brief manual for the Multidimensional Person-ality Questionnaire. Minneapolis, MN: Unpublished document.

Tellegen, A. (1985). Structure of mood and personality and their rel-evance to assessing anxiety, with an emphasis on self-report. InA. H. Tuma & J. D. Maser (Eds.), Anxiety and the anxiety disor-ders (pp. 681–706). Hillsdale, NJ: Erlbaum.

Thurstone, L. L. (1930). A neurotic inventory. Journal of SocialPsychology, 1, 3–30.

Travis, R. C. (1925). The measurement of fundamental charactertraits by a new diagnostic test. Journal of Abnormal and SocialPsychology, 18, 400–425.

Watson, D., & Pennebaker, J. W. (1989). Health complaints, stress,and distress: Exploring the central role of negative affectivity.Psychological Review, 96, 234–254.

Wells, F. L. (1914). The systematic observation of the personality—In its relation to the hygiene of the mind. Psychological Review,21, 295–333.

Wiggins, J. S. (1959). Interrelations among MMPI measures of dis-simulation under standard and social desirability instruction.Journal of Consulting Psychology, 23, 419–427.

Wiggins, J. S. (1966). Substantive dimensions of self-report inthe MMPI item pool. Psychological Monographs, 80(22, WholeNo. 630).

Wiggins, J. S. (1973). Personality and prediction: Principles of per-sonality assessment. Reading, MA: Addison Wesley.

Woodworth, R. S. (1920). Personal data sheet. Chicago: Stoeling.