8/15/2019 43423998 Anne Anastasi Psychological Testing I
1/104
ANNE~NASTASIProfessor of Psychology, Fordham Universiry
P sy c l w lv g ic a l T e s tin g
MACMILLAN PUBLISHING CO., INC.
New Y o r k
Collier Maonillan Publishers
London
8/15/2019 43423998 Anne Anastasi Psychological Testing I
2/104
I N A revised edition, one expects both similarities and differences. This
edition shares with the earlier versions the objectives and basic approach
of the book . The primary goal of this text is still to contribute toward the
proper evaluation of psychological tests and the correct interpretation
and use of test results. This goal calls for several kinds of information:
( 1) an understanding of the major principles of test construction, (2)
psychological knowledge about the behavior being assessed, (3) sensi-
tivity to the social and ethical implications of test use, and (4) broad
familiarity with the types of available instruments and the sources of
information about tests. A minor innovation in the fourth edition is the
addition of a suggested outline for test evaluation (Appendix C).
In successive editions, it has been necessary to exercise more and more
restraint to keep the number of specific tests discussed in the book from
growing with the field-it has never been my intention to provide a
miniature Mental Measurements Yearbook! l:\evertheless, I am aware
that principles of test co~struction and interpretation can be better un-derstood when applied to~ particular tests. Moreover, acquaintance with
the major types of available tests, together with an understanding of
their special contributions and limitations, is an es!>entialcomponent of
knowledge about contemporary testing. For these reasons, specific tests
are again examined and evaluated in Parts 3, 4, and 5. These tests have
been chosen either because they are outstanding examples with which
the student of testing should be familiar or because they illustrate some
special point of test construction or interpretation. In the text itself, the
principal focus is on types of tests rather than on specific instruments. At
the same time, Appendix E contains a classified list of over 250 tests,
including not only those cited in the text but also others added to provide
a more representative sample.
As for the differences-they loomed especially large during the prepa-
ration of this edition. Much that has happened in human society since
the mid-1960'shas had an impact on psychological testing. Some of these
developments were briefly described in the last two chapters of the third
edition. Today they have become part of the mairn;tream.;()fpsychological '
testing and have been accordingly incorpo~i-ted in the apprqpqate sec-
tions throughout the book . Recent changes in psychological Jesting that
are reflected in the present edition can be del pribed on three levels:
(1) general orientation toward testing, (2) Stlbm,IJ,tiveand inethod()l~i-
cal developments, and (3) "ordinary progress" w 1 )Q as the publiciitibnof new tests and revision of earlier tests.
All rights reserved . No part of this book may be reproduced or
transmitted in any form or by any means, electronic or me-
chanical, including photocopying, recording, or any informa-
tion storage and retrieval system, without permission in writing
from the Publisher .
Earlier editions copyright 1954 and © 1961 by MacmillanPublishing Co., Inc., and copyright © 1968 by Anne Anastasi.
MACMILLAN PUBLISHING Co., INC.
866 Third Avenue, New York, New York 10022
COLLIER MACMILLAN CANADA, LTD.
Librarlj of Congress Cataloging in Publication Data
Anastasi, Anne, (date)
Psychological testing.
Bibliography: p.
Includes indexes.1. Mental tests. 2. Per sonality tests. I. Title.
[ DNLM: 1. Psychological tests. WM145 A534P]
BF431.A573 1976 153·9 75-2206
ISBN O-2-30298
8/15/2019 43423998 Anne Anastasi Psychological Testing I
3/104
Preface
; An example of changes on the first level is the incr easing awar eness of
~e ethical, soc ial, and legal implications of t~sting. In the present edi-
lon, thistopic has been expanded and treated 111a separate chapter ear ly
b the book (Ch. 3) and in Appendixes A and B . A cluster of related
l. .evelopments represe~ts a bro~dening of.test u~es..Beside~ the tradi~ion~l' pplications of tests 111 selectwn and diagnosIs, 111creasmgattention ISeing given to administering tests for self-kuowledge and self-d evelop-~entl and to training ind ividuals in the use of their own test res?lts. in,lJecisionmaking (Chs. 3 and 4). In the same categor y are the contmumg
~eplacementof global scores with multitrait profiles and the application
bf classificationstrategies, whereby "ever yone can be above average" in
bne or more socially valued "ariables (Ch. 7). From another angle,
r ffortsare being made to modif y traditional interpretations of test scores,
~n bothcognitive and noncognitive areas, in the light of accumulating
psychological knowledge. In this edition, Chapter 12 brings together
' psychologicalissues in the interpretation of intelligence test scores,
:touchingon such problems as stability and change in intellectual level
.over time; the nature of intelligence; and the testing of intelligence in
:earlychildhood, in old age, and in different cultures. Another example
is pr ovided by the increasing emphasis on situational specificity and I per son- by-situationinteractions in personality testing, stimulated in large
par t bythe social-lear ning theor ists (Ch. 17).
T ~ e second level, -covering substantive and methodological changes,
isillustratedby the impact of computers on the development, administra-
"tioll,scoring,and interpretation of tests (see especially Chs. 4, 11, 13, 17,
1 8 , W ) . The use of computers in administering or managing instructional
pr o/ramshas also stimulated the d evelopment of criterion-r eferenced
t~~~although other conditions have contributed t o th e u psu rg e o f
'i!r estin such tests in education. Criterion-r eferenced tests are discussed '1c •
,. 'pally in Chapters 4,5, and 14. Other ty pes of lllstruments that have
to prominence and have received fuller treatment in the present
n includ e: tests for identif ying specific learning disabilities (Ch.inventor ies and other devices for use in behavior modification pro-'
(Ch. 20), instruments for assessing early ch~ldhOod education
14), Piagetian "ordinal" scales (Chs. 10 and 14), basic education
literacy tests for adults (Cbs. 13 and 14), and techniques for the
ment of environments (Ch. 20). Problems to be considered in the
, ment of minority groups, including the question of test bias, are
ined from different angles in Chapters 3, 7, 8, and 12.
the third level, it may be noted that over 100 of the tests listed in
edition have been either initially pUblished or revised since the
ication of the preceding edition (1968). Major examples include the
ar thy Scales of Children's Abilities, the WISC-R , the 1972 Stanford-
norms (with all the resulting readjustments in interpretations),
Preface vii
Forms Sand T of the DAT (including a computerized Career Planning
Pr ogram), the Str ong-Camp bel l I nter est I nventory (merged f orm of the
SVIB), and the latest r evisions of the Stanfor d Achievement Test and theMetropolitan Readiness Tests.
It is a pleasure to acknowledge the assis~nce r eceived fr om many
sources in the preparation of this edition. The com pletion of the pr o ject
was facilitated by a one-semester Faculty Fellowshi p award ed by For d -
ham Univer sity and by a grant f rom the For dham Univer sity R esear chCouncil covering princi pally the services of a research assistant. These
services wer e perfor m ed by Stanley Fr iedland with an unusual combina-
tion of expertise, r esponSibility, and graciousness. I am indebted to the
many authors and test publishers who pr ovided r epr ints, unpublished
manuscr ipts, specimen sets of tests, and answers to my innumerable in-
quiries by mail and telephone. For assistance extend ing far beyond the
interests and r esponsibilities of any single publisher, I am especially
grateful to Anna Dragositz of Educational Testing Service and Blythe
Mitchell of Harcourt Brace Jovanovich, Ioc. I want to acknowledge the
Significant contribution of John T. Cowles of the Univer sity of Pittsburgh,
who assumed complete responSibility f or the prepar ation of the Instruc-
t or' s M anual to accompany this text.For informative discussions and critical comments on par ticular topics,
I want to convey my sincer e thanks to Willianl H. Angof f of Ed ucational
Testing Ser vice and to sever al members of the Fordham University Psy-
chology Department, including David R . Chabot, Mar vin Reznikoff ,
Reube~ M. Schonebaum, and 'Warren, W. Tr yon. Gratef ul acknowledg-
ment IS also made of the thoughtful r ecommendations su bmitted by
course instructors in r esponse to the questionnaire distri buted to cur rent
users of the third edition. Special thank s in this connection am d ue to
Mar y Carol Cahill for her extensive, constr uctive, and Wid e-r anging
suggestions. I wish to ex press my appr eciation to Victoria Overton of
the Fordham University library staff f or her efficient and courteous a s-
sistance in bibliographic matters. Finany, I am happy to r ecor d thecontributions of my husband , John Porter Foley, Jr ., who again par tici-
pated in the solution of countless problems at all stages in the prepara-tion of the book .
A. A.
8/15/2019 43423998 Anne Anastasi Psychological Testing I
4/104
CONTENTS
P ART 1CONTEXT OF PSYCHOLOGICAL TESTING
1. FUNCTIONS AND ORIGINS OFPSYCHOLOGICAL TESTING 3
Current uses of psychological tests QEarly interest in classification and training of the mentally
retarded 5The first experimental psychologists 7
Contributions of Francis Galton 8
Cattell and the early "mental tests" 9
Binet and the nse of intelligence tests 10
Gr oup t est ing 1 2
Aptitude tes ting 13 ~
Standardized achievement tests 16
Measurement of personality 18
Sour ces of information about tests 20
2. NATURE AND USE OFPSYCHOLOGICAL TESTS
What is a psychological test? 23Reasons for controlling the use of psychological tests
Test administration 32
Rapport 34
Te st an xi et \' 3 7Examiner ~nd situational variables 39
Coaching, practice, and test sophistication 41
3. SOCIAL AND ETHICAL IMPLICATIONS
OF TESTING "
User qualifications 45
Testing instruments and procedures 47
Protection of privacy . 49
Confidentiality 52
Communicating test results 56
Testing and the civil rights of minorities 57
ix
8/15/2019 43423998 Anne Anastasi Psychological Testing I
5/104
4. NORMS AND THE INTERPRETATION OF
TEST SCORES
Statistical concepts 68
Developmental norms 73Within-group norms 77
Relativity of norms 88Computer utilization in tile interpretation of test scores 94
Criterion-referenced testing 96
5, RELIAB ILITY
The correlation coefficient 104
Types of reliability 110
Reliability of speeded tests 122Dependence of reliability coefficients on the sample tested 125
Standard error of measurement 127
Reliability of criterion-referenced tests 131
Content validity 134
Criterion-related validity 140
Construct validity 151
Overview 158
7. VALIDITY: MEASUREMENT AND
INTERPRET ATION
Validity coefficient and error of estimate 163
Test validity and decision theory 167
Moderator variabll;;s 177Combining information from different tests 180
Use of tests for cl.assification decisions 186Statistical analyses of test bias 191
8. ITEM ANALYSl-S
Item difficulty 199
Item validity 206
Internal consistency 215
Item analysis of speeded tests 217
Cross validation 219
Item-group interaction 222
PART 3
TESTS OF GENERAL INTELLECTUAL
LEVEL
9. INDIVIDUAL TESTS
Stanford-Binet Intelligence Scale 230
Wechsler Adult Intelligence Scale 245
Wechsler Intelligence Scale for Children 2 .'5 5Wechsler Preschool and Primary Scale of Intelligence 260
10. TESTS FOR SPECIAL POPULATIONS
Infant and preschool testing 266
Testing the physically handicapped 281
Cross-cultural testing 287
Croup tests versus individual tests 299
Multilevel batteries 305
Tests for the college level and beyond 318
12. PSYCHOLOGICAL ISSUES IN
INTELLIGENCE TESTINGLongitudinal studies of intelligence 327.
Intelligence in early childhood 332
Problems in the testing of adult intelligence 337
Problems in cross-cultural testing 343
Nature of intelligence 349
PART 4
TESTS OF SEPARATE AInLJTIES
13. MEASURING MULTIPLE APTITUDES
Factor analysis 362
Theories of trait organizationMUltiple aptitude batteries
Measurement of creativity
369378
388
14. EDUCATIONAL TESTING
Achievement tests: their nature and uses 398General achievement batteries 403
Standardized tests in separate subjects 410
Teacher-made classroom tests 412
8/15/2019 43423998 Anne Anastasi Psychological Testing I
6/104
20. OTHER ASSESSMENT TECHNIQUES
"Objective" performance tests 588
Situational tests 593
SeH-concepts and personal constructs 598
Assessment techniques in behavior modification programs
Observer reports 606Biographical inventories 614
The assessment of environments 616
Diagnostic and criterion-rdt:re nc ed t est s 417
Specialized prognostic tests 423
Assessment in early childhood education 425
~ O CCU PAT IO NA L TE STI NG\VValidation of industrial tests 435
Short screening tests .for industrial personnel 439
Special aptitude tests 442
Testing in the profeSSions 458
Diagnostic use of intelligence tests 465
Special tests for detecting cognitive dysfunction
Identifying specific learning disabilities 478
Clinical judgment 482
Report writing 487
B. Guidelines on Employee Selection Procedures (EEOC)
Guidelines for Reporting Criterion-Related and
Content Validity (OFCC)
PART 5
PERSON ALITY TESTS
17. SELF-REPORT INVENTORIES
Content validation 494
Empirical criterion keying - 496
Factor analysis in test development
Personality theory in test development
Test-taking attitudes and response sets
Situational specificity 521
Evaluation of personality inventories
506510
515
18. MEASURES OF INTERESTS, ATTITUDES,AND VALUES ;527
Interest inventories 528
Opinion and attitude measurement 543
Attitude scales 546Assessment of values and related variables 552
19. PROJECTIVE TECHNIQUES
Nature of projective techniques 558
Inkblot techniques 559Thematic Apperception Test and related instruments
Other projective techniques 569
Evaluation of projective techniques 576
8/15/2019 43423998 Anne Anastasi Psychological Testing I
7/104
8/15/2019 43423998 Anne Anastasi Psychological Testing I
8/104
CHAPTER 1
F u n c tio n s a n d 0 1 ~ ig iT lS o f
P sy c llo lo g ica l T eS tiT lg
A' NYO NE r ead ing this book tod ay could und oubtedly illush'ate what
. is meant by a psychological test, I t would be easy enough to recall
. a test the read er himself has taken in school, in college, in the
armed services, in the counseling center, or in the personnel office. Or
perhaps the r ead er has served as a subject in an experiment in which
stand ardized tests wer e employed . This would certainly not have been the
case fifty years ago. Psychological testing is a r elatively young br anch of
one of the youngest of the sciences.
Basically, the function of psychological tests is to measure ,9.:iff e~~~.n~L _
1Jetween individuals or between the reactions of the same individual on
diff erent occasions.One of the first problems that stimulated the d evelop-
ment of psychological tests was the id entification of the mentally r e-
tard ed. To this d ay, the d etection of int~i1ectual d eficiencies remains an
Important application of certain ty pes o f psychological tests. Related
clinical uses oftests include the examination of the emotionally disturbed,
the d elinquent, and other ty pes of behavior al d eviartts. A strong impetusto the early d evelopment of tests was lik ewise provided by problems
arising in education, At present, schools ar e among the largest test users.
The classifica.tiOIlOfchildr en with refer ence to their ability to profit
from diff erent types of school instruction, the identi£ication of the in-
tellectually retarded on the one hand and the gifted on the other, the
diagnosis of acad emic failures, the educational and vocational counseling
of high school and college students, and the s~~ction of applicants for
professional and other special schools are among the many educational~uses of tests.
The selection and classification of industrial personnel represent an-
other ma jor application of psychological testing. From the assembly-line
8/15/2019 43423998 Anne Anastasi Psychological Testing I
9/104
4 COllf cl't o f Psychologic al T esting
operator or filing clerk to top management, there is scarcely a type of jo b
for which some kind of psychological test has not proved helpful in such
matters as hiring, job assignment, transf er, promotion, or termination.
To be sure, the effective employment of tests in many of these situations,
es eciiill-"Tri('Onnection with high-level jobs, usuall • re uires that the
t!.:ts he used as an adjunct to s -i u interviewing, so that test scores
may be properly int~rpreteaTnt1leli ht of other back ound ' rmatiQn
a out the m IVI un. evertheless, testing constitutes an important part~ total personnel progr am. A closely r elated application of psycho-
logical testing is to b e found in the selection and classification of military
personnel. From simple beginnings in "Vorld 'War I, the scope and
var iety of psychological tests employe d i n militar y sihlations under went
a phenomenal increase during World War II. Subsequently, research
on test development has been continuing on a large scale in all branches
of the a rmed ser vices,
The use of tests in counseling has gr adually broadened f ro m a na r-
rowly defined guidance r egarding educational and vocational plans to
an involvement with all aspects of the person's life. Emotional well-
being and ef fective interpersonal r elations have become increasingly
prominent objectives of counseling. Ther e is growing emphasis, too, on
the use of tests to enhance self-understanding and personal development.Within this framework, test scores ar e part of the information given to
the individual as ai ds to his own d ecision-making processes.
It is clear ly evid ent that psychological tests are currently being em-
ployed in the solution of a wid e range of practical pr o blems. One should
not, however, lose sight of the fact that such tests are als? serving impor-
tant functions in basic research Nearly all problems in differential psy-
chology, for example, require testing procedures as a means of gathering
d ata. As illustrations, reference may be made to stud ies on the natur e a nd
extent of individual differences, the identification of psychological traits,
the measurement of group:' diff erences, ~nd the investigationf i jo]ogical
and cUltur al factors associated WIth 6ehavioral differences. For all such
areas of research-and for many others-the precise mt>.asur ement of
individual differences mad e possible by well-constructed tests is an
essential prerequisite. Similarly, psycholOgical tests provide standardized
tools for investigating such varied problems as life-span developmental
changes within the individual, the relative effectiveness of different edu-
cational procedures, the outcomes of psychotherapy, the impact of
community programs, and the influence of noise on per formance.
From the many different uses of psychological tests, it follows that some
knowledge of such tests is needed for an adequate understanding of most
fields of contemporar y psychology. It is primarily with this end in view
that the present book has been prepared. The book is not designed to
make the individ ual either n skilled examiner and test administrator or
an"exper f on test construction. It is directed , not to the test specialist, but
to the general student of psychology. Some acquaintance with the lead ·'
ing current tests is necessary in order to understand r eferences to the use
of such tests in the psychological liter ature. And a proper evaluation and
interpretation of test results must ultimately rest on a knowledge of how
the tests were constructe
8/15/2019 43423998 Anne Anastasi Psychological Testing I
10/104
6 Conte xt o f Psychological Testing
deviates came a realization that some uniform cr iteria for identifying and
classif ying these cases were required . The establishment of many special
institutions for the care of the mentally retarded in both Europe and
America made the need for setting up admission standards and an ob-
jective system of classification especially urgent. Fir st it was necessar y to
differentiate between the insane and the mentall v retard ed. The former manifested emotional disorders that might or might not be accompanied
by intellectual d et er iomtion from an initially normal level; the latter were
characterized essentially by i~tellectual defect that had been present
from bir th or early infancy. What is probably the first explicit statement
of this distinction is to be found in a two-volume work published in 1838
by the French physician Esquirol (1838), in wh ich over one hundred
pages are de\'oted to mental retardation. Esquirol also pointed out that
there an! many degrees of mental retardation, varying along a continuum
f rom normality to low-grade idiOCy.In the effort to develop some system
f or claSSifying the differ ent degrees and varieties of retardation"Esguiroi
tr ied several pr ocedures but concluded that the individual's use of lan-
guage pr ovides the m05t de end a ble cr iter ion of his intellectual level. It
is mer es mg to note t at current criteria 0 ment a r etardation ar e also
largely lingUistic ant! that present-day intelligence tests ar e heavily
load ed ~vith Yerbal content. The important par t verbal ability plays in
our concept of intelligence will be repeatedly demonstrated in subsequent
chapters.
Of special significance are the contributions of another Fr en ch p hysi-
cian, S,egll~. who pioneered in the training of the mentally retarded.
Having rejected the prevalent notion of the ineurability of mental r e-
tar d ation SeO'uin (18 66 ) eXIJerimented f or many vears with what he, v ~ "
termed the physiological method of tr aining; and in 1837 he,:es,tal:6hed
the nrst school d evoted to the education of mentally reta .." ~hildren.
In 1848 he emigr at ed t o America, where his ideas gaine _ ide recog-nition. Man~- of the sense-training and muscle-trainirJg techniques cur-
rently in use in institutions for the mentally retarded \vere originated by
Seguin. By these methods, severely retarded children are given intensiveexercise in s ensory discrimination and in the development of motor con-
tr ol. Some of the procedur es developed by Seguin for this purpose were
'eventually incorporated into performan ce o r no nverbal tests of intelli-
gence. An example is the Seguin Form Board, in which the individual
is r eq uired to insert variously shaped blocks into the corr esponding
recesses as quickly as possible.
More than half a centur y after the work of Esquirol and Seguin, the
French psychologist Alfred Binet urged that children who failed to
r espond to normal schooling be examined before dismissal and, if con-
sidered educable, be assigned to special classes (T. H. Wolf , 1973). With
Funct ions and Origins o f P sychological Test ing 7
his fellow members of t he Society for the Psychological S tudy o f the
Child , Binet stimulated the Ministr y of Public Instruction to tak e steps to
improve the condition of retarded children. A s pecific outcome was the
8/15/2019 43423998 Anne Anastasi Psychological Testing I
11/104
Functions and Ol'igills of Psychological Test ing 9
mathematically untrained investigator who might wish to tr eat test r e-
sults quantitatively. He thereby extended enormously the application of
statistical procedures to the analysis of test data. This phase of Galton's
work has been carried forward by many of his students, the most eminent
of whom was Karl Pearson.It "'as the English biologist Sir Francis Galton who ,,:as.primarily r~-
sponsible for launching the testing movem~l~t: A umfY~lg. factor ~nCalton's numerous and vaI'ied research activities was hiS }nterest llL
'humaJ;r heredit ". In the course of his imestigations on heredity, Caltont~a 'ize t e need for measuring the characteristics of related and un-related persons. Only in this way could he discover, for example, the
exact degree of resemblance bet:w'een p~ren~s and offspring, 1;'rothers and .
sisters; cousins, or twins. With this end 11l View, Calton was mstrument~l '
in inducing a number of educational institutions to keep systematic
anthropometric recOl:ds on their students. ~e al
8/15/2019 43423998 Anne Anastasi Psychological Testing I
12/104
10 Context of PS lJc11010g ica l Testing
relation to independent estimates of intellectual levC:'1based on teachers'
ratings (Bolton, 1891-1892; J . A. Gilbert, 1894) or academic grades(Wissler, 1901).
A number of test series assembled by European psychologists of the
period tended to cover somewhat more complex functions. Kraepelin(1895), who was interested primarily in the clinical examination of psy-
chiatric patients, prepared a long series of tests to measure what he re-
garded as basic factors in the characterization of an individual. The
tests, employing chiefly simple arithmetic operations, were designed to
measurepractice effects, memory, and susceptibility to fatigue and to dis-
traction. A few years earlier, Oehrn (1889), a pupil of Kraepelin, had
emploY€idtests of perception, memory, association, and motor functions
in an investigation on the interrelations of psychological functions. An-
other German psychologist, Ebbinghaus (1897), administered tests of
arithmetic computation, memory span, and sentence completion to school-
children. The most complex of the three tests, sentence completion, was
the only one that showed a clear correspondence with the children's
scholastic achievement.
Like Kraepelin, the Italian psychologist Ferrari and his students were
interested primarily in the use of tests with pathological cases (Guicciardi
& Ferrari, 1896). The test series they devised ranged from physiological
measures and motor tests to apprehension span and the interpretation of
pictures. In an article published in France in 1895, Binet and Henri criti-
cizedmost of the available test series as being too largely sensory and as
concentrating unduly on simple, specialized abilities. They argued further
that, in the measurement of the more complex functions, great precision
is not necessary, since individual differences are larger in these functions.
Anextensive and varied list of tests was proposed, covering such func-tions as memory, imagination, attention, comprehension, suggestibility,
aesthetic appreciation, and many others. In these tests we can recognize
the trends that were eventually to lead to the development of the famous
Binet intelligence scales.
Functions and Origi;ls of Psychological Testing 11
ously cited commission to study procedures for the education of retarded
children. It was in connection 'with the objectives of this commission that
Binet, in collaboration with Simon, prepared the first Binet-Simon Scale(Binet & Simon, 1905).
This scale, known as the 1905 seale, consisted of 30 problems or tests
arranged in ascending order of difficulty. The difficulty level was deter-
mined empirically by administering the tests to 50 normal children aged
3 to 11 years, and to some mentally retarded children and adults. The
tests were designed to cover a wide variety of functions, with speCial
emphasis onJ.udgmt;nt, comprehension, and reasoning. Which Binet re-
garded as essential components of intelligence. Although sensory and
perceptual tests were included, a much greater proportion of verbal
content was found in this scale than in most test series of the time. The
1905 scale was presented as a preliminary and tentative instrument, and
no precise objective method for arriving at a total score was formulated.
In the second, or 1908, scale, the number of tests was increased, some
unsatisfactory tests from the earlier scale were eliminated, and all tests
were grouped into age levels on the basis of the performance of about300 normal children between.. the ages of 3 and 13 Years. Thus, in the
3-year level were placed all tests passed by 80 to 0 0 percent of normal3-year-olds; in the 4-year-Ievel, all tests similarly passed by normal 4-year-
olds; and so on to age 13.The child's score on the entire test could then
be expressed as a mental level corresponding to the age of normal chil-
dren whose performance he equaled. In the various translations and
adaptations of the Binet scales, the term "mental age" was commonly
substituted for "mentalleveI." Since mental age is such a simple concept
to~rasE> the introduction of this term undoubtedly did much to popu-
larize intelligence testing.> Binet himself, however, avoided the term
"mental age" because of its unverified developmental implications and
preferred the more neutral term "mental level" (T. H. \\Tolf, 1973).
A third revision of the Binet-Simon Scale appeared in 1911, the year of
Binet's untimely death. In this scale, no fundamental changes were intro-duced. Minor revisions and relocations of specific tests were instituted .
More tests were added at several year levels, and the scale was extended to the adult level
Even prior to the 1908 revision, the Binet-Simon tests attracted wide
> Goodenough (1949, pp. 50-51) notes that in 1881, 2l y~aTs befor~ the appear-ance of the 1908 Binet-Simon Scale, S. E. Chaille publi!iheq in the New Orleans
Medical a~d Surgical Journal a series of tests for infan~ 11l7anged according to thea!1:eat whIch the tests are commonly passed. Partly because' of the limited circulation
of the journal 'nd partly, perhaps, because the scientific ~Om!J1l1nity was not readyfor it, the significance of this age-scale concept passed unnoticed at the time. Binet's
own scale was in~ed by the work o E some o E ~is contemporaries, notably Blinand Damaye, who prepared a set of oral questions from which they derived a singleglobal score E o r eaclrdiild (T. H. Wolf, 1973). .
Binet and his co-workers devoted many years to active and ingenious
research on ways of measuring intelligence. Many approaches were tried,
including even the measurement of cranial, facial, and hand form, and
the analysis of handwriting. The results, however, led to a growing con-
viction that the direct, even though crude, measurement of com lex
1 fence a unc ons 0 ere t e greatest promise. T en a specific situ-ation arose that brought Binet's efforts to imme(]iate practical fruition.
In 1904, the Minister of Public Instruction appointed ~inet to the previ-
8/15/2019 43423998 Anne Anastasi Psychological Testing I
13/104
12 C o n t e xt o f P s yc 11 010g i ca l Tes t i ng
attention among psychologists throughout the wor ld . Tr anslation~ and
adaptations appeared in many lang;uages. In Americ;l, a number of diHer-
ent revisions were prepa.red , the most famous of which is the one de-
veloped under the dir ection of L. ~tTerman a.t Stanford University, and known as the Stanfmd -Binet (Terman, 1916). It was in this test that the
intelligence q uotient (IQ), or mtio between mental age and chr onologi-
cal age, was fir st used . The latest r evision of this test is wid ely employed
today and will be mor e full\' consider ed in Chapter 9. Of special interest,
too. is the first Kuhlmann-Binet revision, which extend ed the scale down-
ward to the age level of 3 months (Kuhlmann, 1912). This scale repre-
sents one of the earliest efforts to develop pr eschool and inf ant tests of
intelligence.
F u nc t io n s a n d Origins o f P s y c1 101u g i c a l T e s t in g 13
f o~ g~n~r al routine te~ting; t~e latter was a nonlanguage scale em ployed
WIth Illiterates and wIth foreign- bor n recruits who were unable to tak e a
tcst in English. Both test~ w~re suitable for administr atio~ to lar ge groups.
Shortly af~e~the temunatlOn of "Vorld War I, the Army tests were r e-
leased for cmhan use. Not only did the Army Alpha and Ar my Beta
themselves pass through many revisions, the latest of which are even now
in use, b.ut they also sVVed as ~dels f or most group intelligence tests.The te~ting .movement under went a tremendous s purt of growth. Soon
group mtelhgence tests were being devised f or all ages and ty pes of
~ersons, from preschool children to graduate students. Large-sc~le test-
109 progr a~ns: previously impossible, were now being launched with
~est~ul optimIsm. Because group. tests were designed as mass testing
lUsh uments, they not only permItted the simultaneous examination of
large groups but also simplified the instructions and adminish'ation pro-
cedu~es so as to demand a minimum of training on the par t of t he
exammer . Schoolteachers began to give intelligence tests to thcir classes.
Coll~ge studen~s wer e routinely examined prio~ to ad mission. Extensive
studies of specIal adult groups, such as prisoners, wer e und ertaken. And
soon the general public became IQ-conscious. "---
T~e application of such group intelligence tests f ar outr an their techni-
cal Impr ovement. That the tests wer e still cr ud e instruments was of ten
f?rgotten in the rush of gathering scores and drawing pr actical condu-
slO~Sfrom the ~esults. 'Vhen. ~he tests failed to meet unwar ranted expec-
tations" skepticism and hostiht)' toward all testing often resulted . J J 1 U S .the testi boom of the twenties, based on the indiscr iminate use of tests i?ISma~ have ~one as much to r etai' as to ad\'ance the pr ogress of psvcho- ---logical test mg. - ~
The Binet tests, as well as all their revisions, ar e in d il;ic lu a l scales inthe sense that the\" can be ad ministered to onlY one person at a time.
Man\' of the tests in these scales r equire .oral re~ ponses fr om the subject
or n~cessitate the manipulation of materials . S ome call f or individualtiming of r es ponses. For these and other r easons, suc h tests are not
ada pted to group administr ation. Another char acter ist ic of the Binet type
of test is that" it r eq uir es a highly trained examiner . Such tests are es-
sentiallv clinical instr uments, suited to the intensive study of individualJ .' •
cases.Group t esting , like the fir st Binet scale, was d evelo ped to meet a press-
ing pr actical need . When the United States e nter ed l)!or ld ' Var I in
191 7, a committee was appointed by the Amer ican Psychological Associ-
ation to consider ways in which psychology might assist in the conduct of the war. This committee, und er the direction of !lo bert 1.•.1. Yerkes, recog-
nized the need for the rapid classification of the million and a ha 1 f re-cruits with r es pect to gener al intellectual level. Such informati~.~~~va:s
r elevant to many admmistr ative decisions, including r e jection or dis-charge f r om militar y ser vice, assignment to diff erent types of sel'vicei, or
admission to officer-training camps. It was in this setting that the first
group intelligence test was d eveloped . In this task , the Ar-m~' psycholo-
gists d r ew on all availa ble test materials, and especially on an un pub-
lished group intelligence test prepared by ~rthur S. Otis, which hcturned over to the Army. A major contr i bution of Otis's test, which he
designed while a student in one of Terman's graduate courses, was the
introduction of multiple-choice and other "objective" item types.
The tests finally developed by the Army psychologists came to be
known as the ~rm""yAlpha and the Army Beta The former was designed
~lthough intelligence tests were originally designed to sample a wid e
vanety of ~unctions in order to estimate the individual's general intelIec-
tua~ level, It soon became apparent that such tests were quite limited in
theIr .cove~age. Not all impor tant functions were represented. I J : ! . fact,
most mtelhgence tests wer e primarily measures of verbal ability and. to a
lesser extent, of the ability to handle numer ical and other abstr act and
symb~~ic re~ations. Gr~dually psychologists eame to recogni~e that the
~erm . Il1telhgence test was a misnomer, since only certain aspects of
mtelligence were measured by such tests.
To be sure, th~ tests cov~red abilities ,t~t are ot p.rime importance in
our culture. B~ It was. realized that more' precise designations, in terms
of the type of mformation these tests are able to yield, w < ;lUlq be prefer-
8/15/2019 43423998 Anne Anastasi Psychological Testing I
14/104
14 Context of Psyclwlo{!.ical Testing
able, For example, a number of tests that would probably have been
caned intelligence tests during the twenties later came to be known as
scholastic aptitude tests. This shift ill terminology was made in l'ec:og-
nition of the fact that mallY so-called intelligence tests measure that
combination of abilities demanded by academic work .
E\'l'n prior to Vvorld War I, ps\'ch~logists had begun to recognize theneed for tests of spE'cial aptitudes to suppkment the global intelligence
tests. These s ecial a till/de test s ' , , _ '
vocationa counseling and in the selection and classification of industrial
and military ersonn~1. Among the most widely used are tests of .!!lechani-
ea , c erica, musical, and artistic aptitlldes.-TI~ca~lation of intelligence tests that follm,'ed their wide-
sl>\'eadand indiscriminate use durinlJ the twenties also revealed another , 0
lIote"iOlthy fact: an individual's erformance on '
test often -showed mar -c var iation. This ,yas especially apparent on
gl'OUptests, 111whlch the items ar~mmonly segregated into subtests of
relath'e1\- homogeneous content. For example, a person might score r ela-
tively high on a verbal subtest and low on a numerical subtest, or vice
ver sa, To some extent, such internal variability is also discernible on a
test lik e the Stanf ord-Binet, in which, for example, all items involvingwords might prove difficult for a particular individual, whereas itcms
em ploying pictures or geometr ic diagrams may place him at an ad -
vantage,Test user s. and es pecially clinicians, frequently utilized such interc~l11-
par isons in order to obtain 1110reinsight into the individual's psychological
mak e-up. Thus, not only tllC'IQ or other glo bal scor e but also scor es on
subtests wonld lJt' examined in the e\'aluation of the indhidual case, Such
a practice is not to be general1~' r ecommended , ho,~,('ver. ~)eeaus~ in-
tellig('J]ce tests wer e not designed for the purpose of ,dIHerel,~h,~11aphtude
anal;'sis. Of ten the subtests heing compar ed contain t0o,14C\\' items to
yield a stable or relia ble estimate of a specific ability:; jis'a r esu lt, the
obtained diffl:'rence betwcen subtest scores might be reversed if the
individual were retestE'd on a different day or with another foml of thesame test. If such intraindividual comparisons are to be made, tests are
needed that are specially designed to reveal differ ences in performance
in v arious functions.While the practical apl)lication of tests demonstr ated t he l1~.ed for
differential aptitude tests, a parallel development in the stu,d)' of trait or-
ganization was gradually providing the means for constructing SUC? tests.
Statistical studi('s on the natur e of intelligence had been explonng the
iflterrelatiol1s among scores obtained by many persons on a ,,,ide variety
of different tests, Such investigations were begun by the English , psy-
chologist Charles Spearman (1904, 1927) during the £lrst decade of the
Functions and Or igillS of PSl jchological Testing 15
present century. Subsequent methodological developments, based on the
work of such Amer ican psychologists as T. L. R eIley (1928) and L. L.
!hurs~one (1935, 194i), as well as on that of other Amer ican and Englishll1veshgators, have come to be known as "factor analvsis."
The contr i butions that the methods of f actor ana'lysis have made to
test c'Onstruction will be more fully examined and ill~strated in Chapter 1:3 . F or the present, it will suffice to note that t h e d ata gathered by such
procedures have indicated the presence of a Dumber of rebtiyely ;nde-
J)end ent f ac tor s. or tr aits. Some of these traits wer e represen'ted , in
vary~ng proportions, in the traditional intelligence tests. Verbal compre-
henSIOn and numerical reasoning are examples of this tvpe of trait.
Other s, such as spatial, perceptual, and mechanical aptitude~, wer e found
more often in special aptitude tests than in intelligence tests.
One of t he chief practical outcomes of factor analysis was the develop-
ment of mult i ple aptitud e batteries. These batteri('s arc d esiuned to pr o-
vid e a measure of the individual's standing in each of a number of traits.
In place of a total score or IQ, a separate score is obtained for such traits
as "erhal compr ehension, numerical aptitude, spatial visualization, arith-
m~tic re~soning, and perce~tual speed , Such batteries thus provide a
SUItable mstrument for makin
8/15/2019 43423998 Anne Anastasi Psychological Testing I
15/104
1 6 C OI l! ex ! o f Psyclwlogict,{ Tcsrillg
term "aptitude test" has been tracHtiollalJ" cmployed to refer to tests
measuring relativel\" homo ('ncous and dparlv defined sc rn1C'nts of
• II \.,t le term "intelliO'ence test" customarih' refers to more hderogenc-Co) e - . .
~ests yielding a single global score sm:h as an IQ. S~)ecial aptitu~c
tests typically measure a single aptitude. ~lultiple al~tltl1de battenes
measure a number of aptitudes but pro\"ide a profile of scores, one for
eaeh aptitude.
FI/I1C!iol1.\' mltl Origi/l.~ of Psyc1IO/
8/15/2019 43423998 Anne Anastasi Psychological Testing I
16/104
J' I IIIC/ /(111,\ {///(/ (higill., of J'sydl(l'(/~i('111 1'
8/15/2019 43423998 Anne Anastasi Psychological Testing I
17/104
8/15/2019 43423998 Anne Anastasi Psychological Testing I
18/104
22 Cont ext of Psyc11010gical T esting
of manuals that meet adequate scientific standar ds. An enlightened P U ? -lie of test users provides the f irmest assur ance that such standal'd s wIll
be maintained and improved in the future.. .Asuccinct but comprehensive guide for the evaluatwn of psy~hologlcal
testsis to be found in Standards for Educational arul Psyc11010glCal Tests
(1974), published by the American Psychological As~ocia~ion. These
standards represent a summary of recommended practices 111 test con-
struction based on the current state of knowledge in the field. They are
concerned w ith the information about validity, r eliability, norms, and other test character istics that ought to be reported in the manual. In their latest revision, the Standards also provide a guide for the proper use of
tests and for the correct interpretation and application of test results.
Relevantportions of the S t Qnda~ds "ill.be cited in the following chapter s,
in connection with the appropnate tOpICS.
CHAPTER 2
J \ r a t 1 u r e ar ld U se o f
P sy c l z .o lo g ic a l T e s ts
T.HE HISTORICAL introduction in Chapter 1 has alr eady suggested some of the many uses of psychological tests, as well as the wid e
d iversity of available tests. Although the general public may still
associate psychological tests most dosely with "IQ tests" and with tests
designed to detect emotional disorders, these tests represent only a small
proportion of the available ty pes of instruments. The major categories of
psychological tests will be discussed and illustr ated in Parts 3, 4, and 5,
'\'hich cover tests of general intellectual level, trad itionally called intelli-
gence tests; tests of se parate abilities, includ ing multi ple a ptitud e bat-teries, tests of s pecial aptitud es, and achievement tests; and per sonality
tests, concerned with measures of emotional and motivational tr aits, in-
terpersonal behavior, inter ests, attitudes, and other noncognitive char -
acteristics.
In the face of such diversity in nature and pur pose, ,~hat ar e tIle
common differentiating characteristics of ps~'Chological tests? Ho," d o
psychological tests differ from other methods of gathering information
about individuals? The answer is to be found in certain f und amental
f eatures of both the construction and use of tests. It is with these featm!es
that the present chapter is concerned.
BEHAVIOR SAMPLE..-A, psychological test is essentially an o bjective
.~d standard ized measure or it's'ample of behavior . Psychological tests
are like tests in any other science, insofar as 0R~f lh~tions are made on a
small hut carefully chosen ,sample .~ . an i p~ jyjil~)r s behaviQr.. In thisrespect, the psychologist proceed s in much·.the 'Jame way as the chemist
who tests a patient's blood or a community.}swater supply by analyzing
,-et'more samples of it. If the psychologistwish¢'~ to test the extent,iff a c1lild's vocabulary, a clerk 's ability to perf orm arithmetic computa-
8/15/2019 43423998 Anne Anastasi Psychological Testing I
19/104
tions, or a pilot's eye-hand coor dination, he ('xamim's their perfor mance
with a r epresentatin' set of wonls, :11'ithmclie prol>lems, or motor tests.
" 'hetlwr or not the test ad eqnately co\'(.'rs the behavior und er con-
sid eration o bviously depends on the number and natur e of it n ls in thesamp e. or examp e, an ant 1I1letJctest consisting of only five problems,
~le including only multiplication items, would be a poor measur e of
the ind iyidual's computational skill. A yoealmlary test composed entirely
of base ball terms would hardly pr oYid e a dependable estimate of achild 's total range of v ocalmlar ~'.
The diagnostic or ' redict i J ;c t ;a7uc of a lsycholC!gical test depend ~ _ ol!
the de bH,',~O which it sen'es as an indicator of a relatively broad and
!!guinea;t ar ea·Ofb~;:. Measur ement of the h ehaYior sample dir ectl~'
cO\'er ed by the test is J:ar ely, if ever , the goal of psychological testing.
The child 's knowled ge of a p ar ticular list of 50 word s is not, in itself , of
,great interest. Nor is the jo b applicant's perf or mance on a specific set
of 20 arithmetic problems of much importune-e _ If , however, it c an b e
demonstr ated that there is a dose correspond ence between the child 's
knO\dedge of the wor d list and his total l1laster~- of voca bular y, or be-
tween the applicant's scor e on the ar ithmetic problems and his computa-
tional perfor mance on the joh. then the tests ar e ser\'ing their purpose,
It should be noted ir.. this connectiol J t hat the test items need notr esemble closely the beha.vior the test is.t o }[('dicr."It is only necessary
tna " .- on ence be d emoHstrated bet"'ecn the t m); The
d egrec of similarity between the test sample and the pr ed icted behavior
ma\' vary wid ely. At one extr eme. the test mav coincid e completelY with
a part o'f the b;'h~or to he preclictt'cl. An e.\:Im ple might be a for eign
voca bulary test in whi!=·htilt:' stud ents are examilled on 20 of the 50 nt'\\-
word s th~y have studied; another example is pr ovided by the r o,ld test
tak en prior to o btaining a driver's liccme. A lesser d egr e e o f similarity is
illustr ated by many vocational a ptitud e tests administer ed prior to joh
training, in which ther e is only a mod'" eh as the Rorschach
inkblot t est , in which an attempt is made to pred ict f r om the subject's
as~ociations to inkblots how he will r cad to other people, to ~motionallytoned stimuli, and to other complex, ever yday-lif e situations, Despite
their superficial differ ences, all these tests consist of samples of the indi-
~s behavioL., And each mUst prove Its worth b y " an empiricallyd emonstrated correspondence between the subject's pcrformance on the
test and in other situations.
Whether the term "diagnosis" or the term " prediction" is employed in
this connection also represents a minor distinction. Prediction eommonly
connotes a temporal estimate, the individual's future performance on a
job, for example, heing for eeast from his present test performance. In a
hr oad e r sense, ho\\"('\'er, e\-en the diagnosis of pr esent condition, suell as
mental r etar dation ur emutional d isord er , im plies a pred iction of what
the incIi\'id ual will cIO in situations other than the pr esent test. It is
logically Simpler to consider all tests as behavior samples from which
predictions regarding other JX.havior can be m ad e. Differ ent ty pps of
tests can then be char acterized as variants of this basic patter n.
Anotlwr point that should be consider ed at the outset per tains to the
cone-e pt of Clll}(/cify. It is entir ely possible, f or example, to d c\'i sc a testf ur pr edicting how wel l a n individual can learn Fr e11Ch bef ore he has
even begun the stud y of Fr ench. Such a t est would invoh- e a sample of
the ty pes of behavior requir ed to learn the new language, but would in
itself presuppose no knowled ge of French. It could then be said that
this test measur es the indh'idual's "capacity" or " potentialitt f or learn-
ing French, Such tenus should , hO"'ever , be used with caution in r ef er -
ence to ps~'dlOlogical tests. Onl\' in the senSe that a present behavior
sample can be used as an indicator of other , futur e behayior can we
s~ak .()f a test measur ing "ca pacity." K o psychological test can d o mor ethan measur elJel1"UDor . 'Vh~ethci:S\1ch behavior can ser ve as an eff ective
inc!('x o f other IX'hador can be determined only by empirica l t ry-out.
STA:-;DARDIZATIO:-;, It ,,-:"iIlhe r ecalled that in the initial d efinition a p s~--
chological test \\'as descr ibed as a stand ard ized measur e. Standar d ization
implies !miformifll of ~)rQcedllre in 'hd nl11Hsfenng and SCoring the 'test If
the scor es o btained by diff er ent iudiyiduals are to be compar a ble, testin~
cond itions must o bYiously be the same for all. Such a requir ement is only
a s peCial application of the need f or controlled conditions in all scientific
ohse-r yations. In a test situation, the single ind e pend ent \'ar ia ble is
usuall~' the indh-idual being tested.
In ord er to secure uniformity of t esting conditions, the test constr uctor
provides detailed dir ections for administer ing each newly d eveloped h:'st.
The for mulation of such directions is a major part of the stand ardization
of a new test_ Such standardization extends to the exact materials em
'plo~d , time limits, oral instructions to subjects, prc>Jiminary demonstra-
: ~n s, ways of handling queries f r om subjects. and evel,\, other ~the testing situation. :Many other , mor e subtle factors may influence the
subject's performance on cer tain tests. Thus, in giving instructions or ,
presenting problems orally, consideration must be given to the rate of
speak ing, tone of voice, inflection, pauses, and faC j~1 e} pr ession. In a
test involving the detection of absurdities, to t eX;lnit>le, the correct an-~wer may be given away by smiling or paY~ jlg wh~n the crucial word
J~.read . . Stand~rdized testing p.rocedure, ~r :,~i[th~\. ex.aminer 's point of \1:w, Will be dJscussed further m a later sect~g~ of -
8/15/2019 43423998 Anne Anastasi Psychological Testing I
20/104
2 6 C Ol lt ex t O f Ps ych ological T es ting
Another important step in the standardization of a test is the establish-
ment of norms , Psychological tests have no pr ed eter mined stand ar ds o f
pli5s ingor f a'inng; an individual's score is evaluated by comparing it withthe scores obtained by others. As its name implies, a nor m is the normalor average performance. Thus, if normal B-year-old children complete
12 out of 50 problems corr ectly on a particular arithmetic reasoning test,
then the 8-year -old norm on this test corr esponds to a scor e of 12, The
latter is known as the raw scor e on the test, It may be expressed as
number of correct items, time required to complete a task, number of
er rors, or some other objective measure appropriate to the content of the
test. Such a raw score is meaninglcss until evaluated in terms of a suitable
set of norms, .
In the process of standardizing a test , i t is administered to a lar ge,representative sample of the ty pe of subjects for whom it is designed.
This group, k nown as the standardization sample, ser ves to establish the
norms. Such norms indicate not only the average performance but also
the relative frequency of varying degrees of deviation a bove and below
the awrage. It is thus possible to e valuate diff er ent degrees of super iority
and inferiority. The specific ways in which such norm" may be expressed
will be considered in Chapter 4. All permit the designation of the indi-
"idual's position with r ef erence t o t he normative or stand ardization
sample.It might also be noted that norms are established for per sonality tests
. in esse!1tially the same way as for aptitude tests. The norm on a person-
ality test is not necessarily the most d esirable or "ideal" performance,
any more than a perfect or errorless scor e is the norm on an aptitude
test. On both ty pes o f tests, the nor m corres ponds to the performance of
typical or average individuals. On dominance-submission tests, for ex-
ample, the nonn falls at an intermediate point representing the degree
of dominance or submission manifested by the average individual.
Similarly. in an emotional adjustment inventory, the nor m d oe s n ot
ordinarih· correspond to a complete absen
8/15/2019 43423998 Anne Anastasi Psychological Testing I
21/104
when retested on Friday, it is obvious that little or 110 confidence can be
put in either score. Similarly, if in olle set of 50 words an individual
identifies 40 correctl~·, whereas in another, supposedly equivalent set he
gets a score of only 20 right, then neither score can be taken as a de-
pendable index of his verbal comprehension. To be sure, in both illustra-
tions it is possible that only one of the two sC'ores is in error, but tlus
could be demonstrated only by further retests. From the given data, we
can conclude only that both scores cannot be right. \Vhether one or
neither is an adequate estimate of the individual's ability in vocabulary
cannot be established without additional information.
Before a psychological test is released for general use, a thorough,
objective check of its reliability should be carried out. The different types
of test reliability, as well as methods of measuring each, will be con-
sidered in Chapter 5. Reliability can be checked with reference to
Itemporal fluctuations, the particular selection of items or behavior sample
constituting the test, the role of different examiners or scorers, and other
aspects of the testing situation. It is essential to specify the type of re-
liability and the method employed to determine it, because the same test
may vary in these different aspects. The number and nature of indi-
viduals on whom reliability was checked should likewise be reported.
With such information, the test user can predict whether the test will beabout equally reliable for the group with 'which he expects to use it, or
whether it is likelv to be more reliable or less reliable.
VALIDITY, Undoubtedly the most important question to be asked about
any psychological test"concerns its validity, i.e., the degree to which the
test actually measures what it purports to measure. Validity provides a
direct check on how well the test fulfills its function. The determination
of validity usually requires independent, external criteria of -whatever the
test is nesigned to measure. For example, if a medical aptitude test ist9
be used in selecting promising applicants for medical school,. ultimatle
success in medical scholYlwould be a criterion. In the process of ·y~lidat-
ing such a test, it would be administered to a large group of students atthe time of their admission to medical school. Some measure of per-
formance in medical school would eventually be obtained for each stu-
dent on the basis of grades, ratings by instructors, success or failure in
completing training, and the like. Such a composite measure constitutes
the criterion with which each student's initial test score is to be correlated.
A high correlation, or validity coefficie, ,!t, would signify th~t those indi-
viduals who scored high on the- test. had been relatively successful in
medical school, whereas those scoring low on the test had done poorly in
medical school. A low correlation would indicate little correspondencel,,,t"' ppn tp~t ~('orp.rind criterirJn measure and hence poor validity for the
test. The validity coefficifnt enables us to determine how closel\' the
criterion perfor~ance could have been predicted from the test scor~s.
In a similar manner, tests designed for other purposes can be validated
against appropriate criteria. A vocational aptitude test, for example, can
be validated against on-the-job success of a trial group of new employees.
A pilot aptitude battery can 1;>evalidated against achie\'ement in flig:lt
training. Tests designed for broader f\nd more varied uses are validated
against a number of criteria and their valid ity can be established only by
the gradual accumulation of data from many different kinds of investiga-tions.
The reader may have noticed an apparent paradox in the concept of
test validity. If it is necessary to follow up the subjects or in other ways
to obtain independent measures of what the test is trying to predict, why
not dispense v.ith the test? The answer to this riddle is to be found in the
distinction between the validation l,T fO U p on the one hand anci the groupson which the test will eventually be employed for operational purposes
on the other. Before the test is ready for use, its validity must be estab-
lished on a representative sample of suhjects. The scores of these persons
are not themselves employed for operational purposes but serve only in
the process of testing the test. If the test proves valid b~' this method, it
can then be used on other samples in the absence of criterion measures.It might still be argued that we would need only to wait for the crite-
rion measure to mature, to become available, on any group in order to
obtain the information that the test is trying to predict. But such a pro-
cedure would be so wasteful of time and energy as to be prohibitive in
most instances. Thus, we could detennine which applicants will succeed
on a job or which students will satisfactorily complete college by admit-
ting all who apply and waiting for subsequent developments! It is thevery wastefulness of this procedure-and its deleterious emotional im-
pact on individuals-that tests are designed to minimize. By means of
tests, the person's present level of prerequisite skills, knowledge, and
other relevant characteristics can be assessed with a deferminable margin
of error. The more valid and reliable thef~, the smaller will be this,margin of error . .
The special problems encountered in determining the validity of dif-
ferent types of tests, as well as the specific criteria and statistical pro-
cedures employed, willlJ~ fhscussed in Chapters 6 and 7. One further
point, however, should be coq$fdered at this time. Validitv tells us more
than the degree to which the te~t is f~lfilling its funcpari.ft actually tells
us what the test is measuring. By studying the validation data, we can
objectively determine what the test is measuring. It would thus be more
accurate to define validity as the extent to which we Jrnow what the test
measures. The interpretation of test scores would undoubtedly be clearer
and less ambiguous if tests were regularly named in terms of the criterion
8/15/2019 43423998 Anne Anastasi Psychological Testing I
22/104
Context of Psychological Tes/ing
'~:~hl:oughwhich they had been validated. A tendency in this direction
pe'recognized in such test labels as "~cholastic aptitude test" and
sonnel classification test" in place of the vague title "intelligence
'SONS FOR CONTROLLING THE USE OF
,CHOLOCICAL TESTS
' y I:have a Stanford-Binet blank? ~fy nephew has to take it next week for ;i~s ion to,School X and I'd like to give him ~ol1lepractice so he can pass."
o improve the reading program in our school, we need a culture-free IQ
,t .that measures each child's inllate potential."
st night I answered the questions in an intelligence test published in a~gazineand I got an IQ of SO-I think psychological tests are silly."
. . ' y roommate is studying psych. She gave me a personality test and I came1neurotic. I've been too upset to go to class ever since."
, 'ast ~'enryou gave a new personality test to our employees for research pur-.; poses.We would now like to have the scores for their personnel folders."
The above ·remarks are not imaginary. Each is based on a re~fincident,
nd the list could easily be extended by any psychologist. SuQ't remarks
'lustrate potential misllses or misinterpretations of psychological tests in
uch wavs, as to rrnder the tests worthless or to hurt the indi:,V;idual.Like
ny sd~ntillc instrument or precision tool, psychological t~~s"roJ!~.LP.!: _
9perly used to be effective. In the hands of either the unscrupulous or
"we -meamng ut uninformed user , such tests can cause serious
~~~ ~. There are two principal reasons for controlling the use of psychological
ests: (a ) to revent general familiarity with test content, which would
.' invalidate the test an ( to ensure tat e test is used ~ a qualified :>
, '~\' if an individual were to merr 'lbrize the correct' re-
O' sponses on a test o'f'color blindness, such a test w ~l d n o l on ge r b e a
'measure of color vision for him. Under these condItions, the test would
be completely invalidated. Test content clearly has to be restricted in
, order to forestall deliberate efforts to fake scores.
In other cnses, however, the effect of familiarity may be less obvious,
or the test may be invalidated in good faith by misinformed persons. A
\ ,schoolteacher, for example, may give her class special praettee in prob-
.1ems closely resembling those on an intelligence test, "so that the pupils
will be well prepared to take the test." Such an attitude is simply a carry-
"over from the usual procedure of preparing for a school examination.
When applied to an intelligence test, however, it is likely that such
specific training 01' coaching will raise the scores on the test without ap-
preciably affecting the broader area of beha"ior the test tries to sample.
Under such conditions. the validity of the test as a predictive instl'l1ment
is reduced.
The need for a qualified examiner is evident in each of the three major aspects of the testing situation-selection of the test, administration and
scoring, and i~terpretation of scores. Tests cannot be chos'en like lawn
mowers, from a mail-order catalogue. They cannot be evaluated by name,
author, or other easy marks of identification. To be sure, it requires no
psychological training to consider such factors as cost, bulkiness and ease
of transporting test materials, testing time required, and ease and rapidity
of scoring. Information on these practica] points can '\lsually be obtained
from a test catalogue and should be taken into account in planning a test-
ing program. For the test to serve its function, however, an e"nlnation of
its technical merits' in terms of such characteristics as validity reliability
difficulty level, and norms is essential. Only in such a way' ~an the tes~
user determine the appropriateness of an)' test for his particular purpose
and its suitability for the type of persons with whom he plans to use it.
The introductory discussion of test standardization earlier in this chap-
ter has ah'eady suggested the importance of a trained examiner. An ade-
quate realization of the need to follow instructions precisely, as well as a
thorough familiarity with the standard instructions, i~ required if the test
scores obtained by different examiners are to be comparable or if anyone
individual's score is to he evaluated in terms of the published norms.
Careful conh-ol of testing conditions is also essential. Similarly, incorrect
or inaccurate scoring may render the test score worthless. In the absence
of proper checking procedures, scoring errors are far more likeh- to occur
than is generally realized . . ,\
The proper interpretation of test scores requires a thorough under-
standing of the test, the individual, and the testing
8/15/2019 43423998 Anne Anastasi Psychological Testing I
23/104
8/15/2019 43423998 Anne Anastasi Psychological Testing I
24/104
8/15/2019 43423998 Anne Anastasi Psychological Testing I
25/104
manifest a number of unfavorable attitudes, such as suspicion, insecurity,
fl'ar, or cynical indifh'renee. Abnormal conditions in their past experiences
are also likely to influence their test perforrnanee adversely. As a result
of early failures and frustrations in school, for example, they may have
developed feelings of hostility and inferiority toward academic tasks,
\rhich the tests resemble. The experienced examiner makes special effortsto establish rappolt under these conditions. In any event, he must be
sensitive t~ these special difficulties and take them into account in inter-
pr eting and explaining test performance.
In testing any school-age child or adult, one should bear in mind that
e\'e1')'test presents an implied threat to the individual's prestige. Some
reassurance should therefore be given at the outset. It is helpful to ex-
plain, f or e xample, that no one is expected to finish or to get all the itcms
correct. The individual might otherwise experience a mounting sense of
failure as 11e advances to the more difficult items or finds that he is un-
able to finish anv subtest within the time allowed.
It is also desil:able to eliminate the element of surprise from the test
situation as far as possible, because the unexpected and unknown are
likely to produce al1xiet~'. :Many group tests provide a prdiminaryex-
planatory statement that is read to the group by the examiner. An even
better procedure is to announce the tests a few days in advance and to
give each subject a printed booklet that explains the purpose and nature
of the tests, offers general suggestions on how to take tests, and contains
a few sample items. Such explanatory booklets are regularly available to
participants in large-scale testing programs such as those conducted bythe College Entrance Examination Board (1974a, 1974b). The United
States Employment Service has likewise de\'eloped a booklet on how to
take tests, as well as a more extensive pretesting orientation~.technique
for use with culturally disadvantaged applicants unfamili~f . ,v'ith tests.
\1ore general orientation booklets aie also . available, si 't c1 1 as l\feetingthe Test (Anderson, Katz, & Shimberg, 1965), A tape recOl'ding and two
booklets are combined in Test Orientatioll Procedure (TOP), designed
specifically for job applicants with little prior testing experience CBen-nett & Doppelt, 1967), The first booklet, used together with the tape,
provides general information on how to take tests; the second contains
practice tests. In the absence of a tape recorder, the examiner may read
the instructions from a printed script.
Adult testing presents--some additional problems. Unlike the school-
child, the adult is not so likely to work hard at a task merely because it is
assigned to him. It therefore becomes more important to "sell" the pur-
pose of the tests to the adult, although high school and college students
also respond to such an appeal Cooperation of the examinee can usually
; be secured by convincing him that it is in his own interests to obtain a\,
valid score, Le., a score correctly indicating wh~lt he can do rather than
overestimating or underestimating his abilities. ~Iost persons will under-
stand that an incorrect decision, which might result from invalid test
scores, would mean subsequent failure, loss of time, and frustration for
them. This approach can serve not only to motivate the individual to
try his best on ability tests but also to reduce faking and encourage frank reporting on personality inventories, because the examinee realizes that
he himself would otherwise be the loser. It is certainly not in the best
inter ests of the individual to be admitted to a course of study for which
he is not qualified or assigned to a job he cannot perform or that hewould find uncongenial.
:\lany of the practices designed to enhance rapport sen'e also to reduce
test anxiety. Procedures tending to dispel surprise and strangeness from
the testing situation and to reassure and encourage the subject shottld
certainly help to lower anxiety. J'he examiner 's own manner and a well-organized , smccthly running testing operation will contribute toward the
same goal. Individual differences in test anxiety have been studied with
hoth schoolchildren and college students (Ga~dry& Spielberger, 1974;-
Spielberger , 19i2). Much of this research was initiated bv Sarason and
his associates at Yale (Sarason, Davidson, Lighthall, "'aite, & Ruebush,
1960). The first step was to construct a questionnaire to assess the indi-
vidual's test-taking attitudes. The children's form, for example, containsitems such as the following:
Do you worry a lot before taking a test?
\\'hen the teacher sa~'sshe is going to find out how much you h,we learned,does your healt begin to beat faster?
While 'you are taking a test, do you usually think you are not doing wen.
Of primary interest is the finding that both school achievement and intel-
ligence test scores yielded significant negative correlations with test anx-
iety. Similar correlations have been found among college st1tdcn!s (1. G.
Samson, 1961). Longitudinal studies likewise revealed an inverse relation
between changes in anxiety level and changes in inteJligence or achieve-
ment test perfonnance (Hill & Sarason, 1966; Sarason, Hill, & Zim- bardo, 1964). .
~uch findings, of course, do not indicate the direction of caUsal relation-slllps. It is possible that children develop test anxiety because they per-
8/15/2019 43423998 Anne Anastasi Psychological Testing I
26/104
Con t e xt o f P s y dl ( Jl o g ic a l T e s t iu g
formpoor ly on tests and haw thus experienced failure and frustration in
previous test situations. In support of this interpretation is the finding
that \\ithin subgroups of high scorers on intelligence tests, the negative
"rrelation between anxiet~' level and test performance disappears
Denny, 1966; Feldhusen & Klausmeier, 1962). On the other hand , there
5 evidence suggesting that at least some of the relationship results fromhe deleteLious effects of anxiety on test performance. In one study
(:Waite,Sarason, Lighthall, & Davidson, 1958), high-anxious and low-
, 'iotlschildren equated in intelligence test scores were given repeated
ialsin a learning task Although initially equal in the learning test, the
w-allxiousgroup improved significantly more than the high-anxious.
Severalinvestigators have compared test performance under conditions
esigned to evoke "anxious" and "relaxed" states. Mandler and Sarason f ;;( .1952) , for example, found that ego-involving instructions, such as telling
su bjects that everyone is expected to finish in the time allotted , h ad a
beneficialeffect on the performance of low-anxious subjects, but a dele-
teriouseffect on that ofbigh-anxious subjects. Other studies have likewise
foundan interaction between testing conditions and such individual char-
~cteristicsas anxiety level and achievement motivation (Lawrence, 1962;Palll & Eriksen, 1964). It thus appears likely that the r~latjQn between
anxiety,and test p erformance is nonlinear, a slight amount Qf anxiety
, \le in bencficia~ while a lar e amount is detrimental. Individuals who are
',cllstomariy ow-anxious benefit from test con i,tions t lat arouse some
e t : > , ",hi e t lose who are customarilv hiil tests may be
undulyaffected by test anxiety. In a thorough ana::4ontrol1ed investi.
gationof this question, French (1962) compar~d Jhf' p,erformancc of high
schoolstudents on a test given as part of the f e-gular administration of
the SAT with performance on a parallel form of the test administered at
,a different time under "relaxed" conditions, The instructions on the latter
, occasionspecified that the test was given for 'research purposes only and
scoreswould not be sent to any college. The results showed that per-
formance was no poorer during the standard administration than during
the r elaxed administration. Moreover , the concurrent validitv of the test
scoresagainst high school course grades did not differ signifi~antly under
the two conditions.
Comprehensive surveys of the effects of examiner and situational
variables on test seores'lmve been prepared by S. B. Sarason (1954),
Masling (l~60),
~foliarty (1961, 1966),
Sattler and Theye (1967),
Palmer (19,0), and Sattler (1970, 1974). Although some effects have
been demonstrated with objective group tests, most of the data have been
obtained with either projective techniques or individual intelligence tests.
These extraneous factors are more likely to operate with unstructured and
ambiguous stimuli, as well as "ith difficult and nO"el tasks, than with
clearly defined and well-learned functions. In general, children are more
susceptible to examiner and situational influences than are adults; in the
examination of preschool children, the role of the examiner is especially
cruCiaL.Emotionally disturbed and insecure persons of an\' age are also
mClre likely to be affected by such conditions than are well-adjusted persons,
There is considerable evidence that test results may vary systematically
as a function of the examiner (E. Cohen, 1965; ~'Iasling, 1960). These dif-
f erences may he related to personal characteristics of the examiner, such
as his, age, sex, race, professional or socioeconomic status, training and
expenence, personality charaderistics, and appearance. Se\'eral studies of
thes~ examiner variables, however , have yielded misleading or illcon-
cluSl\'e results because the experimental designs failed to control or iso-
late the influence of differcnt examiner or subject characteristics. Hence
thp l:'ffeds of two or more variables ma\, be confounded.
The examiner 's behavior before and during test auministration has also
heen s~lown to affect test results, For example, controlled investigations
ha\'e YIelded significant d ifferences in intelligence test performance as a
res~lt of a "warm" versus a "cold" interpersonal relation between ex-
amllJer and examinees, or a rigid and aloof versus a natural manner on
the part of the examiner (Exner, 1966; Masling, 1959). Moreover, there
may be Significant interactions between examiner and examinee' charac-t " , he~lstJCs,III t e sen~e that the same examiner characteristic or testing man-
nel may have a dIfferent effect on different examinees as a function of
the examinee's Own personality characteristics. Similar interactions may
occur '~ith task variables, such as the nature of th,e test, the purpose of
the testing, and the instructions given to the subjects. Dyer (1973) adds
even more variables to this list, calling attention to the possible inHir enceof th t t· , d . ," .. c es gIVers an the test takers' diverse perceptions of the funetigllsand goals of testing.' '
St'll '•'. I. an,other way in which an examin8r may inadvertently affect the
~x~~m~e s responses is through ~is own ' cexpectations, This is simply a
P clal mstance of the self-fulfilhng prophecy (Rosenthal, 1966; Rosen-
8/15/2019 43423998 Anne Anastasi Psychological Testing I
27/104
40 Co nt e xt of P sycholog.ical Testing
thaI & Rosnow, 1969). -Anexper iment conducted with the Rorschach will
illustrate this effect (Masling, 1965). The examiners were 14 graduate
student volunteers, 7 of whom were told, among other things, that ex-
perienced examinel's elicit more human than animal responses from the
subjects, while the other 7 were told that exper ienced examiners elicit
more animal than human responses. Under these conditions, the two
groups of examiner s obtained significantly diHerent ratios of animal tohuman r esponses f rom theh subjects. These diff erences occurr ed d espite
the fact that neither examiners nor subjects reported awareness of any
influence attempt. ~f oreover, tape r ecordings of all testing sessions r e-
vealed no evidence of verbal influence on the par t of any examiner . The
examiners' expectations apparently operated through subtle postural and
f acial cues to which the subjects responded.
Apa~ f rom the examiner, other aspects of the testing situation may
Significantly aff ect test per formance