43423998 Anne Anastasi Psychological Testing I

8/15/2019 43423998 Anne Anastasi Psychological Testing I

1/104

ANNE~NASTASIProfessor of Psychology, Fordham Universiry

P sy c l w lv g ic a l T e s tin g

MACMILLAN PUBLISHING CO., INC.

New Y o r k

Collier Maonillan Publishers

London


2/104

I N A revised edition, one expects both similarities and differences. This

edition shares with the earlier versions the objectives and basic approach

of the book . The primary goal of this text is still to contribute toward the

proper evaluation of psychological tests and the correct interpretation

and use of test results. This goal calls for several kinds of information:

( 1) an understanding of the major principles of test construction, (2)

psychological knowledge about the behavior being assessed, (3) sensi-

tivity to the social and ethical implications of test use, and (4) broad

familiarity with the types of available instruments and the sources of

information about tests. A minor innovation in the fourth edition is the

addition of a suggested outline for test evaluation (Appendix C).

In successive editions, it has been necessary to exercise more and more

restraint to keep the number of specific tests discussed in the book from

growing with the field-it has never been my intention to provide a

miniature Mental Measurements Yearbook! l:\evertheless, I am aware

that principles of test co~struction and interpretation can be better un-derstood when applied to~ particular tests. Moreover, acquaintance with

the major types of available tests, together with an understanding of

their special contributions and limitations, is an es!>entialcomponent of

knowledge about contemporary testing. For these reasons, specific tests

are again examined and evaluated in Parts 3, 4, and 5. These tests have

been chosen either because they are outstanding examples with which

the student of testing should be familiar or because they illustrate some

special point of test construction or interpretation. In the text itself, the

principal focus is on types of tests rather than on specific instruments. At

the same time, Appendix E contains a classified list of over 250 tests,

including not only those cited in the text but also others added to provide

a more representative sample.

As for the differences-they loomed especially large during the prepa-

ration of this edition. Much that has happened in human society since

the mid-1960'shas had an impact on psychological testing. Some of these

developments were briefly described in the last two chapters of the third

edition. Today they have become part of the mairn;tream.;()fpsychological '

testing and have been accordingly incorpo~i-ted in the apprqpqate sec-

tions throughout the book . Recent changes in psychological Jesting that

are reflected in the present edition can be del pribed on three levels:

(1) general orientation toward testing, (2) Stlbm,IJ,tiveand inethod()l~i-

cal developments, and (3) "ordinary progress" w 1 )Q as the publiciitibnof new tests and revision of earlier tests.

All rights reserved . No part of this book may be reproduced or

transmitted in any form or by any means, electronic or me-

chanical, including photocopying, recording, or any informa-

tion storage and retrieval system, without permission in writing

from the Publisher .

Earlier editions copyright 1954 and © 1961 by MacmillanPublishing Co., Inc., and copyright © 1968 by Anne Anastasi.

MACMILLAN PUBLISHING Co., INC.

866 Third Avenue, New York, New York 10022

COLLIER MACMILLAN CANADA, LTD.

Librarlj of Congress Cataloging in Publication Data

Anastasi, Anne, (date)

Psychological testing.

Bibliography: p.

Includes indexes.1. Mental tests. 2. Per sonality tests. I. Title.

[ DNLM: 1. Psychological tests. WM145 A534P]

BF431.A573 1976 153·9 75-2206

ISBN O-2-30298


3/104

Preface

; An example of changes on the first level is the incr easing awar eness of

~e ethical, soc ial, and legal implications of t~sting. In the present edi-

lon, thistopic has been expanded and treated 111a separate chapter ear ly

b the book (Ch. 3) and in Appendixes A and B . A cluster of related

l. .evelopments represe~ts a bro~dening of.test u~es..Beside~ the tradi~ion~l' pplications of tests 111 selectwn and diagnosIs, 111creasmgattention ISeing given to administering tests for self-kuowledge and self-d evelop-~entl and to training ind ividuals in the use of their own test res?lts. in,lJecisionmaking (Chs. 3 and 4). In the same categor y are the contmumg

~eplacementof global scores with multitrait profiles and the application

bf classificationstrategies, whereby "ever yone can be above average" in

bne or more socially valued "ariables (Ch. 7). From another angle,

r ffortsare being made to modif y traditional interpretations of test scores,

~n bothcognitive and noncognitive areas, in the light of accumulating

psychological knowledge. In this edition, Chapter 12 brings together

' psychologicalissues in the interpretation of intelligence test scores,

:touchingon such problems as stability and change in intellectual level

.over time; the nature of intelligence; and the testing of intelligence in

:earlychildhood, in old age, and in different cultures. Another example

is pr ovided by the increasing emphasis on situational specificity and I per son- by-situationinteractions in personality testing, stimulated in large

par t bythe social-lear ning theor ists (Ch. 17).

T ~ e second level, -covering substantive and methodological changes,

isillustratedby the impact of computers on the development, administra-

"tioll,scoring,and interpretation of tests (see especially Chs. 4, 11, 13, 17,

1 8 , W ) . The use of computers in administering or managing instructional

pr o/ramshas also stimulated the d evelopment of criterion-r eferenced

t~~~although other conditions have contributed t o th e u psu rg e o f

'i!r estin such tests in education. Criterion-r eferenced tests are discussed '1c •

,. 'pally in Chapters 4,5, and 14. Other ty pes of lllstruments that have

to prominence and have received fuller treatment in the present

n includ e: tests for identif ying specific learning disabilities (Ch.inventor ies and other devices for use in behavior modification pro-'

(Ch. 20), instruments for assessing early ch~ldhOod education

14), Piagetian "ordinal" scales (Chs. 10 and 14), basic education

literacy tests for adults (Cbs. 13 and 14), and techniques for the

ment of environments (Ch. 20). Problems to be considered in the

, ment of minority groups, including the question of test bias, are

ined from different angles in Chapters 3, 7, 8, and 12.

the third level, it may be noted that over 100 of the tests listed in

edition have been either initially pUblished or revised since the

ication of the preceding edition (1968). Major examples include the

ar thy Scales of Children's Abilities, the WISC-R , the 1972 Stanford-

norms (with all the resulting readjustments in interpretations),

Preface vii

Forms Sand T of the DAT (including a computerized Career Planning

Pr ogram), the Str ong-Camp bel l I nter est I nventory (merged f orm of the

SVIB), and the latest r evisions of the Stanfor d Achievement Test and theMetropolitan Readiness Tests.

It is a pleasure to acknowledge the assis~nce r eceived fr om many

sources in the preparation of this edition. The com pletion of the pr o ject

was facilitated by a one-semester Faculty Fellowshi p award ed by For d -

ham Univer sity and by a grant f rom the For dham Univer sity R esear chCouncil covering princi pally the services of a research assistant. These

services wer e perfor m ed by Stanley Fr iedland with an unusual combina-

tion of expertise, r esponSibility, and graciousness. I am indebted to the

many authors and test publishers who pr ovided r epr ints, unpublished

manuscr ipts, specimen sets of tests, and answers to my innumerable in-

quiries by mail and telephone. For assistance extend ing far beyond the

interests and r esponsibilities of any single publisher, I am especially

grateful to Anna Dragositz of Educational Testing Service and Blythe

Mitchell of Harcourt Brace Jovanovich, Ioc. I want to acknowledge the

Significant contribution of John T. Cowles of the Univer sity of Pittsburgh,

who assumed complete responSibility f or the prepar ation of the Instruc-

t or' s M anual to accompany this text.For informative discussions and critical comments on par ticular topics,

I want to convey my sincer e thanks to Willianl H. Angof f of Ed ucational

Testing Ser vice and to sever al members of the Fordham University Psy-

chology Department, including David R . Chabot, Mar vin Reznikoff ,

Reube~ M. Schonebaum, and 'Warren, W. Tr yon. Gratef ul acknowledg-

ment IS also made of the thoughtful r ecommendations su bmitted by

course instructors in r esponse to the questionnaire distri buted to cur rent

users of the third edition. Special thank s in this connection am d ue to

Mar y Carol Cahill for her extensive, constr uctive, and Wid e-r anging

suggestions. I wish to ex press my appr eciation to Victoria Overton of

the Fordham University library staff f or her efficient and courteous a s-

sistance in bibliographic matters. Finany, I am happy to r ecor d thecontributions of my husband , John Porter Foley, Jr ., who again par tici-

pated in the solution of countless problems at all stages in the prepara-tion of the book .

A. A.


4/104

CONTENTS

P ART 1CONTEXT OF PSYCHOLOGICAL TESTING

1. FUNCTIONS AND ORIGINS OFPSYCHOLOGICAL TESTING 3

Current uses of psychological tests QEarly interest in classification and training of the mentally

retarded 5The first experimental psychologists 7

Contributions of Francis Galton 8

Cattell and the early "mental tests" 9

Binet and the nse of intelligence tests 10

Gr oup t est ing 1 2

Aptitude tes ting 13 ~

Standardized achievement tests 16

Measurement of personality 18

Sour ces of information about tests 20

2. NATURE AND USE OFPSYCHOLOGICAL TESTS

What is a psychological test? 23Reasons for controlling the use of psychological tests

Test administration 32

Rapport 34

Te st an xi et \' 3 7Examiner ~nd situational variables 39

Coaching, practice, and test sophistication 41

3. SOCIAL AND ETHICAL IMPLICATIONS

OF TESTING "

User qualifications 45

Testing instruments and procedures 47

Protection of privacy . 49

Confidentiality 52

Communicating test results 56

Testing and the civil rights of minorities 57

ix


5/104

4. NORMS AND THE INTERPRETATION OF

TEST SCORES

Statistical concepts 68

Developmental norms 73Within-group norms 77

Relativity of norms 88Computer utilization in tile interpretation of test scores 94

Criterion-referenced testing 96

5, RELIAB ILITY

The correlation coefficient 104

Types of reliability 110

Reliability of speeded tests 122Dependence of reliability coefficients on the sample tested 125

Standard error of measurement 127

Reliability of criterion-referenced tests 131

Content validity 134

Criterion-related validity 140

Construct validity 151

Overview 158

7. VALIDITY: MEASUREMENT AND

INTERPRET ATION

Validity coefficient and error of estimate 163

Test validity and decision theory 167

Moderator variabll;;s 177Combining information from different tests 180

Use of tests for cl.assification decisions 186Statistical analyses of test bias 191

8. ITEM ANALYSl-S

Item difficulty 199

Item validity 206

Internal consistency 215

Item analysis of speeded tests 217

Cross validation 219

Item-group interaction 222

PART 3

TESTS OF GENERAL INTELLECTUAL

LEVEL

9. INDIVIDUAL TESTS

Stanford-Binet Intelligence Scale 230

Wechsler Adult Intelligence Scale 245

Wechsler Intelligence Scale for Children 2 .'5 5Wechsler Preschool and Primary Scale of Intelligence 260

10. TESTS FOR SPECIAL POPULATIONS

Infant and preschool testing 266

Testing the physically handicapped 281

Cross-cultural testing 287

Croup tests versus individual tests 299

Multilevel batteries 305

Tests for the college level and beyond 318

12. PSYCHOLOGICAL ISSUES IN

INTELLIGENCE TESTINGLongitudinal studies of intelligence 327.

Intelligence in early childhood 332

Problems in the testing of adult intelligence 337

Problems in cross-cultural testing 343

Nature of intelligence 349

PART 4

TESTS OF SEPARATE AInLJTIES

13. MEASURING MULTIPLE APTITUDES

Factor analysis 362

Theories of trait organizationMUltiple aptitude batteries

Measurement of creativity

369378

388

14. EDUCATIONAL TESTING

Achievement tests: their nature and uses 398General achievement batteries 403

Standardized tests in separate subjects 410

Teacher-made classroom tests 412


6/104

20. OTHER ASSESSMENT TECHNIQUES

"Objective" performance tests 588

Situational tests 593

SeH-concepts and personal constructs 598

Assessment techniques in behavior modification programs

Observer reports 606Biographical inventories 614

The assessment of environments 616

Diagnostic and criterion-rdt:re nc ed t est s 417

Specialized prognostic tests 423

Assessment in early childhood education 425

~ O CCU PAT IO NA L TE STI NG\VValidation of industrial tests 435

Short screening tests .for industrial personnel 439

Special aptitude tests 442

Testing in the profeSSions 458

Diagnostic use of intelligence tests 465

Special tests for detecting cognitive dysfunction

Identifying specific learning disabilities 478

Clinical judgment 482

Report writing 487

B. Guidelines on Employee Selection Procedures (EEOC)

Guidelines for Reporting Criterion-Related and

Content Validity (OFCC)

PART 5

PERSON ALITY TESTS

17. SELF-REPORT INVENTORIES

Content validation 494

Empirical criterion keying - 496

Factor analysis in test development

Personality theory in test development

Test-taking attitudes and response sets

Situational specificity 521

Evaluation of personality inventories

506510

515

18. MEASURES OF INTERESTS, ATTITUDES,AND VALUES ;527

Interest inventories 528

Opinion and attitude measurement 543

Attitude scales 546Assessment of values and related variables 552

19. PROJECTIVE TECHNIQUES

Nature of projective techniques 558

Inkblot techniques 559Thematic Apperception Test and related instruments

Other projective techniques 569

Evaluation of projective techniques 576


7/104


8/104

CHAPTER 1

F u n c tio n s a n d 0 1 ~ ig iT lS o f

P sy c llo lo g ica l T eS tiT lg

A' NYO NE r ead ing this book tod ay could und oubtedly illush'ate what

. is meant by a psychological test, I t would be easy enough to recall

. a test the read er himself has taken in school, in college, in the

armed services, in the counseling center, or in the personnel office. Or

perhaps the r ead er has served as a subject in an experiment in which

stand ardized tests wer e employed . This would certainly not have been the

case fifty years ago. Psychological testing is a r elatively young br anch of

one of the youngest of the sciences.

Basically, the function of psychological tests is to measure ,9.:iff e~~~.n~L _

1Jetween individuals or between the reactions of the same individual on

diff erent occasions.One of the first problems that stimulated the d evelop-

ment of psychological tests was the id entification of the mentally r e-

tard ed. To this d ay, the d etection of int~i1ectual d eficiencies remains an

Important application of certain ty pes o f psychological tests. Related

clinical uses oftests include the examination of the emotionally disturbed,

the d elinquent, and other ty pes of behavior al d eviartts. A strong impetusto the early d evelopment of tests was lik ewise provided by problems

arising in education, At present, schools ar e among the largest test users.

The classifica.tiOIlOfchildr en with refer ence to their ability to profit

from diff erent types of school instruction, the identi£ication of the in-

tellectually retarded on the one hand and the gifted on the other, the

diagnosis of acad emic failures, the educational and vocational counseling

of high school and college students, and the s~~ction of applicants for

professional and other special schools are among the many educational~uses of tests.

The selection and classification of industrial personnel represent an-

other ma jor application of psychological testing. From the assembly-line


9/104

4 COllf cl't o f Psychologic al T esting

operator or filing clerk to top management, there is scarcely a type of jo b

for which some kind of psychological test has not proved helpful in such

matters as hiring, job assignment, transf er, promotion, or termination.

To be sure, the effective employment of tests in many of these situations,

es eciiill-"Tri('Onnection with high-level jobs, usuall • re uires that the

t!.:ts he used as an adjunct to s -i u interviewing, so that test scores

may be properly int~rpreteaTnt1leli ht of other back ound ' rmatiQn

a out the m IVI un. evertheless, testing constitutes an important part~ total personnel progr am. A closely r elated application of psycho-

logical testing is to b e found in the selection and classification of military

personnel. From simple beginnings in "Vorld 'War I, the scope and

var iety of psychological tests employe d i n militar y sihlations under went

a phenomenal increase during World War II. Subsequently, research

on test development has been continuing on a large scale in all branches

of the a rmed ser vices,

The use of tests in counseling has gr adually broadened f ro m a na r-

rowly defined guidance r egarding educational and vocational plans to

an involvement with all aspects of the person's life. Emotional well-

being and ef fective interpersonal r elations have become increasingly

prominent objectives of counseling. Ther e is growing emphasis, too, on

the use of tests to enhance self-understanding and personal development.Within this framework, test scores ar e part of the information given to

the individual as ai ds to his own d ecision-making processes.

It is clear ly evid ent that psychological tests are currently being em-

ployed in the solution of a wid e range of practical pr o blems. One should

not, however, lose sight of the fact that such tests are als? serving impor-

tant functions in basic research Nearly all problems in differential psy-

chology, for example, require testing procedures as a means of gathering

d ata. As illustrations, reference may be made to stud ies on the natur e a nd

extent of individual differences, the identification of psychological traits,

the measurement of group:' diff erences, ~nd the investigationf i jo]ogical

and cUltur al factors associated WIth 6ehavioral differences. For all such

areas of research-and for many others-the precise mt>.asur ement of

individual differences mad e possible by well-constructed tests is an

essential prerequisite. Similarly, psycholOgical tests provide standardized

tools for investigating such varied problems as life-span developmental

changes within the individual, the relative effectiveness of different edu-

cational procedures, the outcomes of psychotherapy, the impact of

community programs, and the influence of noise on per formance.

From the many different uses of psychological tests, it follows that some

knowledge of such tests is needed for an adequate understanding of most

fields of contemporar y psychology. It is primarily with this end in view

that the present book has been prepared. The book is not designed to

make the individ ual either n skilled examiner and test administrator or

an"exper f on test construction. It is directed , not to the test specialist, but

to the general student of psychology. Some acquaintance with the lead ·'

ing current tests is necessary in order to understand r eferences to the use

of such tests in the psychological liter ature. And a proper evaluation and

interpretation of test results must ultimately rest on a knowledge of how

the tests were constructe


10/104

6 Conte xt o f Psychological Testing

deviates came a realization that some uniform cr iteria for identifying and

classif ying these cases were required . The establishment of many special

institutions for the care of the mentally retarded in both Europe and

America made the need for setting up admission standards and an ob-

jective system of classification especially urgent. Fir st it was necessar y to

differentiate between the insane and the mentall v retard ed. The former manifested emotional disorders that might or might not be accompanied

by intellectual d et er iomtion from an initially normal level; the latter were

characterized essentially by i~tellectual defect that had been present

from bir th or early infancy. What is probably the first explicit statement

of this distinction is to be found in a two-volume work published in 1838

by the French physician Esquirol (1838), in wh ich over one hundred

pages are de\'oted to mental retardation. Esquirol also pointed out that

there an! many degrees of mental retardation, varying along a continuum

f rom normality to low-grade idiOCy.In the effort to develop some system

f or claSSifying the differ ent degrees and varieties of retardation"Esguiroi

tr ied several pr ocedures but concluded that the individual's use of lan-

guage pr ovides the m05t de end a ble cr iter ion of his intellectual level. It

is mer es mg to note t at current criteria 0 ment a r etardation ar e also

largely lingUistic ant! that present-day intelligence tests ar e heavily

load ed ~vith Yerbal content. The important par t verbal ability plays in

our concept of intelligence will be repeatedly demonstrated in subsequent

chapters.

Of special significance are the contributions of another Fr en ch p hysi-

cian, S,egll~. who pioneered in the training of the mentally retarded.

Having rejected the prevalent notion of the ineurability of mental r e-

tar d ation SeO'uin (18 66 ) eXIJerimented f or many vears with what he, v ~ "

termed the physiological method of tr aining; and in 1837 he,:es,tal:6hed

the nrst school d evoted to the education of mentally reta .." ~hildren.

In 1848 he emigr at ed t o America, where his ideas gaine _ ide recog-nition. Man~- of the sense-training and muscle-trainirJg techniques cur-

rently in use in institutions for the mentally retarded \vere originated by

Seguin. By these methods, severely retarded children are given intensiveexercise in s ensory discrimination and in the development of motor con-

tr ol. Some of the procedur es developed by Seguin for this purpose were

'eventually incorporated into performan ce o r no nverbal tests of intelli-

gence. An example is the Seguin Form Board, in which the individual

is r eq uired to insert variously shaped blocks into the corr esponding

recesses as quickly as possible.

More than half a centur y after the work of Esquirol and Seguin, the

French psychologist Alfred Binet urged that children who failed to

r espond to normal schooling be examined before dismissal and, if con-

sidered educable, be assigned to special classes (T. H. Wolf , 1973). With

Funct ions and Origins o f P sychological Test ing 7

his fellow members of t he Society for the Psychological S tudy o f the

Child , Binet stimulated the Ministr y of Public Instruction to tak e steps to

improve the condition of retarded children. A s pecific outcome was the


11/104

Functions and Ol'igills of Psychological Test ing 9

mathematically untrained investigator who might wish to tr eat test r e-

sults quantitatively. He thereby extended enormously the application of

statistical procedures to the analysis of test data. This phase of Galton's

work has been carried forward by many of his students, the most eminent

of whom was Karl Pearson.It "'as the English biologist Sir Francis Galton who ,,:as.primarily r~-

sponsible for launching the testing movem~l~t: A umfY~lg. factor ~nCalton's numerous and vaI'ied research activities was hiS }nterest llL

'humaJ;r heredit ". In the course of his imestigations on heredity, Caltont~a 'ize t e need for measuring the characteristics of related and un-related persons. Only in this way could he discover, for example, the

exact degree of resemblance bet:w'een p~ren~s and offspring, 1;'rothers and .

sisters; cousins, or twins. With this end 11l View, Calton was mstrument~l '

in inducing a number of educational institutions to keep systematic

anthropometric recOl:ds on their students. ~e al


12/104

10 Context of PS lJc11010g ica l Testing

relation to independent estimates of intellectual levC:'1based on teachers'

ratings (Bolton, 1891-1892; J . A. Gilbert, 1894) or academic grades(Wissler, 1901).

A number of test series assembled by European psychologists of the

period tended to cover somewhat more complex functions. Kraepelin(1895), who was interested primarily in the clinical examination of psy-

chiatric patients, prepared a long series of tests to measure what he re-

garded as basic factors in the characterization of an individual. The

tests, employing chiefly simple arithmetic operations, were designed to

measurepractice effects, memory, and susceptibility to fatigue and to dis-

traction. A few years earlier, Oehrn (1889), a pupil of Kraepelin, had

emploY€idtests of perception, memory, association, and motor functions

in an investigation on the interrelations of psychological functions. An-

other German psychologist, Ebbinghaus (1897), administered tests of

arithmetic computation, memory span, and sentence completion to school-

children. The most complex of the three tests, sentence completion, was

the only one that showed a clear correspondence with the children's

scholastic achievement.

Like Kraepelin, the Italian psychologist Ferrari and his students were

interested primarily in the use of tests with pathological cases (Guicciardi

& Ferrari, 1896). The test series they devised ranged from physiological

measures and motor tests to apprehension span and the interpretation of

pictures. In an article published in France in 1895, Binet and Henri criti-

cizedmost of the available test series as being too largely sensory and as

concentrating unduly on simple, specialized abilities. They argued further

that, in the measurement of the more complex functions, great precision

is not necessary, since individual differences are larger in these functions.

Anextensive and varied list of tests was proposed, covering such func-tions as memory, imagination, attention, comprehension, suggestibility,

aesthetic appreciation, and many others. In these tests we can recognize

the trends that were eventually to lead to the development of the famous

Binet intelligence scales.

Functions and Origi;ls of Psychological Testing 11

ously cited commission to study procedures for the education of retarded

children. It was in connection 'with the objectives of this commission that

Binet, in collaboration with Simon, prepared the first Binet-Simon Scale(Binet & Simon, 1905).

This scale, known as the 1905 seale, consisted of 30 problems or tests

arranged in ascending order of difficulty. The difficulty level was deter-

mined empirically by administering the tests to 50 normal children aged

3 to 11 years, and to some mentally retarded children and adults. The

tests were designed to cover a wide variety of functions, with speCial

emphasis onJ.udgmt;nt, comprehension, and reasoning. Which Binet re-

garded as essential components of intelligence. Although sensory and

perceptual tests were included, a much greater proportion of verbal

content was found in this scale than in most test series of the time. The

1905 scale was presented as a preliminary and tentative instrument, and

no precise objective method for arriving at a total score was formulated.

In the second, or 1908, scale, the number of tests was increased, some

unsatisfactory tests from the earlier scale were eliminated, and all tests

were grouped into age levels on the basis of the performance of about300 normal children between.. the ages of 3 and 13 Years. Thus, in the

3-year level were placed all tests passed by 80 to 0 0 percent of normal3-year-olds; in the 4-year-Ievel, all tests similarly passed by normal 4-year-

olds; and so on to age 13.The child's score on the entire test could then

be expressed as a mental level corresponding to the age of normal chil-

dren whose performance he equaled. In the various translations and

adaptations of the Binet scales, the term "mental age" was commonly

substituted for "mentalleveI." Since mental age is such a simple concept

to~rasE> the introduction of this term undoubtedly did much to popu-

larize intelligence testing.> Binet himself, however, avoided the term

"mental age" because of its unverified developmental implications and

preferred the more neutral term "mental level" (T. H. \\Tolf, 1973).

A third revision of the Binet-Simon Scale appeared in 1911, the year of

Binet's untimely death. In this scale, no fundamental changes were intro-duced. Minor revisions and relocations of specific tests were instituted .

More tests were added at several year levels, and the scale was extended to the adult level

Even prior to the 1908 revision, the Binet-Simon tests attracted wide

> Goodenough (1949, pp. 50-51) notes that in 1881, 2l y~aTs befor~ the appear-ance of the 1908 Binet-Simon Scale, S. E. Chaille publi!iheq in the New Orleans

Medical a~d Surgical Journal a series of tests for infan~ 11l7anged according to thea!1:eat whIch the tests are commonly passed. Partly because' of the limited circulation

of the journal 'nd partly, perhaps, because the scientific ~Om!J1l1nity was not readyfor it, the significance of this age-scale concept passed unnoticed at the time. Binet's

own scale was in~ed by the work o E some o E ~is contemporaries, notably Blinand Damaye, who prepared a set of oral questions from which they derived a singleglobal score E o r eaclrdiild (T. H. Wolf, 1973). .

Binet and his co-workers devoted many years to active and ingenious

research on ways of measuring intelligence. Many approaches were tried,

including even the measurement of cranial, facial, and hand form, and

the analysis of handwriting. The results, however, led to a growing con-

viction that the direct, even though crude, measurement of com lex

1 fence a unc ons 0 ere t e greatest promise. T en a specific situ-ation arose that brought Binet's efforts to imme(]iate practical fruition.

In 1904, the Minister of Public Instruction appointed ~inet to the previ-


13/104

12 C o n t e xt o f P s yc 11 010g i ca l Tes t i ng

attention among psychologists throughout the wor ld . Tr anslation~ and

adaptations appeared in many lang;uages. In Americ;l, a number of diHer-

ent revisions were prepa.red , the most famous of which is the one de-

veloped under the dir ection of L. ~tTerman a.t Stanford University, and known as the Stanfmd -Binet (Terman, 1916). It was in this test that the

intelligence q uotient (IQ), or mtio between mental age and chr onologi-

cal age, was fir st used . The latest r evision of this test is wid ely employed

today and will be mor e full\' consider ed in Chapter 9. Of special interest,

too. is the first Kuhlmann-Binet revision, which extend ed the scale down-

ward to the age level of 3 months (Kuhlmann, 1912). This scale repre-

sents one of the earliest efforts to develop pr eschool and inf ant tests of

intelligence.

F u nc t io n s a n d Origins o f P s y c1 101u g i c a l T e s t in g 13

f o~ g~n~r al routine te~ting; t~e latter was a nonlanguage scale em ployed

WIth Illiterates and wIth foreign- bor n recruits who were unable to tak e a

tcst in English. Both test~ w~re suitable for administr atio~ to lar ge groups.

Shortly af~e~the temunatlOn of "Vorld War I, the Army tests were r e-

leased for cmhan use. Not only did the Army Alpha and Ar my Beta

themselves pass through many revisions, the latest of which are even now

in use, b.ut they also sVVed as ~dels f or most group intelligence tests.The te~ting .movement under went a tremendous s purt of growth. Soon

group mtelhgence tests were being devised f or all ages and ty pes of

~ersons, from preschool children to graduate students. Large-sc~le test-

109 progr a~ns: previously impossible, were now being launched with

~est~ul optimIsm. Because group. tests were designed as mass testing

lUsh uments, they not only permItted the simultaneous examination of

large groups but also simplified the instructions and adminish'ation pro-

cedu~es so as to demand a minimum of training on the par t of t he

exammer . Schoolteachers began to give intelligence tests to thcir classes.

Coll~ge studen~s wer e routinely examined prio~ to ad mission. Extensive

studies of specIal adult groups, such as prisoners, wer e und ertaken. And

soon the general public became IQ-conscious. "---

T~e application of such group intelligence tests f ar outr an their techni-

cal Impr ovement. That the tests wer e still cr ud e instruments was of ten

f?rgotten in the rush of gathering scores and drawing pr actical condu-

slO~Sfrom the ~esults. 'Vhen. ~he tests failed to meet unwar ranted expec-

tations" skepticism and hostiht)' toward all testing often resulted . J J 1 U S .the testi boom of the twenties, based on the indiscr iminate use of tests i?ISma~ have ~one as much to r etai' as to ad\'ance the pr ogress of psvcho- ---logical test mg. - ~

The Binet tests, as well as all their revisions, ar e in d il;ic lu a l scales inthe sense that the\" can be ad ministered to onlY one person at a time.

Man\' of the tests in these scales r equire .oral re~ ponses fr om the subject

or n~cessitate the manipulation of materials . S ome call f or individualtiming of r es ponses. For these and other r easons, suc h tests are not

ada pted to group administr ation. Another char acter ist ic of the Binet type

of test is that" it r eq uir es a highly trained examiner . Such tests are es-

sentiallv clinical instr uments, suited to the intensive study of individualJ .' •

cases.Group t esting , like the fir st Binet scale, was d evelo ped to meet a press-

ing pr actical need . When the United States e nter ed l)!or ld ' Var I in

191 7, a committee was appointed by the Amer ican Psychological Associ-

ation to consider ways in which psychology might assist in the conduct of the war. This committee, und er the direction of !lo bert 1.•.1. Yerkes, recog-

nized the need for the rapid classification of the million and a ha 1 f re-cruits with r es pect to gener al intellectual level. Such informati~.~~~va:s

r elevant to many admmistr ative decisions, including r e jection or dis-charge f r om militar y ser vice, assignment to diff erent types of sel'vicei, or

admission to officer-training camps. It was in this setting that the first

group intelligence test was d eveloped . In this task , the Ar-m~' psycholo-

gists d r ew on all availa ble test materials, and especially on an un pub-

lished group intelligence test prepared by ~rthur S. Otis, which hcturned over to the Army. A major contr i bution of Otis's test, which he

designed while a student in one of Terman's graduate courses, was the

introduction of multiple-choice and other "objective" item types.

The tests finally developed by the Army psychologists came to be

known as the ~rm""yAlpha and the Army Beta The former was designed

~lthough intelligence tests were originally designed to sample a wid e

vanety of ~unctions in order to estimate the individual's general intelIec-

tua~ level, It soon became apparent that such tests were quite limited in

theIr .cove~age. Not all impor tant functions were represented. I J : ! . fact,

most mtelhgence tests wer e primarily measures of verbal ability and. to a

lesser extent, of the ability to handle numer ical and other abstr act and

symb~~ic re~ations. Gr~dually psychologists eame to recogni~e that the

~erm . Il1telhgence test was a misnomer, since only certain aspects of

mtelligence were measured by such tests.

To be sure, th~ tests cov~red abilities ,t~t are ot p.rime importance in

our culture. B~ It was. realized that more' precise designations, in terms

of the type of mformation these tests are able to yield, w < ;lUlq be prefer-


14/104

14 Context of Psyclwlo{!.ical Testing

able, For example, a number of tests that would probably have been

caned intelligence tests during the twenties later came to be known as

scholastic aptitude tests. This shift ill terminology was made in l'ec:og-

nition of the fact that mallY so-called intelligence tests measure that

combination of abilities demanded by academic work .

E\'l'n prior to Vvorld War I, ps\'ch~logists had begun to recognize theneed for tests of spE'cial aptitudes to suppkment the global intelligence

tests. These s ecial a till/de test s ' , , _ '

vocationa counseling and in the selection and classification of industrial

and military ersonn~1. Among the most widely used are tests of .!!lechani-

ea , c erica, musical, and artistic aptitlldes.-TI~ca~lation of intelligence tests that follm,'ed their wide-

sl>\'eadand indiscriminate use durinlJ the twenties also revealed another , 0

lIote"iOlthy fact: an individual's erformance on '

test often -showed mar -c var iation. This ,yas especially apparent on

gl'OUptests, 111whlch the items ar~mmonly segregated into subtests of

relath'e1\- homogeneous content. For example, a person might score r ela-

tively high on a verbal subtest and low on a numerical subtest, or vice

ver sa, To some extent, such internal variability is also discernible on a

test lik e the Stanf ord-Binet, in which, for example, all items involvingwords might prove difficult for a particular individual, whereas itcms

em ploying pictures or geometr ic diagrams may place him at an ad -

vantage,Test user s. and es pecially clinicians, frequently utilized such interc~l11-

par isons in order to obtain 1110reinsight into the individual's psychological

mak e-up. Thus, not only tllC'IQ or other glo bal scor e but also scor es on

subtests wonld lJt' examined in the e\'aluation of the indhidual case, Such

a practice is not to be general1~' r ecommended , ho,~,('ver. ~)eeaus~ in-

tellig('J]ce tests wer e not designed for the purpose of ,dIHerel,~h,~11aphtude

anal;'sis. Of ten the subtests heing compar ed contain t0o,14C\\' items to

yield a stable or relia ble estimate of a specific ability:; jis'a r esu lt, the

obtained diffl:'rence betwcen subtest scores might be reversed if the

individual were retestE'd on a different day or with another foml of thesame test. If such intraindividual comparisons are to be made, tests are

needed that are specially designed to reveal differ ences in performance

in v arious functions.While the practical apl)lication of tests demonstr ated t he l1~.ed for

differential aptitude tests, a parallel development in the stu,d)' of trait or-

ganization was gradually providing the means for constructing SUC? tests.

Statistical studi('s on the natur e of intelligence had been explonng the

iflterrelatiol1s among scores obtained by many persons on a ,,,ide variety

of different tests, Such investigations were begun by the English , psy-

chologist Charles Spearman (1904, 1927) during the £lrst decade of the

Functions and Or igillS of PSl jchological Testing 15

present century. Subsequent methodological developments, based on the

work of such Amer ican psychologists as T. L. R eIley (1928) and L. L.

!hurs~one (1935, 194i), as well as on that of other Amer ican and Englishll1veshgators, have come to be known as "factor analvsis."

The contr i butions that the methods of f actor ana'lysis have made to

test c'Onstruction will be more fully examined and ill~strated in Chapter 1:3 . F or the present, it will suffice to note that t h e d ata gathered by such

procedures have indicated the presence of a Dumber of rebtiyely ;nde-

J)end ent f ac tor s. or tr aits. Some of these traits wer e represen'ted , in

vary~ng proportions, in the traditional intelligence tests. Verbal compre-

henSIOn and numerical reasoning are examples of this tvpe of trait.

Other s, such as spatial, perceptual, and mechanical aptitude~, wer e found

more often in special aptitude tests than in intelligence tests.

One of t he chief practical outcomes of factor analysis was the develop-

ment of mult i ple aptitud e batteries. These batteri('s arc d esiuned to pr o-

vid e a measure of the individual's standing in each of a number of traits.

In place of a total score or IQ, a separate score is obtained for such traits

as "erhal compr ehension, numerical aptitude, spatial visualization, arith-

m~tic re~soning, and perce~tual speed , Such batteries thus provide a

SUItable mstrument for makin


15/104

1 6 C OI l! ex ! o f Psyclwlogict,{ Tcsrillg

term "aptitude test" has been tracHtiollalJ" cmployed to refer to tests

measuring relativel\" homo ('ncous and dparlv defined sc rn1C'nts of

• II \.,t le term "intelliO'ence test" customarih' refers to more hderogenc-Co) e - . .

~ests yielding a single global score sm:h as an IQ. S~)ecial aptitu~c

tests typically measure a single aptitude. ~lultiple al~tltl1de battenes

measure a number of aptitudes but pro\"ide a profile of scores, one for

eaeh aptitude.

FI/I1C!iol1.\' mltl Origi/l.~ of Psyc1IO/


16/104

J' I IIIC/ /(111,\ {///(/ (higill., of J'sydl(l'(/~i('111 1'


17/104


18/104

22 Cont ext of Psyc11010gical T esting

of manuals that meet adequate scientific standar ds. An enlightened P U ? -lie of test users provides the f irmest assur ance that such standal'd s wIll

be maintained and improved in the future.. .Asuccinct but comprehensive guide for the evaluatwn of psy~hologlcal

testsis to be found in Standards for Educational arul Psyc11010glCal Tests

(1974), published by the American Psychological As~ocia~ion. These

standards represent a summary of recommended practices 111 test con-

struction based on the current state of knowledge in the field. They are

concerned w ith the information about validity, r eliability, norms, and other test character istics that ought to be reported in the manual. In their latest revision, the Standards also provide a guide for the proper use of

tests and for the correct interpretation and application of test results.

Relevantportions of the S t Qnda~ds "ill.be cited in the following chapter s,

in connection with the appropnate tOpICS.

CHAPTER 2

J \ r a t 1 u r e ar ld U se o f

P sy c l z .o lo g ic a l T e s ts

T.HE HISTORICAL introduction in Chapter 1 has alr eady suggested some of the many uses of psychological tests, as well as the wid e

d iversity of available tests. Although the general public may still

associate psychological tests most dosely with "IQ tests" and with tests

designed to detect emotional disorders, these tests represent only a small

proportion of the available ty pes of instruments. The major categories of

psychological tests will be discussed and illustr ated in Parts 3, 4, and 5,

'\'hich cover tests of general intellectual level, trad itionally called intelli-

gence tests; tests of se parate abilities, includ ing multi ple a ptitud e bat-teries, tests of s pecial aptitud es, and achievement tests; and per sonality

tests, concerned with measures of emotional and motivational tr aits, in-

terpersonal behavior, inter ests, attitudes, and other noncognitive char -

acteristics.

In the face of such diversity in nature and pur pose, ,~hat ar e tIle

common differentiating characteristics of ps~'Chological tests? Ho," d o

psychological tests differ from other methods of gathering information

about individuals? The answer is to be found in certain f und amental

f eatures of both the construction and use of tests. It is with these featm!es

that the present chapter is concerned.

BEHAVIOR SAMPLE..-A, psychological test is essentially an o bjective

.~d standard ized measure or it's'ample of behavior . Psychological tests

are like tests in any other science, insofar as 0R~f lh~tions are made on a

small hut carefully chosen ,sample .~ . an i p~ jyjil~)r s behaviQr.. In thisrespect, the psychologist proceed s in much·.the 'Jame way as the chemist

who tests a patient's blood or a community.}swater supply by analyzing

,-et'more samples of it. If the psychologistwish¢'~ to test the extent,iff a c1lild's vocabulary, a clerk 's ability to perf orm arithmetic computa-


19/104

tions, or a pilot's eye-hand coor dination, he ('xamim's their perfor mance

with a r epresentatin' set of wonls, :11'ithmclie prol>lems, or motor tests.

" 'hetlwr or not the test ad eqnately co\'(.'rs the behavior und er con-

sid eration o bviously depends on the number and natur e of it n ls in thesamp e. or examp e, an ant 1I1letJctest consisting of only five problems,

~le including only multiplication items, would be a poor measur e of

the ind iyidual's computational skill. A yoealmlary test composed entirely

of base ball terms would hardly pr oYid e a dependable estimate of achild 's total range of v ocalmlar ~'.

The diagnostic or ' redict i J ;c t ;a7uc of a lsycholC!gical test depend ~ _ ol!

the de bH,',~O which it sen'es as an indicator of a relatively broad and

!!guinea;t ar ea·Ofb~;:. Measur ement of the h ehaYior sample dir ectl~'

cO\'er ed by the test is J:ar ely, if ever , the goal of psychological testing.

The child 's knowled ge of a p ar ticular list of 50 word s is not, in itself , of

,great interest. Nor is the jo b applicant's perf or mance on a specific set

of 20 arithmetic problems of much importune-e _ If , however, it c an b e

demonstr ated that there is a dose correspond ence between the child 's

knO\dedge of the wor d list and his total l1laster~- of voca bular y, or be-

tween the applicant's scor e on the ar ithmetic problems and his computa-

tional perfor mance on the joh. then the tests ar e ser\'ing their purpose,

It should be noted ir.. this connectiol J t hat the test items need notr esemble closely the beha.vior the test is.t o }[('dicr."It is only necessary

tna " .- on ence be d emoHstrated bet"'ecn the t m); The

d egrec of similarity between the test sample and the pr ed icted behavior

ma\' vary wid ely. At one extr eme. the test mav coincid e completelY with

a part o'f the b;'h~or to he preclictt'cl. An e.\:Im ple might be a for eign

voca bulary test in whi!=·htilt:' stud ents are examilled on 20 of the 50 nt'\\-

word s th~y have studied; another example is pr ovided by the r o,ld test

tak en prior to o btaining a driver's liccme. A lesser d egr e e o f similarity is

illustr ated by many vocational a ptitud e tests administer ed prior to joh

training, in which ther e is only a mod'" eh as the Rorschach

inkblot t est , in which an attempt is made to pred ict f r om the subject's

as~ociations to inkblots how he will r cad to other people, to ~motionallytoned stimuli, and to other complex, ever yday-lif e situations, Despite

their superficial differ ences, all these tests consist of samples of the indi-

~s behavioL., And each mUst prove Its worth b y " an empiricallyd emonstrated correspondence between the subject's pcrformance on the

test and in other situations.

Whether the term "diagnosis" or the term " prediction" is employed in

this connection also represents a minor distinction. Prediction eommonly

connotes a temporal estimate, the individual's future performance on a

job, for example, heing for eeast from his present test performance. In a

hr oad e r sense, ho\\"('\'er, e\-en the diagnosis of pr esent condition, suell as

mental r etar dation ur emutional d isord er , im plies a pred iction of what

the incIi\'id ual will cIO in situations other than the pr esent test. It is

logically Simpler to consider all tests as behavior samples from which

predictions regarding other JX.havior can be m ad e. Differ ent ty pps of

tests can then be char acterized as variants of this basic patter n.

Anotlwr point that should be consider ed at the outset per tains to the

cone-e pt of Clll}(/cify. It is entir ely possible, f or example, to d c\'i sc a testf ur pr edicting how wel l a n individual can learn Fr e11Ch bef ore he has

even begun the stud y of Fr ench. Such a t est would invoh- e a sample of

the ty pes of behavior requir ed to learn the new language, but would in

itself presuppose no knowled ge of French. It could then be said that

this test measur es the indh'idual's "capacity" or " potentialitt f or learn-

ing French, Such tenus should , hO"'ever , be used with caution in r ef er -

ence to ps~'dlOlogical tests. Onl\' in the senSe that a present behavior

sample can be used as an indicator of other , futur e behayior can we

s~ak .()f a test measur ing "ca pacity." K o psychological test can d o mor ethan measur elJel1"UDor . 'Vh~ethci:S\1ch behavior can ser ve as an eff ective

inc!('x o f other IX'hador can be determined only by empirica l t ry-out.

STA:-;DARDIZATIO:-;, It ,,-:"iIlhe r ecalled that in the initial d efinition a p s~--

chological test \\'as descr ibed as a stand ard ized measur e. Standar d ization

implies !miformifll of ~)rQcedllre in 'hd nl11Hsfenng and SCoring the 'test If

the scor es o btained by diff er ent iudiyiduals are to be compar a ble, testin~

cond itions must o bYiously be the same for all. Such a requir ement is only

a s peCial application of the need f or controlled conditions in all scientific

ohse-r yations. In a test situation, the single ind e pend ent \'ar ia ble is

usuall~' the indh-idual being tested.

In ord er to secure uniformity of t esting conditions, the test constr uctor

provides detailed dir ections for administer ing each newly d eveloped h:'st.

The for mulation of such directions is a major part of the stand ardization

of a new test_ Such standardization extends to the exact materials em

'plo~d , time limits, oral instructions to subjects, prc>Jiminary demonstra-

: ~n s, ways of handling queries f r om subjects. and evel,\, other ~the testing situation. :Many other , mor e subtle factors may influence the

subject's performance on cer tain tests. Thus, in giving instructions or ,

presenting problems orally, consideration must be given to the rate of

speak ing, tone of voice, inflection, pauses, and faC j~1 e} pr ession. In a

test involving the detection of absurdities, to t eX;lnit>le, the correct an-~wer may be given away by smiling or paY~ jlg wh~n the crucial word

J~.read . . Stand~rdized testing p.rocedure, ~r :,~i[th~\. ex.aminer 's point of \1:w, Will be dJscussed further m a later sect~g~ of -


20/104

2 6 C Ol lt ex t O f Ps ych ological T es ting

Another important step in the standardization of a test is the establish-

ment of norms , Psychological tests have no pr ed eter mined stand ar ds o f

pli5s ingor f a'inng; an individual's score is evaluated by comparing it withthe scores obtained by others. As its name implies, a nor m is the normalor average performance. Thus, if normal B-year-old children complete

12 out of 50 problems corr ectly on a particular arithmetic reasoning test,

then the 8-year -old norm on this test corr esponds to a scor e of 12, The

latter is known as the raw scor e on the test, It may be expressed as

number of correct items, time required to complete a task, number of

er rors, or some other objective measure appropriate to the content of the

test. Such a raw score is meaninglcss until evaluated in terms of a suitable

set of norms, .

In the process of standardizing a test , i t is administered to a lar ge,representative sample of the ty pe of subjects for whom it is designed.

This group, k nown as the standardization sample, ser ves to establish the

norms. Such norms indicate not only the average performance but also

the relative frequency of varying degrees of deviation a bove and below

the awrage. It is thus possible to e valuate diff er ent degrees of super iority

and inferiority. The specific ways in which such norm" may be expressed

will be considered in Chapter 4. All permit the designation of the indi-

"idual's position with r ef erence t o t he normative or stand ardization

sample.It might also be noted that norms are established for per sonality tests

. in esse!1tially the same way as for aptitude tests. The norm on a person-

ality test is not necessarily the most d esirable or "ideal" performance,

any more than a perfect or errorless scor e is the norm on an aptitude

test. On both ty pes o f tests, the nor m corres ponds to the performance of

typical or average individuals. On dominance-submission tests, for ex-

ample, the nonn falls at an intermediate point representing the degree

of dominance or submission manifested by the average individual.

Similarly. in an emotional adjustment inventory, the nor m d oe s n ot

ordinarih· correspond to a complete absen


21/104

when retested on Friday, it is obvious that little or 110 confidence can be

put in either score. Similarly, if in olle set of 50 words an individual

identifies 40 correctl~·, whereas in another, supposedly equivalent set he

gets a score of only 20 right, then neither score can be taken as a de-

pendable index of his verbal comprehension. To be sure, in both illustra-

tions it is possible that only one of the two sC'ores is in error, but tlus

could be demonstrated only by further retests. From the given data, we

can conclude only that both scores cannot be right. \Vhether one or

neither is an adequate estimate of the individual's ability in vocabulary

cannot be established without additional information.

Before a psychological test is released for general use, a thorough,

objective check of its reliability should be carried out. The different types

of test reliability, as well as methods of measuring each, will be con-

sidered in Chapter 5. Reliability can be checked with reference to

Itemporal fluctuations, the particular selection of items or behavior sample

constituting the test, the role of different examiners or scorers, and other

aspects of the testing situation. It is essential to specify the type of re-

liability and the method employed to determine it, because the same test

may vary in these different aspects. The number and nature of indi-

viduals on whom reliability was checked should likewise be reported.

With such information, the test user can predict whether the test will beabout equally reliable for the group with 'which he expects to use it, or

whether it is likelv to be more reliable or less reliable.

VALIDITY, Undoubtedly the most important question to be asked about

any psychological test"concerns its validity, i.e., the degree to which the

test actually measures what it purports to measure. Validity provides a

direct check on how well the test fulfills its function. The determination

of validity usually requires independent, external criteria of -whatever the

test is nesigned to measure. For example, if a medical aptitude test ist9

be used in selecting promising applicants for medical school,. ultimatle

success in medical scholYlwould be a criterion. In the process of ·y~lidat-

ing such a test, it would be administered to a large group of students atthe time of their admission to medical school. Some measure of per-

formance in medical school would eventually be obtained for each stu-

dent on the basis of grades, ratings by instructors, success or failure in

completing training, and the like. Such a composite measure constitutes

the criterion with which each student's initial test score is to be correlated.

A high correlation, or validity coefficie, ,!t, would signify th~t those indi-

viduals who scored high on the- test. had been relatively successful in

medical school, whereas those scoring low on the test had done poorly in

medical school. A low correlation would indicate little correspondencel,,,t"' ppn tp~t ~('orp.rind criterirJn measure and hence poor validity for the

test. The validity coefficifnt enables us to determine how closel\' the

criterion perfor~ance could have been predicted from the test scor~s.

In a similar manner, tests designed for other purposes can be validated

against appropriate criteria. A vocational aptitude test, for example, can

be validated against on-the-job success of a trial group of new employees.

A pilot aptitude battery can 1;>evalidated against achie\'ement in flig:lt

training. Tests designed for broader f\nd more varied uses are validated

against a number of criteria and their valid ity can be established only by

the gradual accumulation of data from many different kinds of investiga-tions.

The reader may have noticed an apparent paradox in the concept of

test validity. If it is necessary to follow up the subjects or in other ways

to obtain independent measures of what the test is trying to predict, why

not dispense v.ith the test? The answer to this riddle is to be found in the

distinction between the validation l,T fO U p on the one hand anci the groupson which the test will eventually be employed for operational purposes

on the other. Before the test is ready for use, its validity must be estab-

lished on a representative sample of suhjects. The scores of these persons

are not themselves employed for operational purposes but serve only in

the process of testing the test. If the test proves valid b~' this method, it

can then be used on other samples in the absence of criterion measures.It might still be argued that we would need only to wait for the crite-

rion measure to mature, to become available, on any group in order to

obtain the information that the test is trying to predict. But such a pro-

cedure would be so wasteful of time and energy as to be prohibitive in

most instances. Thus, we could detennine which applicants will succeed

on a job or which students will satisfactorily complete college by admit-

ting all who apply and waiting for subsequent developments! It is thevery wastefulness of this procedure-and its deleterious emotional im-

pact on individuals-that tests are designed to minimize. By means of

tests, the person's present level of prerequisite skills, knowledge, and

other relevant characteristics can be assessed with a deferminable margin

of error. The more valid and reliable thef~, the smaller will be this,margin of error . .

The special problems encountered in determining the validity of dif-

ferent types of tests, as well as the specific criteria and statistical pro-

cedures employed, willlJ~ fhscussed in Chapters 6 and 7. One further

point, however, should be coq$fdered at this time. Validitv tells us more

than the degree to which the te~t is f~lfilling its funcpari.ft actually tells

us what the test is measuring. By studying the validation data, we can

objectively determine what the test is measuring. It would thus be more

accurate to define validity as the extent to which we Jrnow what the test

measures. The interpretation of test scores would undoubtedly be clearer

and less ambiguous if tests were regularly named in terms of the criterion


22/104

Context of Psychological Tes/ing

'~:~hl:oughwhich they had been validated. A tendency in this direction

pe'recognized in such test labels as "~cholastic aptitude test" and

sonnel classification test" in place of the vague title "intelligence

'SONS FOR CONTROLLING THE USE OF

,CHOLOCICAL TESTS

' y I:have a Stanford-Binet blank? ~fy nephew has to take it next week for ;i~s ion to,School X and I'd like to give him ~ol1lepractice so he can pass."

o improve the reading program in our school, we need a culture-free IQ

,t .that measures each child's inllate potential."

st night I answered the questions in an intelligence test published in a~gazineand I got an IQ of SO-I think psychological tests are silly."

. . ' y roommate is studying psych. She gave me a personality test and I came1neurotic. I've been too upset to go to class ever since."

, 'ast ~'enryou gave a new personality test to our employees for research pur-.; poses.We would now like to have the scores for their personnel folders."

The above ·remarks are not imaginary. Each is based on a re~fincident,

nd the list could easily be extended by any psychologist. SuQ't remarks

'lustrate potential misllses or misinterpretations of psychological tests in

uch wavs, as to rrnder the tests worthless or to hurt the indi:,V;idual.Like

ny sd~ntillc instrument or precision tool, psychological t~~s"roJ!~.LP.!: _

9perly used to be effective. In the hands of either the unscrupulous or

"we -meamng ut uninformed user , such tests can cause serious

~~~ ~. There are two principal reasons for controlling the use of psychological

ests: (a ) to revent general familiarity with test content, which would

.' invalidate the test an ( to ensure tat e test is used ~ a qualified :>

, '~\' if an individual were to merr 'lbrize the correct' re-

O' sponses on a test o'f'color blindness, such a test w ~l d n o l on ge r b e a

'measure of color vision for him. Under these condItions, the test would

be completely invalidated. Test content clearly has to be restricted in

, order to forestall deliberate efforts to fake scores.

In other cnses, however, the effect of familiarity may be less obvious,

or the test may be invalidated in good faith by misinformed persons. A

\ ,schoolteacher, for example, may give her class special praettee in prob-

.1ems closely resembling those on an intelligence test, "so that the pupils

will be well prepared to take the test." Such an attitude is simply a carry-

"over from the usual procedure of preparing for a school examination.

When applied to an intelligence test, however, it is likely that such

specific training 01' coaching will raise the scores on the test without ap-

preciably affecting the broader area of beha"ior the test tries to sample.

Under such conditions. the validity of the test as a predictive instl'l1ment

is reduced.

The need for a qualified examiner is evident in each of the three major aspects of the testing situation-selection of the test, administration and

scoring, and i~terpretation of scores. Tests cannot be chos'en like lawn

mowers, from a mail-order catalogue. They cannot be evaluated by name,

author, or other easy marks of identification. To be sure, it requires no

psychological training to consider such factors as cost, bulkiness and ease

of transporting test materials, testing time required, and ease and rapidity

of scoring. Information on these practica] points can '\lsually be obtained

from a test catalogue and should be taken into account in planning a test-

ing program. For the test to serve its function, however, an e"nlnation of

its technical merits' in terms of such characteristics as validity reliability

difficulty level, and norms is essential. Only in such a way' ~an the tes~

user determine the appropriateness of an)' test for his particular purpose

and its suitability for the type of persons with whom he plans to use it.

The introductory discussion of test standardization earlier in this chap-

ter has ah'eady suggested the importance of a trained examiner. An ade-

quate realization of the need to follow instructions precisely, as well as a

thorough familiarity with the standard instructions, i~ required if the test

scores obtained by different examiners are to be comparable or if anyone

individual's score is to he evaluated in terms of the published norms.

Careful conh-ol of testing conditions is also essential. Similarly, incorrect

or inaccurate scoring may render the test score worthless. In the absence

of proper checking procedures, scoring errors are far more likeh- to occur

than is generally realized . . ,\

The proper interpretation of test scores requires a thorough under-

standing of the test, the individual, and the testing


23/104


24/104


25/104

manifest a number of unfavorable attitudes, such as suspicion, insecurity,

fl'ar, or cynical indifh'renee. Abnormal conditions in their past experiences

are also likely to influence their test perforrnanee adversely. As a result

of early failures and frustrations in school, for example, they may have

developed feelings of hostility and inferiority toward academic tasks,

\rhich the tests resemble. The experienced examiner makes special effortsto establish rappolt under these conditions. In any event, he must be

sensitive t~ these special difficulties and take them into account in inter-

pr eting and explaining test performance.

In testing any school-age child or adult, one should bear in mind that

e\'e1')'test presents an implied threat to the individual's prestige. Some

reassurance should therefore be given at the outset. It is helpful to ex-

plain, f or e xample, that no one is expected to finish or to get all the itcms

correct. The individual might otherwise experience a mounting sense of

failure as 11e advances to the more difficult items or finds that he is un-

able to finish anv subtest within the time allowed.

It is also desil:able to eliminate the element of surprise from the test

situation as far as possible, because the unexpected and unknown are

likely to produce al1xiet~'. :Many group tests provide a prdiminaryex-

planatory statement that is read to the group by the examiner. An even

better procedure is to announce the tests a few days in advance and to

give each subject a printed booklet that explains the purpose and nature

of the tests, offers general suggestions on how to take tests, and contains

a few sample items. Such explanatory booklets are regularly available to

participants in large-scale testing programs such as those conducted bythe College Entrance Examination Board (1974a, 1974b). The United

States Employment Service has likewise de\'eloped a booklet on how to

take tests, as well as a more extensive pretesting orientation~.technique

for use with culturally disadvantaged applicants unfamili~f . ,v'ith tests.

\1ore general orientation booklets aie also . available, si 't c1 1 as l\feetingthe Test (Anderson, Katz, & Shimberg, 1965), A tape recOl'ding and two

booklets are combined in Test Orientatioll Procedure (TOP), designed

specifically for job applicants with little prior testing experience CBen-nett & Doppelt, 1967), The first booklet, used together with the tape,

provides general information on how to take tests; the second contains

practice tests. In the absence of a tape recorder, the examiner may read

the instructions from a printed script.

Adult testing presents--some additional problems. Unlike the school-

child, the adult is not so likely to work hard at a task merely because it is

assigned to him. It therefore becomes more important to "sell" the pur-

pose of the tests to the adult, although high school and college students

also respond to such an appeal Cooperation of the examinee can usually

; be secured by convincing him that it is in his own interests to obtain a\,

valid score, Le., a score correctly indicating wh~lt he can do rather than

overestimating or underestimating his abilities. ~Iost persons will under-

stand that an incorrect decision, which might result from invalid test

scores, would mean subsequent failure, loss of time, and frustration for

them. This approach can serve not only to motivate the individual to

try his best on ability tests but also to reduce faking and encourage frank reporting on personality inventories, because the examinee realizes that

he himself would otherwise be the loser. It is certainly not in the best

inter ests of the individual to be admitted to a course of study for which

he is not qualified or assigned to a job he cannot perform or that hewould find uncongenial.

:\lany of the practices designed to enhance rapport sen'e also to reduce

test anxiety. Procedures tending to dispel surprise and strangeness from

the testing situation and to reassure and encourage the subject shottld

certainly help to lower anxiety. J'he examiner 's own manner and a well-organized , smccthly running testing operation will contribute toward the

same goal. Individual differences in test anxiety have been studied with

hoth schoolchildren and college students (Ga~dry& Spielberger, 1974;-

Spielberger , 19i2). Much of this research was initiated bv Sarason and

his associates at Yale (Sarason, Davidson, Lighthall, "'aite, & Ruebush,

1960). The first step was to construct a questionnaire to assess the indi-

vidual's test-taking attitudes. The children's form, for example, containsitems such as the following:

Do you worry a lot before taking a test?

\\'hen the teacher sa~'sshe is going to find out how much you h,we learned,does your healt begin to beat faster?

While 'you are taking a test, do you usually think you are not doing wen.

Of primary interest is the finding that both school achievement and intel-

ligence test scores yielded significant negative correlations with test anx-

iety. Similar correlations have been found among college st1tdcn!s (1. G.

Samson, 1961). Longitudinal studies likewise revealed an inverse relation

between changes in anxiety level and changes in inteJligence or achieve-

ment test perfonnance (Hill & Sarason, 1966; Sarason, Hill, & Zim- bardo, 1964). .

~uch findings, of course, do not indicate the direction of caUsal relation-slllps. It is possible that children develop test anxiety because they per-


26/104

Con t e xt o f P s y dl ( Jl o g ic a l T e s t iu g

formpoor ly on tests and haw thus experienced failure and frustration in

previous test situations. In support of this interpretation is the finding

that \\ithin subgroups of high scorers on intelligence tests, the negative

"rrelation between anxiet~' level and test performance disappears

Denny, 1966; Feldhusen & Klausmeier, 1962). On the other hand , there

5 evidence suggesting that at least some of the relationship results fromhe deleteLious effects of anxiety on test performance. In one study

(:Waite,Sarason, Lighthall, & Davidson, 1958), high-anxious and low-

, 'iotlschildren equated in intelligence test scores were given repeated

ialsin a learning task Although initially equal in the learning test, the

w-allxiousgroup improved significantly more than the high-anxious.

Severalinvestigators have compared test performance under conditions

esigned to evoke "anxious" and "relaxed" states. Mandler and Sarason f ;;( .1952) , for example, found that ego-involving instructions, such as telling

su bjects that everyone is expected to finish in the time allotted , h ad a

beneficialeffect on the performance of low-anxious subjects, but a dele-

teriouseffect on that ofbigh-anxious subjects. Other studies have likewise

foundan interaction between testing conditions and such individual char-

~cteristicsas anxiety level and achievement motivation (Lawrence, 1962;Palll & Eriksen, 1964). It thus appears likely that the r~latjQn between

anxiety,and test p erformance is nonlinear, a slight amount Qf anxiety

, \le in bencficia~ while a lar e amount is detrimental. Individuals who are

',cllstomariy ow-anxious benefit from test con i,tions t lat arouse some

e t : > , ",hi e t lose who are customarilv hiil tests may be

undulyaffected by test anxiety. In a thorough ana::4ontrol1ed investi.

gationof this question, French (1962) compar~d Jhf' p,erformancc of high

schoolstudents on a test given as part of the f e-gular administration of

the SAT with performance on a parallel form of the test administered at

,a different time under "relaxed" conditions, The instructions on the latter

, occasionspecified that the test was given for 'research purposes only and

scoreswould not be sent to any college. The results showed that per-

formance was no poorer during the standard administration than during

the r elaxed administration. Moreover , the concurrent validitv of the test

scoresagainst high school course grades did not differ signifi~antly under

the two conditions.

Comprehensive surveys of the effects of examiner and situational

variables on test seores'lmve been prepared by S. B. Sarason (1954),

Masling (l~60),

~foliarty (1961, 1966),

Sattler and Theye (1967),

Palmer (19,0), and Sattler (1970, 1974). Although some effects have

been demonstrated with objective group tests, most of the data have been

obtained with either projective techniques or individual intelligence tests.

These extraneous factors are more likely to operate with unstructured and

ambiguous stimuli, as well as "ith difficult and nO"el tasks, than with

clearly defined and well-learned functions. In general, children are more

susceptible to examiner and situational influences than are adults; in the

examination of preschool children, the role of the examiner is especially

cruCiaL.Emotionally disturbed and insecure persons of an\' age are also

mClre likely to be affected by such conditions than are well-adjusted persons,

There is considerable evidence that test results may vary systematically

as a function of the examiner (E. Cohen, 1965; ~'Iasling, 1960). These dif-

f erences may he related to personal characteristics of the examiner, such

as his, age, sex, race, professional or socioeconomic status, training and

expenence, personality charaderistics, and appearance. Se\'eral studies of

thes~ examiner variables, however , have yielded misleading or illcon-

cluSl\'e results because the experimental designs failed to control or iso-

late the influence of differcnt examiner or subject characteristics. Hence

thp l:'ffeds of two or more variables ma\, be confounded.

The examiner 's behavior before and during test auministration has also

heen s~lown to affect test results, For example, controlled investigations

ha\'e YIelded significant d ifferences in intelligence test performance as a

res~lt of a "warm" versus a "cold" interpersonal relation between ex-

amllJer and examinees, or a rigid and aloof versus a natural manner on

the part of the examiner (Exner, 1966; Masling, 1959). Moreover, there

may be Significant interactions between examiner and examinee' charac-t " , he~lstJCs,III t e sen~e that the same examiner characteristic or testing man-

nel may have a dIfferent effect on different examinees as a function of

the examinee's Own personality characteristics. Similar interactions may

occur '~ith task variables, such as the nature of th,e test, the purpose of

the testing, and the instructions given to the subjects. Dyer (1973) adds

even more variables to this list, calling attention to the possible inHir enceof th t t· , d . ," .. c es gIVers an the test takers' diverse perceptions of the funetigllsand goals of testing.' '

St'll '•'. I. an,other way in which an examin8r may inadvertently affect the

~x~~m~e s responses is through ~is own ' cexpectations, This is simply a

P clal mstance of the self-fulfilhng prophecy (Rosenthal, 1966; Rosen-


27/104

40 Co nt e xt of P sycholog.ical Testing

thaI & Rosnow, 1969). -Anexper iment conducted with the Rorschach will

illustrate this effect (Masling, 1965). The examiners were 14 graduate

student volunteers, 7 of whom were told, among other things, that ex-

perienced examinel's elicit more human than animal responses from the

subjects, while the other 7 were told that exper ienced examiners elicit

more animal than human responses. Under these conditions, the two

groups of examiner s obtained significantly diHerent ratios of animal tohuman r esponses f rom theh subjects. These diff erences occurr ed d espite

the fact that neither examiners nor subjects reported awareness of any

influence attempt. ~f oreover, tape r ecordings of all testing sessions r e-

vealed no evidence of verbal influence on the par t of any examiner . The

examiners' expectations apparently operated through subtle postural and

f acial cues to which the subjects responded.

Apa~ f rom the examiner, other aspects of the testing situation may

Significantly aff ect test per formance

43423998 Anne Anastasi Psychological Testing I

Documents