ED 036 521 AUTHOR TITLE INSTITUTION SPCNS AGENCY REPORT NO BUREAU NO PUB DATE CONTRACT NOTE ERRS PRICE DESCRIPTORS IDENTIFIERS AbSTRACT DOCUMENT RESUME 24 TE 001 727 ALLEN, R. E.; AND OTHERS THE DEVELOPMENT OF THE WISCONSIN TESTS OF TESTIMONY AND REASONING ASSESSMENT (WISTTRA). REPORT FFOM CONCEPTS IN VERBAL ARGUMENT PROJECT. :ISCONSIN UNIV., MALISCN. RESEARCH AND DEVELOPMENT CENTER FCR COGNITIVE LEARNING. OFFICE OF EDUCATION (DEEW), WASHINGTON, D.C. BUREAU OF RESEARCH. TE-80 BR-5-0216 APR 69 OEC-5-10-154 44P. EDRS PRICE AF-$0.25 HC-$2.30 COGNITIVE MEASUREMENT, COGNITIVE TESTS, *CRITICAL THINKING, *EDUCATIONAL TESTING, EVALUATION CRITERIA, EVALUATION METHODS., EVALUATION TECHNIQUES, *EVALUATIVE THINKING, MENTAL TESTS, PRODUCTIVE THINKING, STUDENT TESTING, *TEST CONSTRUCTION, TEST INTERPRETATION, TEST RELIABILITY, TEST VALIDITY, *VERBAL ABILITY, VERBAL TESTS *WISCONSIN TESTS TESTIMONY REASONING ASSESSMENT, WISTTRA THIS REPORT PRESENTS AN OVERVIEW OF RESEARCH RELATED TO THE DEVELOPMENT OF THE "WISCONSIN TESTS OF TESTIMONY AND REASCNING ASSESSMENT," A BATTERY CF SEVEN TESTS FOR ASSESSING STUDENT DEVELOPMENT IN THE MASTERY OF RELEVANT CONCEPTS AND SKILLS OF VERBAL ARGUMENT FOR GRADES 10-12. PROVIDED APE (1) A DISCUSSION OF THE RATIONALE FOR AND PURPOSES OF THE TESTS, (2) A DISCUSSION CF THE DEVELOPMENT CF THE TESTS, AND (3) A DESCRIPTION OF EACH TEST, RELIABILITY ANL ITEM DATA, AND A BRIEF DISCUSSION OF THE SIGNIFICANCE OF THE DATA GATHERED BY ADMINISTERING THE FIFTH EDITION OF THE TEST BATTERY TO OVER 3,000 JUNIOR/SENIOR HIGH STUDENTS IN FOUR WISCONSIN SCHOOL SYSTEMS. INCLUDED IS A DISCUSSION OF THE SPECIFIC OBJECTIVES OF THE TESTS: TO MEASURE THE STUDENT'S ABILITY TO EVALUATE TESTIMONY IN TEEMS OF INTERNAL CRITERIA, CONSISTENCY WITH OTHER TESTIMONY, AND RECENCY AND PROXIMITY; TO RECOGNIZE AND SELECT WARRANTS IN ARGUMENTS; TO RECOGNIZE STATEMENTS WHICH ANSWER RESERVATIONS IN ARGUMENTS; TO SELECT RESERVATICNS IN ARGUMENTS; AND TO SELECT CLAIMS IN ARGUMENTS. THIRTY-ONE TABLES AND TWO FIGURES OFFER FURTHER INFORMATION. (AUTHOR/LH)
45
Embed
DOCUMENT RESUME - ERICED 036 521 AUTHOR TITLE INSTITUTION SPCNS AGENCY REPORT NO BUREAU NO PUB DATE CONTRACT NOTE ERRS PRICE DESCRIPTORS IDENTIFIERS AbSTRACT DOCUMENT RESUME 24 TE
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ED 036 521
AUTHORTITLE
INSTITUTION
SPCNS AGENCY
REPORT NOBUREAU NOPUB DATECONTRACTNOTE
ERRS PRICEDESCRIPTORS
IDENTIFIERS
AbSTRACT
DOCUMENT RESUME
24 TE 001 727
ALLEN, R. E.; AND OTHERSTHE DEVELOPMENT OF THE WISCONSIN TESTS OF TESTIMONYAND REASONING ASSESSMENT (WISTTRA). REPORT FFOMCONCEPTS IN VERBAL ARGUMENT PROJECT.:ISCONSIN UNIV., MALISCN. RESEARCH AND DEVELOPMENTCENTER FCR COGNITIVE LEARNING.OFFICE OF EDUCATION (DEEW), WASHINGTON, D.C. BUREAUOF RESEARCH.TE-80BR-5-0216APR 69OEC-5-10-15444P.
THIS REPORT PRESENTS AN OVERVIEW OF RESEARCH RELATEDTO THE DEVELOPMENT OF THE "WISCONSIN TESTS OF TESTIMONY AND REASCNINGASSESSMENT," A BATTERY CF SEVEN TESTS FOR ASSESSING STUDENTDEVELOPMENT IN THE MASTERY OF RELEVANT CONCEPTS AND SKILLS OF VERBALARGUMENT FOR GRADES 10-12. PROVIDED APE (1) A DISCUSSION OF THERATIONALE FOR AND PURPOSES OF THE TESTS, (2) A DISCUSSION CF THEDEVELOPMENT CF THE TESTS, AND (3) A DESCRIPTION OF EACH TEST,RELIABILITY ANL ITEM DATA, AND A BRIEF DISCUSSION OF THE SIGNIFICANCEOF THE DATA GATHERED BY ADMINISTERING THE FIFTH EDITION OF THE TESTBATTERY TO OVER 3,000 JUNIOR/SENIOR HIGH STUDENTS IN FOUR WISCONSINSCHOOL SYSTEMS. INCLUDED IS A DISCUSSION OF THE SPECIFIC OBJECTIVESOF THE TESTS: TO MEASURE THE STUDENT'S ABILITY TO EVALUATE TESTIMONYIN TEEMS OF INTERNAL CRITERIA, CONSISTENCY WITH OTHER TESTIMONY, ANDRECENCY AND PROXIMITY; TO RECOGNIZE AND SELECT WARRANTS IN ARGUMENTS;TO RECOGNIZE STATEMENTS WHICH ANSWER RESERVATIONS IN ARGUMENTS; TOSELECT RESERVATICNS IN ARGUMENTS; AND TO SELECT CLAIMS IN ARGUMENTS.THIRTY-ONE TABLES AND TWO FIGURES OFFER FURTHER INFORMATION.(AUTHOR/LH)
U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE
OFFICE OF EDUCATION
THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION
POSITION OR POLICY.
Technical Report No. 80
THE DEVELOPMENT OF THE WISCONSIN TESTS
OF TESTIMONY AND REASONING ASSESSMENT
(WISTTRA)
By R. R. Allen, Jerry D. Feezel, Fred J. Kauffeld, and Margaret L. Harris
Report from the Concepts in Verbal Argument ProjectR. R. Allen, Principal Investigator
Wisconsin Research and DevelopmentCenter for Cognitive LearningThe University of Wisconsin
Madison, Wisconsin
April 1969
The research reported herein was performed pursuant to a contract with the United States Officeof Education, Department of Health, Education, and Welfare, under the provisions of the Coopera-tive Research Program. The opinions expressed in this publication do not necessarily reflectthe position or policy of the Office of Education and no official endorsement by the Office ofEducation should be inferred.
Center No. C-03 / Contract OE 5-10-154
NATIONAL EVALUATION COMMITTEESamuel BrownellProfessor of Urban EducationGraduate SchoolYale University
Launor F. CarterSenior Vice President on
Technology and DevelopmentSystem Development Corporation
Homy ChaunceyPresidentEducational Testing Service
Martin DeutschDirector, Institute for
Developmental StudiesNew York Medical College
Elizabeth KoontzPresidentNational Education Association
Roderick McPheePresidentPunahou School, Honolulu
Patrick SuppesProfessorDepartment of MathematicsStanford University
*Benton J. UnderwoodProfessorDepartment of PsychologyNorthwestern University
Francis S. ChaseProfessor'Department of EducationUniversity of Chicago
Jack EdlingDirector, Teaching Research
DivisionOregon State System of Higher
Education
G. Wesley SowardsDirector, Elementary EducationFlorida State University
UNIVERSITY POLICY REVIEW BOARD
Leonard Berkowitz John Guy Fowlkes Herbert J. Klausmeier M. Crawford YoungChairman Director Director, R & D Center Associate DeanDepartment of Psychology Wisconsin Improvement Program Professor of Educational The Graduate School
Psychology
Archie A. Buchmiller Robert E. Grinder Donald J. McCartyDeputy State Superintendent Chairman DeanDepartment of Public Instruction Department of Educational School of Education
Psychology
*James W. Cleary H. Clifton Hutchins Ira SharkanskyVice Chancellor for Academic Chairman Associate Professor of Political
Affairs Department of Curriculum and ScienceInstruction
Leon D. EpsteinDeanCollege of,Letters and Science
Clauston JenkinsAssistant DirectorCoordinating Committee for
Higher Education
Henry C. WeinlickExecutive SecretaryWisconsin Education Association
EXECUTIVE COMMITTEE
Edgar F. BorgattaBrittingham Professor of
Sociology
Max R. GoodsonProfessor of Educational Policy
Studies
Russell J. HosierProfessor of Curriculum and
Instruction and of Business
*Herbert J. KlausmeierDirector, R & D CenterProfessor of Educational
Psychology
Wayne OttoProfessor of Curriculum and
Instruction (Reading)
Robert G. PetzoldAssociate Dean of the School
of EducationProfessor of Curriculum and
Instruction and of Music
Richard L. VenezkyAssistant Professor of English
and of Computer Sciences
FACULTY OF PRINCIPAL INVESTIGATORS
Ronald R. Allen Gary A. DavisAssociate Professor of
Educational PsychologyAssociate Professor of Speech
and of Curriculum andInstruction
Vernon L. Allen M. Vero DeVaultAssociate Professor of Psychology
(On leave 1968.69)
Nathan S. BlountAssociate Professor of English
and of Curriculum andInstruction
Robert C. CalfeeAssociate Professor of Psychology
Robert E. DavidsonAssistant Professor of
EL: .cational Psychology
Professor of Curriculum andInstruction (Mathematics)
Frank H. FarleyAssistant Professor of
Educational Psychology
John Guy Fowlkes (Advisor)Professor of Educational
AdministrationDirector of the Wisconsin
Improvement Program
Lester S. GolubLecturer in Curriculum and
Instruction and in English
Max R. GoodsonProfessor of Educational Policy
Studies
Warren 0. HagstromProfessor of Sociology
John G. HarveyAssociate Professor of
Mathematics and Curriculumand Instruction
Herbert J. KlausmeierDirector, R & D Center
Professor of EducationalPsychology
Burton W. KreitlowProfessor of Educational Policy
Studies and of Agriculturaland Extension Education
Richard G. MorrowAssistant Professor of
Educational Administration
Wayne OttoProfessor of Curriculum and
Instruction (Reading)
Milton 0. PellaProfessor of Curriculum and
Instruction (Science)
Thomas A. RombergAssistant Professor of
Mathematics and ofCurriculum and Instruction
Richard L. VenezkyAssistant Professor of English
and of Computer Sciences
MANAGEMENT COUNCIL*Herbert J. Klausmeier
Director, R & D CenterActing Director, Program 1
Mary R. QuillingDirectorTechnical Section
Thomas A. RombergDirectorPrograms 2 and 3
James E. WalterDirectorDissemination Section
Dan G. WoolpertDirectorOperations and Business
* COMMITTEE CHAIRMAN
STATEMENT OF FOCUS
The Wisconsin Research and Development Center for Cognitive Learningfocuses on contributing to a better understanding of cognitive learning bychildren and youth and to the improvement of related educational practices.The strategy for research and development is comprehensive. It includesbasic research to generate new knowledge about the conditions and processesof learning and about the processes of instruction, and the subsequent develop-ment of research-based instructional materials, many of which are designed foruse by teachers and others for use by students. These materials are tested andrefined in school settings. Throughout these operations behavioral scientists,curriculum experts, academic scholars, and school people interact, insuringthat the results of Center activities are based soundly on knowledge of subjectmatter and cognitive learning and that they are applied to the improvement ofeducational practice.
This Technical Report is from the Concepts in Verbal Argument Project inProgram 2. General objectives of the Program are to establish rationale andstrategy for developing instructional systems, to identify sequences of con-cepts and cognitive skills, to develop assessment procedures for those con-cepts and skills, to identify or develop instructional materials associate'with the concepts and cognitive skills, and to generate new knowledge aboutinstructional procedures. Contributing to these Program objectives, the staffof the project developed a semiprogramed course in verbal argument am' -elatedtests for use at the high school level. The project staff prepared the me arialson the basis of an outline of concepts and critical skills developed from anevaluation of everyday discourse.
iii
CONTENTS
List of Tables and Figures
Abstract
Introduction
Rationale and Purposes
Development of the TestsEditionsTest InstructionsTest VocabularyStudent InterestSubject Matter of ItemsContent ValidityInternal Consistency ReliabilityItem CharacteristicsSummary
IV Discussion of Specific TestsTestimony I:
Testimony II:
Testimony III:
Reasoning I:
Reasoning II:
Reasoning III:Reasoning IV:
Appraising Testimony in Terms ofInternal CriteriaAppraising Testimony in Terms ofExternal Criteria: Consistencywith Other TestimonyAppraising Testimony in Terms ofExternal Criteria: Recency and ProximityRecognizing and Selecting Warrantsin ArgumentsRecognizing Statements Which AnswerReservations in ArgumentsSelecting Reservations in ArgumentsSelecting Claims in Arguments
V Reliability Estimates and Item StatisticsSampleReliability EstimatesItem Statistics
VI Conclusions
References
Page
vii
ix
1
2
44
4445
6
7
7
8
10
10
10
11
11
121213
14141414
36
37
LIST OF TABLES AND FIGURES
Table
1
2
3
Test Administrations in the Development of WISTTRA
Panel of Student Consultants
Preferable Difficulty Levels for WISTTRA
Page
5
5
8
4 Testimony I: Reliability Estimates for the Total Test 15
5 Testimony I: Reliability Estimates for the Accept Subtest 15
6 Testimony I: Reliability Estimates for the Bias Subtest 16
7 Testimony E Reliability Estimates for the Position Subtest 16
8 Testimony I: Reliability Estimates for the Competence Subtest 17
9 Testimony I: Reliability Estimates for the Qualification Subtest 17
10 Testimony II: Reliability Estimates 18
11 Testimony III: Reliability Estimates for the Total Test 18
12 Testimony III: Reliability Estimates for the Recency Subtest 19
13 Testimony III: Relial)ility Estimates for the Proximity Subtest 19
14 Reasoning I: Reliability Estlinates 20
15 Reasoning II: Reliability Estimates 20
16 Reasoning III: Reliability Estimates 21
17 Reasoning Reliability Estimates 21
18 Testimony I: Item Statistics for the Total Test 22
19 Testimony I: Item Statistics for the Accept Subtest 23
20 Testimony I: Item Statistics for the Bias Subtest 24
21 Testimony I: Item Statistics for the Position Subtest 25
vii
Table Page
22 Testimony I: Item Statistics for the Competence Subtest 26
23 Testimony I: Item Statistics for the Qualification Subtest 27
24 Testimony II: Item Statistics 28
25 Testimony III: Item Statistics for the Total Test 29
26 Testimony III: Item Statistics for the Recency Subtest 30
27 Testimony III: Item Statistics for the Proximity Subtest 31
28 Reasoning I: Item Statistics 32
29 Reasoning II: Item Statistics 33
30 Reasoning III: Item Statistics 34
31 Reasoning IV: Item Statistics 35
Figure
1 Relationship of WISTTRA to Concepts Identified 6
2 A Typical Item Characteristic Curve 9
viii
ABSTRACT
This paper reports the development of a test battery for measuring studentmastery of certain verbal skills basic to critical thinking. The battery, col-lectively entitled The Wisconsin Tests of Testimony and Reasoning Assessment,consists of three tests related to testimony and four tests related to argumentsdeveloped through reasoning. The basic rationale for the tests and major con-siderations in test development are explained. Each test is described and ap-propriate reliability estimates and item statistics are presented. The particulardata presented were gathered by an administration of the fifth edition of thetest battery to over 3,000 junior/senior high students in four Wisconsin schoolsystems.
INTRODUCTION
This report presents an overview of researchrelated to the development of a test battery formeasuring student mastery of certain verbalskills basic to critical thinking. The reportprovides (1) a discussion of the rationale forand purposes of the tests, (2) a discussion ofthe development of the tests, and (3) a discus-
sion of each test including a description of thetest, reliability and item data, and a brief dis-cussion of that data. The following discussion,then, is intended strictly as a report of testdevelopment; it is not offered as a guide tothe use or interpretation of the tests.
1
RATIONALE AND PURPOSES
In developing the Wisconsin Tests of Testi-mony and Reasoning Assessment (WISTTRA) theresearchers sought (1) to create a valid andreliable testing instrument to be generallyavailable for assessing student developmentin the mastery of relevant concepts and skillsof verbal argument for Grades 10-12, and(2) to gather data from administrations of thisbattery useful in the development of relatedinstructional materials. The tests have beenpublished by and a sample copy is availablefrom the Wisconsin Research and DevelopmentCenter for Cognitive Learning (Allen, Feezel,& Kauffeld, 1968). For a more complete state-ment of the project's rationale the reader shouldconsult A Taxonom Concepts and CriticalAbilities Related to the Evaluation of VerbalArgument (Allen, Feezel, & Kauffeld, 1967).
WISTTRA is based on a view of verbal argu-ment articulated by the English philosopherStephen Toulmin (1958) and adapted to the fieldof ordinary argument by the investigators. Hisprogram, an off-shoot of Rylean language phi-losophy, is developed on two central points:(1) the habits of reasoning utilized in any fieldof inquiry involve rules for evaluating infer-ences much richer than the field-invariantschemes worked out by formal logicians, and(2) an adequate account of such rules can onlybe worked out by attending to the nature ofparticular fields of inquiry. The tests dis-cussed here grew out of a definition of rulesof inference fundamentally important to thefield of ordinary, i.e. , nontechnical, argument.
In order to characterize the rules of infer-ence appropriate to ordinary arguments theresearchers first isolated three major require-ments imposed by the nature of plain discourseon ordinary reasoning: (1) ordinary argumentsmust be able to take the reports of other peo-ple (testimony) as an important source of pri-mary data, (2) ordinary arguments must beable to provide reasons for a wide variety of
2
claim types, and (3) ordinary arguments must beable to handle inferences which utilize cate-gories built on multidimensional, loosely re-lated configurations of criteria.
On this foundation two orders of conceptswere distinguished: (1) those related to ap-praising the testimony of others and (2) thoserelated to appraising the strength of reasonsgiven for a claim. Concepts used in appraisingtestimony may be grouped into two clusters:(a) internal testsposition to observe, abilityto observe, bias, and qualification for judgingand (b) external testsprimary as comparedwith secondary information, recent as comparedwith dated information, and consistent as com-pared with inconsistent information. Conceptsused in assessing the strength of reasons maybe grouped into two clusters: (a) those relatedto the structure of argumentsdata, warrant,claim, and reservationand (b) rules for as-sessing arguments developed through reasoning.The latter are sensitive to the type of argumentbeing assessed and are represented in the testbattery as they are used to assess sign, class,causal, alternative, parallel case, comparative,and warrant supportive arguments. Basic skillsused in assessing arguments developed throughreasoning include (1) the ability to detect mis-sing parts of an argument, (2) the ability todiscern the relevance of objections, and (3) theability to recognize appropriate conclusions.
The researchers saw a compelling need fora test battery adapted to just this configurationof concepts and skills, because measuring in-struments developed on the assumption thatrules of inference are field-invariant do notassess the student's mastery of the skills andconcepts appropriate to ordinary argument.Commonly, tests based on field-invariant logicssimply measure the student's mastery of therules of inference appropriate to some preferredfield of specialized inquiry. Such tests areuseful when information about the student'smastery of the reasoning habits of the preferred
field are of interest, but they may give a verydistorted picture of a student's ability to handleeveryday arguments.
In particular, tests of reasoning based onfield-invariant logics usually neglect the con-cepts and skills related to assessing testimonyand discerning the' relevance of an objection.Tests based on the highly mechanical proce-dures for induction and deduction prescribedby type logics are particularly vulnerable tothis criticism. Few ordinary arguments involve
questions which can be resolved by direct ob-servations of the participants , and still fewerinvolve questions which can be fully analyzedagainst the tidy categories such systems re-quire. WISTTRA was developed to assess thestudent's ability to evaluate adequacy of testi-mony and to recognize the structure that ispresent in ordinary arguments and raise perti-nent objections based on the rules of inferenceappropriate to that structure.
3
III
DEVELOPMENT OF THE TESTS
EDITIONS
The development of WISTTRA constitutesone phase of research related to the develop-ment of student abilities in the assessment ofverbal arguments. From the project's incep-tion the researchers recognized the need foran appropriate testing instrument. Work onthe tests was begun in February of 1966 andcontinued through April of 1968. During thatperiod the instrument went through four experi-mental editions in which its focus was narrowedfrom Grades 7-12 to Grades 10-12 and its itemsanalyzed and revised for greater precision andreliability. During that period portions of i.nebattery were pretested on four occasions anda normative study was conducted with a fifthedition of the battery (See Table 1.)
Development of the tests will be discussedin terms of the considerations and criteria whichthe investigators used in decisions related totest instructions, test vocabulary, studentinterest, subject matter of items, contentvalidity, internal-consistency reliability, anditem characteristics.
TEST INSTRUCTIONS
As is often the case, drawing up instructionsfor the various tests in the battery requiredbalancing the need to provide sufficient infor-mation to complete the task against the demandthat test instructions not teach the studentskills the test seeks to measure. In order tominimize the confounding effects of test in-structions, two forms of instructions were usedin Pretest One. The two forms differed only inthat Form A included an example of the responsetask while Form B did not. The two instructionalforms did not yield significant differences instudent responses, but from general indicationsand conversations with students the longer formwas selected for use in all later test administra-
tions. Care was exercised that the task ex-ample not reveal the nature of the cognitiveskill which the test is to measure. Commentson the clarity and interest of all instructionswere obtained from a panel of high school stu-dents (details on the composition of this panelis reported under STUDENT INTEREST) and re-visions were made according to this feedback.
TEST VOCABULARY
The test battery is not intended as a measureof reading skills or of vocabulary development.To minimize confounding due to such factors,items were screened for words riot availablein an average ninth grader's vocabulary(Thorndike-Lorge, 1944). In addition Dale-Chall (1948) readability scores were computedfor selected portions of the battery. Scoresranged from 7.5 to 8.2, indicating that testitems are suitable for the average readingability of Grades 7 and 8. These steps donot, of course, eliminate confounding due todifferences in reading skills, but they shouldtend to minimize these differences inso-far as possible.1 In addition, they indicatethe battery's appropriateness for the intendedGrades (10-12) .
STUDENT INTEREST
Immediately following Pretest One studentswere asked to rate the testimony portions ofthe battery on seven-step interest, readability,and difficulty scales. Testimony I was ratedas quite readable and quite easy, while all
1 The correlations of tests in this battery withvarious IQ and reading scores are available onrequest.
Test Administrations in the Development of WISTTRA
Tests Place of Date of Sample Educationaladministered testing testing size characteristics Males Females
Pretest TI, TII, Madison, July, 1966 38 Students attend- 23 15One TIII, RI Wisc. ing the Debate
Program, Wis-consin HighSchool SpeechInstitute;Grades 9, 10,and 11
Pretest TI, TII, Lodi, Dec. , 1966 187 Grades 7-12 ofThree TIII, RI, Wisc. Lodi Junior and
RII, RIII, Senior High SchoolsRIV
Pretest TI, RI, Juneau Nov. , 1967 258 Grades 10-12 ofFour RII, MI, and the Juneau and
RIV Reeseville, Reeseville HighWisc. Schools
27 31
1E. 86
123 135
Norma- TI, TII, Clinton, April, 1,968 3090 Grades 7-12 of 1507 1583tive TIII, RI, Cedarburg, to all four schools to to
Study MI, Rill, Reedsburg, 3118a 1515a 1603aRIV and Owen-
Withee, Wisc.
aVariation due to student absenteeism during the testing period.
other ratings for the testimony tests were mod-erately readable, moderately easy, and mod-erately interesting. It should be remembered,however, that these students had received con-siderable instruction in argumentation andshould have found the tests less challengingthan students without special training in thearea.
Item interest was also discussed with apanel of five high school sophomores who hadno previous speech or argumentation coursework. The panel was selected on the basis ofHenman-Nelson IQ and SCAT Reading Scoresto represent a range of abilities at that gradelevel. These are given in Table 2. Commentsby the student consultants were consideredduring subsequent revisions of the test items .
Table 2
Panel of Student Consultants
StudentHenman-Nelson
(IQ)SCAT
(Percentile)
A 121 99B 121 94-99C 143 99D 107 70-83E 112 85-96
SUBJECT MATTER OF ITEMS
Items were constructed using commonplaceinformation from the subject-matter areas of
5
government, entertainment, and education.Fictional names of persons, places, and eventswere used where possible. Each test presentsan approximate balance of items representingthe three subject-matter areas. The data fromPretest One were examined for confounding dueto subject area variables and revealed no sig.nificant differences in scores among the threeareas. However, since items in these threeareas deal with common topics of discussion,approximately equal representation of thesesubjects was retained.
CONTENT VALIDITY
As illustrated in Figure 1, WISTTRA was con-structed to measure cognitive skills related, tocertain fundamental concepts of verbal argument.The three tests of testimony were designed tomeasure the student's ability to detect instanceswhich violate common internal and external testsof testimony. The reasoning tests were designedto measure the student's ability to recognize theessential components of an argument, to ask
relevant questions about arguments, and todraw correct conclusions from arguments.
Based upon pilot study information, subtestsfor Testimony I and Testimony III were retainedas illustrated in Figure 1. The pilot study re-sults indicated that subtests need not be re-tained for Testimony II and the four reasoningtests. Further study of the dimensionality ofall the tests is in progress using factor analyticprocedures.
At two points in the development of the testsbefore Pretest One and prior to the NormativeStudythe battery was submitted to panels ofexperts in the field of argumentation trained inthe conceptual basis of the instrument. Onboth occasions three-judge panels were used.Following a Q-sort technique the judges wereasked to place items in relevant categories orin a 'cannot tell' category. Criteria for cate-gorizing items included (where relevant) argu-ment type, type of rule violated, statementtype, and completeness of argument. judgeagreement ranged from 94.9 to 98.9% for thetests coded in the initial stages of develop-ment and from 85.4 to 98.4% for the tests usedin the normative study. The decline in coder
CONCEPTS
Appraising Testimony Appraising Reasoning
Internal Tests External Tests Data Warrant Reservation Claim
TI60 items
TII20
items
TIII40 items
Alwaysgivenno
test
RI28 items
RII28
items
RIII28
items
RIV28 items
Reject--ooa)
u)4-,
,-,.u)oo0
oa)
a0
)c4
.4-.)--,E--xs-,oa4
Recognizingand
selectingwarrants
,r)
oou)
rd
.,-,4-, 4-,
a)
E ala) u)
rd a)4-,u) ..,
t31 5
NI 0I irst»,.oo 00 --Ia)
C4
(I)oa.,--,.4-.)
5a)u)a)s.t3)
-4.4-.)0a)
,--,
Ci)
Selectingclaims
4-1
000Ho.,o
.1-,
0o
rd.CD
rd-4
a)
a)a)u)-Qo
o.,--,4->Hu)
oo,
.,-4o4-10
a)
a)U)
4-.)0
4->Ca)
4-.)a)
0
a)tn
0
a)a)LH"-4ra
20items
20items
Each of seven warrant typessign, class,causal, alternative, parallel case, com-parative, and warrant-supportiveis rep-resented by four items in each reasoningtest.20
items10
items10
items10
items10
items
Figure 1. Relationship of WISTTRA to Concepts Identified
agreement is attributable to the fact that onlyitems which achieved high coder agreementwere used in drawing up the first edition ofthe tests while the pool of items coded on thesecond occasion consisted of all items com-prising the normative edition of the tests.
INTERNAL CONSISTENCY RELIABILITY
Hoyt analysis of variance reliability esti-mates were obtained for all of the tests. Thisis an internal consistency measure of reliabilityand as such estimates consistency of perform-ance on a relatively homogeneous power test.
Rigid standards for the interpretation ofreliability estimates are not overly meaningful.As a rule of thumb, reliabilities of at least 80are recommended for evaluating level of groupaccomplishment and .90 for evaluating levelof individual accomplishment. In practice how-ever, reliability estimates of .50 to .80 areoften treated as indications of a relatively pre-cise enough instrument for group differentiation.Thorndike and Hagen (1961) discuss this prob-lem in terms of the percent of times the direc-tion of difference will be reversed in subsequenttesting for scores falling at the 75th and 50thpercentiles for various values of reliabilityestimates. The security of a conclusion basedupon a particular test increases much morerapidly for groups than it does for individualsas the reliability of the test increases . Forexample, the probability of reversal is one inthree for scores of single individuals when thereliability is .50; the probability of reversalis 1 in 20 for means of groups of 25 when thereliability is .50.
The standard error of measurement is a sec-ond index of test consistency. This is a meas-ure of the variability of the scores a subjectwould obtain on repeated measurements usingthe same test. The standard error of measure-ment indicates how much his obtained scorefor a single administration is likely to varywith repeated testing, i.e. , how nearly "cor-rec," this obtained score is. For a student'shypothetical distribution of repeated scores ona test, his obtained score would fall withinone standard error value of his actual obtainedscore about two-thirds of the time.
Another way to look at the interpretation ofa reliability estimate is in terms of the size ofthe standard error of measurement relative tothe standard deviation of test scores . This isdiscussed by Thorndike (1951). If the relia-bility is zero the standard error of measurementwould equal the standard deviation of the test.For reliability estimates of .80 and .90 thestandard error of measurement is reduced to
only 45% and 32% of its value for zero relia-bility.
Maximum reliabilities were sought for alltests but the researchers' expectations wereconditioned by two considerations: (1) someof the tests are composed of subtests suffi-ciently divergent in character to reduce theoverall homogeneity of the total tests (TI andTIII) and (2) some subtests are composed ofso few items that high reliabilities are notlikely (subtests of RI, RII, RIII, and RIV) . Forthese reasoning tests the total test reliabilityshould not have been affected by (2) exceptthat item data for the subtests were used inselecting items for inclusion in the total test.The purpose of this action was to enable theresearchers to obtain reliabilities for the fouritem subtests of a sufficient magnitude to en-able further, study of the dimensionality of thetests.
ITEM CHARACTERISTICS
During the development of the tests, itemswere continually revised to improve the instru-ment on the basis of item characteristic dataobtained from the GITAP item analysis pro-gram (Baker 1966, 1968). This program pro-vides difficulty level, biserial correlation,X50, and 13 statistics for each choice of eachitem. In addition it gives descriptive statis-tics, the standard err, r of measurement, andthe Hoyt reliability estimate for the total test.Certain item characteristic criteria were usedin selecting and refining items on the basis ofthe GITAP results. Items to be retained in arevised edition of the test had to meet theminimum requirement as given for each of thefollowing criteria for the correct choice:
1. Preferably fall within a middle difficultyrange as defined by Ebel (1965). SeeTable 3.
2. Have a biserial correlation >. .30.3. Have an X50 between +2.00 and -2.00.4. Have a 13 > .30.
In addition each incorrect choice had to meetthe following minimum requirements:
1. Have a reasonable minimum proportionof subjects respond to it.
2. Have a biserial correlation < -.25 andpreferably < -.30.
3. Have an X50 lower than the X50 for thecorrect choice.
4. Have a S < -.25 and preferably < -.30.These criteria were established in consultationwith staff of the R & D Center and on the basisof reasonably standard rules of thumb for itemevaluation.
7
In a few cases where one or more choicesof an item were slightly deficient in meetingone or more of the standards but it was feltthat the item was still basically good, slightrevisions were made in the item. In so far aspossible it was desired that all items meetthese criteria on the basis of each of two analy-sesone with total test score as the criterionability and another with appropriate subtestscore as the criterion ability.
The difficulty of an item is indexed by theproportion of subjects who responded correctlyto that item. Thus, the greater the value ofthe difficulty index the easier the item. Anitem of middle difficulty is defined by Ebel(1965) as one for which the proportion of cor-rect responses is halfway between the expectedchance proportion and 100%, and he furtherstates that items in a midrange of difficulty30% to 70% of the nonchance rangearealmost as effective discriminators as are itemsof middle difficulty. This middle difficultyrange was taken into consideration in definingdesirable levels of difficulty for the items ofWISTTRA. These levels are specified in Table3. In general, in assembling the total test theitems were ordered by increasing level of dif-ficulty.
For the biserial correlation, X50, anditem statistics, both the results for total testand appropriate subtest analyses were used.An attempt was made to use only items that metthe standards on the basis of both analyses.In a few cases where this was impossible,the subtest analysis was the prime considera-tion.
The biserial correlation coefficient is anindex of the discriminating ability of the itemchoice. For this analysis the criterion abilityused was total test score. As with any cor-relation there is no rigid standard for interpret-ing a biserial correlation. Maximum correla-tions were desired for WISTTRA and .30 was
Table 3
set as a minimum for the correct choice. Alow biserial correlation means that the itemis not discriminating across the criterionability rangea student who had a poor cri-terion score would be almost as likely to getthe item correct as one who had a good criterionscore. The negative biserial correlation for theincorrect choices indicates a descending slopeof the regression line from left to right. Thus,poor students would be more likely to respondto those choices than would good students.The greater the absolute value of the correla-tion the greater the discriminating power ofthe item.
X50 is the point on the criterion scale,given in standard deviation units, correspond-ing to the median of the item characteristiccurve and is the point at which the item choicehas maximum discrimination. Figure 2 illus-trates a typical item characteristic curve.Subjects with a criterion score equal to X50have a 50-50 chance of choosing that response.Thus, +2.00 and -2.00 were used as desirablelt:Lits as this range would include approximatelyF.) 9% of the cases. It was essential, for an itemto be retained in the test, that the X50 valuefor all the incorrect choices be less than thatfor the correct choice with the exception of alltwo choice items. For two choice items theX50 value is the same for both choices.
can be thought of as the slope of the itemcharacteristic curve at the X50 point and is anindex of the discrimination power of the item.The higher the 13 value the greater the slopeof the curve and the more clearly the item isdiscriminating. The maximum positive 13s weredesired for the correct choice and negative onesrequired for all incorrect choices.
SUMMARY
In developing WISTTRA the researchers at-tempted to structure an instrument capable of
Pr sr -able Difficulty Levels for WISTTRA
TestNumber ofChoices
ChanceProbability
Middle DifficultyPoint
Middle DifficultyRange
TI, TIITill, RII 2 .500 .750 .650 to .850
RIV 3 .333 .667 .534 to .800
Rill 4 .250 .625 .475 to .775
RI 5 .200 .600 .440 to .760
8
Figure 2. A Typical Item Characteristic CurveReproduced from Baker, 1965.
stable, precise discriminations across thebroad ability ranges of high school students.At all stages standard procedures and criteriawere employed to insure as much as possiblethat the final test battery would performaccording to these expectations. Ideals ofthis sort must, however, be tempered by thedemands of interestingness, practicability,
and content validity, as well as by the factthat cognitive skills tend to elude precisemeasurement. This section has attempted toconvey the criteria and considerations whichthe investigators employed to balance theoften conflicting demands such a testing in-strument must satisfy.
9
IvDISCUSSION OF SPECIFIC TESTS
This section of the paper presents a discus-sion of WISTTRA on a test by test basis bro-viding a statement of each test's objectivesand a brief description of each test as it existsin its latest edition.
TESTIMONY I: Appraising Testimony inTerms of Internal Criteria
Objective
Testimony I is designed to measure a stu-dent's ability to use the internal tests of testi-mony to discriminate between reliable and un-reliable instances of testimony.
Structure of the Test
Testimony I consists of 60 two-choice items.Each item presents the name of a source (per-son, office, publication, etc.) and a statement(generalization, quality judgment, statistic,etc.) made by that source on some particulartopic. Consider the following two examples:1. Senate Reporter: Sixty-five percent of the
Republican Senators votedfor the Smith-Doe Bill.
2. High School Student: Sun Village is an ex-cellent example ofmodern literature.
Students are asked to indicate whether theywould accept or reject that statement on thegrounds that it was made by that source. Fourcriteria for acceptance are represented in theitems(1) Is the source in a position to ob-serve? (2) Is the source competent to observe?(3) Is the source unbiased? and (4) Is the sourcequalified to judge? Positive (accept) items ,such as Example 1 above, are those whichmeet all four criteria; negative (reject) itemsmeet three of the four, but do not fulfill thefourth criterion (e.g., in position, competent,
10
unbiased, but not qualified to judge), as inExample 2 above. A student responding cor-rectly on all items would accept 20 instancesand reject 40. Violations of internal criteriaare distributed across reject items in a bal-anced fashion so that each criterion is repre-sented by 10 reject items.
There etre indications that the five subtestsof Testimony I should be kept as individualtests and not grouped into one composite test.Further study on the dimensionality of the testsis in progress and will be reported in A FactorAnalytic Study of the Wisconsin Tests of Testi-mony and Reasoning Assessment (Harris, 1969).
TESTIMONY II:
Objective
Appraising Testimony inTerms of External Criteria:Consistency 'ith OtherTestimony
Testimony II is designed to measure a stu-dent's ability to recognize inconsistency be-tween two instances of testimony.
Structure of the Test
Testimony II consists of 20 two-choiceitems. Each item presents two similar state-ments attributed to the same source or to dif-ferent sources. Two examples are:
1A. Sam, Pro GolfOfficial: The greens are in good
condition for today'smatch.
B. Sam, Pro GolfOfficial: Even the best of the
golfers are complainingabout the rough spots onthe greens in today'smatch.
2A. American LeaguePresident: Public interest in base-
ball has declined inthe last year.
B. National LeaguePresident: All major league clubs
have had sizable in-creases in attendanceover the past year.
Students are asked to compare the two instancesof testimony and determine whether acceptingthe second instance would make them more orless likely to accept the first. In the two ex-amples above, the second instance clearlymakes the first instance less likely. Theseitems represent the single criterion of consis-tency between instances of testimony usingstatements by the same source or statementsby different sources. Testimony II presents10 consistent and 10 inconsistent pairs oftestimony. Both consistent and inconsistentitems are balanced so that single-source pairsand two-source pairs appear with the samefrequency.
TESTIMONY III: Appraising Testimony inTerms of External Criteria:Recency and Proximity
Objective
Testimony III is designed to measure a stu-dent's ability to use the external tests of prox-imity and recency to discriminate between re-liable and unreliable instances of testimony.
Structure of the Test
Testimony III consists of 40 two-choiceitems. Each item presents a pair of similarstatements attributed to different sources.Proximity differences (primary versus secondaryinformation) are contrasted within 20 of the 40items by structuring one member of the pair asa reference to some other source and the sec-ond member of the pair as a source of the sortreferred to by the first. For example:
1A. Health Inspector: The pool's chlorinecontent was too hightoday when I testedit.
B. Pool Director: I was told that today'stests indicated the con-centration of chlorine inthe water to be abovethe suggested limit.
In this case, the health inspector's statementis to be preferred since he is a primary source
reporting a direct observation whereas the pooldirector is a secondary source who is merelyreporting what someone else told him. Recencydifferences (recent vs. out-of-date information)are contrasted in the remaining 20 items by in-cluding time references which indicate thatone statement of the pair is more recent thanthe other. For example:
2A. Teacher A: As we begin the 1966-67school year we can expectmore than 75 books to dis-appear from the library.
B. Teacher B: Less than fifty books weremissing from the librarywhen the 1966-67 schoolyear ended.
In this case, when the subject in question isthe number of books disappearing during aschool year, the testimony of Teacher B is tobe preferred since it is the more recent. Inorder to measure a student's ability to handlethese criteria in realistic settings, each of thesubtests presents 10 pairs of consistent in-stances of testimony and 10 pairs of incon-sistent instances. The student is asked toselect the best instance of testimony independ-ent of any inclination to agree with either ofthe sources.
Previous preliminary study indicates thatTestimony III consists of two somewhat inde-pendent subtests. Reliability and item char-acteristic information will be given for Testi-mony III as a composite and as two separatesubtests. Further study on the dimensionalityof the tests is in progress.
REASONING I: Recognizing and SelectingWarrants in Arguments
Objective
Reasoning I is designed to measure a stu-dent's ability to recognize the absence of war-rants and to select appropriate warrants whenneeded.
Structure of the Test
Reasoning I consists of 28 five-option mul-tiple choice items. Each item presents anargument based on reasoning which is eithercomplete. (data, warrant, and claim) or incom-plete (data and claim, but irrelevant informa-tion instead of warrant). Four possible war-rants and a none-needed option are given foreach item. For example:
Mr. Henswar, who moved into our neighbor-hood last week, is a judge. He is one of
11
the new Judges I have never seen. It canbe concluded that Mr. Henswar is a digni-fied man.A. Mr. Henswar is a typical newly ap-
pointed judge.B. Repeated presence in court is a sign
of dignity in a man.C. Judges are dignified men.D. Mr. Henswar is more dignified than
our pre ious judge.E. None needed.
Students are asked to mark option E if theargument is complete as it stands in the initialparagraph. If not, they are to select the ap-propriate warrant from options AD. For in-stance, since th warrant is absent in theabove example, e student should select re-sponse C which ovides an appropriate infer-ence license for a class argument. A studentresponding correctly to all questions will selectE for 10 of the 28 items. Each item representsone of seven argument typessign, cause,class, comparative, parallel case, alternative,and warrant-supportive. Each of the sevenargument types is represented by four items,two or three of which require completion byappropriate warrant selection. In addition,an effort was made to distribute warrant typesamong the distractor options in a balancedfashion. Thus, in he above example, theincorrect responses A, B, and D are warrantsfor warrant-supportive, sign, and comparativearguments respectively.
Reasoning II is designed to measure thestudent's recognition of statements in argu-ments which anticipate and answer reservations.
Structure of the Test
Reasoning II consists of 28 two-choiceitems. Each item presents a pair of completearguments containing data, warrant, claim,and some additional information. The studentis instructed to indicate which of the two argu-ments is better. For example:
A. George is probably a good young farmerbecause he is a member of our school'sFuture Farmers of America Club. Lastyear he won four blue ribbons at thecounty fair with his dairy cattle. Our
12
county agricultural agent says that mem-bership in the FFA is a pretty good signthat a boy is a good young farmer.
B. George is probably a good young farmerbecause he is a member of our school'sFuture Farmers of America Club. TheClub has twenty-two members and meetsevery Saturday morning in the Agricul-tural Lab. Our county agricultural agentsays that membership in the FFA is apretty good sign that a boy is a goodyoung farmer.
The paired arguments are alike except that inone of them the additional information is ir-relevant to the argument while in the other theinformation removes or answers a possiblerefutation (reservation) of the argument. Thus,in the above example, the first argument is tobe preferred to the second because the firstcontains information which answers the "lackof concurrent sign" reservation. All sevenargument types are represented equally in theitems. Reasoning contains four items foreach of the seven argument types: sign,cause, class, comparative, parallel case,alternative, and warrant-supportive.
REASONING III: Selecting Reservationsin Arguments
Objective
Reasoning III is designed to measure thestudent's ability to discriminate between rele-vant and irrelevant reservations.
Structure of the Test
Reasoning III consists of 28 four-optionmultiple choice items. Each item presents acomplete argument (data, warrant, and claim)and four statements to be considered as pos-sible reservations to the argument. For ex-ample:
Enrollment in our high school has beensteadily increasing. Since increases inenrollment force school boards to buildnew high schools, our school board willprobably build another high school soon.
A. 'Unless there is still plenty of room inthe old high school.
B. Unless all school boards face increas-ing enrollments.
C Unless our school system has fewerstudents than many other systems whichbuilt new high schools.
D. Unless the relationship between enroll-ments expressed by the word "increased"does not imply future changes in enroll-ment.
The student is asked to select from the four thereservation which best qualifies or refutes theargument. In each case only one of the fourchoices is a relevant reservation to the argumenttype represented. The other three responses,although appearing as reservations in terms ofphrasing, do not lessen the confidence whichmay be placed in the claim advanced. Thus,answer A in the example above represents thepartial cause reservation to a cause-effect argu-ment and is therefore the appropriate responsewhile the other three responses simply provideseemingly relevant information in a reserva-tion form inappropriate to a causal argument.
Reasoning III presents four items for each ofthe seven argument types, This format enabledthe investigators to use each of the various res-ervation types at least once in connection withan appropriate argument type and to roughlybalance the distribution of reservation typesacross distractor choices.
REASONING IV: Selecting Claims in Arguments
Objective
Reasoning IV is designed to measure a stu-dent's ability to select the claim appropriateto a given argument.
Structure of the Test
Reasoning IV consists of 28 three-optionmultiple choice items. Each item presents the
data, warrant, and some additional informationfor an argument. The items are systematicallyvaried such that the additional information isirrelevant in 9 items, provides an answer to areservation in 10 other items, and raises areservation (with no answer given) in the re-maining 9 items. The student is instructed toselect the proper claim to the argument (oftwo choices given) or to indicate that it is notpossible to make a proper claim given the in-formation presented in the argument. For ex-ample:
Johnny always turns his work in on time.Turning work in on time plays an importantpart in passing college courses. SometimesJohnny spends little time on his work anddoes not care much about studying. There-fore:
A. Johnny probably will pass his collegecourses.
B. You really can't tell whether Johnnywill pass his college courses.
C. Johnny probably does careful work inhis courses.
The three possible answers are constructedsuch that one states a topically related ideawhich does not follow the structure of thedata and warrant, one denies that a particularconclusion is possible, and one presents astraightforward claim. A student respondingcorrectly would select the "cannot tell" Op-tion for items with unanswered reservations(as in the example above) and the straight-forward claim in all other cases. Again eachof the argument types is represented by fouritems.
13
V
RELIABILITY ESTIMATES AND ITEM STATISTICS
The reliability estimates and item statisticsreported in this section were obtained in thenormative study. In all cases, they are givenseparately for each sex group for each grade(seven through twelve).
SAMPLE
The total number of subjects tested rangedfrom 3090 to 3118 for any one test. The totalnumber of subjects within a single age andsex group ranged from 190 to 311 for any onetest. These subjects were obtained by ran-domly sampling schools from a single stratifi-caticni of the population of Wisconsin schooldistricts. This was accomplished by usingthe results of a study by Miller et al. (1967)which describes Wisconsin school districtson the basis of factor scores for a number offactors. The following five factors were usedin identifying a homogeneous stratified popu-lation for the study: (1) numerical size,(2) organizational complexity, (3) teacherexperience, (4) economic power, and (5) sizeof school unit. For further details on the popu-lation and sampling procedures used refer toA Study of Student Abilities in the Evaluationof Verbal Argument (Rott, Feezel, & Allen, inpress).
14
RELIABILITY ESTIMATES
Reliability estimates were computed usingthe Hoyt analysis of variance procedures andwere obtained as part of the results of the Gen-eralized Item and Test Analysis Program (Baker,1968) used to analyze the tests. These esti-mates are presented in Tables 4 throe._; 17 foreach of the seven tests a a total test and forsubtests of Testimony I and Testimony III. Alsoincluded in these tables, for each grade and sexgroup, is the sample size, mean, standard devi-ation, range, and standard error of measurement.
ITEM STATISTICS
A summary of the item statistics (difficulty,biserial correlation, X50, and (3) for the correctchoices for each of the seven tests as a totaltest and for subtests of Testimony I and Testi-mony III are given in Tables 18 through 31.The investigators realize there are problemswith using the mean as a measure of centraltendency for the biserial correlation and 13since they are not linear, but it was felt themean would give the reader some indication ofcentral tendency and at least show the generalincrease in the value of these statistics fromgrade seven through grade twelve.
Table 4
Testimony I: Reliability Estimates for the Total Test
StandardMean Score Standard Range Error of Hoyt
Grade/Sex N (60 max.) Deviation of Scores Measurement Reliability
CO CO '4 CT) CO CO "-D CO 4 CT) D CT) 0) 1.P d., CO 14:::. CD CD 1.P CAD CO C.) CT)
Cr) 0') CA ND
0101 CO CO
01 01 IV 01
Cr)
CO
CD CD CD CD
I II t 4=., Cr) 01 a., CA) N.) CA) Clj Ck) 4=., ND N..) CD o a., C11 1 CD Cr 1-1=* CAD CD C1-1 CO t-F rh rh rh rh rh rh rh rh rh r-1 ri 0 0 0 0 0 0 0 0 0 0 0 0 I, I--, I--, I, I--, I--, I--, I--,
C) C O 1- O C) C CO Cr1 CD C C O CO Cr) Ul . I O c7 co D Cr a D
CO CD CO CO CO r"
CO CO CO CD
CO CO C) Cr) ND
Grade/Sex
N
Number of Items Within Middle
Difficulty Range (.65.85)
Range of item Difficulties
Mean Difficulty
Number of Items with Biserial
Correlations > .30
Range of Biserial Correlations
Mean Biserial Correlation
Number of Low X50s (-1.01 or below)
Number of Medium X50s (-1.00 to +1.00)
Number of High X50s (+1.01 or above)
Range of 13 s
Mean 13
cr)
CD 0 PJ (/)
0 " to 0.)
0 ttr
Pi
0 PJ
0
CA fr
(/)
Tab
le 2
4
Tes
timon
y II
:It
em S
tatis
tics
(20
Item
s)
4--
rn
-,--
.o
o
nz)
rj)
4-)r
cira --I
><
Lr)
00
0E
a)a)
-..7
5rc
i
1--9
T5
---,
4--,
0X
1-1
0M
I4-
10
0 o
0M
tn
0S-
0 al
00
I0
0-
00
r-1
s,-1
0:-
10
o-,
.r)
-4_,
-1-1
Lc)
4-1
,cr
)tn
al4-
i4-
1$-
'E
Loc'
al)
--'
U) o °
>< E
---1 T5 a) 4-1 0 .Q E
--.
c) p . + 0 o
u) o>
<L
r)
...0
tn -'- 2 al
4-1 0 r-
1_c
) o
E6
z6
---
rr.:al
--, Q
_Z
ed
A 1
al 0 0
Z -
2-Z
-I-
-Z
j---
-n.
ro
7M24
67F
251
8M22
88F
224
9M30
49F
302
10M
287
10F
302
11M
253
11F
265
12M
190
12F
253
109 7 9 6
12 13 18 17 18 19 17
.34-
.78
.65
19.2
8 to
.78
.53
317
0.2
9 to
1.2
6.6
5.5
0-.7
5,6
319
.29
to.7
1.5
32
180
.31
to 1
.00
.65
.54-
.73
.62
19.2
4 to
.70
.58
218
0.2
4 to
.99
.73
.56-
.77
.65
19.0
1 to
.84
.59
416
0.0
1 to
1.5
4.8
0
.53-
.72
.62
20.3
0 to
.73
.58
119
0.3
1 to
1.0
8.7
4.5
5-.7
6.6
520
.36
to.8
1.6
26
140
.38
to 1
.39
.85
.59-
.71
.66
20.4
1 to
.83
.67
020
0.4
6 to
1.4
6.9
6.6
2-.8
6.7
520
.48
to.9
3.7
09
110
.55
to 2
.57
1.14
.61-
.81
.72
20.3
7 to
.90
.70
713
0.4
0 to
2.0
41.
08.6
3-.8
3.7
320
.38
to.9
6.7
57
130
.41
to 3
.61
1.35
.64-
.79
.72
20.4
0 to
.89
.72
1010
0.4
4 to
2.0
61.
10.6
0-.8
6.7
520
.43
to 1
.01
.75
8lla
0.4
8 to
8.4
9b
1.59
ca T
he X
50 c
ould
not
be
com
pute
d fo
r th
e ite
mfo
r w
hich
the
bise
rial
cor
rela
tion
exce
eded
1.00
.bT
hehi
ghes
tf3
cou
ld n
ot b
e co
mpu
ted
sinc
e th
ehi
ghes
t bis
eria
l cor
rela
tion
exce
eded
1.0
0.c
Bas
ed o
n 19
item
s.
Tab
le 2
5
Tes
timon
y II
I:It
em S
tatis
tics
for
the
Tot
al T
est
(40
Item
s)
CI) CD
:FJ
Cn
In XC(1
) D0
CD
-,-4
L<
--J
In>
fpill
0C
I)78
X°
i' -o c,
:DI
>0
0o C
flV
Ifr
il-,
-41-
-10 T
DI
.-.
1 2
0,1
:2+
cc)
rnX
4-1
1--1
-1-3
012
Q-a
)4-
14-
14-
Iea
00
., in
--4
48 °
00
4-3
C°
4--i
4-1
4-1
:1SI
Cn
-,-4
O $
'0
00
$'
00
0 '-1
780
0 ai
CO
rd- 0
,--10
0 o4-
)
00
4-1 0
ci]
,1:2
0 0
1:2
-,-1
(=,
0 -8
Tii
-Q o
-CI
o..C
2 o
0' 0 (c
iE
z,7-
1L
IDal
,-41
E a
) co
al._
,rd
$-.4
EE
E0) r4
Cd 0 5
7M 7F 8M 8F 9M 9F 10M
1OF
11M
11F
12M
12F
246
11.3
1-.7
9.5
832
.15
to .4
9.3
7.8
31.1
6 to
.57
.40
251
15.3
1-.7
5.5
928
.10
to .6
0.3
614
233
.10
to.7
4.3
9
227
15.3
7-.7
6.5
929
.15
to .6
3.4
09
292
.16
to.8
2.4
422
318
.32-
.90
.62
28.0
5 to
.67
.38
1720
3.0
5 to
.89
.42
303
18.3
3-.7
7.5
931
.07
to .6
4.4
114
251
.07
to.8
4.4
63(
'521
.30-
.86
.63
31-.
14 to
.69
.41
1820
2-.
14 to
.94
.48
288
23.3
7-.8
2.6
437
.06
to .7
4.4
739
201
.06
to 1
.10
.55
311
19.3
0-.9
2.6
334
-.02
to .7
1.4
420
173
-.02
to 1
.00
.52
256
25.3
2-.8
5.6
737
.04
to .7
8.5
021
181
.04
to 1
.25
.64
262
25.3
1-.9
1.6
833
-.11
to .7
6.4
922
162
-.11
to 1
.17
.64
195
22.3
4-.8
5.6
734
.11
to .7
5.4
718
211
.11
to 1
.12
.58
251
20.2
9-.9
2.6
836
-.05
to .7
8.5
421
181
-.C
5 to
1.2
6.6
4
A'
,kfl
tof
t_a
E-1309 -Z I 13 OcIS
I-. I-.
CV l' 01 CO
CA) CD
I--. I-.
I (.4 CO Ul
I
CO OD Cn
IA CV
I-. I-. V CO
I-' N Cn Cr)
r+ 1-1. 0 0
CO CO ND 0
Cri CC) `,1
CO CO
0 CD
i'-' t's) Cll V t-l r+ 0 0
I-. I-.
A. CA) CO M
OD V I-. 01
Cs.) L's) Cr) C/1 C/1 (A)
4=
I1 CD CO
I I
CO o 01
....3 -,1 A. CO
I-, I-, CO CO
I-, I-. I-' 0.) c-P rt. 0 0
CO *--.1
CA) CO
01 01 (.0 CO
Ul Cn
0 0
I--. 1-.1 F-J 0.) I-1" r-P 0 0 I-. I-.
a N ND I-' CO
OD OD 01 N
CA) N CD CO N.) ..,1
I-. I-. Ul
C/1 a) I I
CO CO N N
L.
a) V OD 1-'
I-, I-, CO CO
N IN) 0.) CO
c-P I-r 0 0
, ...3 .....3
CO V
01 01 01 (J1
CA) CD
V 0
0 CD
N N..) 0.) CO
1-1. r+ 0 0
I-. 1---.
0.) NJ CA) 4.
V ICT)
CD `-.1
CA) CA) CD CD N.) 14=.
I--. 0.)
I-' CO I I
CO V Crl V
CT) M `..1 Cil
CO CD
CD CO CO CD
r+ 1-1. 0 0
C.7) CJI CO CO
CJ1
CZ) (3)
0 1.P
CD CD
I 0 CA) (.0 I-. 1-1. 1-1" 0 0
CO CO
CT) 01
CT) CI CV 1-'
N N N N 14=. co
C.)
0.) V C/1
I I
CO V CD CJI
Cr) Cr) ..7
CO CD
N 0.) CO N c-r c-P 0 0
`.3 V 1.1. CT)
CJ1 (C)
CO Cn
F-J
N Cn
CD CD
CA) CA)
C:) 0.) 1-1. 1-1" 0 0
I-.
I-. CO
I-. I-.
M Ul I-4 CO
N N 01 4. I-. cr)
cr.) cc)
C's) I I
V V Cr) CO
a) a) Co.) C.)
CD CO
0.) I-. Cfr.) Ul I-h 1-1. 0 0
Cr) (J1
CV M
1.1=
C,J Cu
CD CD
CA) I-. 01 M 1-1. r+ 0 0
OD V 0 0
01 01
Grade/Sex
N
Number of Items Within Middle Difficulty Range
(.65-.85)
Range of Item Difficulties
Mean Difficulty
Number of Items with Biserial Correlations
> .30
Range of Biserial Correlations
Mean Biserial Correlation
Number o' Low X50s (-1.01 or below)
Number of Medium X50s (-1.00 to +1.00)
Number of High X50s (+1.01 or above)
Range of .f3 s
Mean 0
(D 5
o cn
r+ (D
0
0". CD
(D 0 CD
0
sr C/)
tr CD
(i)
OE
"LE
/71
U1
CO
1-r-
./7
1N
Cr)
Cfl
Ul
C'.)
I-'
1-11
COI-'
I-'
CO
CO
CO
1-11
cn
CO
CO
1-11
NN.)
NN.)
4=.
CO
C3)
CS)
Cn
CA
)
3
CA
)
CA
)
CA
)
CA
)
CA
)
cA)
c....
)
cA)
c....
)
cA)
c....
)
O4Nc)cnCuic)Cu
c....
)
c....
)
co1,
IIIIIIIIIiII
cocococo(0--.3co-.3co-.3.4....,3
cn1--'
cA)
1,t.)a)c)14,.
NJ
C)
U-1fp
I-'
CO
CC
)
CS
)0)") CO)C/1
C/1
C/1
CO
C)
COCO
COcnC/1
L\D
CA
)
1,0
E.0
LSO
ND
N.)
NJI-' C
u1`,..]NJ4C
A)
CONJ
**-,
3N(11
Hr1HHr-fH 000000000000
CO-.1
CO--.3
CO0)cr)
C)
cncn(xi coNN
J
.1C=
)
(J1'"N
I
NJ
,..]
CO
..U
lClC/1Cu
C/1
CrI
CO0clCll
Ls.)
NJ0)
NIA
Cr)
Cr)
C/1
0-1
CA
)
CA
)I-
'
IA
IA
CA
)C
ON.)
C/1
CA
)
CA
)
CA
)
Cr)
C/10)
NJ
I--'
.9
L\DI'CA
)
N.)
E.0
CA
)
E.0NJ
I-'NNI-' cn**
3
CA
)
C/1
COI-'
NJ
C/1-.1
NJ
C/1IA
H.r-H.HH.HH.
H.
H.
H.H.H 000000000000
I-,
1-.4I-,
I-,
F--
,
I-,
COC)4N4.CO
COCOC)
CO0)'...1
CO
CulCuC)
NJ
0)I-'
C)
NJ
C)
CO
.
I-'
OCf)
C:n
CS
)
Cr)
C/1
C/1
CA
)
COCr)
CA
)
Cu1NCO
IAIA
C/10')
Gra
de/S
exN
Num
ber
ofItem
s
With
in
Mid
dle
Dif
ficu
lty
Ran
ge
(.65
.85)
Ran
ge
ofItem
Dif
ficu
lties
Mea
n
Dif
ficu
lty
Num
ber
ofItem
s
with
Bis
eria
l
Cor
rela
tions
>.30
Ran
ge
ofBis
eria
l
Cor
rela
tions
Mea
n
Bis
eria
l
Cor
rela
tion
Num
ber
ofLowX50
s (-1.
01or
belo
w)
Num
ber
ofMed
ium
X50
s (-1.
00to+1.
00)
Num
ber
ofHig
h
X50
s (+1.
01or
abov
e)
Ran
ge
ofQs
Mea
n
13
1-P
(DC
i)
NJri)
Orl
II0C
I)
En
En0
(D0D
C1-
P(D
C/)
1-3p)
FL"(
D
N..)
N.)
CJ1
)-4
LV
N.)
0.)
N.)
CA
)
CA
)
(3)
CD
II(r
) (0))
CJ1Fp
CJ1
N.)
ODco
A.
)r+r+
00C
O
CO
CT
)
CS
)
0-)
CO
1--1O
ND
NJ
."-.
3
CO
C3
C3
A.;=. C
J1Jr+r+
00I-
-1
1--1
Cr)
Cr)
CO
....3
1.-I
COC:)
3.....i
hiP4'
N..)NJ
30")
CDND
N..)
ND
()CA
)
Cr.
)
CA
)
ND
CA
)
IIVcr
)
NC0
(.0
NJ
ND
OD
CO
A.
A.
0.)
r+I-h
00C
O00
COA.
0") C
O
CO
C:)
N..)
ND
."-.
3
CO
C)O
CJ1
CJ1
A.
1-1
r+r+
001r
CO
(J1
CJ1
CJ1
1--1i
C:)
CD
NJ
NJ
C.)
C3
1-11
NJ
ND
CO
4=1.
`-.3
N1
CD0')
CA
)
CA
)
CJ1II
cocn
(0Cr)
Cn
hr=
.
NJ
ND
CO
CO
CJ1FP
CD
CA
)
r-F
I-h
00(0C
O
CD
C:)
C7)
Cr.
)
(0oC
:)
L\.)
ND
CO
CO
C3
C3
(11A.
CO
CO
r+r+
00N1
)0Cr.
)
,./..
EN
D
IC
:)
CO
NJ
CJ1
CO
CO
0.)
CA
)
O(±)
}ND
CDA.
C)
IIcr
)
CJ1
Cu1C:)
ACA
)
C)
rP
NJ
ND
CO
OD
CA
)
CA
)
CA
)
CA
)r+
00C
O0)
Cr)
CO
Cr)
CFI
NC
:)N
J
ND
rPIAV
CA
)
CA
)
Cr)
(11
r+r+
001-
1C
r)
CO
-CI
Cr)
CO
*-4
coco
N..)
1--1
CA
)
CO0
)--1rA
CO
(0
IIcr
)N
D
oa
cr)1
NJ
ND
CO
CO
CA
)
CA
)
C)
CN
D
r+r+
00C
O
"....
1
I,CA
)
CJ1
(00)
C:)
C:)
N1
CA
)
(C)
CJ1
CO
Cr,
)
CA
)
A.
Cr,
)
r+r+
001r
CA
)
C:)
'VV
.
CO rP
NJ
NJ
rPFP
CO
(11
C)
Cr)A. II
CA
)C
O
OJ
NJ
CO
N..)NJ
0')
Cr)
C)
CA
)
FIN
I-hr-h 00
"*..10)
14=
c
01
CJ1
CJ1
OCrJ
1C)
CJ1Fp
II
NJ
A.
NJ
ND
A.
Q1
r+r+
00r
C)
CO
CO
`-..3
.1
01Cr)
C)
CA
)
Gra
de/S
exN
Num
ber
ofItem
s
With
in
Mid
dle
Dif
ficu
lty
Ran
ge
(.44.76)
Ran
ge
ofItem
Dif
ficu
lties
Mea
n
Dif
ficu
lty
Num
ber
ofItem
s
with
Bis
eria
l
Cor
rela
tions
>.30
Ran
ge
ofBis
eria
l
Cor
rela
tions
Mea
n
Bis
eria
l
Cor
rela
tion
Num
ber
ofLowX50
s (-1.
01or
belo
w)
Num
ber
ofMed
ium
X50
s (-1.
00to+1.
00)
Num
ber
ofHig
h
X50
s (+1.
01or
abov
e)
Ran
ge
off3s
Mea
n
NJ
CO
r+C
DCA
PZ)C
Dp.
)CA0
(CI
1..-
1
Ha)
crCD
co
ZE
EC
I, I, I, I, 1-1 N N I, I, 0 0 CO Co 00 00
171
N N N CA) N CO CA) cn CO O cn H co (c) (c) N N Cfl Fr,t,
Cu i N O I. co cr)
CO ( A) co O) I, I, N O 1-I-
=.
CO A. CO Cu ri CO A. CO Fr=. CO A. I-1 CO Cr) I-1 Cr) N N O CO 0 I I I
CO CO CO CO CO CO CO CO CO CO "...1 0 CO CS) 0 N CS) CO *"...1
ND ND CO CO
A. CO
ND ND
Cfl CS) CO CO
CS) 0.) CO CO
ND ND Cfl C11
Cr) Cir) N
ND ND CA) 0
CO CO N N 0 N 0 N 0 I' 0 [I I, Cfl CO rr=. CO CO *"..1 CS) Cfl CD CO 0 O 0 0 0 0 0 0 0 0 0 0
CO "] CO CO CO *"..1 **".3 CS) CS) Cu cn CS) CS) ***3 cn N I. Fr,. N cn CO cn co c>
CO ND I-1 N Cr) CO CS) CO 1 N Co
cr) O Cr) N co Cfl
CO CO
I, CO CO
I I a
CO CO N N 0 N 0 N 0 I-1 0 I, Fr=. '',I CO A. CO CO ......3 ......3 Cil I' CO 0 (-1. (-1. (-1. rt. (-1. (-1. (-1. r+ (-1. rt. rF (-1. 0 0 0 0 0 0 0 0 0 0 0 C.)
1,-... I-1 I, I, 1.J I-1 I, . **".3 N Cr) Fr=. CO I-1 C:=1. CO CO O) CO *"...3 IV 1' I-1 ND .....3 I, CO "...I Cr) Cr) CO Cr)
CO *s..1
0 CO OD "...1
CO CO Cr) cr) Cri N 00 N "".1
Cfl A. CO
Grade/Sex
N
Number of Items Within Middle Difficulty Range
(.65.85)
Range of Item Difficulties
Mean Difficulty
Number of Items with Biserial Correlations
> .30
Range of Biserial Correlations
Mean Biserial Correlation
Number of Low X 50s (-1.01 or below)
Number of Medium X5 0s (-1.00 to +i.00)
Number of High X50s (+1.01 or above)
Range of igs
Mean 13
CD
cr) 0
LQ
IV I-1 00
rFI t. r+ CD al
rr rr
cr) rF
cr)
CD
Co
Tab
le 3
0
Rea
soni
ng I
II:
Item
Sta
tistic
s(2
8 It
ems)
(1)
cn--
Itp
c)-F
J--
Iu)
Xr)
u)
Ha
0c) Lr
)
i-i 4
-I4-
3-,
-1'4
1)
cr3
--I s-. 0
X0
0
-cp
T3
cp(I
)
..a)
-Io
Xf)
.....,
tp >
0
-1-3
7.)(
11
fl)
a.)
u)rn
CO
1-4
a)t4
-1--
Ico
F-4
0-)
0--
Ifa
4-I -
-
Cn
4-I
4-)
34-1
Zjj
0S
'4-
)0
S'
0 0
0$-
0$-
"--4
.0
fa5-
4Q
as-
..
O4i
Ia)
to0
.-I
..o 1
5a)
0--
I cp
0 T
DI
..0 c
p..E
:1 c
p,Q
cp
TS
._Q
s..
tp -
,ni
ni: F
_51
,it,
4-I
(cin
).
s-.
,_:,
.-.
--1
c4id
AI
c4fa0
7M24
511
.23-
.64
.43
7F24
814
.25-
.75
.48
8M23
016
.16-
.70
.49
8F21
822
.18-
.81
.55
9M30
219
.16-
.72
.49
9F30
120
.15-
.76
.57
10M
277
22.2
3-.8
1.6
310
F29
419
,20
-.89
.67
11M
262
22.2
6-.8
3.5
6liF
270
16.2
5-.9
0.7
2
12M
191
24.3
0-.8
3.6
412
F25
216
.25-
.92
.72
24.1
5 to
.74
.51
022
6.1
6 to
1.1
1.6
425
-.09
to .7
1.5
02
233
-.09
to 1
.01
.57
27.0
7 to
.80
.56
027
1.0
7 to
1.3
6.7
326
.18
to .8
2.5
71
252
.18
to 1
.45
.73
27.2
2 to
.78
.59
026
2.2
3 to
1.2
5.7
827
.03
to .7
7.5
85
221
.03
to 1
.19
.73
27.2
9 to
.87
.63
819
1.3
0 to
1.7
50
.07
.
27.0
2 to
.83
.60
1215
1.0
2 to
1.4
8.8
2
27.2
8 to
.83
.66
522
1.2
9 to
1.4
8.9
328
.33
to .9
2.6
412
151
.35
to 2
.38
.90
28.3
8 to
.88
.68
126
1.4
1 to
1.9
31.
0228
.32
to .8
5.6
116
111
.33
to 1
.63
.84
Tab
le 3
1
Rea
soni
ng I
V: I
tem
Sta
tistic
s(2
8 It
ems)
a) a) Cd o
z
U)
43
148 s. .o z
(1) O 43 C
d O C.) Cu
U)
S-4
CC
1
cp co A I
..x. ..
-
oc
><
lr)
0c,
..o
a).
ol >
a0
.-4
-.-I
0-.
-I(1
)r-
iZ
.-Q
S. -w
M_0
0o
oo
0S
.S.
W ,-
I(1
)C
D0
0.-
0 C
).-
0(=
).-
0(=
)ai
t_.
EE
E.
O o
--'
--'
--'
0Z
-I-
-Z
-I-
-Z
-i--
oa Cd
7M24
515
.32-
.76
.52
25-.
07 to
.64
.47
224
2-.
07 to
.84
.54
7F24
817
.27-
.78
.54
25.0
9 to
.71
.45
520
3.0
9 to
1.0
1.5
2
8M23
017
.21-
.81
.57
26.0
9 to
.76
.46
521
2.0
9 to
1.1
8.5
58F
218
18.2
6-.8
3.5
925
-.02
to .7
0.4
87
201
-.02
to.9
9.5
7
9M30
218
.20-
.83
.58
26.1
7 to
.72
.51
522
1.1
7 to
1.0
5.6
39F
301
21.2
8-.8
7.6
326
.10
to .7
8.5
318
91
.10
to 1
.23
.67
10M
277
21.2
7-.8
7.6
625
.23
to .8
3.5
510
171
.24
to 1
.51
.70
1OF
294
19.3
2-.9
1.7
025
.24
to .7
7.5
217
101
.24
to 1
.20
.66
11M
262
18.4
8-.8
8.6
826
.25
to .7
9.5
514
131
.26
to 1
.29
.71
11F
270
18.4
4-.9
0.6
028
.34
to .7
7.5
220
71
.36
to 1
.20
.65
12M
191
20.3
0-.8
3,
6728
.30
to .8
2.6
09
181
.32
to 1
.41
.81
12F
252
15.3
5-.9
4.7
825
.17
to .8
7.5
121
61
.17
to 1
.78
.66
VI
CONCLUSIONS
The reliability estimates obtained for all ofthe tests for each age and sex group are suf-ficient for research purposes and to evaluategro-o differences. In addition, for some of the
.-. particularly for Grades 10 -12, the;:,. tes are of a sufficient magnitude
evaluation of differences amongind If the further study of the dimen-siona._1,1 of the tests indicates that the sub-tests of Testimony I and Testimony III shouldbe considered as independent tests they shouldbe lengthened to be more reliable.
The items, in general, exhibit the charac-teristics sought by the investigators. Many ofthe items fall within the middle difficulty range.Most items discriminate rather sharply, as in-dexed by high biserial correlations and 13s.Most of the items which have low biserial cor-relations and 13s are found in one of two tests,Testimony I or Testimony III, when total testscore is the criterion measure. These low cor-relations may be indications that at least some
36
items are measuring different abilities and thatsubtests should perhaps be retained. Most ofthese same items have correlations and as above.30 for the appropriate subtest when it is thecriterion measure. As evidenced by the X50 itemstatistics, many more items are maximally dis-criminating among students of low and middleabilities than among students of high ability.Thus, these items are discriminating more clearlyamong less able students than they are amongmore able students. In general, the item sta-tistics tend to increase in value from Grade 7to Grade 12.
Although the final edition of the tests was de-signed primarily for Grades 10 -12, there are in-dications that the tests might also yield usefulinformation for Grades 7-9. A more exact inter-pretation of the adequacy of the reliability anditem statistics of the tests is left to the readerand potential user who should judge the valueof the tests for his particular purpose.
REFERENCES
Allen, R. R., J. D. Feezel, and F. J. Kauffeld.A taxonomy of concepts and critical abilitie,,related to the evaluation of verbal ar'uments.Occasional Paper Number 9. Wisconsin Re-search and Development Center, The Univer-sity of Wisconsin, Madison, August 1967.
Allen, R. R., J. D. Feezel, and F. J. Kauffeld.The Wisconsin tests of testimony and rea-soning assessment (WISTTRA). PracticalPaper Number 6. Wisconsin Research andDevelopment Center for Cognitive Learning,The University of Wisconsin, Madison,December 1968.
Baker, Frank B. Origins of the item parametersX50 and f3, as a modern item analysis tech-nique. Journal of Educational Measurement,1965, 2, 167-180.
Baker, Frank B. Test analysis package: A pro-gram for the CDC 1604-3600 computers .Department of Educational Psychology, TheUniversity of Wisconsin, June 1966.
Baker, F. B. and T. J. Martin. Fortap: A fortrantest analysis package. Wisconsin Researchand Development Center for Cognitive Learn-ing, The University of Wisconsin, 1968.
Dale, Edgar, and Jeanne S. Chall. A formulafor predicting readability: Instructions. Edu-cational Research Bulletin, 1948, 27, 37-54.
Ebel, Robert L. Measuring educational achieve-ment. Englewood Cliffs, N. J.: Prentice-Hall, Inc., 1965.
GPO 812-508-2
Harris, Margaret L. A factor analytic study ofthe Wisconsin tests of testimony and reason-ing assessment (WISTTRA). Wisconsin Re-search and Development. Center for CognitiveLearning, The University of Wisconsin,Madison, 1969 (in progress).
Miller, Donald M. , et al. Multivariate pro-cedures for stratifying school districts.Laboratory of Instructional Research, TheUniversity of Wisconsin, Madison, 1967.
Rott, Robert K., Jerry D. Feezel, and R. R. Allen.A study of studerr: abilities in the evaluationof verbal arguments. Wisconsin Researchand Development Center for Cognitive Learn-ing, The University of Wisconsin, Madison,in press.
Thorndike, Edward L. and Irving Lorge. Theteacher's wordbook of 30 000 words. Colum-bia: Teachers College Press, 1944.
Thorndike, Robert L. Reliability. In E. F.Lindquist (Ed.), Educational measurement.Washington, D. C.: American Council onEducation, 1951.
Thorndike, Robert L. and Elizabeth Hagen.Measurement and evaluation in psychologyand education. New York: John Wiley &Sons, Inc., 1961.
Toulmin, Stephen. The uses of argument. Cam-bridge: University Press, 1958.