DOCUMENT RESUME ED 445 068 TM 031 684 TITLE State Norms for the North Carolina Testing Program. 1999 Edition. INSTITUTION North Carolina State Dept. of Public Instruction, Raleigh. Div. of Accountability/Testing. PUB DATE 1999-00-00 NOTE 92p. PUB TYPE Numerical/Quantitative Data (110) -- Reports - Descriptive (141) EDRS PRICE MF01/PC04 Plus Postage. DESCRIPTORS *Achievement Tests; Elementary Secondary Education; *State Norms; State Programs; Tables (Data); *Test Results; Testing Programs IDENTIFIERS *North Carolina; North Carolina Competency Tests; North Carolina End of Course Testing Program; North Carolina End of Grade Testing Program ABSTRACT This publication provides a reference for those interested in conducting comparative studies about North Carolina tests. It includes norms tables and other basic statistical information for all state-developed tests (state-mandated and local option tests where baseline data are available) that were administered during the current accountability cycle. The baseline norming year is shown for each test. These statistics are the product of a method that computes percentile ranks across two or more distributions. State-mandated end-of-grade test norms, open-ended assessment norms, and local option end-of-grade test norms are given for tests for grades 3, 4, 5, 6, 7, and 8. Norms are given for the end-of-course (high school) tests in 10 subjects. Norms are also presented for the High School Comprehensive Tests for reading and mathematics at grade 10. A discussion of using test scores for student decisions is included, as is a glossary of testing and other statistical terms. (Contains 44 tests.) (SLD) Reproductions supplied by EDRS are the best that can be made from the original document.
93
Embed
Reproductions supplied by EDRS are the best that can be ...Grade 3 Mathematics. 12. Grade 4 Reading. 13. Grade 4 Mathematics. 14. Grade 5 Reading. 16. Grade 5 Mathematics. 17. Grade
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DOCUMENT RESUME
ED 445 068 TM 031 684
TITLE State Norms for the North Carolina Testing Program. 1999Edition.
INSTITUTION North Carolina State Dept. of Public Instruction, Raleigh.Div. of Accountability/Testing.
PUB DATE 1999-00-00NOTE 92p.
PUB TYPE Numerical/Quantitative Data (110) -- Reports - Descriptive(141)
Norms; State Programs; Tables (Data); *Test Results; TestingPrograms
IDENTIFIERS *North Carolina; North Carolina Competency Tests; NorthCarolina End of Course Testing Program; North Carolina Endof Grade Testing Program
ABSTRACTThis publication provides a reference for those interested
in conducting comparative studies about North Carolina tests. It includesnorms tables and other basic statistical information for all state-developedtests (state-mandated and local option tests where baseline data areavailable) that were administered during the current accountability cycle.The baseline norming year is shown for each test. These statistics are theproduct of a method that computes percentile ranks across two or moredistributions. State-mandated end-of-grade test norms, open-ended assessmentnorms, and local option end-of-grade test norms are given for tests forgrades 3, 4, 5, 6, 7, and 8. Norms are given for the end-of-course (highschool) tests in 10 subjects. Norms are also presented for the High SchoolComprehensive Tests for reading and mathematics at grade 10. A discussion ofusing test scores for student decisions is included, as is a glossary oftesting and other statistical terms. (Contains 44 tests.) (SLD)
Reproductions supplied by EDRS are the best that can be madefrom the original document.
1
PERMISSION TO REPRODUCE ANDDISSEMINATE THIS MATERIAL HAS
BEEN GRANTED BY
6(14.14419eLA
TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)
m
rf--0 arit5c Schnes of North CareitaAaM red State Paaml of Bd1E1716311
721DDeparbreatt. oaf Pub!c Ensbution
Da&lazo of AconntizEly Ss/
.crcoco
O2I-
BEST COPY AVAILABLE
Sedan
2
U.S. DEPARTMENT OF EDUCATIONOffice of Educational Research and Improvement
EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)
This document has been reproduced asreceived from the person or organizationoriginating it.
Minor changes have been made toimprove reproduction quality.
Points of view or opinions stated in thisdocument do not necessarily representofficial OERI position or policy.
State Board of Education
Phillip J. Kirk, Jr.ChairmanRaleigh
Jane P. NorwoodVice ChairCharlotte
Kathy A. TaftGreenville
Margaret B. HarveyKinston
Edgar D. MurphyDurham
Evelyn B. MonroePinehurst
Maria T. PalmerChapel Hill
Ronald E. DealHickory
Robert R. DouglasAsheville
Zoe W. LocklearPembroke
Eddie Davis IllDurham
Dennis A. WickerLieutenant GovernorSanford
Harlan E. BoylesState TreasurerRaleigh
NC Department of Public InstructionMichael E. Ward, State Superintendent
301 North Wilmington Street Raleigh, North Carolina 27601-2825 Website: www.dpi.state.nc.us
Table of Contents
Introduction 1
Acknowledgements 2List of Contents 3
End-of-Grade Tests 4State-Mandated End-of-Grade Test Norms
Local Option End-of-Grade Test Norms 31Grade 3 Science 32Grade 3 Social Studies 34Grade 4 Science 35Grade 4 Social Studies 37Grade 5 Science... 39Grade 5 Social Studies 41Grade 6 Science 43Grade 6 Social Studies 45Grade 7 Science 47Grade 7 Social Studies 49Grade 8 Science 50Grade 8 Social Studies 52French Form A 53French Form B 54Spanish Form A 55Spanish Form B 56
State Norms-1999 Edition
4
End-of-Course Tests 57State-Mandated End-of-Course Test Norms
Algebra I 61Biology 63ELP 65English I 66US History 68Algebra II 70Chemistry 71Geometry 72Physical Science 73Physics 74
North Carolina High School Comprehensive Test 75State Mandated North Carolina High School ComprehensiveTest Norms
Grade 10 Reading 79Grade 10 Mathematics 80
Using Test Scores for Student Decisions 82Glossary of Testing and Other Statistical Terms 84
5
State Norms-1999 Edition
Introduction
The purpose of this publication is toprovide a reference for educators andothers interested in conductingcomparative studies relative to NorthCarolina tests. This publication includesnorms tables and other basic statisticalinformation for all state-developed tests(state-mandated and local option testswhere baseline data are available) thatwere administered during the currentaccountability cycle. New tests andrevisions of existing tests are made fromtime to time; therefore, subsequenteditions of this publication will includenorms and statistical information for newand revised tests as they are implemented.
The North Carolina ABCs AccountabilityProgram requires extensive use ofstatistics from North Carolina state-mandated tests to monitor the progress ofthe state's public schools. Each school ismeasured against itself, and isaccountable for having students makereasonable progress based on year-to-yearcomparisons of test performance. Theaccountability program uses the corecomponents of the statewide testingprogram, focusing accountability on thebasics. Student improvement is routinelymeasured by the State Board ofEducation in reading, writing, andmathematics at the elementary andmiddle school levels. At the high schoollevel, schools are held accountable in thecore subjects required for graduationwhere state tests have been developed:English I and II; Algebra I and II;Geometry; U.S. History; Economic,Legal, and Political Systems; PhysicalScience; Biology; Physics; andChemistry. Statistical information,including norms, for other state-developed tests is included in this
document as a reference when usingthose tests.
Users of this publication are remindedthat state-level frequency distributionsand other descriptive statistics aregenerated each year a state-mandated testis administered; however, for baselinecomparative purposes, normativestatistics are typically established the firstyear a test is administered, revised, or asdesignated by the test publisher, NorthCarolina Department of PublicInstruction (NCDPI). The baselinenorming year is shown for each test in theList of Contents (on page 3) and at thetop of each table.
Several legitimate statistical methodsexist whereby norms may be calculatedfor tests. Each method may produceslightly different (although educationallyinsignificant) normative scores. Thestatistics presented here represent theproduct of the method used by the PublicSchools of North Carolina, NCDPI, forthe state's testing and accountabilityprograms. This method computespercentile ranks that can be comparedacross two or more distributions (unlikepercentiles). Accordingly, persons usingthese tables should be able to replicatestatistics on file at NCDPI whencalculating test averages and percentilerankings locally.
Educators and others interested in usingthese tables may contact the RegionalAccountability Coordinator (RAC), localtesting coordinators, or the Division ofAccountability Services for additionalinformation on the appropriate uses ofthese tables. This publication is beingrevised mainly to include the 1999 normsfor NC End-of-Grade Tests in readingand mathematics in grades 3-8.
State Norms-1999 Edition
6
1
Acknowledgments
The development of a document that contains a variety of statistical information requiresthe skill and efforts of many individuals. We thank George Stubblefield for compilingthe statistical information; Mildred Bazemore for organization, writing, editing andcontributions to the text; and Karen Davis, Linda Althouse, and George Stubblefield fortheir contributions to the text and statistical information for this document.
State Norms-1999 Edition 2
List of Contents
Guides to the Interpretation of Tests:
Using Test Scores for Student DecisionsEnd-of-Grade Tests - Achievement Levels by Subject and GradeEnd-of-Course Tests - Achievement Levels and Scale ScoresNorth Carolina High School Comprehensive Test - Achievement LevelsGlossary of Testing and Other Statistical Terms
Norms and Interpolation Tables:
State-Mandated End-of-Grade TestsCurrent
Norm YearOriginal
Norm Year
Grade 3 Pretests/Reading and Mathematics 1996 1996Grades 3-8 Reading 1999 1993Grades 3-8 Mathematics 1999 1993Open-Ended Assessments/Reading and 1997 1997
Mathematics Grade 8
Local Option End-of-Grade Tests
Grades 3-8 Social Studies 1994 1994Grades 3-8 Science 1996 1996
French Forms A and B 1992 1992Spanish Forms A and B 1992 1992
State-Mandated End-of-Course Tests
Algebra I 1994 1987Biology 1995 1987Economic, Legal, & Political Systems 1997 1991English I 1995 1990U.S. History 1995 1988Algebra II 1997 1988Chemistry 1997 1989Geometry 1997 1989Physical Science 1997 1991Physics 1997 1990
North Carolina High School Comprehensive Test 1998 1998
r
State Norms-1999 Edition 30
eyr
IAA
A
,1.0)
End-of-Grade Tests
Historical Information. NorthCarolina end-of-grade tests wereinitiated in response to legislation passedin 1989 by the North Carolina GeneralAssembly. North Carolina end-of-gradetests were developed by the NorthCarolina Department of PublicInstruction with technical support fromthe L.L. Thurstone PsychometricLaboratory at The University of NorthCarolina at Chapel Hill and the NorthCarolina Technical Advisory Group.The tests were developed for use ascurriculum-based achievement tests tomeasure the acquisition of specificsubject-area content and skills associatedwith a particular grade in school.
Based on the 1989 legislation, NorthCarolina end-of-grade tests weredeveloped for two purposes:
To provide accurate measurement ofindividual student skills andknowledge specified in the NorthCarolina Standard Course of Study,and
To provide accurate measurement ofthe knowledge and skills attained bygroups of students for school, schoolsystem, and state accountability.
End-of-grade tests are designed tomeasure higher level thinking skills.When properly administered andinterpreted, these test results provide anindependent, uniform source of reliableand valid information that enables:
Students to know the extent to whichthey have mastered expectedknowledge and skills and how theycompare to others;
Parents to know if their children areacquiring the knowledge and skillsneeded to succeed in a highlycompetitive job market;
Teachers to know if their studentshave mastered grade-levelknowledge and skills in thecurriculum and, if not, whatweaknesses need to be addressed;
Community leaders and lawmakersto know if students in North Carolinaschools are improving theirperformance over; and
Citizens to objectively assess theirreturn on investment in the publicschools.
Scores on the end-of-grade tests are onlyone of many indicators of theachievement of students. The value ofthese tests lies primarily in the fact thatthe scores provide a common standardthat is not influenced by local differencesin achievement and expectations. Thetests provide yardsticks, which can beused to compare the achievement ofstudents, schools, school systems, andthe state. The assessment yardstick canbe used to measure gains (or losses) inperformance across time to see ifeducation improvement efforts at thestate and local level are working.
North Carolina end-of-grade tests inreading and mathematics at grades 3-8were initially normed in 1993. The testswere renormed using the state statisticaldistribution of scores from the 1998-99statewide administration.
State Norms-1999 Edition10 5
Scale Scores. Raw scores are convertedto scale scores on the end-of-grade tests.Developmental scale scores allow thedifferent forms of the test to be equated,so that a developmental scale score of150 in reading on Form A represents thesame level of achievement as adevelopmental scale score of 150 inreading on Form B, even though the rawscores may be slightly different.
The reading, mathematics, and sciencetests are reported on developmentalscales, which yield rulers to measuregrowth in these subject areas acrosstime. Just like height in inches, onaverage, student scores are expected togo up every year. Also like height, therate of growth is somewhat faster in theearlier grades than in later grades. Socialstudies scores are not reported on adevelopmental scale because the specificcontent covered at each grade isdifferent.
Developmental scale scores on end-of-grade tests can be used to chart theaverage educational growth of studentsas they progress through the grades.Given the way the scales weredeveloped, the differences inmathematics scores between grades in1992-93 can provide a standard forcomparing average growth across gradelevels for the future, e.g., between wherestudents were at the end of the thirdgrade versus where they are at the end ofthe fourth grade. This anticipatedgrowth varies from grade level to gradelevel and from subject to subject. Eachyear test results can be used to chart theactual growth of students from gradelevel to grade level within a subject.Schools and school systems may use the
developmental scale scores to comparetheir own mean scale scores to the stateaverage to evaluate the performance oftheir students. Scale scores can beaveraged and relative comparisons madewithin each subject and across groupsincluding schools and school systems.
It is not appropriate to comparedevelopmental scale scores acrosssubjects (i.e., mathematics vs. reading)because the typical growth across gradesfor each subject is different. It isappropriate to compare scale scoreswithin subjects and types of tests (i.e.,open-ended and multiple-choice) acrossyears.
Achievement Levels. Achievementlevels allow the comparison of studentand group performance to standardsbased on what is expected in eachsubject at each grade level.Achievement levels were determined byrelating the judgments of thousands ofNorth Carolina teachers concerning theperformance of each of their students toeach student's performance (ContrastingGroups Method) on the end-of-grademultiple-choice tests. The four levelsare:
Level I: Students performing at thislevel do not have sufficient mastery ofknowledge and skills in this subject areato be successful at the next grade level.
Level II: Students performing at thislevel demonstrate inconsistent masteryof knowledge and skills in this subjectarea and are minimally prepared to besuccessful at the next grade level.
State Norms-1999 Edition 11 6
Level III: Students performing at thislevel consistently demonstrate masteryof grade level subject matter and skillsand are well prepared for the next gradelevel.
Level IV: Students performing at thisleyel consistently perform in a superior
manner clearly beyond that required tobe proficient at grade level work.
Each achievement level corresponds to arange of scale scores for each subjectand grade level. The scale score rangesfor each subject and grade are presentedbelow.
Ranges for Achievement Levels by Subject and GradeGrades 3-8
Reading and Mathematics Scale Scores
Subject/Grade Level I Level II Level III Level IVReading PT3
Historical Information. The NorthCarolina end-of-course tests wereinitiated in response to legislation passedby the North Carolina General Assembly
the North Carolina Elementary andSecondary Reform Act of 1984. This actcalled for the implementation of theBasic Education Program through theestablishment of a core curriculum forall students for each content area and thedevelopment of tests to assess theimplementation of each curriculumacross the state. Based on thislegislation, the North Carolina end-of-course tests were developed for twopurposes:
To provide accurate measurement ofindividual student skills andknowledge specified in the NorthCarolina Standard Course of Study,andTo provide accurate measurement ofthe knowledge and skills attained bygroups of students for school, schoolsystem, and state accountability.
End-of-course tests are designed ascurriculum-based achievement tests tomeasure that students know and are ableto do within the context of a specificsubject-area content. When properlyadministered and interpreted, these testresults provide an independent, uniformsource of reliable and valid information,which enables
Students to know the extent to whichthey have mastered expectedknowledge and skills and how theycompare to others;Parents to know if their children areacquiring the knowledge and skillsneeded to succeed in a highlycompetitive job market;
Teachers to know if their studentshave mastered grade-levelknowledge and skills in thecurriculum and, if not, whatweaknesses need to be addressed;Community leaders and lawmakersto know if students in North Carolinaschools are improving theirperformance over; andCitizens to objectively assess theirreturn on investment in the publicschools.
The North Carolina end-of-course testswere developed by the North CarolinaDepartment of Public Instruction withtechnical support from the L.L.Thurstone Psychometric Laboratory atThe University of North Carolina atChapel Hill and the North CarolinaTechnical Advisory Group. The testswere developed for use as achievementtests to measure the acquisition ofspecific subject-area content and skills.
Scores on the end-of-course tests areonly one of many indicators of theachievement of students. The value ofthese tests lies primarily in the fact thatthe scores provide a common standardthat is not influenced by local differencesin achievement and expectations. Thetests provide yardsticks, which can beused to compare the achievement ofStudents, schools, school systems, andthe state. The assessment yardstick canbe used to measure gains (or losses) inperformance across time to see ifeducational improvement efforts at thestate and local level are working.
6 o
State Norms-1999 Edition 58
High School Subjects Tests (EOC)Scale Scores
Raw scores are converted to scalescores on the end-of-course tests.Scale scores enable the differentforms of the tests to be equated.Developmental scale scores areinappropriate for end-of-course testsbecause the content taught in each ofthe courses is discrete.
Scale scores allow the differentforms of the test to be equated, sothat a scale score of 50 in Algebra Ion Form A represents the same levelof achievement as a scale score of 50on Form B, even though the rawscores may be slightly different.
The scale scores for all tests weredesigned to have a mean of 50 and astandard deviation of 10. The rangeof scale scores is approximately 20to 90.
Beginning with 1993-94, scores forAlgebra I were reported as scalescores rather than raw scores. In1994-95, Biology, English I, andU.S. History were reported as scalescores rather than raw scores. In1995-96 Economic, Legal, andPolitical Systems (ELP) used scalescores rather than raw scores toreport student performance. Thesescores are not comparable to scoresfrom earlier years when raw scoreswere used.
Achievement Levels. Achievementlevels allow the comparison of studentand group performance to standardsbased on what is expected in eachsubject at each grade level.Achievement levels were determined byrelating the judgments of thousands ofNorth Carolina teachers concerning theperformance of each of their students toeach student's performance (ContrastingGroups Method) on the end-of-coursetests. The four levels are:
Level I: Students performing at thislevel do not have sufficient mastery ofknowledge and skills of the course to besuccessful at a more advanced level inthe content area.
Level H: Students performing at thislevel demonstrate inconsistent masteryof knowledge and skills of the courseand are minimally prepared to besuccessful at a more advanced level inthe content area.
Level III: Students performing at thislevel consistently demonstrate masteryof the course subject matter and skillsand are well prepared for a moreadvanced level in the content area.
Level IV: Students performing at thislevel consistently perform in a superiormanner clearly beyond that required tobe proficient in the course subject matterand skills and are very well prepared fora more advanced level in the contentarea.
Each achievement level corresponds to arange of scale scores for each subject.
Historical Information. The NorthCarolina High School ComprehensiveTestReading Comprehension andMathematics Applications was approvedby the State Board of Education at itsMarch 1997 meeting as a component ofthe ABCs accountability model for highschools effective with the 1998-99school year. Because the ABCs ofPublic Education emphasizeperformance on the basic skills, thecomprehensive test focuses on theassessment of reading and mathematics.This curriculum-based multiple-choiceachievement test was developed tomeasure growth in student achievementin reading and mathematics from grade 8to grade 10 for the purpose of highschool accountability. The test was firstadministered (piloted) in 1997-98 to allstudents in grade 10.
The comprehensive test is an end-of-grade curriculum-based multiple-choiceachievement test that assesses thereading strand of the English LanguageArts competencies and the Mathematicscompetencies the typical student shouldhave mastered by the end of. the tenthgrade. The content measured is notcourse specific. The results of the testcan also be used to make norm-referenced interpretations; state normswere established from the 1997-98 testadministration.
Reading Comprehension. The NorthCarolina High School ComprehensiveTestReading Comprehension assessesa student's ability to read, understand,and critically analyze printed material.It emphasizes (1) the reading strand ofthe English Language Arts curriculum atgrade 10 and (2) the reading that
students are expected to achieve by theend of grade 10. The curriculumspecifications were officially adopted bythe State Board of Education in theStandard Course of Study and theTeacher Handbook. Readingcomprehension is assessed in the samemanner as the end-of-grade test ofreading comprehension and is similar tothe Textual Analysis part of the English Iend-of-course test. The authenticreading passages for the test are selectedbased on several criteria. They must be:
Interesting to read;Complete (have a beginning, middle,and end); andBe from sources students mightactually read.
By adhering to these criteria in theselection of passages, the passages tendto be longer than those typically foundon standardized achievement tests. Oneach form of the test there are eightpassagesthree literary, three content-based, and two consumer/humaninterest.
There are 72 reading comprehensionitems, and the test administration time is100 minutes.
Mathematics Applications. The NorthCarolina High School ComprehensiveTestMathematics Applicationsassesses a student's ability to applymathematical knowledge to solve real-world problems. It measuresmathematics competencies that studentstypically are expected td achieve by theend of grade 10. The curriculumspecifications, which form the basis for
State Norms-1999 Edition81
76
this part of the test, were presented to theState Board of Education as part of theNorth Carolina Mathematics Framework(K-12). The framework consists of thefollowing three competencies and fourcontent strands:
Mathematics is assessed in the samemanner as the mathematics applicationspart of the end-of-grade test ofmathematics, the Algebra I andGeometry end-of-course tests. Studentsmay use formulas (located on the back ofthe test book), rulers, and protractorswhen taking the test.
There are 70 mathematics applicationsitems, and the test administration time is85 minutes.
Reporting Student Performance.Student performance in readingcomprehension and mathematicsapplications is reported usingdevelopmental scales like studentperformance on the end-of-grade tests atgrades 3-8. In addition, student scoresare reported using percentiles andachievement levels.
Results from the North Carolina HighSchool Comprehensive Test provide.
information about the averagedevelopmental scale scores obtained byall students and specific groups ofstudents. Results from this test alsoprovide information to chart studentperformance or progress in reading ormathematics over time on adevelopmental scale that spirals fromgrade 3 to grade 10 in each area.
In addition, the comprehensive testprovides school and school systems withlimited curricular information to identifyareas of strengths and weaknesses whichmay be used to compare the performanceof a group to the performance of otherschools, school systems, and the state.
Achievement Levels. Achievementlevels allow for a comparison of studentand group performance to standardsbased on what is expected in eachsubject at a grade level. Achievementlevels were determined by relating thejudgement of thousands of NorthCarolina teachers concerning theperformance of each of their student'sperformance (Contrasting GroupsMethod) on the NCHSCT. Theachievement levels for the NCHSCTwere established based on data collectedduring the 1997-98 pilot administration.The ranges are as follows:
Level I: Students performing atthis level do not have sufficient masteryof knowledge and skills in this subjectarea to be successful in the next gradelevel.
Level II: Students performing atthis level demonstrate inconsistentmastery of knowledge and skills that arefundamental in this subject area and thatare minimally sufficient to be successfulat the next grade level.
State Norms-1999 Edition 82 77
Level HI: Students performing atthis level consistently demonstratemastery of grade level subject matter andskills and are well prepared for the nextgrade level.
Level IV: Students performing atthis level consistently perform in asuperior manner clearly beyond thatrequired to be proficient at grade levelwork.
Ranges for Achievement Levels (1998)North Carolina High School Comprehensive Test
Subject/Grade Level I Level II Level III Level IV
Reading 10
Mathematics 10
132-150
141-159
151-162
160-171
163-174
172-188
175-201
189-226
State Norms-1999 Edition 78
STATE NORTH CAROLINA HIGH SCHOOL COMPREHENSIVE TESTREADING COMPREHENSION --- 1998
SUMMARY STATISTICS ON DEVELOPMENTAL SCALE SCORES
NUMBER OF HIGH SCORE 199STUDENTS WITH 73,976VALID SCORES LOW SCORE 132
There is a natural desire to want to usenumbers as if they were absolutelyprecise when making decisions.However, common experiences teachthat most measurements vary a little intheir precision. A carpenter's ruler maynot be as accurate as a scientist's,bathroom scales may not be as accurateas those in a doctor's office, andspeedometers in cars may show speedsthat vary from those captured on a radargun. Decisions can be and are madefrom both levels of precision.
Individuals often use their knowledge ofaccuracy when making the finaldecision. For example, the carpentermay use lumber that has been cut just abit short but reject other cuts because ofthe intended use of the wood. Thedoctor may not make a decision about apatient using just one precisemeasurement but may prefer to use apattern of weight measurements fromtwo scales before suggesting a course oftreatment. The police officer may notticket a speeding offender driving withina few miles per hour of the posted speedlimit because he or she recognizes thepotential for a difference in precision.
Making decisions about students hassimilar problems, with the additionalcomplexity of the measurement ofhuman characteristics such as academicachievement or aptitude. Clearly, it iseasier to measure speed, weight, orlength than intelligence or achievement.Teachers who work with studentsapproximately 180 days each year canonly assign grades into perhaps fivebroad categories of achievement (such asA, B, C, D, or F). They also know thatachievement may vary from day to dayfor a given student. One day a student
may respond in a consistent "well aboveaverage" fashion while the very next dayhe or she may give average or just aboveaverage responses. When it comes to adecision that has more importantconsequences than reporting to parentsabout progress in school, such as failurein a grade or admission to a specialprogram, teachers are oftenuncomfortable depending on their ownjudgment alone. Most prefer to useadditional independent judgments orresults from standardized tests toconfirm these major decisions.
Multiple-choice tests are more reliablethan any single teacher's test, but theyare not perfect instruments. Just asspeedometers, scales, and rulers havevariation in their measurement accuracy,test scores vary due to characteristics ofthe specific test items and the differencesin student performance that occur on adaily basis. In order to help ensure thatall students meet a standard ofperformance, citizens and educatorssometimes feel a need to use a singleobjective score from a standardized testto make important decisions aboutstudents. It is inappropriate to place somuch weight on a single administrationof a single instrument, which takes sucha short time to administer. The NCTesting Code of Ethics, guidelinesadopted by the State Board of Education,has helped many educators make moreappropriate interpretations of test scores.Two items under Reporting areillustrative:
Individual test scores are bestpresented as approximations and notas absolutes. It is preferable to reportresults as a limited score range ratherthan distinct points.
State Norms-1999 Edition 6 82
Test scores are to be presented as onesource of information about a studentor group of students and should notbe used alone for placement, referral,and other consequential decisions onthe education of a student.
These guidelines are consistent withStandards for Educational andPsychological Testing developednationally by the AmericanPsychological Association, the AmericanEducational Research Association, andthe National Council on Measurement inEducation.
For more information on thepsychometric characteristics (e.g.,standard error of measurement) of testsdeveloped by The NCDPI, refer to thefollowing documents:
The process for identifying students forspecial programs in North Carolinafollows guidelines in the NC TestingCode of Ethics by using a combinationof teacher recommendations,standardized tests, and past performance.
The state standard for a CompetencyTest, in a given area, which was alsoset using a combination ofinformation, is currently based onone test. However, the student hasrepeated opportunities to take the testbefore the decision to withholdgraduation with a diploma is made.
Similarly, Grade-Level ProficiencyGuidelines identify students on astandard set by teachers andtranslated to a test score. Underthese guidelines, local policy anddiscretion determine promotion andretention decisions which take intoaccount the student's test score andother information which mayindicate a student's level ofachievement in a grade.
Conclusion
When making decisions that impactstudents, the best plan is one whichincorporates a combination ofinformation about the student. The planshould ask for repeated demonstrationsof the weaknesses or excellence of thestudent in the particular area. Thisconservative approach more nearlymatches that used by the carpenter,doctor, or police officer. It takes intoaccount variation in measurement andchecks for patterns that provide keys forany important decision.
State Norms-1999 Edition 88 83
Glossary of Testing and Other Statistical Terms
Achievement Levels. Predeterminedlevels of achievement that relate thejudgment of North Carolina teachers tothe actual student performance on NorthCarolina tests. The achievement levelsallow for the comparison of student andgroup performance to establish standardsbased on what is expected in a subjectarea at each grade level. Achievementlevels used for North Carolina were setusing the Contrasting Groups Method ofperformance standard setting.
Achievement Test. A test thatmeasures the extent to which a personhas "achieved" something, acquiredcertain information, or mastered certainstandardized sets of skills - usually as aresult of planned instruction or training.
Criterion-Referenced Test. A test thatis designed to provide information on thespecific knowledge or skills possessedby a student. Such tests usually coverrelatively small units and are related toinstruction. Performance is measured inreference to mastery of particular skills.Scores from these tests have meaning interms of what the student knows or cando, rather than in their relation to thescores made by some external referenceor norm group.
Curriculum-Based Test. A test that isdesigned to measure studentperformance in reference to goals andcompetencies specifically defined in acurriculum.
Developmental Scale Scores.Developmental scales are similar to
rulers that allow growth in a subject tobe measured across time.Developmental scale scores are intendedto be interpreted much like height ininches. As is the case for growth inheight, we expect an annual increase inscale scores, but not the same increasefor each year. The rate of growth is a bitfaster in the earlier grades than in thelater grades. A developmental scalescore in end-of-grade reading should notbe compared to a developmental scalescore in end-of-grade mathematics.
Diagnostic Test. A test that is used todiagnose, analyze or locate anindividual's specific areas of weaknessand/or strength to determine the natureof the weaknesses or deficiencies and,where possible, to suggest the cause.Diagnostic tests are most commonlyprepared for the skill subjects.
End-of-Course Tests. Curriculum-based accountability assessments used tomeasure student achievement in specific.high school courses.
End-of-Grade Tests. Achievementtests that provide information about theoverall proficiency of students inspecific content areas at the end of agrade. They measure how much astudent has grown educationally in ayear and whether or not groups ofstudents are performing according tograde level expectations. These tests arecurriculum-based accountability testsand scores are presented asdevelopmental scale scores. The testsprovide limited information about the
State Norms-1999 Edition 69 84
strengths and weaknesses of individualstudents in specific skill areas.Focused Holistic Score. Used inassessments, which evaluate the writingskills of students in grades 4, 7, andEnglish II (grade 10). Studentcompositions are assigned a score thatreflects a trained reader's generalimpression of each composition'squality. The reader focuses on the set ofprescribed composition characteristics(main idea, supporting details,coherence, and organization) and assignsa score based on the level of facility thatthe writer demonstrates.
Frequency. The number of objects,individuals, or events in a category. Forexample, the number of studentsenrolled in an elementary school or in anAlgebra I course; the number of studentsin each grade level; and the number ofstudents in third grade who scored 150on their reading test.
Frequency Distribution. The numberof objects, individuals, or events in eachcategory of a variable. For example, thenumber of third-grade students in XavierElementary School scoring at each scorepoint on the reading EOG test.
Gain. Gain is computed as thedifference between the average studenttest scores in the grade, group or subjectin Year 1 and the average test scores ofthe same grade, group or subject in Year2. For example, the difference betweengrade 3 reading scores in 1993 and grade3 reading scores in 1994; or thedifference between Algebra I test scoresin 1994 and Algebra I test scores in1995.
Growth. The cohort difference in testscore achievement across grades/years.
Growth is computed as the differencebetween the test scores (developmentalscale scores) of a group in Year 1 andthe test scores of the same group in Year2. For example, grade 3 mathematicstest scores in 1993 and grade 4mathematics test scores in 1994.
Mean, Median, Mode. Measures ofcentral tendency. The mean (or average)is the sum of the scores divided by thetotal number of scores. The median isthe middle score in an ordereddistribution of scores. The mode is themost frequently occurring score in thedistribution. The mean is generally thebest measure of central tendency whenscores are normally distributed.
Mean Percent Correct. The average ofthe percentage of test questions studentsanswered correctly.
N or Number. The number of studentswith valid test scores on a specific test,which may also be referred to as "N-counts."
Normal Curve Equivalents (NCEs).The normal curve equivalent (NCE)scale ranges from 1 to 99. NCEs havemany of the same characteristics aspercentile ranks but have the additionaladvantage of being based on an equal-interval scale. That is, the differencebetween two successive scores on thescale has the same meaning throughoutthe scale.
Norm-Referenced Test. A test thatmeasures performance in relation to acomparison group.
Norming Year. The first year a test isadministered; used for baselinecomparative purposes. Percentile scores
State Norms-1999 Edition0 0
85
are computed based on the statedistribution of scores from the normingyear.
Percentiles. A percentile indicates thepercentage of scores that fall below agiven score. A commonly usedpercentile is the median, which is the50th percentile. For example, a score atthe 55th percentile means that the score ishigher than the score achieved by 55percent of all students taking the test inthe year the test was normed. A score atthe 55th percentile does not mean 55% ofthe questions were answered correctly.
Percentile Ranks. The percentile rankof a score is the percentage of scores lessthan or equal to that score. Forexample, the percentile rank of 63 is thepercentage of scores in the distributionthat falls at or below a score of 63. InAlgebra I, the percentile rank of a scoreof 63 is 80. The percentile rank is apoint on the percentile scale, whereas apercentile is a score, a point on theoriginal measurement scale. Percentileranks of scores can be compared acrosstwo or more distributions; percentilescannot be used to determine differencesin relative rank due to the fact that theinterval between adjacent percentiles donot necessarily represent equal raw orscale score intervals.
Range. A measure of variability. Therange is the difference between thehighest and lowest scores in thedistribution.
Raw Score. The number of testquestions a student answers correctly.
Scale Scores. Scores that express howfar a given raw score (number correct) isfrom a reference point, usually the mean
score. Scale scores are units of an equal-interval scale. The difference betweentwo successive scores on the scale hasthe same meaning throughout the scale.The advantage of scale scores is that theyallow an individual's performance to becompared on several different tests,regardless of the raw score scales.
Standard Deviation. A measure ofvariability; an indication of how muchvariation there is in the scores, i.e., howclose together or how spread out th6scores are. The standard deviationmeasures the extent to which scores orevents in a distribution, on the average,deviate from the mean. The standarddeviation is the square root of thevariance.
For example: SAT Verbal Scores arereported as scale scores. The scoresrange from 200 to 800. The mean is500. To determine the standarddeviation of normally distributed scores,divide the range of scores by 6(600/6=100).
Statistical significance = 1/4 of astandard deviation (100/4=25). Usingthe SAT example, a score of 485 iswithin 25 points of the mean. In astatistical context, this score is notsignificantly different from the mean.
Educational significance = 1/3 of astandard deviation (100/3=33.3). Usingthe SAT example, a score of 465 is notwithin 33 points of the mean. In aneducational context, this score is notsignificantly different from the meanscore.
Reliability. The extent to which a testis consistent in measuring whatever itpurports to measure; an indication of
State Norms-1999 Edition91
86
dependability, stability, trustworthiness,and relative freedom from errors ofmeasurement. Reliability is usuallyexpressed as a reliability coefficient orby the standard error of measurementderived from it.
Standard Error of Measurement. Asapplied to a single obtained score, theamount by which the score may differfrom the hypothetical true score due toerrors of measurement. The larger thestandard error of measurement, the lessreliable the score. In about two-thirds ofthe scores in a group, the obtained scorewill not differ by more than one standarderror measurement from the true score.
Standardized Test. A test designed toprovide a systematic sample ofindividual performance, administeredaccording to prescribed directions,scored in conformance with definiterules, and interpreted in reference tocertain normative information.
Test Item Banks and Test lets. Itembanks are pools of items availableelectronically and from which teacherscan select items for use in the classroom.Test item banks are designed to matchthe curriculum goals and objectiveswhich allows the teacher to assessachievement/mastery immediately afterinstruction. Item banks can be used togenerate tests to diagnose studentstrengths and weaknesses.
Testlets are a resource developed byNCDPI for teachers and administratorsto operationalize and implement therevised North Carolina Standard Courseof Study. Testlets are mini-tests thatmeasure specific areas of the curriculum.The same items are found in test item
banks developed by NCDPI and found inthe testlets.
Validity. The extent to which a testmeasures that which it purports tomeasure. All procedures for determiningtest validity are concerned with therelationships between performance onthe test and other independentlyobservable facts. There is content-related validity, criterion-related validity,construct-related validity, and predictivevalidity.
Variance. A measure of variabilityderived by subtracting the mean testscore from each student's test score,squaring the difference, summing thesquares, and dividing by the number ofstudents. If the variance is small, thescores are close together; if the varianceis large, the scores are more spread out.
State Norms-1999 Edition9 4
87
U.S. Department of EducationOf of Educational Research and Improvement (OERI)
National Library of Education (NLE)
Educational Resources Information Center (ERIC)
NOTICE
Reproduction Basis
TM031684
IC
This document is covered by a signed "Reproduction Release(Blanket)" form (on file within the ERIC system), encompassing allor classes of documents from its source organization and, therefore,does not require a "Specific Document" Release form.
This document is Federally-funded, or carries its own permission toreproduce, or is otherwise in the public domain and, therefore, maybe reproduced by ERIC without a signed Reproduction Release form(either "Specific Document" or "Blanket").