Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington, Ph.D. – Shaker Heights Russ Brown, Ph.D – CMSD Denis Jarvinen, Ph.D. – Strategic Measurement and Evaluation, Inc.
41
Embed
Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Evaluating Pretest to PosttestScore Differences in CAP Science and
Social Studies Assessments:How Much Growth is Enough?
Denis Jarvinen, Ph.D. – Strategic Measurement and Evaluation, Inc.
Licensing Tests(e.g., Pharmacists)
State Accountability Testing(e.g., Ohio OAA)
CAP Foundation Science andSocial Studies Assessments
One Test and One Standard for PerformancePass or Fail
One Test, Multiple StandardsBelow Basic, Basic, Proficient, Advanced
Two Tests, One StandardEvaluating Growth (How Much?)
Setting Standards for Performance
Looking at Performance Standards
Content-BasedStandards
Goal of standard setting is to determine a level of knowledge and skill judged to
be appropriate for test purpose
Growth-BasedStandards
Goal of standard setting is to use common statistical feature(s) of the data to set a
criteria for acceptable performance
Three Statistic-Based Approaches for Evaluating Growth
of Student Scores
• Using Effect Size
• Using The Score Distribution
• Using the Standard Error of Measurement
Describing and Comparing Approaches
• Data Points Needed
• Calculations Required
• Outcomes Using a Common Set of Student Data
• Advantages and Disadvantages
The Common Data Set
The Common Data Set
The Common Data Set
The Common Data Set
The Common Data Set
Shaker Heights Schools
Prepared by Dale WhittingtonShaker Heights City School District
Ohio Middle Level Annual ConferenceColumbus, Ohio
February 21, 2014
Effect Size for SLO’s and Growth
What is effect size?
• In an educational setting, effect size is one way to measure the effectiveness of a particular intervention.
• Effect size enables us to measure both the improvement (gain) in learner achievement for a group of learners AND at the same time, take into account the variation of student performance.
Adapted from Understanding, using and calculating effect size, Govt of South Australia, Department of Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf
Practical Advantages
• Easy to calculate
• Easy to understand; makes intuitive sense
• Adaptable to different kinds of assessments
• Adaptable to different kinds of ways of considering growth
and goals for SLO’s:
– Shared attribution across the district
– Shared attribution within a school
– Attribution for a specific teacher or group of students
So how do you calculate effect sizes for SLO’s or growth?
Student Pretest Posttest
Denis 40 35
Donna 25 30
Dale 45 50
Russ 30 40
Difference (AKA Gain)
-5
+5
+5
+10
Start with a set of pretest scores and posttest scores for the same students
Calculate the difference between the pretest & posttest for each student
Calculations Continued
• Pretest– Mean: 35.0– SD: 9.1
• Posttest– Mean: 38.8– SD: 8.5
• The average of 9.1 and 8.5 is 8.8
Calculate the means and standard deviations for both tests
Average the Standard Deviations
How to adapt
• If your pretest and posttest are different lengths, convert to a similar scale, like percentages.
• Think about who you are basing your analysis on and use that to decide what standard deviation (SD) to use– Common attribution for district: District SD– Common attribution for school: School SD– Class: Class SD– Specific group, such as economically disadvantaged: the
group’s SD
Use the average standard deviation and the gains to calculate the effect size:
Student Pretest Posttest Gain Effect
Denis 40 35 -5 -.57
Donna 25 30 +5 +.57
Dale 45 50 +5 +.57
Russ 30 40 +10 +1.14
Effect Size=Gain/SD
Interpret your results: Common criteria
Cohen (1969)
• ‘Small’ (.2)o real, but difficult to detecto difference between the heights of 15 year old and 16 year old girls in the US
• ‘Medium’ (.5)o ‘large enough to be visible to the naked eye’o difference between the heights of 14 & 18 year old girls
• ‘Large’ (.8)o ‘grossly perceptible and therefore large’ o difference between the heights of 13 & 18 year old girls
Hattie: “For students moving from one year to the next, the average effect size across all students is 0.40.”
How results differ, depending on attribution and how you tier students
Another Example based on OAA
Resources
• Understanding, using and calculating effect size. Government of South Australia, Department of Education & Child Development,
Applied to 4th Grade Benchmark to OAA (Fall to Spring):
Percent of students
achieving the Goal
Building Growth Rating
Translation IRN Count Mean Value Add Index
90- 100% 5 Above 2 1.96
60-89% 2-4 Met 50 -.68
0-59% 1 Below 13 -1.56
Pros and Cons
+ Students with like scores have like expectations for growth
+ Relatively simple and relatively transparent
- Must make a value judgment about the amount of error for which one wishes to compensate (not so transparent)
- More adjustment = more bias at the bottom
Standard Error of Measurement
All scores have a “true” score and “error”
• Error bands on score reports
Standard Error quantifies degree of “error” in a test score
Formula is: Standard Error of Measurement =
Values needed: Mean, Standard Deviation, Reliability of the Test
Assumptions that underlie this approach
Steps
1) For a set of data, calculate the mean and standard deviation
2) Calculate the reliability of the test
3) Use the formula to determine the Standard Error of Measurement (class
level, school level)
4) Set a level for the growth standard (1 se, 2 se, etc.)
5) Add chosen level of standard error to raw score
6) Convert (raw score + standard error) to percent correct on pretest
7) Find corresponding percent correct/raw score on posttest
(Note: Assumptions here not required once IRT equating is completed)
8) Compare actual student posttest score with target score
9) At or above target score = “Acceptable Progress”
Calculations for one student
Results
Observations
High pretest scores can lead to out-of-range posttest score targets.
Any modification to the sample that increases the Standard Deviation will increase the value of the Standard Error and therefore require more score growth to reach the target.