Using Assessment Data for Educator and Student Growth

Andy Hegedus, Ed.D. Kingsbury Center at NWEA

June 2014

Using Assessment Data for Educator and

Student Growth

• Increase your understanding about various urgent assessment related topics– Ask better questions– Useful for making all types of decisions with

data

My Purpose

1. Alignment between the content assessed and the content to be taught

2. Selection of an appropriate assessment• Used for the purpose for which it was designed

(proficiency vs. growth)• Can accurately measure the knowledge of all students• Adequate sensitivity to growth

3. Adjust for context/control for factors outside a teacher’s direct control (value-added)

Three primary conditions

1. Assessment results used wisely as part of a dialogue to help teachers set and meet challenging goals

2. Use of tests as a “yellow light” to identify teachers who may be in need of additional support or are ready for more

Two approaches we like

• What we’ve known to be true is now being shown to be true– Using data thoughtfully improves student

achievement and growth rates– 12% mathematics, 13% reading

• There are dangers present however– Unintended Consequences

Go forth thoughtfullywith care

Slotnik, W. J. , Smith, M. D., It’s more than money, February 2013, retrieved from http://www.ctacusa.com/PDFs/MoreThanMoney-report.pdf

“What gets measured (and attended to), gets done”

Remember the old adage?

• NCLB– Cast light on inequities– Improved performance of “Bubble Kids”– Narrowed taught curriculum

The same dynamic happens inside your schools

An infamous example

It’s what we do that counts

A patient’s health doesn’t change because we know their blood pressure

It’s our response that makes all the difference

Be considerate of the continuum of stakes involved

Support

Compensate

Terminate

Increasing levels of required rigor

Incr

easi

ng r

isk

Marcus Normal Growth Needed Growth

Marcus’ growth

College readiness standard

The Test

The Growth Metric

The Evaluation

The Rating

There are four key steps required to answer this question

Top-Down Model

Assessment 1

Goal Setting

Assessment(s)

Results and Analysis

Evaluation (Rating)

How does the otherpopular process work?

Bottom-Up Model(Student Learning Objectives)

Understanding all four of the top-down elements are needed here

The Test

The Growth Metric

The Evaluation

The Rating

Let’s begin at the beginning

3rd Grade ELA

Standards

3rd Grade ELA

Teacher?

3rd Grade Social

Studies Teacher?

Elem. Art Teacher?

What is measured should be aligned to what is to be taught

1. Answer questions to demonstrate understanding of text….

2. Determine the main idea of a text….

3. Determine the meaning of general academic and domain specific words…

Would you use a general reading assessment in the evaluation of a….

~30% of teachers teach in tested subjects and gradesThe Other 69 Percent: Fairly Rewarding the Performance of Teachers of Nontested Subjects and Grades, http://www.cecr.ed.gov/guides/other69Percent.pdf

• Assessments should align with the teacher’s instructional responsibility– Specific advanced content

• HS teachers teaching discipline specific content – Especially 11th and 12th grade

• MS teachers teaching HS content to advanced students

– Non-tested subjects• School-wide results are more likely “professional

responsibility” rather than reflecting competence

– HS teachers providing remedial services

What is measured should be aligned to what is to be taught

• Many assessments are not designed to measure growth

• Others do not measure growth equally well for all students

The purpose and design of the instrument is significant

http://commons/C19/Marketing/NWEA%20Images/Photos%20for%20Use/boywithpuzzle2.JPG

Let’s ensure we have similar meaning

Beginning

Literacy

Adult Reading

5th Grade x

x

Time 1 Time 2

StatusGrowth

Two assumptions:1. Measurement accuracy,

and2. Vertical interval scale

Accurately measuring growth

depends on accurately measuring

achievement

Questions surrounding the

student’s achievement level

The more questions the

merrier

What does it take to accurately measure achievement?

Teachers encounter a distribution of student performance

Beginning

Literacy

Adult Reading

5th Grad

e

x x xx

xx

xx

x

x

xx

x

xx

Grade Level Performance

Adaptive testing works differently

Item bank can span full range of achievement

How about accurately measuring height?

What if the yardstick stopped in the middle of his back?

Items available need to match student ability

California STAR NWEA MAP

How about accurately measuring height?

What if we could only mark within a pre-defined six inch range?

5th Grade Level Items

These differences impact measurement error

.00

.02

.04

.06

.08

.10

.12

Info

rmati

on

170 180 190 200 210 220 230 240Scale Score

Fully Adaptive Test

Significantly Different Error

160

Constrained Adaptive or

Paper/PencilTest

To determine growth, achievement

measurements must be related through

a scale

If I was measured as:5’ 9”

And a year later I was:1.82m

Did I grow?Yes. ~ 2.5”

How do you know?

Let’s measure height again

Traditional assessment uses items reflecting the grade level standards

Beginning

Literacy

Adult Reading

4th Grade

5th Grade

6th Grade

Grade Level Standards

Traditional Assessment Item Bank

Traditional assessment uses items reflecting the grade level standards

Beginning

Literacy

Adult Reading

4th Grade

5th Grade

6th Grade


Grade Level StandardsOverlap allows linking and scale construction


Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53

• …when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction.

The instrument must be able to detect instruction

Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53

• When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction.

The more complex, the harder to detect and attribute to one teacher

• Tests specifically designed to inform classroom instruction and school improvement in formative ways

No incentive in the system for inaccurate data

Using tests in high stakes ways creates new dynamic

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

10.00

Students taking 10+ minutes longer spring than fall All other students

New phenomenon when used as part of a compensation program

Mean value-added growth by school

Cheating

Atlanta Public SchoolsCrescendo Charter SchoolsPhiladelphia Public SchoolsWashington DC Public SchoolsHouston Independent School DistrictMichigan Public Schools

http://www.ajc.com/news/atlanta/atlanta-public-schools-cheating-1026035.html

http://www.huffingtonpost.com/2011/03/02/crescendo-charter-schools-cheating_n_830415.html

http://articles.philly.com/2012-03-11/news/31145463_1_city-charter-schools-vanguard-schools-education-secretary-ronald-tomalis

http://www.huffingtonpost.com/2011/07/08/dc-schools-release-test-s_n_893235.html

http://www.freep.com/article/20110308/NEWS05/103080333/Michigan-review-suspect-school-test-scoreshttp:/www.freep.com/article/20110308/NEWS05/103080333/Michigan-review-suspect-school-test-scores

When teachers are evaluated on growth using a once per year assessment, one teacher who cheats disadvantages the next teacher

Other consequence

• Both a proctor and the teacher should be presenting during testing– Teacher can best guide students and ensure effort– Proctor protects integrity of results and can

support defense of teacher if results are challenged

• Have all student test each term– Need two terms to determine growth– More student aggregated the more you know

Proctoring

• Important for reliable test data particularly when determining growth

• Use Testing Condition Indicators as KPIs – Accuracy, duration, changes in duration– Formative conversations to improve over time

• Short test durations are worth considering follow-up– Apply criteria each test event

• Be concerned more with consistency in test duration than duration itself

Consistent Testing Conditions

• Pause or terminate before completion– Preferred option – Address when problems are

identified– Not subject to challenge that student retested

simply because the score wasn’t good enough• Monitor students as testing is going on

– Ensure effort– Support students as they struggle – G&T

• Show that accurate data is important

Early Intervention

• Define “Significant” decline between test events– Apply significant decline criteria each test term

• Simply missing cut score is not an acceptable reason to retest

Retesting

Testing is complete . . . What is useful to answer our question?

The Test

The Growth Metric

The Evaluation

The Rating

Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 80

10

20

30

40

50

60

70

80

90

100

ReadingMath

The metric matters - Let’s go underneath “Proficiency”

Difficulty of New York Cut Score Between Level 2 and 3

Nat

iona

l Per

cent

ile

College Readiness

A study of the alignment of the NWEA RIT scale with the New York State (NYS) Testing Program, November 2013

Difficulty of ACT college readiness standards

The metric matters - Let’s go underneath “Proficiency”

Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Center at NWEA, July 2011

Mathematics

No ChangeDownUp

Fall RIT

Num

ber o

f Stu

dent

sWhat gets measured and attended to

really does matter

Proficiency College Readiness

One district’s change in 5th grade mathematics performance relative to the KY proficiency cut scores

Mathematics

Below projected growthMet or above pro-jected growth

Student’s score in fall

Nu

mb

er o

f S

tud

ents

Number of 5th grade students meeting projected mathemat-ics growth in the same district

Changing from Proficiency to Growth means all kids matter

• What did you just learn?• How will you change what you typically

do?

Guiding Questions

How can we make it fair?

The Test

The Growth Metric

The Evaluation

The Rating

Without context what is “Good”?

Beginning Reading

Adult Literacy

Nati

onal

Pe

rcen

tile

Norms StudyScale

Colle

ge R

eadi

ness

Be

nchm

arks

ACT

Perf

orm

ance

Lev

els

State Test

“Meets”Proficiency

Perf

orm

ance

Lev

els

Common Core

Proficient

Normative data for growth is a bit different

Fall Score

Subject: Reading

Grade: 4th

7 points

FRL vs. non-FRL?

IEP vs. non-IEP?

ESL vs. non-ESL?

Outside of a teacher’s direct control

Starting Achievement

Instructional Weeks

Basic Factors

Typical growth

60%20%

20%

APPRObservations State Test Growth EA Value-Added

How did we address requirements in New York?

State Tested Grades / Subjects (4-8 Math and Reading)

Other Grades / Subjects for which there is an available non-state test

60%20%

20%

APPRObservations Local Measure 2 EA Value-Added

Value-Added

Value-Added

Local Measure 2

(SLO)

State Test

Growth

Partnered with Education Analytics on VAM

The Oak Tree Analogy* – a conceptual introduction to the metric

*Developed at the Value-Added Research Center

An Introduction to Value-Added

The Oak Tree Analogy

Gardener A Gardener B

Explaining Value-Added by Evaluating Gardener Performance

• For the past year, these gardeners have been tending to their oak trees trying to maximize the height of the trees.

This method is analogous to using an Achievement Model.


61 in.

72 in.

Method 1: Measure the Height of the Trees Today (One Year After the Gardeners Began)

• Using this method, Gardener B is the more effective gardener.

61 in.

72 in.Gardener A Gardener B

Oak AAge 4

(Today)

Oak BAge 4

(Today)

Oak AAge 3

(1 year ago)

Oak BAge 3

(1 year ago)

47 in.52 in.

This Achievement Result is not the Whole Story

• We need to find the starting height for each tree in order to more fairly evaluate each gardener’s performance during the past year.

This is analogous to a Simple Growth Model, also called Gain.

61 in.


Oak AAge 4

(Today)

Oak BAge 4

(Today)

Oak AAge 3

(1 year ago)

Oak BAge 3

(1 year ago)

47 in.52 in.+14 in. +20 in

.

Method 2: Compare Starting Height to Ending Height

• Oak B had more growth this year, so Gardener B is the more effective gardener.


What About Factors Outside the Gardener’s Influence?

• This is an “apples to oranges” comparison.• For our oak tree example, three environmental factors we will examine are:

Rainfall, Soil Richness, and Temperature.

External condition Oak Tree A Oak Tree B

Rainfall amount

Soil richness

Temperature

High LowLow HighHigh Low



How Much Did These External Factors Affect Growth?

• We need to analyze real data from the region to predict growth for these trees.• We compare the actual height of the trees to their predicted heights to determine

if the gardener’s effect was above or below average.

In order to find the impact of rainfall, soil richness, and temperature, we will plot the growth of each individual oak in the region compared to its environmental conditions.

Rainfall Low Medium HighGrowth in inches

relative to the average

-5 -2 +3

Soil Richness Low Medium HighGrowth in inches


-3 -1 +2

Temperature Low Medium HighGrowth in inches


+5 -3 -8

Calculating Our Prediction Adjustments Based on Real Data

Oak AAge 3

(1 year ago)

Oak BAge 3

(1 year ago)

67 in.72 in.Gardener A Gardener B

Oak APrediction

Oak BPrediction

47 in.52 in.

+20 Average+20 Average

Make Initial Prediction for the Trees Based on Starting Height

• Next, we will refine out prediction based on the growing conditions for each tree. When we are done, we will have an “apples to apples” comparison of the gardeners’ effect.

70 in. 67 in.Gardener A Gardener B

47 in.52 in.


+ 3 for Rainfall - 5 for Rainfall

Based on Real Data, Customize Predictions based on Rainfall

• For having high rainfall, Oak A’s prediction is adjusted by +3 to compensate.• Similarly, for having low rainfall, Oak B’s prediction is adjusted by -5 to compensate.

67 in.69 in.Gardener A Gardener B

47 in.52 in.


+ 3 for Rainfall

- 3 for Soil + 2 for Soil

- 5 for Rainfall

Adjusting for Soil Richness

• For having poor soil, Oak A’s prediction is adjusted by -3.• For having rich soil, Oak B’s prediction is adjusted by +2.

59 in.


47 in.52 in.


+ 3 for Rainfall


- 8 for Temp + 5 for Temp

- 5 for Rainfall

Adjusting for Temperature

• For having high temperature, Oak A’s prediction is adjusted by -8.• For having low temperature, Oak B’s prediction is adjusted by +5.


+ 3 for Rainfall


- 8 for Temp + 5 for Temp_________+12 inchesDuring the year

_________+22 inches During the year

59 in.


47 in.52 in.

- 5 for Rainfall

Our Gardeners are Now on a Level Playing Field

• The predicted height for trees in Oak A’s conditions is 59 inches.

• The predicted height for trees in Oak B’s conditions is 74 inches.

PredictedOak A

PredictedOak B

ActualOak A

ActualOak B

59 in.

74 in.Gardener A Gardener B61 in.

72 in.+2-2

Compare the Predicted Height to the Actual Height

• Oak A’s actual height is 2 inches more than predicted. We attribute this to the effect of Gardener A.• Oak B’s actual height is 2 inches less than predicted. We attribute this to the effect of Gardener B.

This is analogous to a Value-Added measure.

Above Average

Value-Added

Below Average

Value-Added

PredictedOak A

PredictedOak B

ActualOak A

ActualOak B

59 in.

74 in.Gardener A Gardener B61 in.

72 in.+2-2

Method 3: Compare the Predicted Height to the Actual Height

• By accounting for last year’s height and environmental conditions of the trees during this year, we found the “value” each gardener “added” to the growth of the trees.

Gardener A

Value-Added is a Group Measure

• To statistically isolate a gardener’s effect, we need data from many trees under that gardener’s care.

Gardener B

Oak Tree Analogy Value-Added in Education

What are we evaluating?

• Gardeners • Districts• Schools• Grades• Classrooms• Programs and Interventions

How does this analogy relate to value added in the education context?

What are we using to measure success?

• Relative height improvement in inches

• Relative improvement on standardized test scores

Sample • Single oak tree • Groups of students

Control factors • Tree’s prior height

• Other factors beyond the gardener’s control:

• Rainfall• Soil richness• Temperature

• Students’ prior test performance (usually most significant predictor)

• Other demographic characteristics such as:

• Grade level• Gender• Race / Ethnicity• Low-Income Status• ELL Status• Disability Status• Section 504 Status

• What if I skip this step?– Comparison is likely against normative data

so the comparison is to “typical kids in typical settings”

• How fair is it to disregard context?– Good teacher – bad school– Good teacher – challenging kids

Consider . . .

• Control for measurement error– All models attempt to address

this issue• Population size• Multiple data points

– Error is compounded with combining two test events

– Many teachers’ value-added scores will fall within the range of statistical error

A variety of errors means more stability only at the extremes

-12.00-11.00-10.00

-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.008.009.00

10.0011.0012.00

Mathematics Growth Index Distribution by Teacher - Validity Filtered

Aver

age

Grow

th In

dex

Scor

e an

d Ra

nge

Q5

Q4

Q3

Q2

Q1

Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed stu-dents who had tests of questionable validity and teachers with fewer than 20 students.

Range of teacher value-added estimates

With one teacher, error means a lot

Because we want studentsto learn more!

• Research view–Setting goals improves performance

Why should we care about goal setting in education?

What does research say on goal setting?

Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American Psychological Association.

Goals

Moderators

Mechanisms

Performance

Satisfaction with

Performance and Rewards

Willingness to commit

Essential Elements of Goal-Setting Theory and the High-Performance Cycle

What does research say on goal setting?

Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American Psychological Association.

Goals

Moderators

Mechanisms

Performance

Satisfaction with

Performance and Rewards

Willingness to commit

Essential Elements of Goal-Setting Theory and the High-Performance Cycle

• Specificity• Difficulty

– Performance and learning goals

– Proximal goals

Goals

Goals Explanation

• Specific goals are typically stronger than “Do your best” goals

• Moderately challenging is better than too easy or too hard– If complex and new knowledge or

skills needed, set learning goals• Master five new ways to assess each

student’s learning in the moment

– If complex, set short term goals to gauge progress and feel rewarded

• Lack of a historical context– What has this teacher and these students done in

the past?• Lack of comparison groups

– What have other teachers done in the past?• What is the objective?

– Is the objective to meet a standard of performance or demonstrate improvement?

• Do you set safe goals or challenging goals?

Challenges with goal setting

• Goals and targets themselves– Appropriately balance moderately

challenging goals with consequences • Only use “Stretch” goals for the organization to

stimulate creativity and create unconventional solutions

Suggestions

Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.

• Goals and targets themselves (cont.)– Set additional learning goals if complex and

new– Set interim benchmarks for progress

monitoring– Carefully consider what will not happen to

attain the goal• Can you live with the consequences?• How will you look for other unintended ones?

Suggestions

Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.

How tests are used to evaluate teachers

The Test

The Growth Metric

The Evaluation

The Rating

• How would you translate a rank order to a rating?• Data can be provided

• Value judgment ultimately the basis for setting cut scores for points or rating

Translation into ratings can be difficult to inform with data

http://commons/C19/Marketing/NWEA%20Images/Photos%20for%20Use/binder.JPG

• What is far below a district’s expectation is subjective

• What about• Obligation to help

teachers improve?• Quality of replacement

teachers?

Decisions are value based, not empirical

• System for combining elements and producing a rating is also a value based decision– Multiple measures and principal judgment

must be included– Evaluate the extremes to make sure it

makes sense

Even multiple measures need to be used well

Leadership Courage Is A Key

Teacher 1 Teacher 2 Teacher 30

1

2

3

4

5

Ratings can be driven by the assessment

Observation Assessment

Real or Noise?

If evaluators do not differentiate their ratings,

then all differentiation comes from the test

Big Message

1. Alignment between the content assessed and the content to be taught

2. Selection of an appropriate assessment• Used for the purpose for which it was designed

(proficiency vs. growth)• Can accurately measure the knowledge of all students• Adequate sensitivity to growth

3. Adjust for context/control for factors outside a teacher’s direct control (value-added)

Please be thoughtful about . . .

• Presentations and other recommended resources are available at: – www.nwea.org– www.kingsburycenter.org– www.slideshare.net

• Contacting us:NWEA Main Number 503-624-1951 E-mail: [email protected]

More information

http://www.kingsburycenter.org/

http://www.slideshare.net/

mailto:[email protected]

Using Assessment Data for Educator and Student Growth

Education

grade x x time

student growth

grade ms teachers

growth metric

grade level items

performance of teachers

growth rates

grade ela standards