Top Banner
Andy Hegedus, Ed.D. June 2012 Assessment Literacy in a Teacher Evaluation Frame
48

Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Nov 30, 2014

Download

Education

NWEA

Dr. Andy Hegedus, Senior Manager, Professional Development Data Analytics, NWEA

Fusion 2012, the NWEA summer conference in Portland, Oregon

At times, gaps in educators understanding of assessment data limits the depth of dialogue about the implications of all kinds of uses for data. More and more often people are considering including assessment data as a piece of a formal teacher evaluation process. This is a new and complicated area in which educators are beginning to tread. Using a framework for using data in teacher evaluations, we will reinforce some of what you know about assessment data; answer some questions you may have; and deepen your understanding of the strengths and limitations of assessment data.

Learning Outcome:
- Deepen your understanding of assessment data
- Provide a context when considering using assessment results in teacher evaluation programs

Audience:
- New data user
- Experienced data user
- District leadership
- Curriculum and Instruction
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Andy Hegedus, Ed.D.

June 2012

Assessment Literacy in a

Teacher Evaluation Frame

Page 2: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• How many of you think your literacy with assessments in general is “Good” or better?

• How many of you are currently figuring out how to use assessment data thoughtfully in a Teacher Evaluation process?

Trying to gauge my audience and adjust my speed . . .

Page 3: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• What we’ve known to be true is now being shown to be true– Using data thoughtfully improves student

achievement• There are dangers present however

– Unintended Consequences

Go forth thoughtfullywith care

Page 4: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

“What gets measured (and attended to), gets done”

Remember the old adage?

Page 5: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• NCLB– Cast light on inequities– Improved performance of “Bubble Kids”– Narrowed taught curriculum

An infamous example

Page 6: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

It’s what we do that counts

A patient’s health doesn’t change because we know their blood pressure

It’s our response that makes all the difference

Page 7: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Our nation has moved from a model of education reform that focused on fixing schools to a model that is focused on fixing the teaching profession

Data Use in Teacher Evaluation is our construct for today

Page 8: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Be considerate of the continuum of stakes involved

Support

Compensate

Terminate

Increasing levels of required rigor

Incr

easi

ng r

isk

Page 9: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Growth• Depiction of progress over time along a cross-

grade scale• Value-Added

– A determination of whether growth is greater for a particular student or group of students than would be expected

Let’s get clear on terms

Page 10: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Marcus Normal Growth Needed Growth

Marcus’ growth

College readiness standard

Page 11: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Is the progress produced by this teacher dramatically different than teaching peers who deliver instruction to comparable students in comparable situations?

What question is being answered in support of

using data in evaluating teachers?

Page 12: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

The Test

The Growth Metric

The Evaluation

The Rating

There are four key steps required to answer this question

Page 13: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

The purpose and design of the instrument is significant

• Many assessments are not designed to measure growth

• Others do not measure growth equally well for all students

Page 14: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Both Status and Growth are important

Beginning

Literacy

Adult Reading

5th Grade x

x

Time 1 Time 2

StatusGrowth

Value Added = Teacher Contribution to Growth

Page 15: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Teachers encounter a distribution of student performance

Beginning

Literacy

Adult Reading

5th Grad

e

x x xx

xx

xx

x

x

xx

x

xx

Grade Level Performance

Norm = “Typical” for a reference population

Page 16: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Traditional assessment uses items reflecting the grade level standards

Beginning

Literacy

Adult Reading

4th Grade

5th Grade

6th Grade

Grade Level Standards

Traditional Assessment Item Bank

Page 17: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Traditional assessment uses items reflecting the grade level standards

Beginning

Literacy

Adult Reading

4th Grade

5th Grade

6th Grade

Grade Level Standards

Grade Level StandardsOverlap allows linking and scale construction

Grade Level Standards

Page 18: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Adaptive testing works differently

Item bank can span full range of achievement

Page 19: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Available item pool depthis crucial

Est. RIT

Correct

Incorrect

Page 20: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Tests are not equally accurate for all students

California STAR NWEA MAP

Page 21: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

5th Grade Level Items

These differences impact measurement error

.00

.02

.04

.06

.08

.10

.12

Info

rmat

ion

165 175 185 195 205 215 225 235 245

Scale Score

Academic Warning Below Meets Exceeds

Adaptive Test

Traditional Test

Significantly Different

Error

1st 86th

Page 22: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Think of a high stakes test – State Summative

– Designed to identify if a student is proficient or not

• Do they do that well?• 93% correct on Proficiency determination

• Does it go off design well?• 75% correct on Performance Levels

determination

Error can change your life!

*Testing: Not an Exact Science, Education Policy Brief, Delaware Education Research & Development Center, May 2004, http://dspace.udel.edu:8080/dspace/handle/19716/244

Page 23: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Assessments must align with the teacher’s instructional responsibility– Validity

• Is it assessing what you think it’s assessing?– Reliability

• If we gave it again, would the results be consistent?

What is measured must be aligned to what is being taught

Page 24: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53

• …when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction.

The instrument must be able to detect instruction

Page 25: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53

• When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction.

The more complex, the harder to detect and attribute to one teacher

Page 26: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Security and Cheating

• Proctoring

• Procedures

Other issues

Page 27: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

Spring term Fall term

Mean spring and fall test duration in minutes by school

Dur

atio

n (M

in)

Page 28: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

10.00

Students taking 10+ minutes longer spring than fall All other students

Ten minutes makes a difference ~ one RIT

Gro

wth

In

dex

(RIT

)

Page 29: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Testing is complete . . . What is useful to answer our question?

The Test

The Growth Metric

The Evaluation

The Rating

Page 30: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 80

10

20

30

40

50

60

70

80

90

100

MathReading

The metric matters - Let’s go underneath “Proficiency”

Difficulty of New York “Meets” Level

Nat

iona

l Per

cent

ile

College Readiness

Typical

Page 31: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Mathematics

No ChangeDownUp

Fall RIT

Num

ber o

f Stu

dent

sWhat gets measured and attended to

really does matter

Proficiency College Readiness

One district’s change in 5th grade mathematics performance relative to the KY proficiency cut scores

Page 32: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Mathematics

Below projected growthMet or above pro-jected growth

Student’s score in fall

Nu

mb

er o

f S

tud

ents

Number of 5th grade students meeting projected mathemat-ics growth in the same district

Changing from Proficiency to Growth means all kids matter

Page 33: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

How can we make it fair?

The Test

The Growth Metric

The Evaluation

The Rating

Page 34: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• What if I skip this step?– Comparison is likely against normative data so the

comparison is to “typical kids in typical settings”• How fair is it to disregard context?

– Good teacher – bad school– Good teacher – challenging kids

How does your performance evaluation consider context?

Consider . . .

Page 35: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Value added models control for a variety of classroom, school level, and other conditions– Over one hundred different value added models– All attempt to minimize error– Variables outside controls are assumed as random

• Results are not stable– The use of multiple-years of data is highly

recommended– Results are more likely to be stable at the

extremes

Nothing is perfect

Page 36: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Multiple years of data is necessary for some stability

Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)

Lowest Highest0

20

40

60

80

100

120

Year 1Year 2

Teachers with growth scores in lowest and highest quintile over two years using NWEA’s MAP

(493 teachers)

Vote – Year 2 above or below

Num

ber

of te

ache

rs

Page 37: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Control for statistical error– All models attempt to

address this issue• Error is compounded with

combining two test events

– Nevertheless, many teachers’ value-added scores will fall within the range of statistical error

A variety of errors mean more stability only at the

extremes

Page 38: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

-12.00-11.00-10.00

-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.008.009.00

10.0011.0012.00

Mathematics Growth Index Distribution by Teacher - Validity Filtered

Aver

age

Grow

th In

dex

Scor

e an

d Ra

nge

Q5

Q4

Q3

Q2

Q1

Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed stu-dents who had tests of questionable validity and teachers with fewer than 20 students.

Range of teacher value-added estimates

Page 39: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

With one teacher, error means a lot

Page 40: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Value-added models assume that variation is caused by randomness if not controlled for explicitly– Young teachers are assigned disproportionate

numbers of students with poor discipline records– Parent requests for the “best” teachers are

honored– Sound educational reasons for placement are

likely to be defensible

Assumption of randomness can have risk

implications

Page 41: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Idiosyncratic cases– In self-contained classrooms,

one or two idiosyncratic cases can have a large effect on results

Lower numbers can significantly impact a teacher

level analysis

Page 42: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

How tests are used to evaluate teachers

The Test

The Growth Metric

The Evaluation

The Rating

Page 43: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• How would you translate a rank order to a rating?• Data can be provided

• Value judgment ultimately used to set cut scores for points or rating

Translation into ratings can be difficult to inform with data

Page 44: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

Decisions are value based, not empirical

• What is far below a district’s expectation is subjective

• What about• Obligation to help

teachers improve?• Quality of replacement

teachers?

Page 45: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• System for combining elements and producing a rating is also a value based decision– Multiple measures and principal judgment must be

included– Evaluate the extremes to make sure it makes sense

Even multiple measures need to be used well

Page 46: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Principal evaluation, state test, and local assessment scores are combined– Rating and points generated separately for each category– Principal has 60% of the evaluation

• What happens at the extremes– Low end of Developing (not Ineffective) with test scores

requires 98% rating by principal to not fall to Ineffective• Effective needs 95%

– A highly effective teacher based on test scores needs 50% or higher on Principal evaluation to maintain rating

NY use of multiple measures provides an example

Page 47: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Be thoughtful • Involve variety of stakeholders • Use multiple years of student achievement data• Begin with pilots to understand the accuracy and

unintended consequences• Embrace the formative advantages of growth

measurement as well as the summative

Recommendations

Page 48: Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

• Presentations and other recommended resources are available at: – www.nwea.org– www.kingsburycenter.org

• Contacting us:NWEA Main Number 503-624-1951 E-mail: [email protected]

More information