Top Banner
John Cronin, Ph.D. John Cronin, Ph.D. Director Director The Kingsbury Center @ NWEA The Kingsbury Center @ NWEA Tests, evaluation and teacher Tests, evaluation and teacher dismissal dismissal
38
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ed Reform Lecture - University of Arkansas

John Cronin, Ph.D.John Cronin, Ph.D.DirectorDirector

The Kingsbury Center @ NWEAThe Kingsbury Center @ NWEA

Tests, evaluation and teacher Tests, evaluation and teacher dismissaldismissal

Page 2: Ed Reform Lecture - University of Arkansas

Presenter - John Cronin, Ph.D.

Contacting us:Rebecca Moore: 503-548-5129E-mail: [email protected]

Tests, evaluation and teacher dismissal

http://www.slideshare.net/JFCronin/ed-reform-lecture-university-of-arkansas

This presentation can be viewed at:

Page 3: Ed Reform Lecture - University of Arkansas

If one objective of evaluation reform was to make it easier to dismiss ineffective teachers, in most states the reforms are likely to make dismissal more difficult.

Page 4: Ed Reform Lecture - University of Arkansas

Problems

• If tests are the controlling evidence in a dismissal, expect expensive battles of experts.

• Title VII claims are likely if evaluation systems have disparate impact. Especially likely in states using less robust models like the Colorado Growth Model.

• Many states implementing evaluation reform have enacted stricter procedural requirements, particularly around classroom observation.

• Rating systems can be manipulated, in favor of and against educators.

• The threats of cheating and gaming are underestimated, and risks are greater as we move to growth measurement.

Page 5: Ed Reform Lecture - University of Arkansas

How tests are used to evaluate teachers and principals

Page 6: Ed Reform Lecture - University of Arkansas

Measurement Issues

Measuring a teacher’s contribution to learning is

inexact.

Page 7: Ed Reform Lecture - University of Arkansas

Measurement Issues

It’s about the measurement…

Page 8: Ed Reform Lecture - University of Arkansas

Tests are not equally accurate for all students

California STAR NWEA MAP

Page 9: Ed Reform Lecture - University of Arkansas

Measurement Issues

It’s about the measurement…AND conditions…

Page 10: Ed Reform Lecture - University of Arkansas

Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress

Bottom quintile Y1&Y2

Top quintile Y1&Y2

Number 59/493 63/493

Percent 12% 13%

r .64 r2 .41

Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)

Reliability of teacher value-added estimates

Page 11: Ed Reform Lecture - University of Arkansas

Range of teacher value-added estimates

Page 12: Ed Reform Lecture - University of Arkansas

Issues in the use of growth and value-added measures

“Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”

Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

Page 13: Ed Reform Lecture - University of Arkansas

Measurement Issues

It’s about the measurement…AND conditions...

AND the model.

Page 15: Ed Reform Lecture - University of Arkansas

“Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates. 

Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).”

Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing & SGP-ing Teacher Dismissal.

Possible racial bias in models

Page 16: Ed Reform Lecture - University of Arkansas

“The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” 

Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes.

Instability at the tails of the distribution

LA Times Teacher #1LA Times Teacher #2

Page 17: Ed Reform Lecture - University of Arkansas

New York City

• Margins of error can be very large• Increasing n doesn't always decrease the

margin of error• The margin of error in math is typically less

than reading

Page 18: Ed Reform Lecture - University of Arkansas

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Page 19: Ed Reform Lecture - University of Arkansas

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Page 20: Ed Reform Lecture - University of Arkansas

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Page 21: Ed Reform Lecture - University of Arkansas

Characteristics of value-added metrics

• Value-added metrics always produce winners and losers.

• Value-added metrics can’t measure progress of the larger group.

• Extreme performance is more likely to have alternate explanations.

Page 22: Ed Reform Lecture - University of Arkansas

Measurement Issues

Moving from the model to the teacher rating

Page 23: Ed Reform Lecture - University of Arkansas

Translating ranked data to ratings - principles

• There is no “science” per se around translating a ranking to a rating. If you call a bottom 40% teacher ineffective that is a judgment.

• The rating process can be politicized.• The process is easy to over-engineer.

Page 24: Ed Reform Lecture - University of Arkansas

New York Rating System

• 60 points assigned from classroom observation• 20 points assigned from state assessment• 20 points assigned from local assessment• A score of 64 or less is rated ineffective.

Page 25: Ed Reform Lecture - University of Arkansas
Page 26: Ed Reform Lecture - University of Arkansas

Connecticut requirements

• Criteria for student growth indicator– Fair to students

• The indicator of academic growth and development is used in such a way as to provide students an opportunity to show that they have met or are making progress in meeting the learning objective. The use of the indicator of academic growth and development is as free as possible from bias and stereotype.

– Fair to teachers • The use of an indicator of academic growth and development is fair when a teacher has the

professional resources and opportunity to show that his/her students have made growth and when the indicator is appropriate to the teacher’s content, assignment and class composition.

– Reliable– Valid– Useful

• The indicator may be used to provide the teacher with meaningful feedback about student knowledge, skills, perspective and classroom experience that may be used to enhance student learning and provide opportunities for teacher professional growth and development.

Page 27: Ed Reform Lecture - University of Arkansas

Connecticut requirements

• Components of the evaluation– Student growth (45%) - including the state test, one non-standardized

indicator, and (optional) one other standardized indicator.• Requires a beginning of the year, mid-year, and end-of year conference

– Teacher practice and performance (40%) – • First and second year teachers – 3 in-class observations• Developing or below standard – 3 in-class observations• Proficient or exemplary – 3 observations of practice, one in-class

– Whole-school learning indicator or student feedback (5%)– Parent or peer feedback (10%)

Page 28: Ed Reform Lecture - University of Arkansas

Connecticut requirements

Requirements for observations1. Facilitate and encourage effective means for multiple in-class visits necessary

for gathering evidence of the quality of teacher practice; 2. Provide constructive oral and written feedback of observations in a timely and

useful manner; 3. Provide on-going calibration of evaluators in the district; 4. Use a combination of formal, informal, announced, and unannounced

observation; 5. Consider differentiating the number of observations related to experience,

prior ratings, needs and goals. 6. Include pre- and post-conferences that include deep professional

conversations that allow evaluators and teachers to set goals, allow administrators to gain insight into the teacher’s progress in addressing issues and working toward their goals, and share evidence each has gathered during the year

Page 29: Ed Reform Lecture - University of Arkansas

Cheating

Atlanta Public SchoolsCrescendo Charter SchoolsPhiladelphia Public SchoolsWashington DC Public SchoolsHouston Independent School DistrictMichigan Public Schools

Page 30: Ed Reform Lecture - University of Arkansas

Unintended Consequences?

• Principals and teachers may game the system, inadvertently or intentionally.

• Many principals and teachers (including good ones) will seek schools or teaching assignments that they think will improve their results.

• Many teachers will seek opportunities to avoid grades with standardized tests.

• Ranking metrics can discourage cooperation among principals and teachers – finding ways to reward teamwork and cooperation are important.

Page 31: Ed Reform Lecture - University of Arkansas

Case Study #1 - Mean value-added performance in mathematics by school – fall to spring

Page 32: Ed Reform Lecture - University of Arkansas

Case Study #1 - Mean spring and fall test duration in minutes by school

Page 33: Ed Reform Lecture - University of Arkansas

Case Study #1 - Mean value-added growth by school and test duration

Page 34: Ed Reform Lecture - University of Arkansas

Differences in fall-spring test durations

Case Study # 2

Differences in growth index score based on fall-spring test durations

Page 35: Ed Reform Lecture - University of Arkansas

Case Study # 2

Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration

How much of summer loss is really summer loss?

Page 36: Ed Reform Lecture - University of Arkansas

Case Study # 2

Differences in fall-spring test duration (yellow-black) andDifferences in growth index scores (green) by school

Page 37: Ed Reform Lecture - University of Arkansas

Negotiated goals – Student Learning Objectives

• Negotiated goals are not likely to be challenging

• Negotiated goals leave a potential for discrimination charges if teachers at a grade level have different improvement expectations.

Page 38: Ed Reform Lecture - University of Arkansas

An alternate approach

• Give primacy to evaluator observation for judging teachers.• Focus mandatory observations on low performers. • Use assessments and value-added measurement to validate

observations.• Require reassessment when observations and assessment

data are in significant misalignment.