Top Banner
JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability
17

JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Dec 17, 2015

Download

Documents

Estella Walker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

JENNA PORTERDAVID JELINEK

SACRAMENTO STATE UNIVERSITY

Statistical Analysis of Scorer Interrater

Reliability

Page 2: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Background & Overview

Piloted since 2003-2004Began with our own version of training &

calibrationOverarching question: Scorer interrater reliability

i.e., Are we confident that the TE scores our candidates receive are consistent from one scorer to the next?

If not -- the reasons? What can we do out it?

Jenna: Analysis of data to get these questionsThen we’ll open it up to “Now what?” “Is there

anything we can/should do about it?” “Like what?”

Page 3: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Presentation Overview

• Is there interrater reliability among our PACT scorers?

• How do we know?• Two methods of analysis

• Results

• What do we do about it?• Implications

Page 4: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Data Collection

• 8 credential cohorts– 4 Multiple subject and 4 Single subject

• 181 Teaching Events Total – 11 rubrics (excludes pilot Feedback rubric)– 10% randomly selected for double score– 10% TEs were failing and double scored

• 38 Teaching Events were double scored (20%)

Page 5: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Scoring Procedures

Trained and calibrated scorers University Faculty Calibrate once per academic year

Followed PACT Calibration Standard-Scores must: Result in same pass/fail decision (overall) Have exact agreement with benchmark at least 6 times Be within 1 point of benchmark

All TEs scored independently once If failing, scored by second scorer and evidence

reviewed by chief trainer

Page 6: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Methods of Analysis

Percent Agreement Exact Agreement Agreement within 1 point Combined (Exact and within 1 point)

Cohen’s Kappa (Cohen, 1960) Indicates percentage of agreement accounted for

from raters above what is expected by chanceCohen (1960).

Cohen, J. (1960). A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 (1), pp.37–46.

Page 7: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Percent Agreement

Benefits Easy to Understand

Limitations Does not account for chance agreement Tends to overestimate true agreement (Berk, 1979;

Grayson, 2001).

Berk, R. A. (1979). Generalizability of behavioral observations: A clarification of interobserver agreement and interobserver reliability. American Journal of Mental Deficiency, 83, 460-472.

Grayson, K. (2001). Interrater Reliability. Journal of Consumer Psychology, 10, 71-73.

Page 8: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Cohen’s Kappa

Benefits Accounts for chance agreement Can be used to compare across different conditions

(Ciminero, Calhoun, & Adams, 1986).

Limitations Kappa may decrease if low base rates, so need at least

10 occurances (Nelson and Cicchetti, 1995)

  Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (Eds.). (1986). Handbook of behavioral assessment (2nd ed.).

New York: Wiley. Nelson, L. D., & Cicchetti, D. V. (1995). Assessment of emotional functioning in brain impaired individuals.

Psychological Assessment, 7, 404–413.

Page 9: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Kappa coefficient

Kappa= Proportion of observed agreement – chance agreement

1- chance agreement

Coefficient ranges from -1.0 (disagreement) to 1.0 (perfect agreement)

Altman, D.G. (1991). Practical Statistics for Medical Research. London: Chapman and Hall.Fleiss, J. L. (1981). Statistical methods for rates and proportions. NY:Wiley.Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33:159-

174.

Page 10: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Percent Agreement

Page 11: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Percent Agreement

Page 12: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Pass/Fail Disagreement

Page 13: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Cohen’s Kappa

Page 14: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Interrater Reliability Compared

Page 15: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Implications

Overall interrater reliability poor to fair Consider/reevaluate protocol for calibration Calibration protocol may be interpreted differently Drifting scorers?

Use multiple methods to calculate interrater reliabilityOther?

Page 16: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

How can we increase interrater reliability?

Your thoughts . . . Training protocol

Adding “Evidence Based” Training- Jeanne Stone, UCI More calibration

Page 17: JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Contact Information

Jenna Porter [email protected]

David Jelinek [email protected]