Top Banner

of 16

3.3.interrater

Apr 03, 2018

Download

Documents

Bambang Adi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/28/2019 3.3.interrater

    1/16

    Funded through the ESRCs ResearcherDevelopment Initiative

    Prof. Herb Marsh

    Ms. Alison OMara

    Dr. Lars-Erik Malmberg

    Department of Education,University of Oxford

    Session 3.3: Inter-rater reliability

  • 7/28/2019 3.3.interrater

    2/16

    Establishresearchquestion

    Definerelevantstudies

    Develop codematerials

    Locate andcollate studies

    Pilot coding;coding

    Data entryand effect size

    calculation

    Main analysesSupplementary

    analyses

  • 7/28/2019 3.3.interrater

    3/16

    Aim of co-judge procedure, to discern:Consistency within coder

    Consistency between coders

    Take care when making inferences based on littleinformation,

    Phenomena impossible to code become missingvalues

    Interrater reliability

  • 7/28/2019 3.3.interrater

    4/16

    Percent agreement: Common but not recommended

    Cohens kappa coefficientKappa is the proportion of the optimum improvement

    over chance attained by the coders, 1 = perfectagreement, 0 = agreement is no better than that expectedby chance, -1 = perfect disagreementKappas over .40 are considered to be a moderate level of

    agreement (but no clear basis for this guideline)

    Correlation between different raters

    Intraclass correlation. Agreement among multipleraters corrected for number of raters usingSpearman-Brown formula (r)

    Interrater reliability

  • 7/28/2019 3.3.interrater

    5/16

    Percent exact agreement =

    Number of

    observations agreed onTotal number of

    observations

    Interrater reliability of categorical IV (1)

    Categorical IV with 3 discreetscale-steps

    9 ratings the same

    % exact agreement = 9/12 =.75

    Study Rater 1 Rater 2

    1 0 0

    2 1 1

    3 2 1

    4 1 1

    5 1 1

    6 2 2

    7 1 1

    8 1 1

    9 0 0

    10 2 1

    11 1 0

    12 1 1

  • 7/28/2019 3.3.interrater

    6/16

    Interrater reliability of categorical IV (2)unweighted Kappa

    544.451.1

    451.750.

    451.12/)]1)(3()8)(7()3)(2[(

    75.12/)162(

    ,1

    2

    K

    P

    P

    P

    PPK

    E

    O

    E

    EO

    Rater 1

    0 1 2 SumRater 2 0 2 0 0 2

    1 1 6 0 7

    2 0 2 1 3

    Sum 3 8 1 12

    Kappa:

    Positive values indicate

    how much the raters

    agree over and abovechance alone

    Negative values indicate

    disagreement

    If agreement

    matrix is irregularKappa will not be

    calculated, or

    misleading

  • 7/28/2019 3.3.interrater

    7/16

    Interrater reliability of categorical IV (3)unweighted Kappa in SPSS

    Symmetric Measures

    .544 .220 2.719 .007

    12

    KappaMeasure of Agreement

    N of Valid Cases

    Value

    Asymp.

    Std. Errora

    Approx. Tb

    Approx. Sig.

    Not assuming the null hypothesis.a.

    Using the asymptotic standard error assuming the null hypothesis.b.

    CROSSTABS/TABLES=rater1 BY rater2

    /FORMAT= AVALUE TABLES

    /STATISTIC=KAPPA

    /CELLS= COUNT

    /COUNT ROUND CELL .

  • 7/28/2019 3.3.interrater

    8/16

    Interrater reliability of categorical IV (4)Kappas in irregualar matrices

    If rater 2 is systmatically above rater 1 when coding anordinal scale, Kappa will be misleading possible to fill

    up with zeros

    Rater 11 2 3 Sum

    Rater 2 2 4 1 0 5

    3 3 6 1 10

    4 0 3 7 10

    Sum 7 10 8 25

    Rater 11 2 3 4 Sum

    Rater 2 1 0 0 0 0 0

    2 4 1 0 0 5

    3 3 6 1 0 10

    4 0 3 7 0 10

    Sum 7 10 8 0 25

    K= .51 K= -.16

  • 7/28/2019 3.3.interrater

    9/16

    Interrater reliability of categorical IV (5)Kappas in irregular matrices

    If there are no observations in some row or column,Kappa will not be calculated possible to fill up with

    zeros

    Rater 1

    1 3 4 SumRater 2 1 4 0 0 4

    2 2 1 0 3

    3 1 3 2 6

    4 0 1 4 5

    Sum 7 5 6 18

    Rater 1

    1 2 3 4 SumRater 2 1 4 0 0 0 4

    2 2 0 1 0 3

    3 1 0 3 2 6

    4 0 0 1 4 5

    Sum 7 0 5 6 18

    Knot possible to

    estimateK= .47

  • 7/28/2019 3.3.interrater

    10/16

    Interrater reliability of categorical IV (6)weighted Kappa using SAS macro

    PROCFREQDATA = int.interrater1 ;TABLES rater1 * rater2 / AGREE;

    TESTKAPPA; RUN;

    Papers and macros

    available for estimatingKappa when unequal or

    misaligned rows and

    columns, or multiple raters:

    eii

    oiiW

    pw

    pwK

    1

  • 7/28/2019 3.3.interrater

    11/16

    Interrater reliability of continuous IV (1)

    Average correlation r= (.873 + .879 + .866) / 3 = .873

    Coders code in same direction!

    Correlations

    1 .873** .879**

    .000 .000

    12 12 12

    .873** 1 .866**

    .000 .000

    12 12 12

    .879** .866** 1

    .000 .00012 12 12

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    Pearson Correlat ion

    Sig. (2-tail ed)

    N

    rater1

    rater2

    rater3

    rater1 rater2 rater3

    Correlation is significant at the 0.01 level (2-tailed).**.

    Study Rater 1 Rater 2 Rater 3

    1 5 6 52 2 1 2

    3 3 4 4

    4 4 4 4

    5 5 5 5

    6 3 3 4

    7 4 4 48 4 3 3

    9 3 3 2

    10 2 2 1

    11 1 2 1

    12 3 3 3

  • 7/28/2019 3.3.interrater

    12/16

    Interrater reliability of continuous IV (2)

    Estimates of Covariance Parametersa

    .222222

    1.544613

    Parameter

    Residual

    VarianceIntercept [subject = study]

    Estimate

    Dependent Variable : rating.a.

    874.0767.1

    544.1

    222.0544.1

    544.1

    22

    2

    WB

    BICC

  • 7/28/2019 3.3.interrater

    13/16

    Interrater reliability of continuous IV (3)

    Design 1 one-way random effects model when each

    study is rater by a different pair of codersDesign 2 two-way random effects model when a

    random pair of coders rate all studies

    Design 3 two-way mixed effects model ONE pair ofcoders rate all studies

  • 7/28/2019 3.3.interrater

    14/16

    Comparison of methods (from Orwin, p. 153;in Cooper & Hedges, 1994)

    Low Kappa but good AR when little variability

    across items, and coders agree

  • 7/28/2019 3.3.interrater

    15/16

    Interrater reliability in meta-analysis andprimary study

  • 7/28/2019 3.3.interrater

    16/16

    Meta-analysis: coding of independent variables

    How many co-judges?

    How many objects to co-judge? (sub-sample ofstudies, versus sub-sample of codings)

    Use of Golden standard (i.e., one master-coder)Coder drift (cf. observer drift): are coders

    consistent over time? Your qualitative analysis is only as good as the

    quality of your categorisation of qualitative data

    Interrater reliability in meta-analysis vs. inother contexts