Top Banner
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010
28

Iccs Nrc(Feb10)Iccsworkshop Pvs

Feb 18, 2016

Download

Documents

ICCS workshop
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Iccs Nrc(Feb10)Iccsworkshop Pvs

Introduction to plausible values

National Research Coordinators Meeting Madrid, February 2010

Page 2: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Content of presentation

• Rationale for scaling• Rasch model and possible ability

estimates• Shortcomings of point estimates• Drawing plausible values• Computation of measurement error

Page 3: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Rationale for IRT scaling of data

• Summarising data instead of dealing with many single items

• Raw scores or percent correct sample-dependent

• Makes equating possible and can deal with rotated test forms

Page 4: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

The ‘Rasch model’

• Models the probability to respond correctly to an item as

• Likewise, the probability of NOT responding correctly is modelled as

in

innii XP

exp1exp)1(

)exp(11)0(

inniXP

Page 5: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

IRT curves

0

0.5

1

-4 -3 -2 -1 0 1 2 3 4

Page 6: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How might we impute a reasonable proficiency value?

• Choose the proficiency that makes the score most likely– Maximum Likelihood Estimate– Weighted Likelihood Estimate

• Choose the most likely proficiency for the score– empirical Bayes

• Choose a selection of likely proficiencies for the score– Multiple imputations (plausible values)

Page 7: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Maximum Likelihood vs. Raw Score

0

1

2

3

4

5

Proficiency

Scor

e

Page 8: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

The Resulting Proficiency Distribution

Score 0

Score 1

Score 2

Score 3Score 4

Score 5

Score 6

Proficiency on Logit Scale

Page 9: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of Maximum Likelihood Estimates (MLE)• Unbiased at individual level with

sufficient information BUT biased towards ends of ability scale.

• Arbitrary treatment of perfects and zeroes required

• Discrete scale & measurement error leads to bias in population parameter estimates

Page 10: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of Weighted Likelihood Estimates

• Less biased than MLE• Provides estimates for perfect and

zero scores• BUT discrete scale &

measurement error leads to bias in population parameter estimates

Page 11: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Plausible Values

• What are plausible values?

• Why do we use them?

• How to analyse plausible values?

Page 12: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Purpose of educational tests

• Measure particular students(minimise measurement error of individual estimates)

• Assess populations(minimise error when generalising to the population)

Page 13: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Posterior distributionsfor test scores on 6 dichotomous items

Page 14: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Empirical Bayes – Expected A-Priori estimates (EAP)

Page 15: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of EAPs

• Biased at the individual level but unbiased population means (NOT variances)

• Discrete scale, bias & measurement error leads to bias in population parameter estimates

• Requires assumptions about the distribution of proficiency in the population

Page 16: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Plausible Values

Score 0

Score 1

Score 2

Score 3 Score 4

Score 5

Score 6

Proficiency on Logit Scale

Page 17: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Characteristics of Plausible Values

• Not fair at the student level• Produces unbiased population parameter

estimates– if assumptions of scaling are reasonable

• Requires assumptions about the distribution of proficiency

Page 18: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Estimating percentages below benchmark with Plausible Values

Level One Cutpoint

The proportion of plausible values less than the cut-point will be a superior estimator to the EAP, MLE or WLE based values

Page 19: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Methodology of PVs

• Mathematically computing posterior distributions around test scores

• Drawing 5 random values for each assessed individual from the posterior distribution for that individual

Page 20: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

What is conditioning?

• Assuming normal posterior distribution: • Model sub-populations:

X=0 for boyX=1 for girl

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69

2,N

2,N X

2...,N X Y Z

Page 21: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Conditioning Variables• Plausible values should only be

analysed with data that were included in the conditioning (otherwise, results may be biased)

• Aim: Maximise information included in the conditioning, that is use as many variables as possible

• To reduce number of conditioning variables, factor scores from principal component analysis were used in ICCS

• Use of classroom dummies takes between-school variation into account (no inclusion of school or teacher questionnaire data needed)

Page 22: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Plausible values

• Model with conditioning variables will improve precision of prediction of ability (population estimates ONLY)

• Conditioning provides unbiased estimates for modelled parameters.

• Simulation studies comparing PVs, EAPs and WLEs show that– Population means similar results– WLEs (or MLEs) tend to overestimate variances– EAPs tend to underestimate variance

Page 23: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Calculating of measurement error

• As in TIMSS or PIRLS data files, there are five plausible values for cognitive test scales in ICCS

• Using five plausible values enable researchers to obtain estimates of the measurement error

Page 24: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How to analyse PVs - 1

• Estimated mean is the AVERAGE of the mean for each PV

• Sampling variance is the AVERAGE of the sampling variance for each PV

M

iiM 1

ˆ1ˆ

M

iiM 1

2)(

2)( ˆ1ˆ

Page 25: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How to analyse PVs - 2

• Measurement variance computed as:

• Total standard error computed from measurement and sampling variance as:

25

1

2)( ˆˆ

11ˆ

i

iPV M

2 2ˆ ˆ ( )( ) ( )

1ˆ ˆ ˆ(1 )PV PVM

Page 26: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

How to analyse PVs - 3

can be replaced by any statistic for instance:- SD- Percentile- Correlation coefficient- Regression coefficient- R-square- etc.

Page 27: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Steps for estimating both sampling and measurement error

• Compute statistic for each PV for fully weighted sample

• Compute statistics for each PV for 75 replicate samples

• Compute sampling error (based on previous steps)

• Compute measurement error• Combine error variances to calculate

standard error

Page 28: Iccs Nrc(Feb10)Iccsworkshop Pvs

NRCMeetingMadrid

February 2010

Questions or comments?