Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.

Examing Rounding Rules in Angoff Type Standard Setting Methods

Adam E. WyseMark D. Reckase

Mark D. Reckase• Current Projects• Multidimensional Item Response Theory

– Development of methodology for fine grained analysis of item response data in high dimensional spaces. Application of methodology to gain understanding of constructs assessed by tests.

• Test Design and Construction – Design of content and statistical specifications for tests using the

philosophy of item response theory. Use of computerized test assembly procedures to match test specifications.

• Portfolio Assessment – Design of portfolio assessment systems, including formal objective

scoring of portfolios. • Procedures for Setting Standards

– Development and evaluation of procedures for setting standards on educational and psychological tests. Includes extensive work on setting standards on the National Assessment of Educational Progress.

• Computerized Adaptive Testing – Developing procedures for selecting and administering test items

to individuals using computer technology. In particular, designing systems to match item selection to the specific requirements for test use.

Angoff Method

• The probability of the minimally competent examinee (MCE) would respond correctly to the item

Modified Angoff Method (1)

• Round to a whole number of score point (Yes/No method)

Polytom

ous

Dichotom

ous


• Rate the MCE score of each cluster of items.-Round to 1 decimal place -round to integer


• How to aggregate those rater’s judgment– Mean or median (for excluding the effect of outliner)

mean median18.166

7 18.4

20.8833 21

Theoretical Framework

• Reckase 2006 Round to integer

Round to 0.05

Perfectly understand the relation between Item difficulty and Cut theta


• Reckase 2006

Round to 1 decimal place

Round to 2 decimal places


• Bias– Individual panelists cut-score– Group level cut-scores: mean or median.

• Other evidence for evaluating Standard Setting– Correlation: item ratings and P values provided by

panelists• Can’t detect the panelists’ servility• Errors can be incorporated into Reckase evaluation

approach.


• Assumption– Only for single round (Without training effect)– Do not include error (In an ideal setting)

• Investigate the impact of the Angoff modifications and rounding rules in the ideal situation.

Data and Method• NEAP Data– 20 raters last round– The panelist’s θ cut-score in NEAP was his

intended cut-score.• 2PL• 3PL• GPCM:

E(X|θ)=1*P1(θ)+2*P2(θ)+3*P3(θ)+4*P4(θ)

http://en.wikipedia.org/wiki/File:CPCs.png

Simulated conditions

• Round – Integer: 1.2345 1– Nearest 0.05: 1.2345 1.25– Nearest 2 decimal places: 1.2345 1.23

• Item pool– 180, 107, 109, 53 items

Simulated conditions

• Individual item vs. clusters of items• Cut-scores– Basic, Proficient, and advanced

• Aggregating value– Mean vs. Median

Evaluation Criteria• Bias:–

• Average absolute bias:–

• Bias for the group’s intended cut score– mean:

– median:

Result –individual panelist

•

> > > >

Rounding: integer > 0.05 > 2 decimal places


•

Cut-score location: Advanced > Basic > Proficient


•

Individual items > cluster level (fewer rounding error)

>


•

Item pool: 53 items have greater bias than the other pools


•

Item pool: 53 items < 180 items , for Proficient, integer.The importance of the location of Cut-score and the items distribution

Result –Group panelist

Some cases the Mean is better, other cases the Median is better


Basic were “-” bias, Proficient and Advanced were “+” bias.At cluster item level, the proficient was “-” bias.


The advanced produced the greatest bias than other two level.The bias did not cancel out for a group of panelists.


Both the mean and median bias < 0.01 for round to 0.05 and 2 decimal places.Again, more test items did not necessarily.


Cluster level is better than individual items.

Impact on Percent Above Cut-score (PAC)

Finding the PAC for the closest value on the NAEP in the pilot study.PAC for estimating θ - PAC for intended θ. Nearest 0.05 or nearest 0.01 did not change. No effect. Minimal impact


Basic: 5.610~13.010Proficient: -3.823~-4.387Advanced: -1.156~-1.262


Basic: 4.490~14.190Proficient: -4.387~-5.346Advanced: -1.156~-1.343


Bias: Advanced > Basic and ProficientPAC: Advanced < Basic and ProficientThere are more student near the basic and proficient cut score


Rounding to the integer dose not present a viable alternative in Angoff method.

Discussion

• Rounding to integer could affect the cut scores.– Using cluster item level can mitigate bias, but

biases still remained.• Using more test items will not necessarily

produce less bias.– The important is the location of the items in

relationship to the intended cut-score.

Discussion

• 10 items [-2 ~ +2]• Cut score θ = 0– 5 items rounded to score 1 – 5 items rounded to score 0

• Cut total score = 5 θ = 0

• Bias = 0

Discussion

• 20 items [-1 ~ +3]• Cut score θ = 0– 5 items rounded to score 1 – 15 items rounded to score 0

• Cut total score = 5 θ = -0.438

• Bias = -0.438

Discussion

• Using OIB from bookmark to roughly design half of the items were above cut-score.– Impossible to know the location of cut-score.– The intended cut-scores in different panelists are

different. Some panelists must have bias• In multiple cut-scores, at lease one of cut-

scores would produce bias.• Rounding to integer present many potential

problems.

Discussion

• Challenge: in real situations panelists are not completely consistent in their judgments.– Feedback is helpful for reducing rater

inconsistency in NAEP

• Further development– Examine the bias at the group level

Thank you for attention

Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.

Documents

item slide

reckase slide

median slide

integer slide

theta slide

cut score

integer round

intended cutscore