8/3/2019 Mfrm to Adjust for Rater Severity Leniency
1/13
Sultan Qaboos University
Language Centre
MFRM TO ADJUST FOR RATER SEVERITY/LENIENCY
Presentation for the LC Conference
by
Farah Bahrouni/[email protected]
April 20, 2011
1Farah Bahrouni/LC Conf./April 20, 2011
mailto:[email protected]:[email protected]8/3/2019 Mfrm to Adjust for Rater Severity Leniency
2/13
Plan Briefing about MFRM
Run the analysis for 5 facets: candidate, rater, background ,
experience & category
Adjusting scores as per FACETS estimates
Conclusion
2Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
3/13
Student 1
TA:25 CC:25 LR:25 GR:25 Total: 100
Mean 19.62132 Mean 19.38971 Mean 18.20956 Mean 16.45588
Max 25 Max 24 Max 23 Max 22 94
Min 14 Min 13 Min 14 Min 10 51Range 11 Range 11 Range 9 Range 12 43
Count 68 Count 68 Count 68 Count 68
Student 2
Mean 20.13971 Mean 20.09926 Mean 19.88235 Mean 18.88971
Max 25 Max 25 Max 25 Max 24 99Min 14 Min 13 Min 12 Min 11 50
Range 11 Range 12 Range 13 Range 13 49
Count 68 Count 68 Count 68 Count 68
Student 3
Mean 15.16544 Mean 15.79559 Mean 15.48162 Mean 18.88971
Max 25 Max 23 Max 20 Max 24 92
Min 10 Min 10 Min 8 Min 11 39
Range 15 Range 13 Range 12 Range 13 53
Count 68 Count 68 Count 68 Count 68
3Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
4/13
Assessment of language proficiency:Speaking/Writing subjectivity
a number ofdistinct factors directly orindirectly impinge upon the
assessment/measurement outcomes.
These factors are referred to asfacets.
4Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
5/13
Afacethas been defined as
Any factor, variable, or component [e.g. examinees,
tasks, raters, interviewers, etc] of the
measurement situation that is assumed to affecttest scores in a systematic way.
(Backman, 2004; Linacre, 2002; Wolfe & Dobria, 2008, cited in Eckes,2009: 2)
5Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
6/13
The error-prone nature of mostmeasurement facets bring about serious
concerns about both the reliability and
validity of the obtained scores.
6Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
7/13
The usual approaches to deal with rater variability include:
rater training
using 2 or more raters in the scoring of performance
assessment
call for an adjucator (3
rd
/4th
.. rater, usu. > exp./senior/expert..)
developing rubrics that spell out the proficiency levels
identifying anchor papers to provide concrete examples of
each proficiency level
(for details see Johnson, et al. 2005, 2003, 2001, 2000)
7Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
8/13
Nevertheless, research has found that try as they may,
none of these methods is effective enough toguarantee reliable objective scores.
They are diverse enough to raise questions about the
quality of the resolved scores.
Underlying these resolution models is the common assumption that
the discrepant scores might lack the requisite levels of reliability and
validity, and that adjudication might improve this deficit to someextent (Johnson, et al. 2005 :123).
8Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
9/13
As for rater training, it has been found that even
with proper training, substantial differences
between raters persist.
(Linacre, 1990; Hamp-Lyons, 1991; Weigle, 1994, 1998, 2002; Lumley & McNamara ,
1995; McNamara, 1996; Lumley 2005)
Raters differences are reduced by training, but do
persist. (McNamara, 1996: 118 )
Reason:
Some see severity much as a personality trait thatis inherently brought to any rating situation.
(Myford, et all. 2003)
9Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
10/13
Multi-facet Rasch Model (MFRM) provides a rich
set of highly flexible tools to account, and
compensate, for measurement error, especially
rater-dependent measurement error.
It is an extension of the basic Rasch model thatincorporates more facets than the 2 usally included
in dichotomous item tests, i.e. candidates and
items.
10Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
11/13
Multifaceted Rasch measurement is a stochastic model
performed using FACETS, a computer program developed
by Linacre (1989).
Candidate ability is estimated from all ratings given by all
raters on all items(Lunz & Wright, 1997; McNamara, 1996: 132).
Item difficulty (TA,CC,LR & GA) is estimated from all
responses across all candidates to that item (ibid).
Rater severity is estimated from all ratings given across
all candidates and items (ibid).
11Farah Bahrouni/LC Conf./April 20, 2011
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
12/13
Farah Bahrouni/LC Conf./April 20, 2011 12
In addition, MFRM has 2 more very informative
functions:
Bias analysisFit analysis
These 2 functions enable researchers to look at
how individual raters, ratees, or traits included in the analysis are performing: (fit
analysis: z score values between +2 & -2 are usually accepted in contexts similar to ours)
how the individual elements within the facets interact: individual-level effects of the
various elements: (bias analysis: z score values between +2 & -2 )
Thus, source(s) of variation in the scores are efficiently determined.(Myford, et al. 2003; Lunz & Wright, 1997)
8/3/2019 Mfrm to Adjust for Rater Severity Leniency
13/13
Conclusion
Owing to the above features, MFRM has been found a
model with a great potential to improve our capacity to
produce objective measures of the ability of test takers
in performance assessment contexts. It is practical and
can be used in our context along with the pair rating.
(Linacre, et al. 1990; Engelhard, 1991, 1992, 1994, 1996; Engelhard & Myford, 2003; Hamp-Lyons, 1991; Lunz
1996, 1997a, 1997b; Lunz & Wright 1997, Weigle, 1994, 1998, 2002; Schaefer 2003, 2008; Kondo-Brown 2002;Lumley & McNamara 1995, Lumley 2005; McNamara 1991, 1996, 1997, 2000, 2002, 2008; McNamara & Roever,
2006; Myford et al, 2003, 2004; Shaw & Weir 2007; Wigglesworth, 1993, 1994).
13Farah Bahrouni/LC Conf./April 20, 2011