0.1 Rasch Workshop Booklet Structure of Measurement

8/11/2019 0.1 Rasch Workshop Booklet Structure of Measurement

1/48

Workshop on

Rasch Analysis:

A Practical Guide toWinSteps

Dr.Azrilah Abdul Aziz

Mohd Saidfudin Masodi

February 2010


2/48

Table of Content

1.0 Rasch Course !er!ie"............................................................

2.0 #ro$ra% u&line....................................................................... Rasch Analysis 2'Days Workshop u&line

(.0 Keynote :

3.1 Modern Measurement Paradigm: Easier to read and betteranalysis using Rasch-based Approach...............................) '1)

3.2 Deeloping an !nstrument "onstruct Made #imple.........1* '2+

4. Wor!shop:

WinS&eps,- A #rac&ical uide

' #etting up "ontrol $ile.........................................................2%- 2&

' 'asic Rasch Analysis............................................................2( -3)

- #erson'/&e% Dis&ribu&ion Map Cons&ruc& and in&erpre&a&ion - #ar&ial Credi& Ra&in$ Analysis /&e% alidi&y

- #erson Response alidi&y Analysis

' "omputation o* Probabilistic +i,elihood Estimate..........3% - 3&

". Appendices........................................................#$%4&

.1 Sa%ple Size and /&e% Calibra&ion 3or #erson Measure4

S&abili&y.2 Wha& do /n5& and u&5&6 Mean's7uare and S&andardized%ean8

.( Ra&in$ Scale /ns&ru%en& 9uali&y Cri&eria

.+ :;a%ple of #erson /&e% Map re'dis&ribu&ion

2


3/48

Rasch Course '(er(ie)


4/48

Key *ote Spea!er-

Modern measurement paradigm: The Rasch ApproachMohd.Saidfudin Masodi /RCA ?on@ /SE001 9MS ead Assessor

Pro+ra, 'utline

A&&endees "ill learn:


5/48

12-(0'2-1. 1unch

Afternoon.2-1. uidance &hrou$h a Rasch Analysis s&ar&in$ "i&h an:;cel 5le and usin$ insteps sof&"are6 %orein&erpre&a&ion of ou&pu&.

#rac&ice usin$ instepsMinistep sof&"are in analyzin$Readiness to "hange da&a &ha& is included "i&h &hesof&"are.

(-+. >reak

+-00'-(0. #ar&icipan&s "ill be $i!en an :;cel da&a 5le and"ill "ri&e &heir o"n con&rol 5les and run &he% on &hesa%ple da&a "i&h hands'on help fro% facili&a&ors.

-ay 2.3ands%on Analysis Practice -iscussion

/ornin+. $:#a,#reli%inary brie5n$.


6/48

par&icipan&s should ha!e a "orkin$ fa%iliari&y "i&h RaschMeasure%en& and Analysis concep&s6 runnin$ &he instepspro$ra%6 and in&erpre&in$ instepsou&pu&.

-(0p%. 5nd of Course.

J


7/48

3.1 Rasch Model:The Modern Measurement Paradigm

Dr.Azrilah Abdul Aziz

Mohd Saidfudin Masodi

For many decades, we have made ourselves assured that it is almost

impossible to have quantitative type of data in social sciences. We deludedourselves into having only descriptive type of research in social sciences. Over

the years, social scientist became aware that there should be more than

descriptive findings, more to only the typical reporting by median and

percentages. There should be a way in presenting the findings in more meaningful

ways. It would be more interesting to clearly see the inherent relationship between

the human and the observable actions being assessed. There should be more

than only reporting on the association and correlations values, rather, it should

provide a clearer picture of what is happening between the human and the

observable actions, sort of a hierarchical relationship between the two.

Rasch measurement model has made it possible for social scientist to conduct

calibrated measurement where human is the focused of attention. In our day to

day lives, we rely on standard measurement system to measure and cut timber,

buy lengths of cloth, assemble the correct amounts of ingredients to bae a cae,

and to administer appropriate doses of medicine to ailing relatives. !imilarly, we

can conduct educational research, treating and analysing data from survey or

psychological investigation in the same manner as we do with the standard

measurement system.

Rasch measurement model is a way to mae sense of the world. "#perience is

continuous, but once we notice the e#perience, it becomes discrete. We sense

)


8/48

happiness when receiving flowers during convocation. When we distinguish

between the different ind of happiness$ not very happy, happy, happier, and

happiest, that moment of observations become discrete. Then we choose

dimension according to its utility of the happy sensations$ for e#ample a flower, a

bouquet of flowers, a bouquet of flowers with chocolate, and a bouquet of flowers

with chocolate and a little teddy bear with it. To mae our observations more

meaningful, we represent the observations with score in the form of$

Not very happy, happy, happier, happiest

Which we score as observations% x&1, 2, 3,

It is common that we tae the raw score of each of the sensation to indicate

sort of a measurement for the dimensions. 'owever, raw score are only

indications of a possible measure. Raw score cannot be the measures sought

because in their raw state, they have little inferential value. To develop metric

meaning, the counts must be incorporated into stochastic process which

constructs inferential stability.

( survey item for e#ample is answered by ), *, +, or , with each correspond

some ind of phrases such as -strongly disagree to -strongly agree. Responses

from the survey are of ordinal variables and with it we can determine median,

percentile ran and determine the relationship between the two characteristics by

means of !pearman/s ran order correlation. That is as far as we can go with

ordinal data. It is by no mean a measurement.

The separation of the rating values assigned ), *, +, or is not of equal

interval and therefore does not give a scale which has been construed all the

while. The termed scale for such given rating has submerged the truth of

measurement it does not possess the characteristics what is deemed to be a

scale by standard definition in physical science. The ignorance of standard

*


9/48

protocol has lead us into a situation which warrants us to review measurement as

what is normally perceived in human science.

!imilarly, the practise of raw counts may give the impression that they are

interval measure of e#perience. 0ut this is always an illusion. From our

observations on the sensation of happy, the assignment of the number labels *, +,

, 1 to the options of -not very happy, -happy, -happier, -happiest, does not

mae these numerals become equally distanced measures. If the category labels

are not equally distances, than we cannot provide legitimate processing for these

non2interval category labels, including the mean and standard deviation.

(part from that, missing data is a common issue in survey data. 3issing data

may result from oversight, non2compliance or from incidental interference. The

purpose of we conducting a research is to use e#isting information to mae

inferences about what is still unnown, then missing data are of importance too. (

useful measurement model for constructing inference from observation must be

unaffected by these missing data. The measurement model must also enableprecision estimation of the inference and able to provide detection and evaluation

of discrepancy between an observation and e#pectation. 'ence raw counts

cannot be relied upon to serve as measures.

(ccuracy of a measure may be achieved through replication. When similar

results occur repeatedly, we are confident that similar result will occur again in

future. 'owever, replication does not guarantee accuracy of measure unless the

testing instrument is used to operate for a specific purpose or target and more

importantly each measurement shall be independent of each other. Irrespective of

the person, a thermometer reading is always accepted independently. On the

same premise, a measurement instrument thus developed must be able to

behave equally functional. ( survey conducted or an e#amination being carried

out on a sub4ect matter rightfully shall not get affected by the respondent who is

taing it.

E


10/48

Thus, in order to construct inferences from observation, the measurement

model must

5a6 be able to produce linear measures

5b6 to overcome missing data

5c6 to give estimates of precision

5d6 able to detect misfits or outliers

5e6 and the parameters of the ob4ect being measured and the

measurement instrument must be separable or independent of each

other.

Only Rasch measurement models solve these problems. Rasch measurement

model or in short referred to as Rasch model helps in constructing a scale based

on a set of survey items.

7et/s start discussing on Rasch measurement model wors using simpleoutcomes. 0ac to our e#ample of happy sensation of receiving a bouquet of

flower on a convocation day, assume a guy goes around giving a bouquet of

flower to a successful lady graduate. On the first attempt, the observation of a lady

graduate to respond is a 8)%8) chance of liing the giving. If the guy were to give

the bouquet of flower to *) other lady graduates, he may receive a respond of si#

596 liings against four 516 dislies. This can be e#pressed as a probabilistic odd of

9)%1). We may get some other outcomes in the order of perhaps :)%) or on the

opposite scale +)%;) perhaps.


11/48

This order of probabilistic turn of event can be represented in the line diagram as

in Figure *.

!"#$%& 1. 'robabilisti( line dia)ra*

>sing a scientific calculator, the probabilistic odd can be converted into a

series of numerical values. 'owever, we noticed that these numerical values are

much clustered towards the left most end of the ruler when we attempt to position

it on an equal interval scale. This is a prerequisite where these numerical values

shall e#hibit distribution of equal distance being one of the criteria of a

measurement scale. It was clearly noted that by converting the odd ratio into

numerical value does not yield a scale of equal interval. Therefore, it cannot

behave as a ruler of measurement.

This is depicted in Figure +.

!"#$%& 2. Nu*eri(al s(ale

In order to achieve an equal interval scale, we can introduce logarithm of the

odd probabilistic value. 3aintaining the same odd probabilistic ruler as in Figure *,

starting with ).)* to *)), we can create an equal interval separation between the

log odds units on the line, hence the measurement ruler with the logitunit. This

can be verified by computing the value of log ).)* equals to 2+.)$ value of log ).*

equals to 2*$ value of log * equals to ). Figure shows the newly established logit

11


12/48

ruler as a scale with equal interval separation. It is 4ust lie looing at a

thermometer with =)/, as water being ice and *)) as boiling point whilst the

negative e#treme end as 2+:o?, the point where all atoms of any element come

to a standstill.

!"#$%& 3. +o) odds unit ruler


13/48

!"#$%& . bservation of responses fro* survey

While this raw data matri# is all the observation we have, it is of limited utility.

"ven though it contains everything we could observe, it does not help to predict

what will happen in future. The raw score does not allow us to mae any useful

inference or able to draw any conclusion about the items or about the interactions

between the items and the persons. It only gives an order of preference. Raw

score is therefore ordinal. It lacs depth to give any meaningful interpretation of

the data obtained.

For e#ample if the survey is to differentiate the ability of person, we cannot

conclude which respondents are more able simply by looing at the raw score. If

we were to accept the respondent with the highest raw score to be more able

compared to those with lower raw score, it will be complicated to distinguish

respondents with same raw score. We do not have any basis to differentiate them.

We must build a useful e#pectation of whether the respondent will agree or

succeed on the ne#t item. To now about the ability of the respondent in agreeing,

we need to now the degree of difficulty of an item endorsement in which the

respondent has to attempt.

The data matri# can be arranged so that the items are ordered from least to

most difficult and the persons are ordered from least to most able, as in Figure 8.

The higher up the table one goes, the more able the persons. The further right

across the table one goes, the more difficult the items. This type of arrangement is

termed !calogram or @uttman matri#.

1(


14/48

!"#$%& -. S(alo)ra* of responses fro* survey

From Figure 8, we can deduce that persons who are more able i.e., those located

towards the top of the table would have a greater lielihood to respond to all the

items as =*/. On the other hand, the less able person would have a greater

lielihood to respond only to easy items and may find it difficult to respond to

difficult items. This will generate some ind of pattern of observable responses. In

general, we can estimate that the scalogram will have most respond of -* on the

upper portion of the diagonal line, and respond of -) on the lower portion of the

diagonal line. This is the general rule that creates a pattern of responses which is

easy to read and clearer to understand$ ready to be scrutinised and e#plored in

depth.

This ind of pattern response can be described best mathematically by a

simple logistic regression by virtue of its point of origin =)/ and with a ma#imum

limit of =*/. This probabilistic curve is the fundamental of Rasch analysis where

data is e#pected to fit the model. On contrary, traditional research is deterministic

in nature where the best fit line is established based on a set of historical data

trying to describe a past event. ?learly, Rasch has a distinct feature where it can

predict an e#pected event. This is e#hibited in Figure 9.

1+


15/48

From this estimation too, we can mae an inference that person ( made a

careless mistae by not responding a -*as e#pected for the easy item in Figure 9.

7iewise, person A did a lucy guess by responding a -* on the difficult item

when we e#pect him to respond a -) instead.


16/48

difficult items 5items which are towards the right most end6. Therefore by

calculating the ma#imum lielihood estimate of event, we can predict that person

A would answer a -) at the missing bo# 5third item from the right for person A6.

!imilarly, person F is e#pected to respond a -* for the second item since person

F is less able and would find the item easy.

!"#$%& /. 'redi(tion on responds pattern

This pattern of responses gives the respond validity of a person to an item.!ubsequently, Rasch has enabled us to establish the construct validity of an item.

(n item is said to be valid when it is able to discriminate between a more able

person and a poor person. This give rise to two ma4or discussion. The first

possibility is the person is an outfit for not meeting the e#pected outcome. Rasch

focuses on this ind of outfit responses and attempt to find reasoned argument

why the person does not fit the model.

This contribute to the significant findings of a particular research. ?ontrary to

traditional statistical practise, this ind of outfit data would have normally been

cleaned when it is of utmost value in Rasch.

!econdly is of more critical issue where the item construct is at stae. If the

responses cannot discriminate the respondent$ between the able and the poor

ones, then there is a need to re2construct the question or possibly discard the

item. On the e#treme end, we may need to re2construct the whole survey

1J


17/48

questionnaire or e#amination paper as it is not measuring what are we supposedly

to measure. This is reflected in Rasch analysis as Item reliability hence instrument

construct validity. The fundamental difference between a quantitative research as

practised in physical science is now addressed in natural science in what so

called qualitative research..

?onsequently, the rating assumed at the beginning of the survey construct or

test result grades can be verified whether it shows the e#pected pattern of

response. This form of calibration is unique to Rasch where it can comply to

physical science measurement requirement to calibrate the scale of

measurement. (n instrument which is not calibrated is deemed to yield invalid

data thus render the whole research futile. The responds pattern prediction in

overcoming missing data is an essential feature in maing a more accurate

statistical analysis. This is one of the Rasch principle that allows a more accurate

analysis to be carried out. While other model treat missing data as Bero$ Rasch

predictive power will mae the ma#imum lielihood of an event for that particular

matri#. 0y so doing, it maes the whole data set =as though/ it is a complete dataset while the other statistical method would be short of data therefore maing it

less accurate in computing the basic statistics of chi2square$ set aside to calculate

the mean and standard deviation including the z2test thereof.


18/48

These data if were to be put on a scatter plot would yield a sigmoidal curve 5!2

curve6, and do not have equal interval as a prerequisite in measurement as shown

in Figure ;.

!"#$%& 0. he si)*oidal (urve of responses

Without the equal interval, prediction of an event is almost impossible. In

overcoming the issue, linear regression approach is applied in establishing a

straight line which fits the points as best as possible.

!"#$%& . +inear re)ression on the responses

It is then used to mae the required predictions by maing inter2polation or

e#tra2polation as necessary as shown in Figure *). In obtaining the best fit line

however, there e#its differences between actual point yiand the predicted Cithat is

on the best fit line. The difference is referred here as error e.

1*


19/48

!"#$%& 1. 4est fit line

0y accepting the fact that there are always errors involves in the prediction

model, the deterministic model of equation renders itself less reliable. This can

further be resolved by transforming it into a probabilistic model y including the

prediction error into the equation$

y & mxD c D e 5+6

The formulation of probabilistic model of Rasch is based on this principle%

a person having a greater ability than another person should have the

greater probability of solving any item of the type in question and

similarly one item being more difficult than another means that for any

person the probability of solving the second item is the greater one.

5Rasch, *E9)6

Therefore in summary, the probability of success depends on the difference

between the ability of the person and the difficulty of the item. The Rasch 3odel

incorporates an algorithm that e#presses the probabilistic e#pectations of an

item=i/ andperson=n/ performances$ mathematically e#pressed as$

'ni5xni61 7 n, i8 6e5n 9 i 8

5381 : e 5n 9 i 8

where%

ni5#ni&* G n, i6 is the probability of person n

on item i scoring a correct response 5#&*6$

given the person ability,nand item difficulty, i .

1E


20/48


21/48

Su**ary

Rasch measurement model helps to understand a little how we came to fall so

short of our reasonable e#pectations for scientific measurement in the humansciences. Rasch measurement model provide a closest general appro#imation of

measurement principle for the human sciences. It accomplished the five 586

principles of a measurement model which able to provide linear equal scale,

overcoming missing data, estimates precision, detect misfits or provide reliability

and is replicable. Thus, by complying to all the principles, more meaningful and

accurate inferences can be made to the data.

This is core issue in measurement$ the meaningfulness. There are three ma4or

aspects of meaningfulness to tae into account in measurement. These have to

do with the constancy of the unit, interpreting the siBe of differences in measures,

and evaluating the coherence of the units and differences.

First, raw scores 5counts of right answers or other events, sums of ratings, or

ranings6 do not stand for anything that adds up the way they do. (ny given raw

score unit can be 128 times larger than another, depending on where they fall in

the range hence lac constant separation. 3eaningful measurement demands a

constant unit. Instrument scaling by Rasch methods provide it.

!econd, meaningful measurement requires that we be able to say 4ust what

any quantitative amount of difference is supposed to represent. What does a

difference between two measures stand for in the way of what is and isn/t done at

those two levelsH Is the difference within the range of error, and so randomH Is the

difference many times more than the error, and so repeatedly reproducible and

constantH 3eaningful measurement demands that we be able to mae reliable

distinctions and only Rasch 3odel fulfil this requirement.

(nd finally, meaningful measurement demands that the items wor together to

measure the same thing. If reliable distinctions can be made between measures,

what is the one thing that all of the items tap intoH If the data e#hibit a consistency

21


22/48

that is shared across items and across persons, what is the nature of that

consistency H 3eaningful measurement posits a model of what data must loo

lie to be interpretable and coherent, and then it evaluates data in light of that

model. Rasch has all these specific properties as a unique model of

measurement.

22


23/48

4. WinSteps:

A Practical Guide toRasch Analysis

2(


24/48

6 *'T5S 7

4. S5TT8*G 9P -ATA .prn ;815

2+


25/48

6 *'T5S 7

4.2 S5TT8*G 9P -ATA C'*TR'1 ;815S

2


26/48

6 *'T5S 7

4.# '9TP9T TA015S

2J


27/48

6 *'T5S 7

4.4 P5RS'* 8T5/ -8STR809T8'* /AP

2)


28/48

6 *'T5S 7

2*


29/48

6 *'T5S 7

4." S9//AR< STAT8ST8CS

2E


30/48

6 *'T5S 74.& PART8A1 CR5-8T =P'1


31/48

6 *'T5S 7

4.> 8T5/ P'1AR8T


32/48

6 *'T5S 7

4.$ 8CC GRAP3

(2


33/48

6 *'T5 74.? CA1C91AT8'* '; PR'0A0818ST8C

/A@8/9/ 18K5183''- 5ST8/AT5

((

'erson +o)it "te* +o)it

'erson "te* '5!28 '5!.8 '5!8 '5!38 '5!-8 '5!.8 '5!18 Av)

Measure *easure ite*

F+ +.)* !pecialised *.9, ).8E ).1; ).11 ).,; ).+, ).+, ).*1 ).,9

F9 *.81 ," *.+E ).9: ).89 ).8, ).1: ).+E ).+E ).*E ).1,

F1 *.1 IT *.+E ).9: ).89 ).8, ).1: ).+E ).+E ).*E ).1,

F, *.*8 Research ).E9 ).:1 ).91 ).9* ).88 ).,9 ).,9 ).+1 ).8)

F8 ).1 "#cellence ).91 ).;) ).:* ).9; ).9+ ).11 ).11 ).,* ).8:

F9 ).1 Filter ).+E ).;8 ).:; ).:8 ).:) ).8, ).8, ).,; ).98

F* 2).*; ?onsultative ).+E ).;8 ).:; ).:8 ).:) ).8, ).8, ).,; ).98

Teamwr ).+E ).;8 ).:; ).:8 ).:) ).8, ).8, ).,; ).98


34/48

6 *'T5 7

4. P5RS'* /5AS9R5

(+


35/48

6 *'T5 7

4. 8T5/ /5AS9R5

(


36/48

INPUT: 25 Persons 20 Items MEASURED: 25 Persons 20 Items 2 CATS-----------------------------------------------------------------------GUTTMAN SCALOGRAM OF RESPONSES:Person |Item | 1 11 11 11 1211

|20!"##1$!55"%%102$ |-------------------- 1% &11011111111111111101 '1%111%(n$ 1# &10111111111111110001 '1#221%)s2 1" &11111111111111110000 '1"221%))1 4 +11111111111100101010 '0%222%sr$ 9 +11111111111010101010 )021$%)(2 15 +11111111111011101000 '15222%)(1 25 +11111111111101010100 )25111%)s2 5 &11111011111111001000 )05121%)s1 3 +11110111111100001010 '0$212%)s2 6 +11111011110011100100 )0#121%)s1 8 +11111110011110011000 )0"1125(*2 13 +11110101101101010101 '1$121%t+%

17 +11101111011011110000 '1!222%(*$ ! &11111011010100111000 )0!121%)s1 1 &10001101100111100111 '1221%,r$ 2$ &11111001111001000011 '2$11$5(*$ 2% &11111001010101100010 '2%121%sr% 2 &11101101011000010010 )0211$5)s2 11 &11111110000110000100 )11212%)s1 12 &11111110100000001100 )12212%(n2 22 &11110011001100001100 )2211$5)(1 1 &11111000100110000100 )01212%(*1 21 &11101100111000000010 )21212%(1 10 &11111110000000000100 )102210)s$ 20 &11110110100000010000 '20211%(n$ |--------------------

| 1 11 11 11 1211 |20!"##1$!55"%%102$

(J


37/48

INPUT: 25 Persons 20 Items MEASURED: 25 Persons 20 Items 2 CATS $.#".2--------------------------------------------------------------------------------Person: REAL SEP.: .5 REL.: .%! ... Item: REAL SEP.: 1.2 REL.: .!

Person STATISTICS: MISFIT ORDER------------------------------------------------------------------------------------------------|ENTR/ TOTAL MODEL| INFIT | OUTFIT |PT-MEASURE |EACT MATC| ||NUMER SCORE COUNT MEASURE S.E. |MNS3 4STD|MNS3 4STD|CORR. EP.| OS EP| Person ||------------------------------------&----------&----------&-----------&-----------&-----------|| 14 18 20 2.58 .78|1.16 .5|5.94 2.6|A-.03 .2#| ".5 ".5| '1%111%(n$|| 19 12 20 .42 .52|1.95 3.6|2.84 3.9|B-.19 .%#| $#." !0.| '1221%,r$|| 16 16 20 1.64 .61|1.11 .4|2.72 2.0|C .19 .$5| !$.! "0.1| '1#221%)s2|| 13 13 20 .70 .53|1.31 1.4|1.39 1.1|D .25 .%%| #$.2 !1.1| '1$121%t+%|| 2$ 12 20 .%2 .52|1.1$ .!|1.1$ .5|E .$" .%#| #".% !0.| '2$11$5(*$|| 22 10 20 -.11 .52|1.0 .5|1.02 .2|F .%5 .50| 5!. !0.0| )2211$5)(1|| 2% 11 20 .15 .52|1.0! .%| ." .0|G .%5 .%"| #$.2 !0.$| '2%121%sr%|| 2 10 20 -.11 .52|1.0! .%|1.0! .$| .%5 .50| #".% !0.0| )0211$5)s2|| 17 13 20 .70 .53|1.03 .2|1.07 .3|I .41 .%%| !$.! !1.1| '1!222%(*$|| ! 12 20 .%2 .52|1.00 .1| ."! -.$|6 .%" .%#| #".% !0.| )0!121%)s1|| 1 20 -.$" .5$| ." .0| .1 -.2|7 .5$ .51| #$.2 !0.5| )01212%(*1|| 21 20 -.$" .5$| .# -.1| .! .0|L .5$ .51| #$.2 !0.5| )21212%(1|| 3 13 20 .70 .53| .91 -.3| .96 .0|M .48 .%%| !$.! !1.1| '0$212%)s2|

| 6 13 20 .70 .53| .95 -.2| .82 -.4|l .49 .%%| !$.! !1.1| )0#121%)s1|| 8 13 20 .70 .53| .92 -.3| .81 -.4|k .50 .%%| !$.! !1.1| )0"1125(*2|| 11 10 20 -.11 .52| .0 -.%| ."1 -.#|, .5# .50| #".% !0.0| )11212%)s1|| 12 10 20 -.11 .52| ." -.5| ."$ -.5|' .5! .50| !". !0.0| )12212%(n2|| 20 " 20 -.## .5%| ."" -.%| ."% -.%|* .5 .52| !$.! !2.!| '20211%(n$|| 9 15 20 1.30 .57| .87 -.4| .67 -.5|g .49 .$| !$.! !#.0| )021$%)(2|| 4 15 20 1.30 .57| .79 -.7| .58 -.7|f .53 .$| !$.! !#.0| '0%222%sr$|| 5 1% 20 . .55| .!! -1.0| .#" -.!|e .55 .%2| "%.2 !2.5| )05121%)s1|| 15 15 20 1.30 .57| .75 -1.0| .58 -.7|d .55 .$| "%.2 !#.0| '15222%)(1|| 10 " 20 -.## .5%| .!% -1.1| .!$ -.!|8 .## .52| "%.2 !2.!| )102210)s$|| 25 15 20 1.30 .57| .70 -1.2| .52 -.9|b .58 .$| "%.2 !#.0| )25111%)s2|| 1" 1# 20 1.#% .#1| .## -1.1| .%5 -.|9 .5! .$5| "%.2 "0.1| '1"221%))1||------------------------------------&----------&----------&-----------&-----------&-----------|| MEAN 12.% 20.0 .5" .55| ." .0|1.21 .1| | !2.0 !$.2| || S.D. 2.! .0 .! .05| .25 1.0|1.12 1.1| | 10." %.5| |

----------------------------------------------------------------------------------

()


38/48

".Appendices!.1 "ample "ize and #tem $alibration %or &erson

'easure( "tability

!.2 )hat do #nfit and *utfit 'ean+square and"tandardized mean,

a. Rating "cale #nstrument -uality $riteria

b. xample of &erson #tem 'ap Re+construction

(*


39/48

". Sa,ple Sie and 8te, CalibrationHor Person /easureI Stability

'ow big a sample is necessary to obtain usefully stable item calibrationsHOr how long a test is necessary to obtain usefully stable person measure estimatesH

The Rasch model is blind to what is a person and what is an item, so the numbers arethe same.

"ach time we calibrate a set of items on different samples of similar e#aminees, wee#pect slightly different results. In principle, as the siBe of the samples increases, thedifferences become smaller. If each sample were only + or e#aminees, results couldbe very unstable. If each sample were +,))) or ,))) e#aminees, results might be

essentially identical, provided no other sources of error are in action. 0ut large samplesare e#pensive and time2consuming. What is the minimum sample to give useful itemcalibrations K calibrations that we can e#pect to be similar enough to maintain a usefullevel of measurement stabilityH

Polyto,ies

The extra concern with polytomies is that you need at least 10 observations per

category, see, for instance, Linacre J.M. !00!" #nderstanding $asch measurement%

&ptimi'ing rating scale category effectiveness. Journal of (pplied Measurement )%1*+10-. or Linacre J.M. 1" /nvestigating rating scale category utility. Journal of

&utcome Measurement )%!, 10)1!!.

&therwise the actual sample si'es could be smaller than with dichotomies because

there is more information in each polytomous observation.

(E


40/48

Person /easure 5sti,ate Stability

The reuirements are symmetric for the $asch model so you need as many items for a

stable person measure as you need persons for a stable item measure. onseuently,

)0 items administered to )0 persons with reasonable targeting and fit" should produce

statistically stable measures.

The first step is to clarify Ksimilar enough.K Lust as no person has a height stable towithin .)* or even .* inches, no item has a difficulty stable to within .)* or even .* logits.In fact, stability to within M. logits is the best that can be e#pected for most variables.7ee 5R3T 9%+ p.+++26 discovers that in many applications one logit changecorresponds to one grade level advance. !o when an item calibration is stable within alogit, it will be targeted at a correct grade level.

For groups of items, Wright N Aouglas 50est Test Aesign and !elf2Tailored Testing,3"!( 3emo. *E, *E:86 report that, when calibrations deviate in a random way fromtheir optimal values, Kas test length increases above ) items, virtually no reasonable

testing situation riss a measurement bias for the e#amineesP large enough to notice.KFor even shorter tests, measures based on item calibration with random deviations upto ).8 logits are Kfor all practical purposes free from bias.K

Theoretically, the stability of an item calibration is its modelled standard error. For asample of < e#aminees, that is reasonably targeted at the items and that responds tothe test as intended, average item p2values are in the range ).8 to ).;:, so thatmodelled item standard errors are in the range +Qsqrt5


41/48

designed pilot studies. The Table suggests other ranges. Inflate these sample siBes by*)S21)S if there are ma4or sources of unmodelled measurement disturbance, such asdifferent testing conditions or alternative curricula.

If much larger samples are conveniently available, divide them into smaller,homogeneous samples of males, females, young, old, etc. in order to chec the stabilityof item calibrations in different measuring situations.

/&e% Calibra&ions

s&able "i&hin

Con5den

ce

Mini%u% sa%ple size

ran$e

?bes& &o poor &ar$e&in$@

Size for %os&

purposes

O 1

lo$i

&

EP 1J '' (J (0

O 1

lo$i

&

EEP 2) '' J1 0

O Q

lo$i

&

EP J+ '' 1++ 100

O Q

lo$i

&

EEP 10* ''

2+(

10

/ohn 'ichael Linacre

&xplanatory notes%

*. KFor a M* logit interval this !.". is M*Q+.9 logits.K (n estimateUs standard !.". isthe modelled standard deviation of the normal distribution of the observedestimate around its KtrueK value. !uppose we want to be EES confident that theKtrueK item difficulty is within * logit of its reported estimate. Then the estimateneeds to have a standard error of *.) logits divided by +.9 or less & *Q+.9 &).;8 logits.

+1


42/48

+. KThis gives a minimum sample in the range 15+.96V < E5+.96VKWith optimum targeting of a dichotomous test, the modelled probability of eachresponse is p&).8. Then the modelled binomial variance & ).8).8 & theinformation in a response. Thus < perfectly targeted observations haveinformation < ).8 ).8 & niversity of ?hicago, *E:9

Wright N Aouglas5*E:96 KRasch #tem 0nalysis by andK% KIn other wor we have found that whentest lengthP is greater than +), random values of item calibrationP as high as ).8) havenegligible effects on measurement.K

Wright N Aouglas 5*E:86 Kest 3est 4esign and "elf+3ailored 3estingK% KThey allow the test

designer to incur item discrepancies, that is item calibration errors, as large as *.). This mayappear unnecessarily generous, since it permits use of an item of difficulty +.), say, when thedesign calls for *.), but it is offered as an upper limit because we found a large area of the testdesign domain to be e#ceptionally robust with respect to independent item discrepancies.K

Wright N !tone 5*E:E6 Kest 3est 4esign5 p.67 + 5random uncertainty of less than .8 logits ,Kreferencing 3"!( 3emo *E% 0est Test and !elf2Tailored Testing. 0en4amin A. Wright N @raham(. Aouglas, *E:8 . (lso . logits in !olving 3easurement roblems with the Rasch 3odel.Lournal of "ducational 3easurement *1 5+6 pp. E:2**9, !ummer *E:: 5and 3"!( 3emo 1+6

"ample "ize and #tem $alibration "tability. Linacre /'.

Rasch 'easurement 3ransactions 1669 :;9 p.827

+2


43/48

".2 What do 8nJt and 'utJt /ean%suare andStandardied ,eanL

These are all KfitK statistics. In a Rasch conte#t they indicate how accurately orpredictably data fit the model. Aichotomous fit statistics.olytomous fit statistics.

"nfitmeans inlier2sensitive or information2weighted fit. This is more sensitive to thepattern of responses to items targeted on the person, and vice2versa. For e#ample, infitreports overfit for @uttman patterns, underfit for alternative curricula or idiosyncraticclinical groups. These patterns can be hard to diagnose and remedy.

utfitmeans outlier2sensitive fit. This is more sensitive to responses to items with

difficulty far from a person, and vice2versa. For e#ample, outfit reports overfit forimputed responses, underfit for lucy guesses and careless mistaes. These areusually easy to diagnose and remedy.

Mean;s


44/48

!tandardiBed fit statistics are usually obtained by converting the mean2square statisticsto the normally2distributed B2standardiBed ones by means of the Wilson2'ilferty cuberoot transformation.

An(hored runs=(nchor values may not e#actly accord with the current data. To thee#tent that they donUt, fit statistics can be misleading. (nchor values that are too centralfor the current data tend to mae the data appear to fit too well. (nchor values that aretoo dispersed for the current data tend to mae the data appear noisy.

Mean's7uare alue /%plica&ion for Measure%en&

2.0Dis&or&s or de$rades &he %easure%en& sys&e%. May becaused by only one or &"o obser!a&ions.

1. ' 2.0Bnproduc&i!e for cons&ruc&ion of %easure%en&6 bu& no&de$radin$.

0. ' 1. #roduc&i!e for %easure%en&.

0.ess produc&i!e for %easure%en&6 bu& no& de$radin$.May produce %isleadin$ly hi$h reliabili&y and separa&ioncoecien&s.

!tandardiBed Jalue Implication for 3easurement

T (

Da&a !ery une;pec&ed if &hey 5& &he %odel?perfec&ly@6 so &hey probably do no&. >u&6 "i&hlar$e sa%ple size6 subs&an&i!e %is5& %ay bes%all.

2.0 ' 2.E Da&a no&iceably unpredic&able.

'1.E ' 1.E Da&a ha!e reasonable predic&abili&y.

U '2Da&a are &oo predic&able. &her di%ensions %aybe cons&rainin$ &he response pa&&erns.

)hat do #nfit and *utfit 'ean+square and "tandardized mean, Linacre /'. ? Rasch 'easurement3ransactions 2@@2 1A;2 p.7:7

++


45/48

-.3 %atin) S(ale "nstru*ent >uality ?riteria

Ratin+ Scale 8nstru,ent Muality Criteria

Criterion Poor ;air Good Nery Good 5cellent


46/48


47/48

TALE 1#.$ Assessment o Fr98t'on: S8*oos 'n Ar9; A;+ 1# 2:$ 200 INPUT: 2"" Persons %0 Items MEASURED: 2"" Persons %0 Items 2 CATS

Items MAP OF Persons


48/48

'9R C'*S91TA*TSO

and published in refereed ournals. 3is paper entitled QClearer to read andeasier to understand: Rasch Analysis in 1earnin+ 'utco,es )on as the 0estConference Paper in 8C55- K1 2?.Plannin+ and MA=MC for Muality 8,pro(e,ent. 3e is currently the Pro+ra,Coordinator for the 5ec.-ip. in Muality /ana+e,ent Syste, at SPAC59ni(ersity Te!nolo+i /alaysia. 8n year 2> and 2$ he )as consecuti(elyna,ed SPAC5 9T/ 0est Nisitin+ 1ecturer. ;or details (isit :http:==))).lin!edin.co,=in=saidfudin

/'3- SA8-;9-8* /AS'-8B holds a de+ree in Architecturefro, Australia and is a 1ead Assessor trained by *i+el 0auer1td. 9K and RWT9N A.G.%8nternational Ger,any. 3e is alsoan A'TS apan recipient in 5/S. With o(er 24 years ofeperience he has ,ana+ed a di(erse ran+e of proects onaspects of MA=MC for both pri(ate and public sectors )ithspeciJc interest in 8nstitution of 3i+her 1earnin+. 3is paperson uality perfor,ance ,easure,ent based on RaschAnalysis has been accepted as proceedin+s in se(eral

AR81A3 A0- A8 earned a 0achelor of Science in Co,puterScience and /arthenatics de+ree fro, Kent State 9ni(ersity

Kent 'hio 9S and her ,asters in /ana+e,ent 8nfor,ationSyste,s fro, the 8nternational 8sla,ic 9ni(ersity /alaysia.3er Ph.- is on co,petency ,easure,ent inde of 8nfor,ationProfessionals usin+ Rasch /odel. 3er papers on Rasch/easure,ent has been presented in international conferencesand published in refereed ournals (iB WS5AS TransactionsournalsB 8555plore *A9* etc.

She has conducted se(eral )or!shops and short courses re+ularly locallyand abroadB in areas of Perfor,ance /easure,ent Strate+ic Plannin+based on Rasch /odel /easure,ent and Analysis usin+ Winsteps and0ondU;o. ;or further enuiries she can be reached at arilahV+,ail.co,
http://www.linkedin.com/in/saidfudinhttp://var/www/apps/conversion/tmp/scratch_5/[email protected]://www.linkedin.com/in/saidfudinhttp://var/www/apps/conversion/tmp/scratch_5/[email protected]

0.1 Rasch Workshop Booklet Structure of Measurement

Documents