Rasch Measurement Model

4/23/2012

R hR h M d lM d lRaschRasch Model Model for Research/Educational for Research/Educational

AssessmentAssessment

This training is concern with the improvement of themeasurement process in educational assessmentusing Rasch Model. The assessments in education

Course Description

involve persons responding to a set of items forassessment. This training is aimed to determinewhether the responses to a set of assessment fit themodel.

Course Outcomes

Upon the completion of this course, participantwould be able to:

Understand the concept of psychometrics andRasch measurement modelRasch measurement modelDevelop a reliable set of test/ questionnaire for

measurementAnalyze the data/ responses using Bond&Fox

Steps® software.

Target Participants

This course is designed for:

LecturersResearchersPost Graduate StudentsTraining Authorities (Educational Admin Staffs)

Course Content

Developing Instruments• Steps• Hands‐on

Rasch• Intro• Key Concepts

Rating Scale Model• Dichotomous • Polytomous

Application• Bond & Fox• Hands On

EVALUATIONEVALUATION

ASSESSMENTASSESSMENT

TESTINGTESTING

Key Concept of Measurement

TESTINGTESTING

MEASUREMENTMEASUREMENT

GRADINGGRADING

4/23/2012

Measurement?

Right Measure? Right Scale?

Right Instrument?

Physical Measure?

Psychological Measure?

Psychometric Measure?

Global Accepted Measure

Length :: meterWeight :: gram

Time :: second/ minute/ hour

Temperature :: celsiusElectricity :: ampere

INSTRUMENT?INSTRUMENT?

Think Wisely!

Think Wisely! How to Measure?

4/23/2012

How to Measure?Secara logiknya…

Atribut kebijaksanaan yang paling sukar dimiliki oleh manusia, mungkinhanya dimiliki oleh manusia yang y y gpaling bijak…begitu juga atribut

kecantikan yang paling sukar dimilikioleh manusia, hanya dimiliki oleh

manusia yang paling cantik!

Secara logiknya juga…

Manusia yang memiliki atribut yang semua manusia miliki tidak dianggapbijak/ cantik dan tidak juga dianggapj / j g gg p

bodoh/ hodoh.

dannnnn!

Manusia yang tidak memiliki atributkebijaksanaan yang semua manusia

lain miliki boleh dianggapgg pbodoh…begitu juga manusia yang tidak memiliki atribut kecantikan

yang semua manusia lain miliki bolehdianggap hodoh.

Apakah yang anda faham;::Penilaian::EVALUATION..………………………………………….::Pengukuran::MEASUREMENT……………………………………::Pengujian::TESTING..……………………………………………...

Apa Tujuan Ujian?

Apakah Contoh Bentuk Ujian?

Educational Measurement

Apakah Contoh Bentuk Ujian?

Apakah Aspek Yang Kita Uji?::0:: Skema Pengajaran::0:: Objektif Pengajaran::0:: Rujukan Pengajaran::0:: Tugasan Pengajaran::0:: Kandungan Pengajaran

Why Rasch in OBE?

OUTCOME ASSESSMENTE lExamples:1. EE Survey2. Test/ Examination3. Training Effectiveness4. Placement Test5. Attitude/ Insaniah Skills

4/23/2012

Questions Must Be Simple

Clear and Directive

Avoid Multi-Defined Questions

Various Levels

Parallel With Teaching Contents

Test Instrument Construct

Parallel With Teaching Contents

Should Consider;Validity

Reliability

Objectivity

Utility

Readability

Test Instrument Construct

Contoh:

1 : SANGAT TIDAK SETUJU

2 : TIDAK SETUJU

3 : TIDAK PASTI

4 : SETUJU

Right Test Right Measure

4 : SETUJU

5 : SANGAT SETUJU

mean = 4.325

maksudnya?

55//1010TIDAK SAMATIDAK SAMA

Probabilistic Ratio-based Ruler

55::55

Bagi mengetahui responden yang mendapat skor tertinggi, item mestilah diketahui tahapkesukarannya. RM memastikan item disusun mengikut yang mudah kepada yang sukar. Perhatikan nilai 1merujuk kepada jawapan yang betul berada sebelah kiri. Pelajar P paling berupaya dan pelajar S paling tidak berupaya. Jika dibuat garisan merentas (garisputus) terdapat banyak skor 1 di atas garisan berbanding di bawah garisan. Skor 0 olehpelajar P di atas garisan adalah kerana kecuaian pelajar manakala skor 1 oleh pelajar S dibawah garisan adalah tekaan jawapan. Inilah idea asas RM.

Indices

10‐210 210 0 10 1

10 ‐1

Log Odds Unit Ruler(LOGIT)

Logit adalah unit ukuran yang dikira selepas data mentah (angka) ditukar kepada bentuk nisbah (ratio-

based) yang lebih tepat untuk mengukur abiliti.

4/23/2012

Antara Ali, Ah Chong and Ramasamy yang mendapat markah 79%...siapakah yang lebih bagus?

Antara 2 responden yang mean persetujuannya 3.45 (menggunakan 5 skel Likert)…siapakah yang benar-

benar bersetuju? atau berbohong? atau tersilap bersetuju?

CRITERION-BASED versus NORM-BASEDEVALUATION

Scalogram of Responses/ Anwers

Scalogram of Responses/ AnwersCARELESS LUCKY GUESS

PRINSIP MODEL PROBABILISTIK

“a person having a greater ability than another personshould have the greater probability of solving any item ofthe type in question, and similarly, one item being morediffi lt th th th t f th

difficult than another means that for any person theprobability of solving the second item is the greater one”.

(Rasch, 1960)

PROBABILITI SUKSESAbiliti Responden Kesukaran Item

sesebuah model pengukuran saintifik mestilah…

USES LINEAR MEASURE

OVERCOMES MISSING DATA

GIVES ESTIMATES OF PRECISION

DETECTS MISFITS OR OUTLIERS

PROVIDES RELIABILITY VALUE

Dimensions Number of ItemsAttitude Towards Energy Conservation (EC) 1, 11, 21, 31, 41, 50Attitude Towards Mobility and Transportation (MT) 2, 12, 22, 32,Attitude Towards Waste Avoidance (WA) 3, 13, 23, 33, 42, 51, 52, 53Attitude Towards Recycling (R) 4, 14, 24, 34, 43, 54Attitude Towards Consumerism (C) 5 15 25 35 44 55 56 57

Attitude Towards Consumerism (C) 5, 15, 25, 35, 44, 55, 56, 57Attitude Towards Environmental Conservation (VB) 6, 16, 26, 36, 45, 58, 59, 60Attitude Towards Flora and Fauna (EFF) 7, 17, 27, 37, 46Attitude Towards Water and Air (EWA) 8, 18, 28, 38, 47Attitude Towards Human Being (EHB) 9, 19, 29, 39, 48Attitude Towards Metaphysical Entities (EME) 10, 20, 30, 40, 49

After review analysis, there are ten main dimensions identified, with 60 items to be measured on student’s environmental attitude

effects.

4/23/2012

Dimension Item/ Attribute Weightage

D1/CO1/CH1

D1/CO1/CH1 A1

30%D1/CO1/CH1 A2D1/CO1/CH1 A3D2/CO2/CH2 A1

D2/CO2/CH2

D2/CO2/CH2 A2D2/CO2/CH2 A3D2/CO2/CH2 A4

D3/CO3/CH3

D3/CO3/CH3 A1

30%D3/CO3/CH3 A2D3/CO3/CH3 A3

Do keep the questionnaire brief and concise. Some questionnaires give the impression that their authors tried to think of every conceivable question that might be asked with respect to the general topic of concern. The result is a very long questionnaire causing annoyance and frustration on the part of the respondents resulting in non‐return of mailed questionnaires and incomplete or inaccurate responses on questionnaires administered directly. This is where Rasch becomes very handy in handling small size and missing data.

Do the best endeavor to write as few questions as possible to obtain it. Peripheral

Tips for Instrument Development

questions and ones to find out "something that might just be nice to know" must be avoided. A clear‐cut need for every question should be established. Experience tells us that items in a long questionnaire are normally non‐functional anymore.

Do face validity and a pilot test; it certainly helps. Get feedback on your initial list of questions. Feedback may be obtained from a small but representative as sampling unit.

Do choose appropriate response category language and logic. The extent to which responders agree with a statement can be assessed adequately in many cases by the dichotomous options: 1) Disagree 2) Agree

Do order categories. When response categories represent a progression between a lowerlevel of response and a higher one, it is usually better to list a polytomous order from thelower level to the higher in left‐to‐right order, for example,

1) Never 2) Seldom 3) Occasionally 4) Frequently

Do ask responders to rate both positive and negative stimuli. There is sometimes adifficulty when responders are asked to rate items for which the general level of approvalis high. There is a tendency for responders to mark every item at the same end of the

scale. By offering positive and negative responses the respondent is required to evaluateeach response rather than uniformly agreeing or disagreeing to all of the responses. Tryto visualize yourself in their shoe; Rasch see every attempt begins with a 50:50 scenario.

These options have the advantage of allowing the expression of some uncertainty. Incontrast, the following options would be undesirable in most cases:

1) Strongly disagree 2) Disagree 3) Agree 4) Strongly Agree

Some would say that "Strongly agree" is redundant or at best a colloquialism. In addition,there is no comfortable resting place for those with some uncertainty.

Avoid open-ended questions. In most cases open-ended questionsshould be avoided due to variation in willingness and ability to respond inwriting.

Avoid the response option "Others." Careless responders will overlookthe option they should have designated and conveniently mark the option

the option they should have designated and conveniently mark the option"Other" or will be hairsplitters and will reject an option for some trivialreason.

Avoid category proliferation. A typical question is the following:Marital status:1) Single (never married) 2) Married 3) Widowed4) Divorced 5) Separated

Avoid scale point proliferation. In contrast to category proliferation, whichseems usually to arise somewhat naturally, scale point proliferation takessome thought and effort. An example is:1) Never 2) Rarely 3) Occasionally 4) Fairly often5) Often 6) Very often 7) Almost always 8) Always

Such stimuli run the risk of annoying or confusing the responder withhairsplitting differences between the response levels. Psychometricresearch has shown that most subjects cannot reliably distinguish more

research has shown that most subjects cannot reliably distinguish morethan six or seven levels of responses. Offering four to five scale points isusually quite sufficient to stimulate a reasonably reliable indication ofresponse direction. Rasch analysis has shown the sensitivity of ameasurement is not loss despite a smaller rating is used. By somestatistical analysis, Rasch recommends a rating category to be collapsedor expanded.

Avoid responses at the scale mid-point and neutral responses. The use ofneutral response positions had a basis in the past when crudecomputational methods were unable to cope with missing data.

Avoid asking responders to rank responses. Responders cannot be reasonably expected to rank more than about six things at a time, and many of them misinterpret directions or make mistakes in responding. To help alleviate this latter problem, ranking questions may be framed as follows: Following are three colors for office walls: 1) Beige 2) Ivory 3) Light green

Which color do you like best? _____Which color do you like second best? _____Which color do you like least? _____

By carefully evaluating the needs of every question used in an instrument and carefully wording the responses, you will collect information which will yield more satisfactory and meaningful results.

4/23/2012

Why Rasch!

Rasch• Intro

Rasch Key Concept

• Intro• Key Concepts

Measurement

• Measurement is the process of constructinglines and locating individuals on lines

(Wright & Stones, 1979)

• Measurement is the location of objects along asingle dimension on the basis of observationswhich add together

(Bond & Fox, 2007)

Types of Scales

Nominal Ordinal

Interval Ratio

Rasch Measurement Model

• Rasch model can be applied to measure latenttraits (e.g., ability or attitude) in variousdisciplines

• Rasch model estimates of ability / attitude /difficulty become data for statistical analysis

• Latent traits are usually assessed troughthe responses of a sample of persons to aset of itemsset of items• two response categories• more than two response categories

4/23/2012

• The Rasch Model belongs to the itemresponse theory (IRT) models

• probability of an individual's response to an item• probability of a correct/keyed response to an item is a

mathematical function of person parameter (ability) and itemparameter (difficulty)

IRT Models

• Rasch gives the maximum likelihoodestimate (MLE) of an event outcome

• Rasch read the pattern of an event thuspredictive in nature which ability resolvesthe problem of missing data

• The relationship between the probability ofsuccess to an item and the latent trait isdescribed by a function called itemcharacteristic curve (ICC) that takes anS-shape

• ICC show pictorially the fit of the data to the model

Item characteristics curve showing the relationship between the location on the latent trait and the probability of answering the item correctly.

• The psychometric Rasch modelconceptualizes the measurement scalelike a ruler

• Items are located along the measurement• Items are located along the measurementscale according to their difficulty

Ability More difficult itemsLess difficult items - +

4/23/2012

• Person can also be located onthe same measurement scale

• They are located according totheir ability

More able

their ability• The less difficult items can be

successfully achieved by themore able subjects

Less able

• A person having a greater ability than anotherperson should have the greater probability ofsolving any item of the type in question andsimilarly one item being more difficult thansimilarly one item being more difficult thananother means that for any person theprobability of solving the second item is thegreater one

(Rasch, 1960, p117)

• A turn of event is seen as a chance; alikelihood of happenings hence a ratiodata (Steven, 1946)

100 102

0 2-1 1

SCALE with a unit termed logit’

• Interval scales have known and equalintervals between two graduations– numbers tell us how much more of the

attribute of interest is present– scales are linear and quantitative

100 102

0 2-1 1

• The probability of endorsing any responsecategory to an item solely depends on theg y y pperson ability and the item difficulty– This requirement called unidimensionality

Rasch Model Key Question

• When a person with this ability (number oftest items correct) encounters an item ofthis difficulty (number of person whothis difficulty (number of person whosucceeded on the item), what is thelikelihood that this person gets this itemcorrect?

4/23/2012

Rasch Model Key Answer

• The probability of success depends on thedifferences between the ability of theperson and the difficulty of the item

Rasch Measurement Model Theorem

• Theorem 1– Persons who are more able have a greater

likelihood of correctly answer all the items(dichotomous response)( p )

– Persons who are more developed (higheragreeability level) have a greater likelihood ofendorsing all the items (polytomous response)

Rasch Measurement Model Theorem

• Theorem 2– Easier items are more likely to be answered

correctly by all persons (dichotomousresponse)p )

– Easier task are more likely to be endorsed byall persons (polytomous response)

• In estimating the probability of success,the prediction is expressed in term ofchances / odds / probabilities.

• The data matrix (eg: result of theory driven• The data matrix (eg: result of theory drivenqualitative observation) can be arrangedso that the items are ordered from least tomost difficult and the person are orderedfrom least to more able.

Matrix of Abilities vs Difficulties

AbilityDifficulty

0/100 10/90 30/70 50/50 70/30 90/10 100/0

• This organized data table is termed aScalogram (Guttman, 1944). The higher up thetable one goes, the more able the person

• The further right across the table one goes• The further right across the table one goes,the more difficult the item

• Calculate item difficulties and person abilities(n/N, item or persons raw score divided bytotal possible score)

4/23/2012

Scalogram

EASY ITEMS/TO ENDORSE DIFFICULT ITEMS/TO ENDORSE

11111011111111111 11111111111111111 11111111111100 = 48

11111111111111111 11111111111111111 11111001001000 = 43

11111111111111111 11111111111100100 01100010000000 = 33

MORE ABLE

11111111111111111 11111111111100100 01100010000000 = 33

11111111111111111 11111111111010100 00110100000000 = 33

11110111111111111 10111111101001100 00110100101100 = 33

11111111111111111 11110111001000100 01000000000000 = 27

11111111111111101 11011011001000100 00000000001000 = 25LESS ABLE

Scalogram

• Pattern of success or failure can be seenin the data matrix

• Person who scores in old fashion(successful in easy question and(successful in easy question andunsuccessful for difficult question –response pattern 111111000000) is said togood to be true

Scalogram

• Person who scores well on difficult itemsdespite low overall score– might have guess or cheat on the item

• Person who scores poor on easy items despite• Person who scores poor on easy items despitehigh overall score– might indicate lack of concentration or guessingthe item

Scalogram

• If the person’s response pattern isunpredictable or so erratic (110101100) it isdifficult to interpret the success or failure(person ability).

• It is impossible to predict likelihood ofstudent’s performance (well / poorly) on thewhole test just by looking at item that is soerratic.– Investigate further the item.

Key Rasch Measurement Concepts

• Quantity• Estimates of item and person location

• PrecisionSt d d E (SE) f M t• Standard Error (SE) of Measurement

• Quality• Fit Statistics

The Basic Rasch Questions

• What are the distance between thelocation?• Estimates of item and person locations

How precise are those location?• How precise are those location?• SE of measurement

• Are those locations all equally valuable?• Fit Statistics

4/23/2012

Estimate: Person / Item Location

More DifficultMore Able

Location of a person

Less Difficult / EasyLess Able

• Rasch analysis were performed by setting themean of person as starting point (0 logits) forthe calibration

• In applying the Rasch model, item locationsare often scaled first

• The location of an item on a scale correspondswith the person location at which there is awith the person location at which there is a0.5 probability of a correct response to thequestion

• The probability of a person respondingcorrectly / endorsing to a question withdifficulty lower than that person's location isgreater than 0 5greater than 0.5

• The probability of responding correctly /endorsing to a question with difficulty greaterthan the person's location is less than 0.5.

Separation Statistics

• Item located by number of person getting aspecific item correct / endorsing a specificitem

• Person are located by number of items they• Person are located by number of items theyare able to answer correctly / endorse

• It is necessary to locate persons and itemsalong the variable line with sufficient precisionto "see“ between them

• The item and person separation statistics inRasch measurement provide an analytical toolby which to evaluate the successfuldevelopment of a variable and with which todevelopment of a variable and with which tomonitor its continuing utility

4/23/2012

• Person separation indicates how efficiently aset of items is able to separate those personsmeasured:– Separation that is too wide usually signifies gapsp y g g pamong person abilities

• This leads to imprecise measurement

– Separation that is too narrow signifies that notenough differentiation among person abilities todistinguish between them

• Item separation indicates how well a sampleof people is able to separate those items usedin the test:– Separation that is too wide usually signifies gaps– Separation that is too wide usually signifies gapsamong item difficulties

• This leads to imprecise measurement

– Separation that is too narrow signifies redundancyfor test items

• Separation statistics are expressed asreliabilities– range from 0.0 to 1.0

• Higher the value the better the separation• Higher the value the better the separationthat exists and the more precise themeasurement

SE of Measurement

• Standard Error– Method of measurement or estimation of the

standard deviation of the sampling distributionassociated with the estimation method

– Refer to an estimate of that standard deviation,derived from a particular sample used to computethe estimate

SE of Measurement

• The sample mean is the usual estimator of apopulation mean

• However, different samples drawn from thatsame population would in general havesame population would in general havedifferent values of the sample mean

SE of Measurement

• The standard error of the mean– Is standard deviation of the sample mean estimateof a population mean

– Estimated by the sample estimate of theEstimated by the sample estimate of thepopulation standard deviation (sample standarddeviation) divided by the square root of thesample size from population

4/23/2012

SE of Measurement

• The standard error of the mean can refer to anestimate of that standard deviation computedestimate of that standard deviation, computedfrom the sample of data being analyzed at thetime

SE of Measurement

• In practical applications, the true value of thestandard deviation (of the error) is usuallyunknown

• As a result the term standard error is often• As a result, the term standard error is oftenused to refer to an estimate of this unknownquantity– the standard error is only an estimate

SE of Measurement

• In other cases, the standard error may usefullybe used to provide an indication of the size ofthe uncertainty

• But standard error use to provide confidence• But standard error use to provide confidenceintervals or tests should be avoided unless thesample size is at least moderately large– Here "large enough" would depend on theparticular quantities being analyzed

SE of Measurement

• t‐distribution is used to provide a confidenceinterval for an estimated mean or differenceof means

Standard Deviation & Confidence Intervals

SE of Measurement

• For a value that is sampled with anunbiased normally distributed error, theproportion of samples would fall between0 1 2 and 3 standard deviations above0, 1, 2, and 3 standard deviations aboveand below the actual value

4/23/2012

SE of Measurement

• The standard error of a measure captures itsprecision in a particular context

• The accuracy of a measure is captured by fitstatisticsstatistics

• A measure may be accurate, but imprecise

SE of Measurement

• Raw scores are almost always reportedwithout their standard errors

• The highest possible precision for anymeasure is that obtained when every othermeasure is that obtained when every othermeasure is known, and the data fit the Raschmodel

SE of Measurement

• This standard error is called the "model"standard error and is reported by mostproduction‐oriented Rasch software

• For well constructed tests with clean data (as• For well‐constructed tests with clean data (asconfirmed by the fit statistics), the modelstandard error is usefully close to, but slightlysmaller than the actual standard error

SE of Measurement

• The stability of an item calibration is itsmodelled standard error

• A two‐tailed 99% confidence interval is ±2.6S E wideS.E. wide

• A two‐tailed 95% confidence interval is ±1.96S.E. wide

• A two‐tailed 68% confidence interval is ±1.00S.E. wide

SE of Measurement

• Suppose we want to be 99% confident thatthe "true" item difficulty is within 1 logit of itsreported estimate– The sample size needed to have 99% confidence– The sample size needed to have 99% confidencethat no item calibration is more than 1 logit awayfrom its stable value

SE of Measurement

• Then the estimate needs to have a standarderror of 1.0 logits divided by 2.6 or less = 1/2.6= 0.385 logits or less

• The stability to within ± 3 logits is the best• The stability to within ±.3 logits is the bestthat can be expected for most variables ifsample size needed to have 99% confidence

4/23/2012

Sample Size

Item Calibrationsstable within

Confidence Minimum sample size range

(best to poor targeting)

Size for mostpurposes

± 1 logit 95% 16 ‐‐ 36 30

± 1 logit 99% 27 ‐‐ 61 50

± ½ logit 95% 64 ‐‐ 144 100

± ½ logit 99% 108 ‐‐ 243 150

Definitive orHigh Stakes

99%+ (Items) 250 ‐‐ 20*test length

John Michael Linacre

Sample Size

• A sample of 50 well‐targeted respondent isconservative for obtaining useful, stableestimatesestimates

• 30 respondent is enough for well‐designedpilot studies

Fit Statistics

• To aid in measurement quality control• to identify those parts of the data which

t R h d l ifi ti dmeet Rasch model specifications andthose parts which don't

Fit Statistics

Item fit the

Item does not fit the

Fit Statistics

• Parts that don't are not automaticallyrejected, but are examined to identify inwhat way, and why, they fall short, andwhether on balance they contribute to orwhether, on balance, they contribute to orcorrupt measurement

• Then the decision is made to accept,reject or modify the data

Fit Statistics

• Modification includes simple actions suchas correcting obvious data entry errorsand respondent mistakes and moreand respondent mistakes, and moresophisticated actions such as collapsingrating scale categories.

4/23/2012

Developing Instrument

Developing Instruments

• Steps• Hands‐on

Developing Instruments

• To find out about the characteristics (latent traits) of people• Latent Traits: A characteristic or attribute ofa person that can be inferred from thea person that can be inferred from theobservation of the person’s behaviours.

• Measurement here is to measure the latent traits of people

Developing a Questionnaire

Indentify research that studies same

construct

Define the construct

Define the target population

Review related measuresDevelop a draftEvaluate draft

Revise the testCollect data on reliability and

validity

(Gall, Gall & Borg, 2003)

Example 1

• RESEARCH: Student’s Satisfaction in DistanceLearning Course

• What need to be measured?St d t S ti f ti• Student Satisfaction– What are the constructs and their variables?

DIMENSION=CONSTRUCT=CHAPTERDIMENSION=CONSTRUCT=CHAPTERITEM=VARIABLES=SUBCHAPTERITEM=VARIABLES=SUBCHAPTER

Student’s Satisfaction in DL Course

• Construct 1: Instructor Performance– Variables: availability, knowledge of subject, fairtreatment, respects students, encourage questions,present information clearly

• Construct 2: Student-Instructor Interaction– Variables: encourage students to actively involve,

provide feedback on work, provide progressperiodically

• Construct 3: Course Evaluation– Variables: course material relevant, assignment

relevant, workload appropriate with hours ofcredit

Example 2

• RESEARCH: Student’s learning for thesubject of Introduction to InteractiveMultimedia

• What need to be measured?• What need to be measured?• Level of Students Learning Ability

– What are the constructs and their variables?

4/23/2012

Students Learning

• Construct: Learning OutcomeVariables: Knowledge Understanding– Variables: Knowledge, Understanding,Application, Analysis, Evaluation, Synthesis

Dicho vs Poly

Rating Scale Model•Dichotomous •Polytomous

Dichotomous Data

• Rasch Principles:– Person Ability (N correct)– Item Facility (difficulty – N correct)Proportion: n correct/N possible– Proportion: n correct/N possible

– Odds of success/failure• Natural logarithm of odds – logits (log odds units)

Dichotomous DataData Matrix Showing the Odds

ItemPerson Q11 Q1 Q8 Q2 Q7 Q10 Q3 Q5 Q9 Q4 Q6 Ability n/1‐nR10 1 1 0 1 1 1 1 1 0 1 1 9 82/18R03 1 1 1 1 1 1 0 0 1 1 0 8 73/27R05 1 0 1 1 1 1 0 1 1 0 0 7 64/36R12 1 0 1 1 1 1 0 1 1 0 0 7 64/36R09 1 1 1 1 1 0 1 0 0 0 0 6 55/45R06 1 1 1 1 1 0 1 0 0 0 0 6 55/45/R11 1 1 1 0 0 1 0 1 0 0 0 5 45/55R01 1 1 1 1 1 0 0 0 0 0 0 5 45/55R07 1 1 1 0 0 1 0 1 0 0 0 5 45/55R04 1 1 1 0 0 0 1 0 0 0 0 4 36/64R02 1 1 0 0 0 0 1 0 0 0 0 3 27/73R08 0 1 1 0 0 0 0 0 0 0 0 2 18/82

Facility 11 10 10 7 7 6 5 5 3 2 1n/1‐n 93/ 83/ 83/ 58/ 58/ 50/ 42/ 42/ 25/ 17/ 8/

7 17 17 42 42 50 58 58 75 83 92

Procedure 1: Estimate Location

• Iterative process between item and personvalues:1. How difficult are these item?2 How able are these person?2. How able are these person?

• Iterated until acceptable variation in locationis reached

Procedure II: Fit Statistics

• Item difficulties and person abilities areentered into a matrix

• The Rasch modeled table of expectedprobabilities for each cell is calculatedprobabilities for each cell is calculated

4/23/2012

• Measurement is possible with only onevariable at a time

• Construct validity is the key concept.Th ti l t th t th it i– Theoretical argument that the items in aninstrument measures what it claims to measure

• Fit Statistics (misfit statistics) help in control ofthe measurement construct

• Fit Statistics are based on residualsDiff b t t l d t d t– Difference between actual and expected outcome

• Fit Statistic are used to control the quality of measures

• Aim is to detect the differences between:– EXPECTED: The strict measurement ofrequirements of the Rasch Model (Theory)

– ACTUAL: The data collected when the real itemsACTUAL: The data collected when the real itemsare used with real people (Practice)

1. Collect Observed Scores (Xni)– Must always be whole numbers (0,1)

2. Calculate the Rasch Expected ResponseProbabilities (E ) Based on Item and PersonProbabilities (Eni) Based on Item and PersonEstimates (eg: 0.80)– E=Expected response probability when any person with

the ability n respond to an item with difficulty i

3. Calculate Response Residual Yni = Xni‐Eni– Y = the response residual that remains in the cell

for person n x item i when the expected responseprobability Xni is subtracted from the actualp y niresponses Eni

• Residuals are squared (to remove negativevalues) and summed to yield:– Mean Squares of residual for every item and

personperson• Low mean squares are too predictable to believe• High mean squares are too unpredictable to yield

measures

• Residuals often standardized (t or z statistic)

4/23/2012

• In order to verify for fit and misfit items orpersons, the following criteria must besatisfied:– Point Measure Correlation: 0 32 < x < 0 8– Point Measure Correlation: 0.32 < x < 0.8– Infit / Outfit Mean Square: 0.5 < y < 1.5 OR 0.6 < y < 1.4 (for survey)

– Infit / Outfit Z standard: ‐2.0 < Z <+2.0

Dichotomous Data

• Example of the cognitive developmental test(BLOT) used to outline Rasch measurement:Tutorial 4– Analysis of BLOT focus on performance of the– Analysis of BLOT focus on performance of the

item than persons

Dichotomous Data

• The basic rasch questions:1. Quantity?2. Precision?3. Fit?

• Our focus is on:̵ Construct validity with a clear direction̵ Estimate of ability/development̵ Precision of the estimate̵ Confidence of estimate̵ Probability of success on similar item

Polytomous Data

• Nature of likert scales:– Ordered categories – lowest to highest– Response opportunities – good – neutral – bad

Odd / even number categories– Odd / even number categories– With / out mid point category (neutral)

Polytomous Data

• For rating‐scale data:– Each item have a difficulty estimate– Scale has a series of thresholds

• Item thresholds estimates the location where a person• Item thresholds estimates the location where a personwith the estimated ability has 50% probability success/failure on an item at same location

• 5 category have 4 thresholds

Polytomous Data

4/23/2012

Polytomous Data

• Run Tutorial 6 and interpret output– Quantity? Precision? Fit?– Maps / Tables

Variable Map– Variable Map– Fit Map– Item Characteristic Curve (ICC)– Person / Item Tables

TERIMA KASIH

SYUKRAN JAZILAN

THANK YOU

SYUKRIYA

ARIGATO GOZAIMAS

OOKINI

XIE XIE

KHOP KUN

DANKE SCHON

DOR JE

For More, Please Visit:http://www.rasch.org

R h U SAS EFHARISTO

DANK U

DAGHANG SALAMAT

GRAZIE

MATUR NUVUN

KOMAPSUMNIDA

MAHALO

STA NA SHUKRIA

• Reach Us:• Mohd Nor @ 019 281 9003

• Prasanna @ • Nurul Hidayah @

SHARING TIME…

How to Apply Simple Rasch in Educational Assessment

Educational Use

• To accurately measure student’s ability(knowledge, skill and attitude)

• To verify the reliability of test question setif h li bili f d ’• To verify the reliability of student’s answer

• To confirm the cognitive levels of questionaccording to educational taxonomy

• To quantify student’s ability in percentile

DISTRIBUTE QUESTION, OFFLINE OR ONLINE::Student answers via question form, offline or online::

(responses of min 15 students are recorded for at least of 95% confidency level)

exp(*) / (exp(*) / (11+exp+exp(*))(*))

Item Measure increasing

Person Measuredecreasing

Probability Sc

COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)

TEMPLATE 2

1. Logit score - Item difficulty score (*)i.e EA measure score is 1.3 – Item difficulty score is 0.3 = 1.0

2. Using above formula, the answer will be 0.7315

3. Therefore, final marks (%) for student’s EA7373%%

equivalent toB+ OR 3.0 GPA

Probability Sc%

4/23/2012

SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS

(Reliability of Answer’s acc Model)

rITEM=0.92rPERSON=0.72

(OV(OV--EV) / OVEV) / OVOV – OBSERVED VARIANCE

EV – ERROR VARIANCE

CLICK PERSON MEASURE(EA Person Measure)

TEMPLATE 2

4/23/2012

Rasch Measurement Model

sample size

latent traits

response categories

standard deviation

standard error

dichotomous

fit statistics

latent trait

Documents

The Rasch model as a statistical model: Erling B...

The Rasch Model, Additive Conjoint Measurement, and...

Rasch Measurement Transactions 24:3, 2010

PCA of Standardized Residuals RASCH MEASUREMENT · PCA of.....

THE USE OF RASCH MEASUREMENT MODEL IN ENGLISH …

ANALISIS DESAIN DIDAKTIS MENGGUNAKAN RASCH MODEL …

Rasch Model - journal.institutpendidikan.ac.id

An Overview of the Family of Rasch Measurement...

Rasch Measurement v. Item Response Theory: Knowing When to.....

The Rasch Model, Objective Measurement, - Conservancy

Loglinear Rasch model tests - Universiteit Twente

ESTIMATON FOR THE RASCH MODEL

The Rasch Rating Scale Model for Objective Measurement ·...

USING THE RASCH MEASUREMENT MODEL FOR STANDARD SETTING...

Overview of the Rasch Measurement Model JR

An application of the Rasch measurement theory to an ...