Top Banner
4/23/2012 1 R h R h Mdl Mdl Rasch Rasch Model Model for Research/Educational for Research/Educational Assessment Assessment This training is concern with the improvement of the measurement process in educational assessment using Rasch Model. The assessments in education Course Description involve persons responding to a set of items for assessment. This training is aimed to determine whether the responses to a set of assessment fit the model. Course Outcomes Upon the completion of this course, participant would be able to: Understand the concept of psychometrics and Rasch measurement model Rasch measurement model Develop a reliable set of test/ questionnaire for measurement Analyze the data/ responses using Bond&Fox Steps® software. Target Participants This course is designed for: Lecturers Researchers Post Graduate Students Training Authorities (Educational Admin Staffs) Course Content Developing Instruments • Steps • Handson Rasch • Intro • Key Concepts Rating Scale Model • Dichotomous • Polytomous Application • Bond & Fox • Hands On EVALUATION EVALUATION ASSESSMENT ASSESSMENT TESTING TESTING Key Concept of Measurement TESTING TESTING MEASUREMENT MEASUREMENT GRADING GRADING
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rasch Measurement Model

4/23/2012

1

R hR h M d lM d lRaschRasch Model Model for Research/Educational for Research/Educational

AssessmentAssessment

This training is concern with the improvement of themeasurement process in educational assessmentusing Rasch Model. The assessments in education

Course Description

involve persons responding to a set of items forassessment. This training is aimed to determinewhether the responses to a set of assessment fit themodel.

Course Outcomes

Upon the completion of this course, participantwould be able to:

Understand the concept of psychometrics andRasch measurement modelRasch measurement modelDevelop a reliable set of test/ questionnaire for

measurementAnalyze the data/ responses using Bond&Fox

Steps® software.

Target Participants

This course is designed for:

LecturersResearchersPost Graduate StudentsTraining Authorities (Educational Admin Staffs)

Course Content

Developing Instruments• Steps• Hands‐on

Rasch• Intro• Key Concepts

Rating Scale Model• Dichotomous • Polytomous

Application• Bond & Fox• Hands On

EVALUATIONEVALUATION

ASSESSMENTASSESSMENT

TESTINGTESTING

Key Concept of Measurement

TESTINGTESTING

MEASUREMENTMEASUREMENT

GRADINGGRADING

Page 2: Rasch Measurement Model

4/23/2012

2

Key Concept of Measurement

Measurement?

Right Measure? Right Scale?

Right Instrument?

Key Concept of Measurement

Physical Measure?

Psychological Measure?

Psychometric Measure?

Global Accepted Measure

Length :: meterWeight :: gram

Time :: second/ minute/ hour

Key Concept of Measurement

Temperature :: celsiusElectricity :: ampere

INSTRUMENT?INSTRUMENT?

Think Wisely!

Think Wisely! How to Measure?

Page 3: Rasch Measurement Model

4/23/2012

3

How to Measure?Secara logiknya…

Atribut kebijaksanaan yang paling sukar dimiliki oleh manusia, mungkinhanya dimiliki oleh manusia yang y y gpaling bijak…begitu juga atribut

kecantikan yang paling sukar dimilikioleh manusia, hanya dimiliki oleh

manusia yang paling cantik!

Secara logiknya juga…

Manusia yang memiliki atribut yang semua manusia miliki tidak dianggapbijak/ cantik dan tidak juga dianggapj / j g gg p

bodoh/ hodoh.

dannnnn!

Manusia yang tidak memiliki atributkebijaksanaan yang semua manusia

lain miliki boleh dianggapgg pbodoh…begitu juga manusia yang tidak memiliki atribut kecantikan

yang semua manusia lain miliki bolehdianggap hodoh.

Apakah yang anda faham;::Penilaian::EVALUATION..………………………………………….::Pengukuran::MEASUREMENT……………………………………::Pengujian::TESTING..……………………………………………...

Apa Tujuan Ujian?

Apakah Contoh Bentuk Ujian?

Educational Measurement

Apakah Contoh Bentuk Ujian?

Apakah Aspek Yang Kita Uji?::0:: Skema Pengajaran::0:: Objektif Pengajaran::0:: Rujukan Pengajaran::0:: Tugasan Pengajaran::0:: Kandungan Pengajaran

Why Rasch in OBE?

OUTCOME ASSESSMENTE lExamples:1. EE Survey2. Test/ Examination3. Training Effectiveness4. Placement Test5. Attitude/ Insaniah Skills

Page 4: Rasch Measurement Model

4/23/2012

4

Questions Must Be Simple

Clear and Directive

Avoid Multi-Defined Questions

Various Levels

Parallel With Teaching Contents

Test Instrument Construct

Parallel With Teaching Contents

Should Consider;Validity

Reliability

Objectivity

Utility

Readability

Test Instrument Construct

Contoh:

1 : SANGAT TIDAK SETUJU

2 : TIDAK SETUJU

3 : TIDAK PASTI

4 : SETUJU

Right Test Right Measure

4 : SETUJU

5 : SANGAT SETUJU

mean = 4.325

maksudnya?

55//1010TIDAK SAMATIDAK SAMA

Right Test Right Measure

5050

6040

7030

8020

9010

991

1090

2080

3070

4060

199

Probabilistic Ratio-based Ruler

55::55

Bagi mengetahui responden yang mendapat skor tertinggi,  item mestilah diketahui tahapkesukarannya.  RM  memastikan item  disusun mengikut yang  mudah kepada yang sukar. Perhatikan nilai 1merujuk  kepada jawapan yang  betul berada sebelah kiri.  Pelajar P paling  berupaya dan pelajar S  paling  tidak berupaya.  Jika dibuat garisan merentas (garisputus)  terdapat banyak skor 1  di atas garisan berbanding di bawah garisan.  Skor 0  olehpelajar P di atas garisan adalah kerana kecuaian pelajar manakala skor 1 oleh pelajar S  dibawah garisan adalah tekaan jawapan. Inilah idea asas RM.

Right Test Right Measure

g j p

Indices

10‐210 210 0 10 1

10 ‐1

Right Test Right Measure

Log Odds Unit Ruler(LOGIT)

Logit adalah unit ukuran yang dikira selepas data mentah (angka) ditukar kepada bentuk nisbah (ratio-

based) yang lebih tepat untuk mengukur abiliti.

Page 5: Rasch Measurement Model

4/23/2012

5

KES

Antara Ali, Ah Chong and Ramasamy yang mendapat markah 79%...siapakah yang lebih bagus?

Right Test Right Measure

Antara 2 responden yang mean persetujuannya 3.45 (menggunakan 5 skel Likert)…siapakah yang benar-

benar bersetuju? atau berbohong? atau tersilap bersetuju?

CRITERION-BASED versus NORM-BASEDEVALUATION

Right Test Right Measure

Scalogram of Responses/ Anwers

Right Test Right Measure

Scalogram of Responses/ AnwersCARELESS LUCKY GUESS

1

PRINSIP MODEL PROBABILISTIK

“a person having a greater ability than another personshould have the greater probability of solving any item ofthe type in question, and similarly, one item being morediffi lt th th th t f th

Right Test Right Measure

difficult than another means that for any person theprobability of solving the second item is the greater one”.

(Rasch, 1960)

PROBABILITI SUKSESAbiliti Responden Kesukaran Item

sesebuah model pengukuran saintifik mestilah…

USES LINEAR MEASURE

Right Test Right Measure

OVERCOMES MISSING DATA

GIVES ESTIMATES OF PRECISION

DETECTS MISFITS OR OUTLIERS

PROVIDES RELIABILITY VALUE

Dimensions Number of ItemsAttitude Towards Energy Conservation (EC) 1, 11, 21, 31, 41, 50Attitude Towards Mobility and Transportation (MT) 2, 12, 22, 32,Attitude Towards Waste Avoidance (WA) 3, 13, 23, 33, 42, 51, 52, 53Attitude Towards Recycling (R) 4, 14, 24, 34, 43, 54Attitude Towards Consumerism (C) 5 15 25 35 44 55 56 57

Right Test Right Measure

Attitude Towards Consumerism (C) 5, 15, 25, 35, 44, 55, 56, 57Attitude Towards Environmental Conservation (VB) 6, 16, 26, 36, 45, 58, 59, 60Attitude Towards Flora and Fauna (EFF) 7, 17, 27, 37, 46Attitude Towards Water and Air (EWA) 8, 18, 28, 38, 47Attitude Towards Human Being (EHB) 9, 19, 29, 39, 48Attitude Towards Metaphysical Entities (EME) 10, 20, 30, 40, 49

After review analysis, there are ten main dimensions identified, with 60 items to be measured on student’s environmental attitude

effects.

Page 6: Rasch Measurement Model

4/23/2012

6

Dimension Item/ Attribute Weightage

D1/CO1/CH1

D1/CO1/CH1 A1

30%D1/CO1/CH1 A2D1/CO1/CH1 A3D2/CO2/CH2 A1

Right Test Right Measure

D2/CO2/CH2

40%

D2/CO2/CH2 A2D2/CO2/CH2 A3D2/CO2/CH2 A4

D3/CO3/CH3

D3/CO3/CH3 A1

30%D3/CO3/CH3 A2D3/CO3/CH3 A3

Do keep the questionnaire brief and concise. Some questionnaires give the impression that  their  authors  tried  to  think  of  every  conceivable  question  that might  be  asked with  respect  to  the general  topic of  concern. The  result  is a very  long questionnaire causing  annoyance  and  frustration on  the  part of  the  respondents  resulting  in  non‐return  of  mailed  questionnaires  and  incomplete  or  inaccurate  responses  on questionnaires  administered  directly.  This  is  where  Rasch  becomes  very  handy  in handling small size and missing data. 

Do the  best  endeavor  to write  as  few  questions  as  possible  to  obtain  it.  Peripheral 

Tips for Instrument Development

questions and ones to find out "something that might  just be nice to know" must be avoided. A clear‐cut need for every question should be established. Experience tells us that items in a long questionnaire are normally non‐functional anymore. 

Do face validity and a pilot  test;  it certainly helps. Get  feedback on your  initial  list of questions.  Feedback may  be  obtained  from  a  small  but  representative  as  sampling unit. 

Do choose  appropriate  response  category  language  and  logic.  The  extent  to  which responders agree with a statement can be assessed adequately  in many cases by the dichotomous options: 1) Disagree 2) Agree 

Do order categories. When response categories represent a progression between a lowerlevel of response and a higher one, it is usually better to list a polytomous order from thelower level to the higher in left‐to‐right order, for example,

1) Never 2) Seldom 3) Occasionally 4) Frequently

Do ask responders to rate both positive and negative stimuli. There is sometimes adifficulty when responders are asked to rate items for which the general level of approvalis high. There is a tendency for responders to mark every item at the same end of the

Tips for Instrument Development

scale. By offering positive and negative responses the respondent is required to evaluateeach response rather than uniformly agreeing or disagreeing to all of the responses. Tryto visualize yourself in their shoe; Rasch see every attempt begins with a 50:50 scenario.

These options have the advantage of allowing the expression of some uncertainty. Incontrast, the following options would be undesirable in most cases:

1) Strongly disagree 2) Disagree 3) Agree 4) Strongly Agree

Some would say that "Strongly agree" is redundant or at best a colloquialism. In addition,there is no comfortable resting place for those with some uncertainty.

Avoid open-ended questions. In most cases open-ended questionsshould be avoided due to variation in willingness and ability to respond inwriting.

Avoid the response option "Others." Careless responders will overlookthe option they should have designated and conveniently mark the option

Tips for Instrument Development

the option they should have designated and conveniently mark the option"Other" or will be hairsplitters and will reject an option for some trivialreason.

Avoid category proliferation. A typical question is the following:Marital status:1) Single (never married) 2) Married 3) Widowed4) Divorced 5) Separated

Avoid scale point proliferation. In contrast to category proliferation, whichseems usually to arise somewhat naturally, scale point proliferation takessome thought and effort. An example is:1) Never 2) Rarely 3) Occasionally 4) Fairly often5) Often 6) Very often 7) Almost always 8) Always

Such stimuli run the risk of annoying or confusing the responder withhairsplitting differences between the response levels. Psychometricresearch has shown that most subjects cannot reliably distinguish more

Tips for Instrument Development

research has shown that most subjects cannot reliably distinguish morethan six or seven levels of responses. Offering four to five scale points isusually quite sufficient to stimulate a reasonably reliable indication ofresponse direction. Rasch analysis has shown the sensitivity of ameasurement is not loss despite a smaller rating is used. By somestatistical analysis, Rasch recommends a rating category to be collapsedor expanded.

Avoid responses at the scale mid-point and neutral responses. The use ofneutral response positions had a basis in the past when crudecomputational methods were unable to cope with missing data.

Avoid asking responders to rank responses. Responders cannot be reasonably expected to rank more than about six things at a time, and many of them misinterpret directions or make mistakes in responding. To help alleviate this latter problem, ranking questions may be framed as follows: Following are three colors for office walls: 1) Beige 2) Ivory 3) Light green

Tips for Instrument Development

Which color do you like best? _____Which color do you like second best? _____Which color do you like least? _____

By carefully evaluating the needs of every question used in an instrument and carefully wording the responses, you will collect information which will yield more satisfactory and meaningful results.

Page 7: Rasch Measurement Model

4/23/2012

7

Why Rasch!

Rasch• Intro

Rasch Key Concept

• Intro• Key Concepts

Measurement

• Measurement is the process of constructinglines and locating individuals on lines

(Wright & Stones, 1979)

• Measurement is the location of objects along asingle dimension on the basis of observationswhich add together

(Bond & Fox, 2007)

Types of Scales

Nominal Ordinal

Interval Ratio

Rasch Measurement Model

• Rasch model can be applied to measure latenttraits (e.g., ability or attitude) in variousdisciplines

• Rasch model estimates of ability / attitude /difficulty become data for statistical analysis

Rasch Measurement Model

• Latent traits are usually assessed troughthe responses of a sample of persons to aset of itemsset of items• two response categories• more than two response categories

Page 8: Rasch Measurement Model

4/23/2012

8

Rasch Measurement Model

• The Rasch Model belongs to the itemresponse theory (IRT) models

• probability of an individual's response to an item• probability of a correct/keyed response to an item is a

mathematical function of person parameter (ability) and itemparameter (difficulty)

IRT Models

Rasch Measurement Model

• Rasch gives the maximum likelihoodestimate (MLE) of an event outcome

• Rasch read the pattern of an event thuspredictive in nature which ability resolvesthe problem of missing data

Rasch Measurement Model

• The relationship between the probability ofsuccess to an item and the latent trait isdescribed by a function called itemcharacteristic curve (ICC) that takes anS-shape

Rasch Measurement Model

• ICC show pictorially the fit of the data to the model

Rasch Measurement Model

Item characteristics curve showing the relationship between the location on the latent trait and the probability of answering the item correctly.

Rasch Measurement Model

• The psychometric Rasch modelconceptualizes the measurement scalelike a ruler

• Items are located along the measurement• Items are located along the measurementscale according to their difficulty

Ability More difficult itemsLess difficult items - +

Page 9: Rasch Measurement Model

4/23/2012

9

Rasch Measurement Model

• Person can also be located onthe same measurement scale

• They are located according totheir ability

More able

C

their ability• The less difficult items can be

successfully achieved by themore able subjects

Less able

B

A

Rasch Measurement Model

• A person having a greater ability than anotherperson should have the greater probability ofsolving any item of the type in question andsimilarly one item being more difficult thansimilarly one item being more difficult thananother means that for any person theprobability of solving the second item is thegreater one

(Rasch, 1960, p117) 

Rasch Measurement Model

• A turn of event is seen as a chance; alikelihood of happenings hence a ratiodata (Steven, 1946)

1090

10-2

-2

3070

6040

5050

991

199

100 102

0 2-1 1

exp

logit

SCALE with a unit termed logit’

Rasch Measurement Model

• Interval scales have known and equalintervals between two graduations– numbers tell us how much more of the

attribute of interest is present– scales are linear and quantitative

1090

10-2

-2

3070

6040

5050

991

199

100 102

0 2-1 1

exp

logit

Rasch Measurement Model

• The probability of endorsing any responsecategory to an item solely depends on theg y y pperson ability and the item difficulty– This requirement called unidimensionality

Rasch Model Key Question

• When a person with this ability (number oftest items correct) encounters an item ofthis difficulty (number of person whothis difficulty (number of person whosucceeded on the item), what is thelikelihood that this person gets this itemcorrect?

Page 10: Rasch Measurement Model

4/23/2012

10

Rasch Model  Key Answer

• The probability of success depends on thedifferences between the ability of theperson and the difficulty of the item

Rasch Measurement Model Theorem

• Theorem 1– Persons who are more able have a greater

likelihood of correctly answer all the items(dichotomous response)( p )

– Persons who are more developed (higheragreeability level) have a greater likelihood ofendorsing all the items (polytomous response)

Rasch Measurement Model Theorem

• Theorem 2– Easier items are more likely to be answered

correctly by all persons (dichotomousresponse)p )

– Easier task are more likely to be endorsed byall persons (polytomous response)

Rasch Measurement Model

• In estimating the probability of success,the prediction is expressed in term ofchances / odds / probabilities.

• The data matrix (eg: result of theory driven• The data matrix (eg: result of theory drivenqualitative observation) can be arrangedso that the items are ordered from least tomost difficult and the person are orderedfrom least to more able.

Matrix of Abilities vs Difficulties

AbilityDifficulty

0/100 10/90 30/70 50/50 70/30 90/10 100/0

0/100

10/90

30/70

50/50

70/30

90/10

100/0

Rasch Measurement Model

• This organized data table is termed aScalogram (Guttman, 1944). The higher up thetable one goes, the more able the person

• The further right across the table one goes• The further right across the table one goes,the more difficult the item

• Calculate item difficulties and person abilities(n/N, item or persons raw score divided bytotal possible score)

Page 11: Rasch Measurement Model

4/23/2012

11

Scalogram

EASY ITEMS/TO ENDORSE             DIFFICULT  ITEMS/TO ENDORSE

11111011111111111 11111111111111111 11111111111100 = 48

11111111111111111 11111111111111111 11111001001000 = 43

11111111111111111 11111111111100100 01100010000000 = 33

MORE ABLE

11111111111111111 11111111111100100 01100010000000 = 33

11111111111111111 11111111111010100 00110100000000 = 33

11110111111111111 10111111101001100 00110100101100 = 33

11111111111111111 11110111001000100 01000000000000 = 27

11111111111111101 11011011001000100 00000000001000 = 25LESS ABLE

Scalogram

• Pattern of success or failure can be seenin the data matrix

• Person who scores in old fashion(successful in easy question and(successful in easy question andunsuccessful for difficult question –response pattern 111111000000) is said togood to be true

Scalogram

• Person who scores well on difficult itemsdespite low overall score– might have guess or cheat on the item

• Person who scores poor on easy items despite• Person who scores poor on easy items despitehigh overall score– might indicate lack of concentration or guessingthe item

Scalogram

• If the person’s response pattern isunpredictable or so erratic (110101100) it isdifficult to interpret the success or failure(person ability).

• It is impossible to predict likelihood ofstudent’s performance (well / poorly) on thewhole test just by looking at item that is soerratic.– Investigate further the item.

Key Rasch Measurement Concepts

• Quantity• Estimates of item and person location

• PrecisionSt d d E (SE) f M t• Standard Error (SE) of Measurement

• Quality• Fit Statistics

The Basic Rasch Questions

• What are the distance between thelocation?• Estimates of item and person locations

How precise are those location?• How precise are those location?• SE of measurement

• Are those locations all equally valuable?• Fit Statistics

Page 12: Rasch Measurement Model

4/23/2012

12

Estimate: Person / Item Location

More DifficultMore Able

Location of a person

Less Difficult / EasyLess Able

Estimate: Person / Item Location

• Rasch analysis were performed by setting themean of person as starting point (0 logits) forthe calibration

Estimate: Person / Item Location

• In applying the Rasch model, item locationsare often scaled first

• The location of an item on a scale correspondswith the person location at which there is awith the person location at which there is a0.5 probability of a correct response to thequestion

Estimate: Person / Item Location

• The probability of a person respondingcorrectly / endorsing to a question withdifficulty lower than that person's location isgreater than 0 5greater than 0.5

• The probability of responding correctly /endorsing to a question with difficulty greaterthan the person's location is less than 0.5.

Separation Statistics

• Item located by number of person getting aspecific item correct / endorsing a specificitem

• Person are located by number of items they• Person are located by number of items theyare able to answer correctly / endorse

• It is necessary to locate persons and itemsalong the variable line with sufficient precisionto "see“ between them

Separation Statistics

• The item and person separation statistics inRasch measurement provide an analytical toolby which to evaluate the successfuldevelopment of a variable and with which todevelopment of a variable and with which tomonitor its continuing utility

Page 13: Rasch Measurement Model

4/23/2012

13

Separation Statistics

• Person separation indicates how efficiently aset of items is able to separate those personsmeasured:– Separation that is too wide usually signifies gapsp y g g pamong person abilities

• This leads to imprecise measurement

– Separation that is too narrow signifies that notenough differentiation among person abilities todistinguish between them

Separation Statistics

• Item separation indicates how well a sampleof people is able to separate those items usedin the test:– Separation that is too wide usually signifies gaps– Separation that is too wide usually signifies gapsamong item difficulties

• This leads to imprecise measurement

– Separation that is too narrow signifies redundancyfor test items

Separation Statistics

• Separation statistics are expressed asreliabilities– range from 0.0 to 1.0

• Higher the value the better the separation• Higher the value the better the separationthat exists and the more precise themeasurement

SE of Measurement

• Standard Error– Method of measurement or estimation of the

standard deviation of the sampling distributionassociated with the estimation method

– Refer to an estimate of that standard deviation,derived from a particular sample used to computethe estimate

SE of Measurement

• The sample mean is the usual estimator of apopulation mean

• However, different samples drawn from thatsame population would in general havesame population would in general havedifferent values of the sample mean

SE of Measurement

• The standard error of the mean– Is standard deviation of the sample mean estimateof a population mean

– Estimated by the sample estimate of theEstimated by the sample estimate of thepopulation standard deviation (sample standarddeviation) divided by the square root of thesample size from population

Page 14: Rasch Measurement Model

4/23/2012

14

SE of Measurement

• The standard error of the mean can refer to anestimate of that standard deviation computedestimate of that standard deviation, computedfrom the sample of data being analyzed at thetime

SE of Measurement

• In practical applications, the true value of thestandard deviation (of the error) is usuallyunknown

• As a result the term standard error is often• As a result, the term standard error is oftenused to refer to an estimate of this unknownquantity– the standard error is only an estimate

SE of Measurement

• In other cases, the standard error may usefullybe used to provide an indication of the size ofthe uncertainty

• But standard error use to provide confidence• But standard error use to provide confidenceintervals or tests should be avoided unless thesample size is at least moderately large– Here "large enough" would depend on theparticular quantities being analyzed

SE of Measurement

• t‐distribution is used to provide a confidenceinterval for an estimated mean or differenceof means

Standard Deviation & Confidence Intervals

SE of Measurement

• For a value that is sampled with anunbiased normally distributed error, theproportion of samples would fall between0 1 2 and 3 standard deviations above0, 1, 2, and 3 standard deviations aboveand below the actual value

Page 15: Rasch Measurement Model

4/23/2012

15

SE of Measurement

• The standard error of a measure captures itsprecision in a particular context

• The accuracy of a measure is captured by fitstatisticsstatistics

• A measure may be accurate, but imprecise

SE of Measurement

• Raw scores are almost always reportedwithout their standard errors

• The highest possible precision for anymeasure is that obtained when every othermeasure is that obtained when every othermeasure is known, and the data fit the Raschmodel

SE of Measurement

• This standard error is called the "model"standard error and is reported by mostproduction‐oriented Rasch software

• For well constructed tests with clean data (as• For well‐constructed tests with clean data (asconfirmed by the fit statistics), the modelstandard error is usefully close to, but slightlysmaller than the actual standard error

SE of Measurement

• The stability of an item calibration is itsmodelled standard error

• A two‐tailed 99% confidence interval is ±2.6S E wideS.E. wide

• A two‐tailed 95% confidence interval is ±1.96S.E. wide

• A two‐tailed 68% confidence interval is ±1.00S.E. wide

SE of Measurement

• Suppose we want to be 99% confident thatthe "true" item difficulty is within 1 logit of itsreported estimate– The sample size needed to have 99% confidence– The sample size needed to have 99% confidencethat no item calibration is more than 1 logit awayfrom its stable value

SE of Measurement

• Then the estimate needs to have a standarderror of 1.0 logits divided by 2.6 or less = 1/2.6= 0.385 logits or less

• The stability to within ± 3 logits is the best• The stability to within ±.3 logits is the bestthat can be expected for most variables ifsample size needed to have 99% confidence

Page 16: Rasch Measurement Model

4/23/2012

16

Sample Size

Item Calibrationsstable within

Confidence Minimum sample size range

(best to poor targeting)

Size for mostpurposes

± 1 logit 95% 16 ‐‐ 36 30

± 1 logit 99% 27 ‐‐ 61 50

± ½ logit 95% 64 ‐‐ 144 100

± ½ logit 99% 108 ‐‐ 243 150

Definitive orHigh Stakes

99%+ (Items) 250 ‐‐ 20*test length

250

John Michael Linacre

Sample Size

• A sample of 50 well‐targeted respondent isconservative for obtaining useful, stableestimatesestimates

• 30 respondent is enough for well‐designedpilot studies

Fit Statistics

• To aid in measurement quality control• to identify those parts of the data which

t R h d l ifi ti dmeet Rasch model specifications andthose parts which don't

Fit Statistics

Item fit the

model

Item does not fit the

model

Fit Statistics

• Parts that don't are not automaticallyrejected, but are examined to identify inwhat way, and why, they fall short, andwhether on balance they contribute to orwhether, on balance, they contribute to orcorrupt measurement

• Then the decision is made to accept,reject or modify the data

Fit Statistics

• Modification includes simple actions suchas correcting obvious data entry errorsand respondent mistakes and moreand respondent mistakes, and moresophisticated actions such as collapsingrating scale categories.

Page 17: Rasch Measurement Model

4/23/2012

17

Developing Instrument

Developing Instruments

• Steps• Hands‐on

Developing Instruments

• To find out about the characteristics (latent traits) of people• Latent Traits: A characteristic or attribute ofa person that can be inferred from thea person that can be inferred from theobservation of the person’s behaviours.

• Measurement here is to measure the latent traits  of people

Developing a Questionnaire

Indentify research that studies same 

construct

Define the construct

Define the target population

Review related measuresDevelop a draftEvaluate draft

Revise the testCollect data on reliability and 

validity

(Gall, Gall & Borg, 2003)

Example 1

• RESEARCH: Student’s Satisfaction in DistanceLearning Course

• What need to be measured?St d t S ti f ti• Student Satisfaction– What are the constructs and their variables?

DIMENSION=CONSTRUCT=CHAPTERDIMENSION=CONSTRUCT=CHAPTERITEM=VARIABLES=SUBCHAPTERITEM=VARIABLES=SUBCHAPTER

Student’s Satisfaction in DL Course

• Construct 1: Instructor Performance– Variables: availability, knowledge of subject, fairtreatment, respects students, encourage questions,present information clearly

• Construct 2: Student-Instructor Interaction– Variables: encourage students to actively involve,

provide feedback on work, provide progressperiodically

• Construct 3: Course Evaluation– Variables: course material relevant, assignment

relevant, workload appropriate with hours ofcredit

Example 2

• RESEARCH: Student’s learning for thesubject of Introduction to InteractiveMultimedia

• What need to be measured?• What need to be measured?• Level of Students Learning Ability

– What are the constructs and their variables?

Page 18: Rasch Measurement Model

4/23/2012

18

Students Learning

• Construct: Learning OutcomeVariables: Knowledge Understanding– Variables: Knowledge, Understanding,Application, Analysis, Evaluation, Synthesis

Dicho vs Poly

Rating Scale Model•Dichotomous •Polytomous

Dichotomous Data

• Rasch Principles:– Person Ability (N correct)– Item Facility (difficulty – N correct)Proportion: n correct/N possible– Proportion: n correct/N possible

– Odds of success/failure• Natural logarithm of odds – logits (log odds units)

Dichotomous DataData Matrix Showing the Odds

ItemPerson Q11 Q1 Q8 Q2 Q7 Q10 Q3 Q5 Q9 Q4 Q6 Ability n/1‐nR10 1 1 0 1 1 1 1 1 0 1 1 9 82/18R03 1 1 1 1 1 1 0 0 1 1 0 8 73/27R05 1 0 1 1 1 1 0 1 1 0 0 7 64/36R12 1 0 1 1 1 1 0 1 1 0 0 7 64/36R09 1 1 1 1 1 0 1 0 0 0 0 6 55/45R06 1 1 1 1 1 0 1 0 0 0 0 6 55/45/R11 1 1 1 0 0 1 0 1 0 0 0 5 45/55R01 1 1 1 1 1 0 0 0 0 0 0 5 45/55R07 1 1 1 0 0 1 0 1 0 0 0 5 45/55R04 1 1 1 0 0 0 1 0 0 0 0 4 36/64R02 1 1 0 0 0 0 1 0 0 0 0 3 27/73R08 0 1 1 0 0 0 0 0 0 0 0 2 18/82

Facility 11 10 10 7 7 6 5 5 3 2 1n/1‐n 93/ 83/ 83/ 58/ 58/ 50/ 42/ 42/ 25/ 17/ 8/

7 17 17 42 42 50 58 58 75 83 92

Procedure 1: Estimate Location

• Iterative process between item and personvalues:1. How difficult are these item?2 How able are these person?2. How able are these person?

• Iterated until acceptable variation in locationis reached

Procedure II: Fit Statistics 

• Item difficulties and person abilities areentered into a matrix

• The Rasch modeled table of expectedprobabilities for each cell is calculatedprobabilities for each cell is calculated

Page 19: Rasch Measurement Model

4/23/2012

19

Procedure II: Fit Statistics 

• Measurement is possible with only onevariable at a time

• Construct validity is the key concept.Th ti l t th t th it i– Theoretical argument that the items in aninstrument measures what it claims to measure

Procedure II: Fit Statistics 

• Fit Statistics (misfit statistics) help in control ofthe measurement construct

• Fit Statistics are based on residualsDiff b t t l d t d t– Difference between actual and expected outcome

• Fit Statistic are used to control the quality of measures

Procedure II: Fit Statistics 

• Aim is to detect the differences between:– EXPECTED: The strict measurement ofrequirements of the Rasch Model (Theory)

– ACTUAL: The data collected when the real itemsACTUAL: The data collected when the real itemsare used with real people (Practice)

Procedure II: Fit Statistics 

1. Collect Observed Scores (Xni)– Must always be whole numbers (0,1)

2. Calculate the Rasch Expected ResponseProbabilities (E ) Based on Item and PersonProbabilities (Eni) Based on Item and PersonEstimates (eg: 0.80)– E=Expected response probability when any person with

the ability n respond to an item with difficulty i

Procedure II: Fit Statistics 

3. Calculate Response Residual Yni = Xni‐Eni– Y = the response residual that remains in the cell

for person n x item i when the expected responseprobability Xni is subtracted from the actualp y niresponses Eni

Procedure II: Fit Statistics 

• Residuals are squared (to remove negativevalues) and summed to yield:– Mean Squares of residual for every item and

personperson• Low mean squares are too predictable to believe• High mean squares are too unpredictable to yield

measures

• Residuals often standardized (t or z statistic)

Page 20: Rasch Measurement Model

4/23/2012

20

Procedure II: Fit Statistics 

• In order to verify for fit and misfit items orpersons, the following criteria must besatisfied:– Point Measure Correlation: 0 32 < x < 0 8– Point Measure Correlation: 0.32 < x < 0.8– Infit / Outfit Mean Square:   0.5 < y < 1.5 OR 0.6 < y < 1.4 (for survey)

– Infit / Outfit Z standard:   ‐2.0 < Z <+2.0

Dichotomous Data

• Example of the cognitive developmental test(BLOT) used to outline Rasch measurement:Tutorial 4– Analysis of BLOT focus on performance of the– Analysis of BLOT focus on performance of the

item than persons

Dichotomous Data

• The basic rasch questions:1. Quantity?2. Precision?3. Fit?

• Our focus is on:̵ Construct validity with a clear direction̵ Estimate of ability/development̵ Precision of the estimate̵ Confidence of estimate̵ Probability of success on similar item

Polytomous Data

• Nature of likert scales:– Ordered categories – lowest to highest– Response opportunities – good – neutral – bad

Odd / even number categories– Odd / even number categories– With / out mid point category (neutral)

Polytomous Data

• For rating‐scale data:– Each item have a difficulty estimate– Scale has a series of thresholds

• Item thresholds estimates the location where a person• Item thresholds estimates the location where a personwith the estimated ability has 50% probability success/failure on an item at same location

• 5 category have 4 thresholds

Polytomous Data

Page 21: Rasch Measurement Model

4/23/2012

21

Polytomous Data

• Run Tutorial 6 and interpret output– Quantity? Precision? Fit?– Maps / Tables

Variable Map– Variable Map– Fit Map– Item Characteristic Curve (ICC)– Person / Item Tables

TERIMA KASIH

SYUKRAN JAZILAN

THANK YOU

SYUKRIYA

ARIGATO GOZAIMAS

OOKINI

XIE XIE

KHOP KUN

MERCI

DANKE SCHON

DOR JE

For More, Please Visit:http://www.rasch.org

R h U SAS EFHARISTO

NANDI

DANK U

TODA

HVALA

DAGHANG SALAMAT

GRAZIE

MATUR NUVUN

KOMAPSUMNIDA

MAHALO

STA NA SHUKRIA

• Reach Us:• Mohd Nor @ 019 281 9003

• Prasanna @ • Nurul Hidayah @

SHARING TIME…

How to Apply Simple Rasch in Educational Assessment

Educational Use

• To accurately measure student’s ability(knowledge, skill and attitude)

• To verify the reliability of test question setif h li bili f d ’• To verify the reliability of student’s answer

• To confirm the cognitive levels of questionaccording to educational taxonomy

• To quantify student’s ability in percentile

DISTRIBUTE QUESTION, OFFLINE OR ONLINE::Student answers via question form, offline or online::

(responses of min 15 students are recorded for at least of 95% confidency level)

exp(*) / (exp(*) / (11+exp+exp(*))(*))

Item Measure increasing

Person Measuredecreasing

Probability Sc

COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)

TEMPLATE 2

1. Logit score - Item difficulty score (*)i.e EA measure score is 1.3 – Item difficulty score is 0.3 = 1.0

2. Using above formula, the answer will be 0.7315

3. Therefore, final marks (%) for student’s EA7373%%

equivalent toB+ OR 3.0 GPA

Probability Sc%

Page 22: Rasch Measurement Model

4/23/2012

22

SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS

(Reliability of Answer’s acc Model)

SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS

(Reliability of Answer’s acc Model)

rITEM=0.92rPERSON=0.72

(OV(OV--EV) / OVEV) / OVOV – OBSERVED VARIANCE

EV – ERROR VARIANCE

SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS

(Reliability of Answer’s acc Model)

CLICK PERSON MEASURE(EA Person Measure)

CLICK PERSON MEASURE(EA Person Measure)

COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)

TEMPLATE 2

Page 23: Rasch Measurement Model

4/23/2012

23

COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)