Transcript
4/23/2012
1
R hR h M d lM d lRaschRasch Model Model for Research/Educational for Research/Educational
AssessmentAssessment
This training is concern with the improvement of themeasurement process in educational assessmentusing Rasch Model. The assessments in education
Course Description
involve persons responding to a set of items forassessment. This training is aimed to determinewhether the responses to a set of assessment fit themodel.
Course Outcomes
Upon the completion of this course, participantwould be able to:
Understand the concept of psychometrics andRasch measurement modelRasch measurement modelDevelop a reliable set of test/ questionnaire for
measurementAnalyze the data/ responses using Bond&Fox
Steps® software.
Target Participants
This course is designed for:
LecturersResearchersPost Graduate StudentsTraining Authorities (Educational Admin Staffs)
Course Content
Developing Instruments• Steps• Hands‐on
Rasch• Intro• Key Concepts
Rating Scale Model• Dichotomous • Polytomous
Application• Bond & Fox• Hands On
EVALUATIONEVALUATION
ASSESSMENTASSESSMENT
TESTINGTESTING
Key Concept of Measurement
TESTINGTESTING
MEASUREMENTMEASUREMENT
GRADINGGRADING
4/23/2012
2
Key Concept of Measurement
Measurement?
Right Measure? Right Scale?
Right Instrument?
Key Concept of Measurement
Physical Measure?
Psychological Measure?
Psychometric Measure?
Global Accepted Measure
Length :: meterWeight :: gram
Time :: second/ minute/ hour
Key Concept of Measurement
Temperature :: celsiusElectricity :: ampere
INSTRUMENT?INSTRUMENT?
Think Wisely!
Think Wisely! How to Measure?
4/23/2012
3
How to Measure?Secara logiknya…
Atribut kebijaksanaan yang paling sukar dimiliki oleh manusia, mungkinhanya dimiliki oleh manusia yang y y gpaling bijak…begitu juga atribut
kecantikan yang paling sukar dimilikioleh manusia, hanya dimiliki oleh
manusia yang paling cantik!
Secara logiknya juga…
Manusia yang memiliki atribut yang semua manusia miliki tidak dianggapbijak/ cantik dan tidak juga dianggapj / j g gg p
bodoh/ hodoh.
dannnnn!
Manusia yang tidak memiliki atributkebijaksanaan yang semua manusia
lain miliki boleh dianggapgg pbodoh…begitu juga manusia yang tidak memiliki atribut kecantikan
yang semua manusia lain miliki bolehdianggap hodoh.
Apakah yang anda faham;::Penilaian::EVALUATION..………………………………………….::Pengukuran::MEASUREMENT……………………………………::Pengujian::TESTING..……………………………………………...
Apa Tujuan Ujian?
Apakah Contoh Bentuk Ujian?
Educational Measurement
Apakah Contoh Bentuk Ujian?
Apakah Aspek Yang Kita Uji?::0:: Skema Pengajaran::0:: Objektif Pengajaran::0:: Rujukan Pengajaran::0:: Tugasan Pengajaran::0:: Kandungan Pengajaran
Why Rasch in OBE?
OUTCOME ASSESSMENTE lExamples:1. EE Survey2. Test/ Examination3. Training Effectiveness4. Placement Test5. Attitude/ Insaniah Skills
4/23/2012
4
Questions Must Be Simple
Clear and Directive
Avoid Multi-Defined Questions
Various Levels
Parallel With Teaching Contents
Test Instrument Construct
Parallel With Teaching Contents
Should Consider;Validity
Reliability
Objectivity
Utility
Readability
Test Instrument Construct
Contoh:
1 : SANGAT TIDAK SETUJU
2 : TIDAK SETUJU
3 : TIDAK PASTI
4 : SETUJU
Right Test Right Measure
4 : SETUJU
5 : SANGAT SETUJU
mean = 4.325
maksudnya?
55//1010TIDAK SAMATIDAK SAMA
Right Test Right Measure
5050
6040
7030
8020
9010
991
1090
2080
3070
4060
199
Probabilistic Ratio-based Ruler
55::55
Bagi mengetahui responden yang mendapat skor tertinggi, item mestilah diketahui tahapkesukarannya. RM memastikan item disusun mengikut yang mudah kepada yang sukar. Perhatikan nilai 1merujuk kepada jawapan yang betul berada sebelah kiri. Pelajar P paling berupaya dan pelajar S paling tidak berupaya. Jika dibuat garisan merentas (garisputus) terdapat banyak skor 1 di atas garisan berbanding di bawah garisan. Skor 0 olehpelajar P di atas garisan adalah kerana kecuaian pelajar manakala skor 1 oleh pelajar S dibawah garisan adalah tekaan jawapan. Inilah idea asas RM.
Right Test Right Measure
g j p
Indices
10‐210 210 0 10 1
10 ‐1
Right Test Right Measure
Log Odds Unit Ruler(LOGIT)
Logit adalah unit ukuran yang dikira selepas data mentah (angka) ditukar kepada bentuk nisbah (ratio-
based) yang lebih tepat untuk mengukur abiliti.
4/23/2012
5
KES
Antara Ali, Ah Chong and Ramasamy yang mendapat markah 79%...siapakah yang lebih bagus?
Right Test Right Measure
Antara 2 responden yang mean persetujuannya 3.45 (menggunakan 5 skel Likert)…siapakah yang benar-
benar bersetuju? atau berbohong? atau tersilap bersetuju?
CRITERION-BASED versus NORM-BASEDEVALUATION
Right Test Right Measure
Scalogram of Responses/ Anwers
Right Test Right Measure
Scalogram of Responses/ AnwersCARELESS LUCKY GUESS
1
PRINSIP MODEL PROBABILISTIK
“a person having a greater ability than another personshould have the greater probability of solving any item ofthe type in question, and similarly, one item being morediffi lt th th th t f th
Right Test Right Measure
difficult than another means that for any person theprobability of solving the second item is the greater one”.
(Rasch, 1960)
PROBABILITI SUKSESAbiliti Responden Kesukaran Item
sesebuah model pengukuran saintifik mestilah…
USES LINEAR MEASURE
Right Test Right Measure
OVERCOMES MISSING DATA
GIVES ESTIMATES OF PRECISION
DETECTS MISFITS OR OUTLIERS
PROVIDES RELIABILITY VALUE
Dimensions Number of ItemsAttitude Towards Energy Conservation (EC) 1, 11, 21, 31, 41, 50Attitude Towards Mobility and Transportation (MT) 2, 12, 22, 32,Attitude Towards Waste Avoidance (WA) 3, 13, 23, 33, 42, 51, 52, 53Attitude Towards Recycling (R) 4, 14, 24, 34, 43, 54Attitude Towards Consumerism (C) 5 15 25 35 44 55 56 57
Right Test Right Measure
Attitude Towards Consumerism (C) 5, 15, 25, 35, 44, 55, 56, 57Attitude Towards Environmental Conservation (VB) 6, 16, 26, 36, 45, 58, 59, 60Attitude Towards Flora and Fauna (EFF) 7, 17, 27, 37, 46Attitude Towards Water and Air (EWA) 8, 18, 28, 38, 47Attitude Towards Human Being (EHB) 9, 19, 29, 39, 48Attitude Towards Metaphysical Entities (EME) 10, 20, 30, 40, 49
After review analysis, there are ten main dimensions identified, with 60 items to be measured on student’s environmental attitude
effects.
4/23/2012
6
Dimension Item/ Attribute Weightage
D1/CO1/CH1
D1/CO1/CH1 A1
30%D1/CO1/CH1 A2D1/CO1/CH1 A3D2/CO2/CH2 A1
Right Test Right Measure
D2/CO2/CH2
40%
D2/CO2/CH2 A2D2/CO2/CH2 A3D2/CO2/CH2 A4
D3/CO3/CH3
D3/CO3/CH3 A1
30%D3/CO3/CH3 A2D3/CO3/CH3 A3
Do keep the questionnaire brief and concise. Some questionnaires give the impression that their authors tried to think of every conceivable question that might be asked with respect to the general topic of concern. The result is a very long questionnaire causing annoyance and frustration on the part of the respondents resulting in non‐return of mailed questionnaires and incomplete or inaccurate responses on questionnaires administered directly. This is where Rasch becomes very handy in handling small size and missing data.
Do the best endeavor to write as few questions as possible to obtain it. Peripheral
Tips for Instrument Development
questions and ones to find out "something that might just be nice to know" must be avoided. A clear‐cut need for every question should be established. Experience tells us that items in a long questionnaire are normally non‐functional anymore.
Do face validity and a pilot test; it certainly helps. Get feedback on your initial list of questions. Feedback may be obtained from a small but representative as sampling unit.
Do choose appropriate response category language and logic. The extent to which responders agree with a statement can be assessed adequately in many cases by the dichotomous options: 1) Disagree 2) Agree
Do order categories. When response categories represent a progression between a lowerlevel of response and a higher one, it is usually better to list a polytomous order from thelower level to the higher in left‐to‐right order, for example,
1) Never 2) Seldom 3) Occasionally 4) Frequently
Do ask responders to rate both positive and negative stimuli. There is sometimes adifficulty when responders are asked to rate items for which the general level of approvalis high. There is a tendency for responders to mark every item at the same end of the
Tips for Instrument Development
scale. By offering positive and negative responses the respondent is required to evaluateeach response rather than uniformly agreeing or disagreeing to all of the responses. Tryto visualize yourself in their shoe; Rasch see every attempt begins with a 50:50 scenario.
These options have the advantage of allowing the expression of some uncertainty. Incontrast, the following options would be undesirable in most cases:
1) Strongly disagree 2) Disagree 3) Agree 4) Strongly Agree
Some would say that "Strongly agree" is redundant or at best a colloquialism. In addition,there is no comfortable resting place for those with some uncertainty.
Avoid open-ended questions. In most cases open-ended questionsshould be avoided due to variation in willingness and ability to respond inwriting.
Avoid the response option "Others." Careless responders will overlookthe option they should have designated and conveniently mark the option
Tips for Instrument Development
the option they should have designated and conveniently mark the option"Other" or will be hairsplitters and will reject an option for some trivialreason.
Avoid category proliferation. A typical question is the following:Marital status:1) Single (never married) 2) Married 3) Widowed4) Divorced 5) Separated
Avoid scale point proliferation. In contrast to category proliferation, whichseems usually to arise somewhat naturally, scale point proliferation takessome thought and effort. An example is:1) Never 2) Rarely 3) Occasionally 4) Fairly often5) Often 6) Very often 7) Almost always 8) Always
Such stimuli run the risk of annoying or confusing the responder withhairsplitting differences between the response levels. Psychometricresearch has shown that most subjects cannot reliably distinguish more
Tips for Instrument Development
research has shown that most subjects cannot reliably distinguish morethan six or seven levels of responses. Offering four to five scale points isusually quite sufficient to stimulate a reasonably reliable indication ofresponse direction. Rasch analysis has shown the sensitivity of ameasurement is not loss despite a smaller rating is used. By somestatistical analysis, Rasch recommends a rating category to be collapsedor expanded.
Avoid responses at the scale mid-point and neutral responses. The use ofneutral response positions had a basis in the past when crudecomputational methods were unable to cope with missing data.
Avoid asking responders to rank responses. Responders cannot be reasonably expected to rank more than about six things at a time, and many of them misinterpret directions or make mistakes in responding. To help alleviate this latter problem, ranking questions may be framed as follows: Following are three colors for office walls: 1) Beige 2) Ivory 3) Light green
Tips for Instrument Development
Which color do you like best? _____Which color do you like second best? _____Which color do you like least? _____
By carefully evaluating the needs of every question used in an instrument and carefully wording the responses, you will collect information which will yield more satisfactory and meaningful results.
4/23/2012
7
Why Rasch!
Rasch• Intro
Rasch Key Concept
• Intro• Key Concepts
Measurement
• Measurement is the process of constructinglines and locating individuals on lines
(Wright & Stones, 1979)
• Measurement is the location of objects along asingle dimension on the basis of observationswhich add together
(Bond & Fox, 2007)
Types of Scales
Nominal Ordinal
Interval Ratio
Rasch Measurement Model
• Rasch model can be applied to measure latenttraits (e.g., ability or attitude) in variousdisciplines
• Rasch model estimates of ability / attitude /difficulty become data for statistical analysis
Rasch Measurement Model
• Latent traits are usually assessed troughthe responses of a sample of persons to aset of itemsset of items• two response categories• more than two response categories
4/23/2012
8
Rasch Measurement Model
• The Rasch Model belongs to the itemresponse theory (IRT) models
• probability of an individual's response to an item• probability of a correct/keyed response to an item is a
mathematical function of person parameter (ability) and itemparameter (difficulty)
IRT Models
Rasch Measurement Model
• Rasch gives the maximum likelihoodestimate (MLE) of an event outcome
• Rasch read the pattern of an event thuspredictive in nature which ability resolvesthe problem of missing data
Rasch Measurement Model
• The relationship between the probability ofsuccess to an item and the latent trait isdescribed by a function called itemcharacteristic curve (ICC) that takes anS-shape
Rasch Measurement Model
• ICC show pictorially the fit of the data to the model
Rasch Measurement Model
Item characteristics curve showing the relationship between the location on the latent trait and the probability of answering the item correctly.
Rasch Measurement Model
• The psychometric Rasch modelconceptualizes the measurement scalelike a ruler
• Items are located along the measurement• Items are located along the measurementscale according to their difficulty
Ability More difficult itemsLess difficult items - +
4/23/2012
9
Rasch Measurement Model
• Person can also be located onthe same measurement scale
• They are located according totheir ability
More able
C
their ability• The less difficult items can be
successfully achieved by themore able subjects
Less able
B
A
Rasch Measurement Model
• A person having a greater ability than anotherperson should have the greater probability ofsolving any item of the type in question andsimilarly one item being more difficult thansimilarly one item being more difficult thananother means that for any person theprobability of solving the second item is thegreater one
(Rasch, 1960, p117)
Rasch Measurement Model
• A turn of event is seen as a chance; alikelihood of happenings hence a ratiodata (Steven, 1946)
1090
10-2
-2
3070
6040
5050
991
199
100 102
0 2-1 1
exp
logit
SCALE with a unit termed logit’
Rasch Measurement Model
• Interval scales have known and equalintervals between two graduations– numbers tell us how much more of the
attribute of interest is present– scales are linear and quantitative
1090
10-2
-2
3070
6040
5050
991
199
100 102
0 2-1 1
exp
logit
Rasch Measurement Model
• The probability of endorsing any responsecategory to an item solely depends on theg y y pperson ability and the item difficulty– This requirement called unidimensionality
Rasch Model Key Question
• When a person with this ability (number oftest items correct) encounters an item ofthis difficulty (number of person whothis difficulty (number of person whosucceeded on the item), what is thelikelihood that this person gets this itemcorrect?
4/23/2012
10
Rasch Model Key Answer
• The probability of success depends on thedifferences between the ability of theperson and the difficulty of the item
Rasch Measurement Model Theorem
• Theorem 1– Persons who are more able have a greater
likelihood of correctly answer all the items(dichotomous response)( p )
– Persons who are more developed (higheragreeability level) have a greater likelihood ofendorsing all the items (polytomous response)
Rasch Measurement Model Theorem
• Theorem 2– Easier items are more likely to be answered
correctly by all persons (dichotomousresponse)p )
– Easier task are more likely to be endorsed byall persons (polytomous response)
Rasch Measurement Model
• In estimating the probability of success,the prediction is expressed in term ofchances / odds / probabilities.
• The data matrix (eg: result of theory driven• The data matrix (eg: result of theory drivenqualitative observation) can be arrangedso that the items are ordered from least tomost difficult and the person are orderedfrom least to more able.
Matrix of Abilities vs Difficulties
AbilityDifficulty
0/100 10/90 30/70 50/50 70/30 90/10 100/0
0/100
10/90
30/70
50/50
70/30
90/10
100/0
Rasch Measurement Model
• This organized data table is termed aScalogram (Guttman, 1944). The higher up thetable one goes, the more able the person
• The further right across the table one goes• The further right across the table one goes,the more difficult the item
• Calculate item difficulties and person abilities(n/N, item or persons raw score divided bytotal possible score)
4/23/2012
11
Scalogram
EASY ITEMS/TO ENDORSE DIFFICULT ITEMS/TO ENDORSE
11111011111111111 11111111111111111 11111111111100 = 48
11111111111111111 11111111111111111 11111001001000 = 43
11111111111111111 11111111111100100 01100010000000 = 33
MORE ABLE
11111111111111111 11111111111100100 01100010000000 = 33
11111111111111111 11111111111010100 00110100000000 = 33
11110111111111111 10111111101001100 00110100101100 = 33
11111111111111111 11110111001000100 01000000000000 = 27
11111111111111101 11011011001000100 00000000001000 = 25LESS ABLE
Scalogram
• Pattern of success or failure can be seenin the data matrix
• Person who scores in old fashion(successful in easy question and(successful in easy question andunsuccessful for difficult question –response pattern 111111000000) is said togood to be true
Scalogram
• Person who scores well on difficult itemsdespite low overall score– might have guess or cheat on the item
• Person who scores poor on easy items despite• Person who scores poor on easy items despitehigh overall score– might indicate lack of concentration or guessingthe item
Scalogram
• If the person’s response pattern isunpredictable or so erratic (110101100) it isdifficult to interpret the success or failure(person ability).
• It is impossible to predict likelihood ofstudent’s performance (well / poorly) on thewhole test just by looking at item that is soerratic.– Investigate further the item.
Key Rasch Measurement Concepts
• Quantity• Estimates of item and person location
• PrecisionSt d d E (SE) f M t• Standard Error (SE) of Measurement
• Quality• Fit Statistics
The Basic Rasch Questions
• What are the distance between thelocation?• Estimates of item and person locations
How precise are those location?• How precise are those location?• SE of measurement
• Are those locations all equally valuable?• Fit Statistics
4/23/2012
12
Estimate: Person / Item Location
More DifficultMore Able
Location of a person
Less Difficult / EasyLess Able
Estimate: Person / Item Location
• Rasch analysis were performed by setting themean of person as starting point (0 logits) forthe calibration
Estimate: Person / Item Location
• In applying the Rasch model, item locationsare often scaled first
• The location of an item on a scale correspondswith the person location at which there is awith the person location at which there is a0.5 probability of a correct response to thequestion
Estimate: Person / Item Location
• The probability of a person respondingcorrectly / endorsing to a question withdifficulty lower than that person's location isgreater than 0 5greater than 0.5
• The probability of responding correctly /endorsing to a question with difficulty greaterthan the person's location is less than 0.5.
Separation Statistics
• Item located by number of person getting aspecific item correct / endorsing a specificitem
• Person are located by number of items they• Person are located by number of items theyare able to answer correctly / endorse
• It is necessary to locate persons and itemsalong the variable line with sufficient precisionto "see“ between them
Separation Statistics
• The item and person separation statistics inRasch measurement provide an analytical toolby which to evaluate the successfuldevelopment of a variable and with which todevelopment of a variable and with which tomonitor its continuing utility
4/23/2012
13
Separation Statistics
• Person separation indicates how efficiently aset of items is able to separate those personsmeasured:– Separation that is too wide usually signifies gapsp y g g pamong person abilities
• This leads to imprecise measurement
– Separation that is too narrow signifies that notenough differentiation among person abilities todistinguish between them
Separation Statistics
• Item separation indicates how well a sampleof people is able to separate those items usedin the test:– Separation that is too wide usually signifies gaps– Separation that is too wide usually signifies gapsamong item difficulties
• This leads to imprecise measurement
– Separation that is too narrow signifies redundancyfor test items
Separation Statistics
• Separation statistics are expressed asreliabilities– range from 0.0 to 1.0
• Higher the value the better the separation• Higher the value the better the separationthat exists and the more precise themeasurement
SE of Measurement
• Standard Error– Method of measurement or estimation of the
standard deviation of the sampling distributionassociated with the estimation method
– Refer to an estimate of that standard deviation,derived from a particular sample used to computethe estimate
SE of Measurement
• The sample mean is the usual estimator of apopulation mean
• However, different samples drawn from thatsame population would in general havesame population would in general havedifferent values of the sample mean
SE of Measurement
• The standard error of the mean– Is standard deviation of the sample mean estimateof a population mean
– Estimated by the sample estimate of theEstimated by the sample estimate of thepopulation standard deviation (sample standarddeviation) divided by the square root of thesample size from population
4/23/2012
14
SE of Measurement
• The standard error of the mean can refer to anestimate of that standard deviation computedestimate of that standard deviation, computedfrom the sample of data being analyzed at thetime
SE of Measurement
• In practical applications, the true value of thestandard deviation (of the error) is usuallyunknown
• As a result the term standard error is often• As a result, the term standard error is oftenused to refer to an estimate of this unknownquantity– the standard error is only an estimate
SE of Measurement
• In other cases, the standard error may usefullybe used to provide an indication of the size ofthe uncertainty
• But standard error use to provide confidence• But standard error use to provide confidenceintervals or tests should be avoided unless thesample size is at least moderately large– Here "large enough" would depend on theparticular quantities being analyzed
SE of Measurement
• t‐distribution is used to provide a confidenceinterval for an estimated mean or differenceof means
Standard Deviation & Confidence Intervals
SE of Measurement
• For a value that is sampled with anunbiased normally distributed error, theproportion of samples would fall between0 1 2 and 3 standard deviations above0, 1, 2, and 3 standard deviations aboveand below the actual value
4/23/2012
15
SE of Measurement
• The standard error of a measure captures itsprecision in a particular context
• The accuracy of a measure is captured by fitstatisticsstatistics
• A measure may be accurate, but imprecise
SE of Measurement
• Raw scores are almost always reportedwithout their standard errors
• The highest possible precision for anymeasure is that obtained when every othermeasure is that obtained when every othermeasure is known, and the data fit the Raschmodel
SE of Measurement
• This standard error is called the "model"standard error and is reported by mostproduction‐oriented Rasch software
• For well constructed tests with clean data (as• For well‐constructed tests with clean data (asconfirmed by the fit statistics), the modelstandard error is usefully close to, but slightlysmaller than the actual standard error
SE of Measurement
• The stability of an item calibration is itsmodelled standard error
• A two‐tailed 99% confidence interval is ±2.6S E wideS.E. wide
• A two‐tailed 95% confidence interval is ±1.96S.E. wide
• A two‐tailed 68% confidence interval is ±1.00S.E. wide
SE of Measurement
• Suppose we want to be 99% confident thatthe "true" item difficulty is within 1 logit of itsreported estimate– The sample size needed to have 99% confidence– The sample size needed to have 99% confidencethat no item calibration is more than 1 logit awayfrom its stable value
SE of Measurement
• Then the estimate needs to have a standarderror of 1.0 logits divided by 2.6 or less = 1/2.6= 0.385 logits or less
• The stability to within ± 3 logits is the best• The stability to within ±.3 logits is the bestthat can be expected for most variables ifsample size needed to have 99% confidence
4/23/2012
16
Sample Size
Item Calibrationsstable within
Confidence Minimum sample size range
(best to poor targeting)
Size for mostpurposes
± 1 logit 95% 16 ‐‐ 36 30
± 1 logit 99% 27 ‐‐ 61 50
± ½ logit 95% 64 ‐‐ 144 100
± ½ logit 99% 108 ‐‐ 243 150
Definitive orHigh Stakes
99%+ (Items) 250 ‐‐ 20*test length
250
John Michael Linacre
Sample Size
• A sample of 50 well‐targeted respondent isconservative for obtaining useful, stableestimatesestimates
• 30 respondent is enough for well‐designedpilot studies
Fit Statistics
• To aid in measurement quality control• to identify those parts of the data which
t R h d l ifi ti dmeet Rasch model specifications andthose parts which don't
Fit Statistics
Item fit the
model
Item does not fit the
model
Fit Statistics
• Parts that don't are not automaticallyrejected, but are examined to identify inwhat way, and why, they fall short, andwhether on balance they contribute to orwhether, on balance, they contribute to orcorrupt measurement
• Then the decision is made to accept,reject or modify the data
Fit Statistics
• Modification includes simple actions suchas correcting obvious data entry errorsand respondent mistakes and moreand respondent mistakes, and moresophisticated actions such as collapsingrating scale categories.
4/23/2012
17
Developing Instrument
Developing Instruments
• Steps• Hands‐on
Developing Instruments
• To find out about the characteristics (latent traits) of people• Latent Traits: A characteristic or attribute ofa person that can be inferred from thea person that can be inferred from theobservation of the person’s behaviours.
• Measurement here is to measure the latent traits of people
Developing a Questionnaire
Indentify research that studies same
construct
Define the construct
Define the target population
Review related measuresDevelop a draftEvaluate draft
Revise the testCollect data on reliability and
validity
(Gall, Gall & Borg, 2003)
Example 1
• RESEARCH: Student’s Satisfaction in DistanceLearning Course
• What need to be measured?St d t S ti f ti• Student Satisfaction– What are the constructs and their variables?
DIMENSION=CONSTRUCT=CHAPTERDIMENSION=CONSTRUCT=CHAPTERITEM=VARIABLES=SUBCHAPTERITEM=VARIABLES=SUBCHAPTER
Student’s Satisfaction in DL Course
• Construct 1: Instructor Performance– Variables: availability, knowledge of subject, fairtreatment, respects students, encourage questions,present information clearly
• Construct 2: Student-Instructor Interaction– Variables: encourage students to actively involve,
provide feedback on work, provide progressperiodically
• Construct 3: Course Evaluation– Variables: course material relevant, assignment
relevant, workload appropriate with hours ofcredit
Example 2
• RESEARCH: Student’s learning for thesubject of Introduction to InteractiveMultimedia
• What need to be measured?• What need to be measured?• Level of Students Learning Ability
– What are the constructs and their variables?
4/23/2012
18
Students Learning
• Construct: Learning OutcomeVariables: Knowledge Understanding– Variables: Knowledge, Understanding,Application, Analysis, Evaluation, Synthesis
Dicho vs Poly
Rating Scale Model•Dichotomous •Polytomous
Dichotomous Data
• Rasch Principles:– Person Ability (N correct)– Item Facility (difficulty – N correct)Proportion: n correct/N possible– Proportion: n correct/N possible
– Odds of success/failure• Natural logarithm of odds – logits (log odds units)
Dichotomous DataData Matrix Showing the Odds
ItemPerson Q11 Q1 Q8 Q2 Q7 Q10 Q3 Q5 Q9 Q4 Q6 Ability n/1‐nR10 1 1 0 1 1 1 1 1 0 1 1 9 82/18R03 1 1 1 1 1 1 0 0 1 1 0 8 73/27R05 1 0 1 1 1 1 0 1 1 0 0 7 64/36R12 1 0 1 1 1 1 0 1 1 0 0 7 64/36R09 1 1 1 1 1 0 1 0 0 0 0 6 55/45R06 1 1 1 1 1 0 1 0 0 0 0 6 55/45/R11 1 1 1 0 0 1 0 1 0 0 0 5 45/55R01 1 1 1 1 1 0 0 0 0 0 0 5 45/55R07 1 1 1 0 0 1 0 1 0 0 0 5 45/55R04 1 1 1 0 0 0 1 0 0 0 0 4 36/64R02 1 1 0 0 0 0 1 0 0 0 0 3 27/73R08 0 1 1 0 0 0 0 0 0 0 0 2 18/82
Facility 11 10 10 7 7 6 5 5 3 2 1n/1‐n 93/ 83/ 83/ 58/ 58/ 50/ 42/ 42/ 25/ 17/ 8/
7 17 17 42 42 50 58 58 75 83 92
Procedure 1: Estimate Location
• Iterative process between item and personvalues:1. How difficult are these item?2 How able are these person?2. How able are these person?
• Iterated until acceptable variation in locationis reached
Procedure II: Fit Statistics
• Item difficulties and person abilities areentered into a matrix
• The Rasch modeled table of expectedprobabilities for each cell is calculatedprobabilities for each cell is calculated
4/23/2012
19
Procedure II: Fit Statistics
• Measurement is possible with only onevariable at a time
• Construct validity is the key concept.Th ti l t th t th it i– Theoretical argument that the items in aninstrument measures what it claims to measure
Procedure II: Fit Statistics
• Fit Statistics (misfit statistics) help in control ofthe measurement construct
• Fit Statistics are based on residualsDiff b t t l d t d t– Difference between actual and expected outcome
• Fit Statistic are used to control the quality of measures
Procedure II: Fit Statistics
• Aim is to detect the differences between:– EXPECTED: The strict measurement ofrequirements of the Rasch Model (Theory)
– ACTUAL: The data collected when the real itemsACTUAL: The data collected when the real itemsare used with real people (Practice)
Procedure II: Fit Statistics
1. Collect Observed Scores (Xni)– Must always be whole numbers (0,1)
2. Calculate the Rasch Expected ResponseProbabilities (E ) Based on Item and PersonProbabilities (Eni) Based on Item and PersonEstimates (eg: 0.80)– E=Expected response probability when any person with
the ability n respond to an item with difficulty i
Procedure II: Fit Statistics
3. Calculate Response Residual Yni = Xni‐Eni– Y = the response residual that remains in the cell
for person n x item i when the expected responseprobability Xni is subtracted from the actualp y niresponses Eni
Procedure II: Fit Statistics
• Residuals are squared (to remove negativevalues) and summed to yield:– Mean Squares of residual for every item and
personperson• Low mean squares are too predictable to believe• High mean squares are too unpredictable to yield
measures
• Residuals often standardized (t or z statistic)
4/23/2012
20
Procedure II: Fit Statistics
• In order to verify for fit and misfit items orpersons, the following criteria must besatisfied:– Point Measure Correlation: 0 32 < x < 0 8– Point Measure Correlation: 0.32 < x < 0.8– Infit / Outfit Mean Square: 0.5 < y < 1.5 OR 0.6 < y < 1.4 (for survey)
– Infit / Outfit Z standard: ‐2.0 < Z <+2.0
Dichotomous Data
• Example of the cognitive developmental test(BLOT) used to outline Rasch measurement:Tutorial 4– Analysis of BLOT focus on performance of the– Analysis of BLOT focus on performance of the
item than persons
Dichotomous Data
• The basic rasch questions:1. Quantity?2. Precision?3. Fit?
• Our focus is on:̵ Construct validity with a clear direction̵ Estimate of ability/development̵ Precision of the estimate̵ Confidence of estimate̵ Probability of success on similar item
Polytomous Data
• Nature of likert scales:– Ordered categories – lowest to highest– Response opportunities – good – neutral – bad
Odd / even number categories– Odd / even number categories– With / out mid point category (neutral)
Polytomous Data
• For rating‐scale data:– Each item have a difficulty estimate– Scale has a series of thresholds
• Item thresholds estimates the location where a person• Item thresholds estimates the location where a personwith the estimated ability has 50% probability success/failure on an item at same location
• 5 category have 4 thresholds
Polytomous Data
4/23/2012
21
Polytomous Data
• Run Tutorial 6 and interpret output– Quantity? Precision? Fit?– Maps / Tables
Variable Map– Variable Map– Fit Map– Item Characteristic Curve (ICC)– Person / Item Tables
TERIMA KASIH
SYUKRAN JAZILAN
THANK YOU
SYUKRIYA
ARIGATO GOZAIMAS
OOKINI
XIE XIE
KHOP KUN
MERCI
DANKE SCHON
DOR JE
For More, Please Visit:http://www.rasch.org
R h U SAS EFHARISTO
NANDI
DANK U
TODA
HVALA
DAGHANG SALAMAT
GRAZIE
MATUR NUVUN
KOMAPSUMNIDA
MAHALO
STA NA SHUKRIA
• Reach Us:• Mohd Nor @ 019 281 9003
• Prasanna @ • Nurul Hidayah @
SHARING TIME…
How to Apply Simple Rasch in Educational Assessment
Educational Use
• To accurately measure student’s ability(knowledge, skill and attitude)
• To verify the reliability of test question setif h li bili f d ’• To verify the reliability of student’s answer
• To confirm the cognitive levels of questionaccording to educational taxonomy
• To quantify student’s ability in percentile
DISTRIBUTE QUESTION, OFFLINE OR ONLINE::Student answers via question form, offline or online::
(responses of min 15 students are recorded for at least of 95% confidency level)
exp(*) / (exp(*) / (11+exp+exp(*))(*))
Item Measure increasing
Person Measuredecreasing
Probability Sc
COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)
TEMPLATE 2
1. Logit score - Item difficulty score (*)i.e EA measure score is 1.3 – Item difficulty score is 0.3 = 1.0
2. Using above formula, the answer will be 0.7315
3. Therefore, final marks (%) for student’s EA7373%%
equivalent toB+ OR 3.0 GPA
Probability Sc%
4/23/2012
22
SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS
(Reliability of Answer’s acc Model)
SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS
(Reliability of Answer’s acc Model)
rITEM=0.92rPERSON=0.72
(OV(OV--EV) / OVEV) / OVOV – OBSERVED VARIANCE
EV – ERROR VARIANCE
SAVE FILE IN .prn FORMATDRAG DATA FILE (.prn) ONTO BOND & FOX STEPS ICON; SAVE AND EXIT TO DATA ANALYSIS
(Reliability of Answer’s acc Model)
CLICK PERSON MEASURE(EA Person Measure)
CLICK PERSON MEASURE(EA Person Measure)
COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)
TEMPLATE 2
4/23/2012
23
COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2(EA Score)
top related