Bridging the gap from LogR to IRT Indebted to: Wu, A. D., & Zumbo, B.D. (2007). Thinking About Item Response Theory from a Logistic Regression Perspective: A Focus on Polytomous Models. In Shlomo S. Sawilowsky (Ed.), Real Data Analysis (pp. 241-269). Information Age Publishing, Inc.., Greenwich, CT..
Bridging the gap from LogR to IRT. Indebted to: Wu, A. D., & Zumbo, B.D. (2007). Thinking About Item Response Theory from a Logistic Regression Perspective: A Focus on Polytomous Models. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bridging the gap from LogR to IRT
Indebted to:Wu, A. D., & Zumbo, B.D. (2007).
Thinking About Item Response Theory from aLogistic Regression Perspective: A Focus on Polytomous Models.
In Shlomo S. Sawilowsky (Ed.), Real Data Analysis (pp. 241-269). Information Age Publishing, Inc.., Greenwich, CT..
Bridging the gap from LogR to IRT
• The explanatory variable– In IRT, the exposure is a cts latent variable– Hence IRT = generalized linear latent model
• The outcome variable(s)– Logistic regression typically models ONE outcome,
whereas IRT models a number of categorical outcomes simultaneously
Aim of IRT
• To relate a subjects’ responses to a number of test items, to an underlying ability (AKA trait) by way of a mathematical function
• Due to the non-linear relationship, a logistic curve is often used, and is referred to as the Item Characteristic Curve or Item Response Function
Estimation method: Conditional maximum likelihood (CML)Number of items: 4Number of groups: 5 (3 of them are used to compute the statistics of test)Number of individuals: 365 (0 individuals removed for missing values)Number of individuals with null or perfect score: 242Conditional log-likelihood: -131.2562 Log-likelihood: -320.5403
Difficulty Standardized Items parameters Std. Err. R1c df p-value Outfit Infit U----------------------------------------------------------------------------- woman 1.64747 0.19064 1.940 2 0.3790 -1.232 -0.422 -1.411 couple -0.19486 0.16979 2.342 2 0.3100 -0.574 -0.313 -0.838 not_marr -0.87046 0.18302 1.580 2 0.4538 -1.272 -1.467 -0.854 afford -0.58216 0.17588 3.937 2 0.1397 2.336 2.113 3.015----------------------------------------------------------------------------- R1c test R1c= 15.343 6 0.0177 Andersen LR test Z= 14.594 6 0.0237-----------------------------------------------------------------------------The mean of the difficulty parameters is fixed to 0You have groups of scores with less than 30 individuals. The tests can be
Estimation method: Marginal maximum likelihood (MML)Number of items: 4Number of groups: 5 (5 of them are used to compute the statistics of test)Number of individuals: 365 (0 individuals removed for missing values)Number of individuals with null or perfect score: 242Marginal log-likelihood: -665.8056 Log-likelihood: -281.5298
Difficulty Standardized Items parameters Std. Err. R1m df p-value Outfit Infit---------------------------------------------------------------------- woman 1.25298 0.26213 4.606 2 0.0999 -2.624 0.164 couple -0.66034 0.30265 18.408 2 0.0001 . -3.567 not_marr -1.27117 0.29512 11.668 2 0.0029 . -0.046 afford -1.02314 0.29784 26.037 2 0.0000 . -0.611---------------------------------------------------------------------- R1m test R1m= 31.056 8 0.0001---------------------------------------------------------------------- Sigma 4.12109 0.28776----------------------------------------------------------------------You have groups of scores with less than 30 individuals. The tests can be
• Raschtest – Avoids the need to derive dummy variables– Needs complete dataset, not frequency-weights– Reformats the dataset in the background so no need
to do it yourself– Can employ CML (an estimation specific to Rasch
models) which requires no integration
Polytomous IRT
Extension to polytomous IRT
• We now have a hierarchy of parameters to model
1. At the test level• A number of items models simultaneously with the
potential for parameters to vary across items
2. At the item level• Contrasts are used to model the response categories
within each item. Parameters may/may-not vary across response categories.
So when faced with a set of polytomous items
We must decide
1. The payoff from not collapsing into binary items
2. The form of contrasts needed to model over response categories within items
3. Any constraints required across these response categories
4. Any parameter constraints across items within a single test
4 commonly used polytomous IRT models
• Partial Credit model (PCM)– Masters (1982)
• Rating Scale model (RSM)– Andrich (1978a/b)
• Graded Response model (GRM)– Samejima (1969)
• Nominal Response model (NRM)– Bock (1972)
Partial Credit Model
Partial Credit Model (PCM)
• Designed for items where you can obtain a “partial credit”, e.g.
0 = solved nothing,
1 = solved part A,
2 = solved parts A and B
• i.e. those who scored a ‘2’ can also be thought of as having achieved a ‘1’
Partial Credit Model (PCM)
– Here items scored from 0 to m, however it’s possible for items to have differing numbers of categories
– βij is referred to as the “step parameter”
)](exp[
)](exp[)| itemfor levelattain (
00
0
ij
r
j
m
r
ij
x
jixP
0)(with
0
0ij
j
Partial Credit Model (PCM)
)]()exp[(]exp[1
)]()exp[()|2 (
211
21
xP
)]()exp[(]exp[1
1)|0 (
211
xP
)]()exp[(]exp[1
]exp[)|1 (
211
1
xP
E.g. for an item with three categories 0,1,2:
Same format as adjacent category ordinal model
))()exp(()exp(1
)exp()|1(
110011
11
XcXcXc
XcXuP
))()exp(()exp(1
))()exp(()|0(
110011
1100
XcXcXc
XcXcXuP
))()exp(()exp(1
1)|2(
110011 XcXcXcXuP
Hence, PCM is a DIRECT or divide-by-total model
Collapse to two category levels:
]exp[1
1)|0 (
1
xP
]exp[1
]exp[)|1 (
1
1
xP
Or for an item with two categories 0,1:
i.e. the familiar 1PL or Rasch model- the PCM is the polytomous extension of the Rasch model
Step parameters
βij “step parameters”
occur at the intersection
of adjacent ICC’s
- thresholds for transition
from one category to
the next
Disordering of steps
Suggests lack of ordinality
e.g. 1/3/2/4 instead of 1/2/3/4
Partial Credit Model (PCM)
Outcome
Level 1 C1
Level 2 C1 C2
Level 3 C2 C3
Level 4 C3
Within item constraints
Ideally want ordering of steps so that each response category is endorsed in turn as trait level increases
Across item constraints
Rasch-based hence items have equal discrimination. This can be relaxed to give Generalised PCM but this loses the desirable Rasch properties
Adjacent category model
Rating Scale Model
Rating Scale Model (RSM)
• Traditionally used for attitudes data
• Constrained form of PCM, hence is also Rasch
Rating Scale Model (RSM)
)](exp[
)](exp[)(
00
0
ij
r
j
m
r
ij
x
jijP
)](exp[
)](exp[)(
00
0
ji
r
j
m
r
ji
x
jijP
PCM RSM
Probabilities estimated directly, as with PCMNot suitable for items with differing response formats
All items share same step parameters δj
There is a location parameter βi which can vary across items
Rating Scale Model (RSM)
Outcome
Level 1 C1
Level 2 C1 C2
Level 3 C2 C3
Level 4 C3
Adjacent category model Within item constraints
Ideally want ordering of steps so that each response category endorsed in turn as trait level increases
Across item constraints
Constrained, more parsimonious version of PCM Only appropriate for instance where all items have same number of response categories + have same response options (text)
Graded Response Model
Graded Response Model (GRM)
)exp(1
)exp()|(
Xc
XcXjuP
j
j
)](exp[1
)](exp[)|(
iji
ijijuP
This is the IRT equivalent of the contrasts described earlier for the POM:
Consequently
• The Graded Response Model is a difference or indirect IRT model, in contrast to the PCM/RSM
• The βij are thresholds with interpretation akin to the binary 1PL/2PL models– The ability level for which the probability of making a response
equal to or greater than the threshold j is 50%
• Plots of P(u ≥ j |θ) are Operating Characteristic Curves
ICC’s for GHQ-01 under GRM
GRM parameters correspond to where p=0.5 crosses each OCC in turn
ICC’s for GHQ-01 under GRM
ICC’s obtained by subtraction
ICC’s for GHQ-01 under GRM
1st and last GRM parameters can also be obtained from equivalent ICC plot: where p=0.5 crosses ICC’s #1, #4
Graded Response Model (GRM)
Outcome
Level 1 C1C2
C3Level 2
C1Level 3C2
Level 4 C3
Within item constraints
Discrimination αi equal across item levels= homogeneous GRM ~ POM
Without this, the OCC’s will cross at some trait level hence the difference can be negative
Across item constraints
Discrimination αi differ across itemsβij differ across items
βij = ci + bj = Modified GRM
)](exp[1
)](exp[)(
iji
ijijuP
Location Thresholds
Nominal Response Model
Nominal Response Model (NRM)
)exp(
)exp()(
0
ijij
J
j
ijijij
c
cP
Nominal Response Model (NRM)
Within items
Across items
Outcome
Level 1 C1 C2 C3
Level 2 C1
Level 3 C2
Level 4 C3
Contrasts relative to baseline As in multinomial logistic model
Some ordinal examples
Winsteps: PCM and RSM
Ego-GHQ Odd items
Winsteps
• Specialist package for fitting IRT models of the Rasch family• John M Linacre• www.winsteps.com
• Estimation using Joint Maximum Likelihood• Impressive array of output tables and figures
The conventional representation of the Partial Credit model is
log ( Pnij / Pni(j-1) ) = θn - βij
Winsteps parameterizes βij as βi + δij where sum(δij) = 0. And βi is the
average (βij).
log ( Pnij / Pni(j-1) ) = θn - βi - δij
Algebraically these two representations are identical. Thus every item has a mean difficulty, βi. (~location)This simplifies communication, because the results of a Partial Credit
analysis now have the same form as any other polytomous analysis supported by Winsteps.
STRUCTURE CALIBRATN, the calibrated measure of the transition from the category below to this category.
This is an estimate of the Rasch-Andrich model parameter, δj. Use this for anchoring in Winsteps.
(This corresponds to δj in the βi+δj parameterization of the "Rating Scale" model, and is similarly applied as the δij of the βij=βi+δij of the "Partial Credit" model.) The bottom category has no prior transition, and so that the measure is shown as NONE.
This is comparable to the familiar RSM formulation
log ( Pnij / Pni(j-1) ) = θn - βi – δj
Recall in the RSM, items share the same step parameters.Winsteps deals with this by grouping the J items.
J groups → PCM, 1 group → RSMk (< J) groups → somewhere in between
PCM input file&INST TITLE = C:\work\courses\summer_school\GHQ_IRT\Winsteps\ego_ghq12_id_0123.dta PERSON = Person ; persons are ... ITEM = Item ; items are ... ITEM1 = 1 ; column of response to first item in data record NI = 6 ; number of items NAME1 = 8 ; column of first character of person identifying labelNAMELEN = 1 ; length of person label XWIDE = 1 ; number of columns per item response CODES = 0123 ; valid codes in data file UPMEAN = 0 ; person mean for local origin USCALE = 1 ; user scaling for logits UDECIM = 2 ; reported decimal places for user scalingISGROUPS = 000000 ; Item Scale Grouping for modeling rating scales; STATA file created or last modified: 25/08/2009 18:28:37; : 25 Aug 2009 18:28; STATA Cases processed = 1119; STATA Variables processed = 14TOTALSCORE = Yes ; Include extreme responses in reported scores; Person Label variables: columns in label: columns in line
RSM input file&INST TITLE = C:\work\courses\summer_school\GHQ_IRT\Winsteps\ego_ghq12_id_0123.dta PERSON = Person ; persons are ... ITEM = Item ; items are ... ITEM1 = 1 ; column of response to first item in data record NI = 6 ; number of items NAME1 = 8 ; column of first character of person identifying labelNAMELEN = 1 ; length of person label XWIDE = 1 ; number of columns per item response CODES = 0123 ; valid codes in data file UPMEAN = 0 ; person mean for local origin USCALE = 1 ; user scaling for logits UDECIM = 2 ; reported decimal places for user scalingISGROUPS = AAAAAA ; Item Scale Grouping for modeling rating scales; STATA file created or last modified: 25/08/2009 18:28:37; : 25 Aug 2009 18:28; STATA Cases processed = 1119; STATA Variables processed = 14TOTALSCORE = Yes ; Include extreme responses in reported scores; Person Label variables: columns in label: columns in line
> LRtest(ego_pcm_even_erm2)Warning message: Persons with median raw scores are assigned to the lower raw score group!Warning in LRtest.Rm(ego_pcm_even_erm2) : The following items were excluded due to inappropriate response patterns within subgroups: ghq08 Full and subgroup models are estimated without these items!
> LRtest(ego_rsm_even_erm2)Warning message: Persons with median raw scores are assigned to the lower raw score group!Warning in LRtest.Rm(ego_rsm_even_erm2) : The following items were excluded due to inappropriate response patterns within subgroups: ghq08 Full and subgroup models are estimated without these items!
par(mfrow = c(2, 3))plot(ego_grm_odd, type = c("ICC"))
par(mfrow = c(1, 1))plot(ego_grm_odd, type = c("IIC"))
Or individual IIF – ghost plots?
?
par(mfrow = c(2, 3))plot(ego_grm_odd, type = c("OCCu"))
par(mfrow = c(2, 3))plot(ego_grm_odd, type = c("OCCl"))
Compare IRF’s across items
[Mplus] EGO: GRM (2)
DATA: File is "C:\work\IRT experimenting\ego_ghq12_id.dta.dat" ;VARIABLE: Names are ghq01 ghq02 ghq03 ghq04 ghq05 ghq06 ghq07 ghq08 ghq09 ghq10 ghq11 ghq12 f1 id; Missing are all (-9999) ; usevariables = ghq01 ghq03 ghq05 ghq07 ghq09 ghq11; categorical = ghq01 ghq03 ghq05 ghq07 ghq09 ghq11; ANALYSIS: ESTIMATOR = MLR;
LINK = logit;
MODEL: f BY ghq01* ghq03 ghq05 ghq07 ghq09 ghq11 (1); f@1;
PLOT: TYPE = PLOT3;
All items have same loading First item’s loading is freely estimatedTrait variance fixed to one
Is equivalent to R’sego_grm_odd<-grm(ego[,c(1,3,5,7,9,11)], constrained = TRUE, IRT.param = FALSE)
Mplus: alternative IRT models
• Two similar models can be fitted in Mplus
Model A
• Estimation = MLR• Uses full information• Logit = default link function• Non-linear relationship between trait and items• Conditional probability method
Model B
• Estimation = Least squares• Uses limited information• Probit = default link function• Linear relationship between trait and items• Underlying variable (UV) method
There are 2*2*2*2*2 = 32 different models defined by these options, and most canbe fitted in some software or other.some models are equivalent, and some are good approximations for each otherWe will focus on two – because Mplus does
Comparison
• Model A– Models the categorical items as a multivariate set of ordinal
responses (here using the GRM)– Here we’ll use the (non-default) probit-link to aid comparison
• Model B– Uses the one/two-way margins present in the observed categorical
data to estimate a (polychoric) correlation matrix for a set of underlying latent continuous variables
– Uses this correlation matrix and standard CFA to estimate a trait which is linearly related to these latent continuous variables
– Throws away information, but often a good approximation + estimation is much simpler (no need for integration)