8/17/2019 Testing 07.ppt
1/27
Item Response Theory
8/17/2019 Testing 07.ppt
2/27
Shortcomings of Classical True
Score Model• Sample dependence
• Limitation to the specific test situation.
• Dependence on the parallel forms
• Same error variance for all
8/17/2019 Testing 07.ppt
3/27
Sample Dependence• The first shortcoming of CTS is that the values of
commonly used item statistics in test development such as item difficulty and item discrimination
depend on the particular examinee samples inwhich they are obtained . The average level ofability and the range of ability scores in aneaminee sample influence! often substantially! thevalues of the item statistics.
• Difficulty level changes "ith the level of sample#sability and discrimination inde is different
bet"een heterogeneous sample and thehomogeneous sample.
8/17/2019 Testing 07.ppt
4/27
Limitation to the Specific Test
Situation• The tas$ of comparing eaminees "ho have
ta$en samples of test items of differing
difficulty cannot easily be handled "ithstandard testing models and procedures.
8/17/2019 Testing 07.ppt
5/27
Dependence on the %arallel
&orms• The fundamental concept! test reliability! is
defined in terms of parallel forms.
8/17/2019 Testing 07.ppt
6/27
Same 'rror (ariance &or )ll
• CTS presumes that the variance of errors of
measurement is the same for all eaminees.
8/17/2019 Testing 07.ppt
7/27
Item Response Theory
• The purpose of any test theory is to describe ho"
inferences from eaminee item responses and*or
test scores can be made about unobservableeaminee characteristics or traits that are
measured by a test.
• )n individual#s epected performance on a
particular test +uestion! or item! is a function of both the level of difficulty of the item and the
individual#s level of ability.
8/17/2019 Testing 07.ppt
8/27
Item Response Theory• 'aminee performance on a test can be predicted
,or eplained- by defining eamineecharacteristics! referred to as traits! or abilitiesestimating scores for eaminees on these traits,called /ability scores/- and using the scores to
predict or eplain item and test performance.Since traits are not directly measurable! they arereferred to as latent traits or abilities. )n item
response model specifies a relationship bet"eenthe observable eaminee test performance and theunobservable traits or abilities assumed to underlie
performance on the test.
8/17/2019 Testing 07.ppt
9/27
)ssumptions of IRT
• 0nidimensionality
• Local independence
8/17/2019 Testing 07.ppt
10/27
0nidimensionality )ssumption
• It is possible to estimate an eaminee1s ability onthe same ability scale from any subset of items inthe domain of items that have been fitted to the
model. The domain of items needs to behomogeneous in the sense of measuring a singleability2 If the domain of items is too heterogenous!the ability estimates "ill have little meaning.
• Most of the IRT models that are currently beingapplied ma$e the specific assumption that theitems in a test measure a single! or unidimensionalability or trait! and that the items form a
unidimensional scale of measurement.
8/17/2019 Testing 07.ppt
11/27
Local Independence
• This assumption states that an eaminee1s
responses to different items in a test are
statistically independent. &or thisassumption to be true! an eaminee1s
performance on one item must not affect!
either for better or for "orse! his or herresponses on any other items in the test.
8/17/2019 Testing 07.ppt
12/27
Item Characteristic Curves
• Specific assumptions about the relationship
bet"een the test ta$er1s ability and his
performance on a given item are eplicitlystated in the mathematical formula! or item
characteristic curve ,ICC-.
8/17/2019 Testing 07.ppt
13/27
Item Characteristic Curves
• The form of the ICC is determined by the
particular mathematical model on "hich it is
based. The types of information about itemcharacteristics may include2
• ,3- the degree to "hich the item
discriminates among individuals of differinglevels of ability ,the 1discrimination1
parameter a-
8/17/2019 Testing 07.ppt
14/27
Item Characteristic Curves
• ,4- the level of difficulty of the item ,the1difficulty1 parameter b-! and
• ,5- the probability that an individual of lo"ability can ans"er the item correctly ,the1pseudo6chance1 or 1guessing1 parameter c-.
• 7ne of the ma8or considerations in theapplication of IRT models! therefore! is theestimation of these item parameters.
8/17/2019 Testing 07.ppt
15/27
ICC• pseudo6chance parameter
c2 p9:.4: for t"o items
• difficulty parameter b2half"ay bet"een the
pseudo6chance parameterand one
• discrimination parametera2 proportional to the slop
of the ICC at the point ofthe difficulty parameterThe steeper the slope! thegreater the discrimination
parameter.)bility Scale
%
r ob a b i l i t y
8/17/2019 Testing 07.ppt
16/27
)bility Score• 3. The test developer collects a set of observed
item responses from a relatively large number oftest ta$ers.
• 4. )fter an initial eamination of ho" "ellvarious models fit the data! an IRT model isselected.
• 5. Through an iterative procedure! parameter
estimates are assigned to items and ability scoresto individuals! so as to maimi;e the agreement! orfit bet"een the particular IRT model and the testdata.
8/17/2019 Testing 07.ppt
17/27
)bility Score
8/17/2019 Testing 07.ppt
18/27
Item Information &unction
• The limitations on CTS theory approaches to precision of measurement are addressed in the IRTconcept of information function. The item
information function refers to the amount ofinformation a given item provides for estimatingan individual1s level of ability! and is a function of
both the slope of the ICC and the amount of
variation at each ability level.• The information function of a given item "ill be atits maimum for individuals "hose ability is at ornear the value of the difficulty parameter.
8/17/2019 Testing 07.ppt
19/27
Item Information &unction
8/17/2019 Testing 07.ppt
20/27
Item Information &unction
8/17/2019 Testing 07.ppt
21/27
Item Information &unction
• The information function of a given item "ill be atits maimum for individuals "hose ability is at ornear the value of the difficulty parameter.
• ,3- provides the most information aboutdifferences in ability at the lo"er end of the abilityscale.
• ,4- provides relatively little information at any
point on the ability scale.• ,5- provides the most information about
differences in ability at the high end of the abilityscale.
8/17/2019 Testing 07.ppt
22/27
Test Information &unction
• The test information function ,TI&- is the sum of
the item information functions! each of "hich
contributes independently to the total! and is ameasure of ho" much information a test provides
at different ability levels.
• The TI& is the IRT analog of CTS theory
reliability and the standard error of measurement.
8/17/2019 Testing 07.ppt
23/27
Item
8/17/2019 Testing 07.ppt
24/27
Specifications in CTS Item
8/17/2019 Testing 07.ppt
25/27
&orm of Items• Dichotomous
Listening comprehension
Statement = +uestion = choices
Short conversation =+uestion = choices
Long conversation * passage = some +uestions = choices
Reading comprehension
%assage = some +uestions = choices
%assage = T*& +uestions
Syntactic $no"ledge * vocabulary
>uestion stem "ith blan$*underlined parts = choices
Clo;e
%assage = choices
8/17/2019 Testing 07.ppt
26/27
&orm of Items
• ?ondichotomous
Listening comprehension
Dictation
Dictation passage "ith blan$s to be filled
8/17/2019 Testing 07.ppt
27/27
Describing data
• )bility measured
• Difficulty inde
• Discrimination
• Storage code