7/29/2019 20121020101015Item_Analysis
1/27
Item Analysis
7/29/2019 20121020101015Item_Analysis
2/27
Purpose of Item Analysis
Evaluates the quality of each item
Rationale: the quality of items determines the
quality of test (i.e., reliability & validity)
May suggest ways of improving the
measurement of a test
Can help with understanding why certain
tests predict some criteria but not others
7/29/2019 20121020101015Item_Analysis
3/27
Item Analysis
When analyzing the test items, we have several
questions about the performance of each item. Someof these questions include:
Are the items congruent with the test objectives?
Are the items valid? Do they measure what they're
supposed to measure?
Are the items reliable? Do they measure consistently?
How long does it take an examinee to complete each
item?
What items are most difficult to answer correctly?
What items are easy?
Are there any poor performing items that need to be
discarded?
7/29/2019 20121020101015Item_Analysis
4/27
Types of Item Analyses for CTT
Three major types:
1. Assess quality of the distractors
2. Assess difficulty of the items
3. Assess how well an itemdifferentiates between high and lowperformers
7/29/2019 20121020101015Item_Analysis
5/27
A. Multiple-Choke
B. Multiply-Choice
C. Multiple-Choice
D. Multi-Choice
DISTRACTOR ANALYSIS
7/29/2019 20121020101015Item_Analysis
6/27
Distractor Analysis
First question of item analysis: How many
people choose each response?
If there is only one best response, then all
other response options are distractors.
Example from in-class assignment (N = 35):
Which method has the best internal consistency? #
a) projective test 1
b) peer ratings 1
c) forced choice 21
d) differences n.s. 12
7/29/2019 20121020101015Item_Analysis
7/27
Distractor Analysis (contd)A perfect test item would have 2 characteristics:
1. Everyone who knows the item gets it right2. People who do not know the item will have
responses equally distributed across the wrong answers.
It is not desirable to have one of the distractors chosenmore often than the correct answer.
This result indicates a potential problem with the
question. This distractor may be too similar to the correctanswer and/or there may be something in either the stem
or the alternatives that is misleading.
7/29/2019 20121020101015Item_Analysis
8/27
Distractor Analysis (contd)
Calculate the # of people expected to choose each of thedistractors. If random same expected number for each
wrong response (Figure 10-1).
N answering incorrectly 14
Number of distractors 3
# of PersonsExp. To Choose
Distractor
= = 4.7
7/29/2019 20121020101015Item_Analysis
9/27
Distractor Analysis (contd)
When the number of persons choosing a distractor
significantly exceeds the number expected, there are 2possibilities:
1. It is possible that the choice reflects partial knowledge
2. The item is a poorly worded trick question
unpopular distractors may lower item and test difficulty
because it is easily eliminated
extremely popular is likely to lower the reliability and
validity of the test
7/29/2019 20121020101015Item_Analysis
10/27
Item Difficulty Analysis
Description and How to Compute
ex: a) (6 X 3) + 4 = ?b) 9[1n(-3.68) X (1 1n(+3.68))] = ?
It is often difficult to explain or define difficulty interms of some intrinsic characteristic of the item
The only common thread of difficult items is thatindividuals did not know the answer
7/29/2019 20121020101015Item_Analysis
11/27
Item Difficulty
Percentage of test takers who respond correctly
What ifp= .00
What ifp= 1.00?
7/29/2019 20121020101015Item_Analysis
12/27
Item Difficulty
An item with ap value of .0 or 1.0 does not
contribute to measuring individual differences andthus is certain to be useless
When comparing 2 test scores, we are interested in
who had the higher score or the differences inscores
p value of .5 have most variation so seek items in
this range and remove those with extreme values
can also be examined to determine proportion
answering in a particular way for items that dont
have a correct answer
7/29/2019 20121020101015Item_Analysis
13/27
Item Difficulty (cont.)
What is the best p-value?
most optimal p-value = .50
maximum discrimination between good
and poor performers
Should we only choose items of .50?
When shouldnt we?
7/29/2019 20121020101015Item_Analysis
14/27
Should we only choose items of .50?
Not necessarily ...
When wanting to screen the very top group of
applicants (i.e., admission to university or medical
school).
Cutoffs may be much higher
Other institutions want a minimum level (i.e., minimumreading level)
Cutoffs may be much lower
7/29/2019 20121020101015Item_Analysis
15/27
Item Difficulty (cont.)
Interpreting the p-value...
example:
100 people take a test15 got question 1 right
What is the p-value?Is this an easy or hard item?
7/29/2019 20121020101015Item_Analysis
16/27
Item Difficulty (cont.)
Interpreting the p-value...
example:
100 people take a test
70 got question 1 right
What is the p-value?
Is this an easy or hard item?
7/29/2019 20121020101015Item_Analysis
17/27
Item Difficulty (contd)
General Rules of Item Difficulty
p low (< .20) difficult test item
p moderate (.20 - .80) moderately diff.
p high (> .80) easy item
7/29/2019 20121020101015Item_Analysis
18/27
ITEM DISCRIMINATION
... The extent to which an item
differentiates people on thebehavior that the test is designed
to assess.
the computed difference between
the percentage of high achievers
and the percentage oflow achievers who got the item
right.
7/29/2019 20121020101015Item_Analysis
19/27
Item Discrimination (cont.)
compares the performance of upper
group (with high test scores) and lower
group (low test scores) on each item--%of test takers in each group who were
correct
7/29/2019 20121020101015Item_Analysis
20/27
Item Discrimination (contd):
Discrimination Index (D)
Divide sample into TOP half and
BOTTOM half (or TOP and BOTTOM
third)
Compute Discrimination Index (D)
7/29/2019 20121020101015Item_Analysis
21/27
Item Discrimination
D = U - L
U = # in the upper group correct response
Total # in upper group
L = # in the lower group correct response
Total # in lower group
The higher the value of D, the more adequately
the item discriminates (The highest value is 1.0)
7/29/2019 20121020101015Item_Analysis
22/27
Item Discrimination
seek items with high positive numbers (thosewho do well on the test tend to get the item
correct)
negative numbers (lower scorers on test more
likely to get item correct) and low positive
numbers (about the same proportion of low andhigh scorers get the item correct) dont
discriminate well and are discarded
It Di i i ti ( td)
7/29/2019 20121020101015Item_Analysis
23/27
Item Discrimination (contd):
Item-Total CorrelationCorrelation between each item (a correct responseusually receives a score of 1 and an incorrect a score
of zero) and the total test score.
To which degree do item and test measures the samething?
Positive -item discriminates between high and low
scores
Near0 - item does not discriminate between high & low
Negative - scores on item and scores on test disagree
It Di i i ti ( td)
7/29/2019 20121020101015Item_Analysis
24/27
Item Discrimination (contd):
Item-Total Correlation
Item-total correlations are directly
related to reliability.
Why?Because the more each item correlates
with the test as a whole, the higher all
items correlate with each other
( = higher alpha, internal consistency)
7/29/2019 20121020101015Item_Analysis
25/27
Quantitative Item Analysis
Inter-item correlation matrix displays thecorrelation of each item with every other
item
provides important information forincreasing the tests internal consistency
each item should be highly correlated
with every other item measuring the sameconstruct and not correlated with items
measuring a different construct
7/29/2019 20121020101015Item_Analysis
26/27
Quantitative Item Analysis
items that are not highly correlated with
other items measuring the same
construct can and should be droppedto
increase internal consistency
It Di i i ti ( td)
7/29/2019 20121020101015Item_Analysis
27/27
Item Discrimination (contd):
Interitem Correlation
Possible causes for low inter-item correlation:
a. Item badly written (revise)
b. Item measures other attribute than rest ofthe test (discard)
c. Item correlated with some items, but not
with others: test measures 2 distinct
attributes (subtests or subscales)