20121020101015Item_Analysis

7/29/2019 20121020101015Item_Analysis

1/27

Item Analysis

7/29/2019 20121020101015Item_Analysis

2/27

Purpose of Item Analysis

Evaluates the quality of each item

Rationale: the quality of items determines the

quality of test (i.e., reliability & validity)

May suggest ways of improving the

measurement of a test

Can help with understanding why certain

tests predict some criteria but not others

7/29/2019 20121020101015Item_Analysis

3/27

Item Analysis

When analyzing the test items, we have several

questions about the performance of each item. Someof these questions include:

Are the items congruent with the test objectives?

Are the items valid? Do they measure what they're

supposed to measure?

Are the items reliable? Do they measure consistently?

How long does it take an examinee to complete each

item?

What items are most difficult to answer correctly?

What items are easy?

Are there any poor performing items that need to be

discarded?

7/29/2019 20121020101015Item_Analysis

4/27

Types of Item Analyses for CTT

Three major types:

1. Assess quality of the distractors

2. Assess difficulty of the items

3. Assess how well an itemdifferentiates between high and lowperformers

7/29/2019 20121020101015Item_Analysis

5/27

A. Multiple-Choke

B. Multiply-Choice

C. Multiple-Choice

D. Multi-Choice

DISTRACTOR ANALYSIS

7/29/2019 20121020101015Item_Analysis

6/27

Distractor Analysis

First question of item analysis: How many

people choose each response?

If there is only one best response, then all

other response options are distractors.

Example from in-class assignment (N = 35):

Which method has the best internal consistency? #

a) projective test 1

b) peer ratings 1

c) forced choice 21

d) differences n.s. 12

7/29/2019 20121020101015Item_Analysis

7/27

Distractor Analysis (contd)A perfect test item would have 2 characteristics:

1. Everyone who knows the item gets it right2. People who do not know the item will have

responses equally distributed across the wrong answers.

It is not desirable to have one of the distractors chosenmore often than the correct answer.

This result indicates a potential problem with the

question. This distractor may be too similar to the correctanswer and/or there may be something in either the stem

or the alternatives that is misleading.

7/29/2019 20121020101015Item_Analysis

8/27

Distractor Analysis (contd)

Calculate the # of people expected to choose each of thedistractors. If random same expected number for each

wrong response (Figure 10-1).

N answering incorrectly 14

Number of distractors 3

# of PersonsExp. To Choose

Distractor

= = 4.7

7/29/2019 20121020101015Item_Analysis

9/27

Distractor Analysis (contd)

When the number of persons choosing a distractor

significantly exceeds the number expected, there are 2possibilities:

1. It is possible that the choice reflects partial knowledge

2. The item is a poorly worded trick question

unpopular distractors may lower item and test difficulty

because it is easily eliminated

extremely popular is likely to lower the reliability and

validity of the test

7/29/2019 20121020101015Item_Analysis

10/27

Item Difficulty Analysis

Description and How to Compute

ex: a) (6 X 3) + 4 = ?b) 9[1n(-3.68) X (1 1n(+3.68))] = ?

It is often difficult to explain or define difficulty interms of some intrinsic characteristic of the item

The only common thread of difficult items is thatindividuals did not know the answer

7/29/2019 20121020101015Item_Analysis

11/27

Item Difficulty

Percentage of test takers who respond correctly

What ifp= .00

What ifp= 1.00?

7/29/2019 20121020101015Item_Analysis

12/27

Item Difficulty

An item with ap value of .0 or 1.0 does not

contribute to measuring individual differences andthus is certain to be useless

When comparing 2 test scores, we are interested in

who had the higher score or the differences inscores

p value of .5 have most variation so seek items in

this range and remove those with extreme values

can also be examined to determine proportion

answering in a particular way for items that dont

have a correct answer

7/29/2019 20121020101015Item_Analysis

13/27

Item Difficulty (cont.)

What is the best p-value?

most optimal p-value = .50

maximum discrimination between good

and poor performers

Should we only choose items of .50?

When shouldnt we?

7/29/2019 20121020101015Item_Analysis

14/27

Should we only choose items of .50?

Not necessarily ...

When wanting to screen the very top group of

applicants (i.e., admission to university or medical

school).

Cutoffs may be much higher

Other institutions want a minimum level (i.e., minimumreading level)

Cutoffs may be much lower

7/29/2019 20121020101015Item_Analysis

15/27


Interpreting the p-value...

example:

100 people take a test15 got question 1 right

What is the p-value?Is this an easy or hard item?

7/29/2019 20121020101015Item_Analysis

16/27


Interpreting the p-value...

example:

100 people take a test

70 got question 1 right

What is the p-value?

Is this an easy or hard item?

7/29/2019 20121020101015Item_Analysis

17/27

Item Difficulty (contd)

General Rules of Item Difficulty

p low (< .20) difficult test item

p moderate (.20 - .80) moderately diff.

p high (> .80) easy item

7/29/2019 20121020101015Item_Analysis

18/27

ITEM DISCRIMINATION

... The extent to which an item

differentiates people on thebehavior that the test is designed

to assess.

the computed difference between

the percentage of high achievers

and the percentage oflow achievers who got the item

right.

7/29/2019 20121020101015Item_Analysis

19/27

Item Discrimination (cont.)

compares the performance of upper

group (with high test scores) and lower

group (low test scores) on each item--%of test takers in each group who were

correct

7/29/2019 20121020101015Item_Analysis

20/27

Item Discrimination (contd):

Discrimination Index (D)

Divide sample into TOP half and

BOTTOM half (or TOP and BOTTOM

third)

Compute Discrimination Index (D)

7/29/2019 20121020101015Item_Analysis

21/27

Item Discrimination

D = U - L

U = # in the upper group correct response

Total # in upper group

L = # in the lower group correct response

Total # in lower group

The higher the value of D, the more adequately

the item discriminates (The highest value is 1.0)

7/29/2019 20121020101015Item_Analysis

22/27

Item Discrimination

seek items with high positive numbers (thosewho do well on the test tend to get the item

correct)

negative numbers (lower scorers on test more

likely to get item correct) and low positive

numbers (about the same proportion of low andhigh scorers get the item correct) dont

discriminate well and are discarded

It Di i i ti ( td)

7/29/2019 20121020101015Item_Analysis

23/27


Item-Total CorrelationCorrelation between each item (a correct responseusually receives a score of 1 and an incorrect a score

of zero) and the total test score.

To which degree do item and test measures the samething?

Positive -item discriminates between high and low

scores

Near0 - item does not discriminate between high & low

Negative - scores on item and scores on test disagree

It Di i i ti ( td)

7/29/2019 20121020101015Item_Analysis

24/27


Item-Total Correlation

Item-total correlations are directly

related to reliability.

Why?Because the more each item correlates

with the test as a whole, the higher all

items correlate with each other

( = higher alpha, internal consistency)

7/29/2019 20121020101015Item_Analysis

25/27

Quantitative Item Analysis

Inter-item correlation matrix displays thecorrelation of each item with every other

item

provides important information forincreasing the tests internal consistency

each item should be highly correlated

with every other item measuring the sameconstruct and not correlated with items

measuring a different construct

7/29/2019 20121020101015Item_Analysis

26/27

Quantitative Item Analysis

items that are not highly correlated with

other items measuring the same

construct can and should be droppedto

increase internal consistency

It Di i i ti ( td)

7/29/2019 20121020101015Item_Analysis

27/27


Interitem Correlation

Possible causes for low inter-item correlation:

a. Item badly written (revise)

b. Item measures other attribute than rest ofthe test (discard)

c. Item correlated with some items, but not

with others: test measures 2 distinct

attributes (subtests or subscales)

20121020101015Item_Analysis

Documents