Page 1
http://epm.sagepub.com/Measurement
Educational and Psychological
http://epm.sagepub.com/content/72/4/533The online version of this article can be found at:
DOI: 10.1177/0013164411431162
12 January 2012 2012 72: 533 originally published onlineEducational and Psychological Measurement
Takafumi Wakita, Natsumi Ueshima and Hiroyuki NoguchiComparing Different Numbers of Options
Psychological Distance Between Categories in the Likert Scale :
Published by:
http://www.sagepublications.com
at: can be foundEducational and Psychological MeasurementAdditional services and information for
http://epm.sagepub.com/cgi/alertsEmail Alerts:
http://epm.sagepub.com/subscriptionsSubscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
What is This?
- Jan 12, 2012OnlineFirst Version of Record
- Jul 6, 2012Version of Record >>
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 2
Educational and PsychologicalMeasurement
72(4) 533–546� The Author(s) 2012
Reprints and permission:sagepub.com/journalsPermissions.nav
DOI: 10.1177/0013164411431162http://epm.sagepub.com
Psychological DistanceBetween Categories in theLikert Scale: ComparingDifferent Numbers ofOptions
Takafumi Wakita1, Natsumi Ueshima2, andHiroyuki Noguchi3
Abstract
This study examined whether the number of options in the Likert scale influencesthe psychological distance between categories. The most important assumption whenusing the Likert scale is that the psychological distance between options is equal. Theauthors proposed a new algorithm for calculating the scale values of options by apply-ing item response theory and the ideas of Wakita to reveal the influence of the num-ber of categories. Three types of questionnaires that were composed of the sameitems, but used different numbers of options to assess these items (specifically, 4-, 5-,and 7-point scales), were completed by 722 undergraduate students. The results indi-cated that the number of options influenced the psychological distance betweenoptions, particularly for the 7-point scale. This influence was revealed only by theauthors’ algorithm; descriptive statistics and coefficients of reliability did not showthat the number of options had a prominent influence. The importance of the num-ber of options and the new algorithm are discussed.
Keywords
Likert scale, number of options, item response theory
1Kansai University, Osaka, Japan2Chita Child Rearing Main Support Center, Kyoto, Japan3Nagoya University, Nagoya, Japan
Corresponding Author:
Takafumi Wakita, Faculty of Sociology, Kansai University, 3-3-35, Yamate-cho, Suita, Osaka 564-8680, Japan
Email: [email protected]
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 3
Background
The Likert scale is the most commonly used psychometric scale among psychologi-
cal measurements that require self-reporting. For this scale, it is assumed that if the
psychological distance between categories is equal, the scale will provide exact mea-
surements of the psychological trait being assessed. This assumption about the psy-
chological distance between categories is the most important factor in the Likert
scale. However, no conclusion has been reached regarding the influence of a different
number of options on the Likert scale, and no previous research has examined the
impact of the number of options on the psychological distance between the options.
The number of options has been a central issue for researchers in extracting infor-
mation from participants since Garner (1960) reported that psychological scales
require more than 20 categories to derive complete information from answers. A
decade later, Green and Rao (1970) reported that six or seven categories were appro-
priate. In contrast, Schuts and Rucker (1975) suggested that the number of options
might not affect participants’ responses. Consequently, no consensus has been
reached regarding the number of options required.
Most Likert scales include four to seven categories. An odd number of options is
used when researchers need a neutral anchor, such as ‘‘Neither agree nor disagree,’’
whereas an even number of options is used when researchers intend to elicit partici-
pants’ opinions or attitudes through answers such as ‘‘Agree’’ or ‘‘Disagree.’’
Previous research also investigated the appropriate number of options from the
perspective of statistical reliability. Lissitz and Green (1975) and Boote (1981) sug-
gested that a 5-point scale was reliable. Cicchetti, Showalter, and Tyrer (1985) exam-
ined the interrater reliability using a Monte Carlo simulation and reported an increase
in reliability when the number of categories was less than eight. Oaster (1989) indi-
cated that a 7-point scale showed the highest test–retest reliability. Preston and
Colman (2000) also revealed that a scale with two to four categories showed the low-
est test–retest reliability, and a scale with seven or more categories showed the high-
est test–retest reliability; however, there was no relation between the number of
options and criterion-related validity among scales with 2 to 11 categories. These
results indicate that 7-point scales are likely to show higher reliability than are any
other number of options. Chang (1994) compared 4- and 6-point scales for the same
items and suggested that an increase in the number of categories did not always result
in higher reliability. However, other studies have indicated that reliability is indepen-
dent of the number of options (Bendig, 1953, 1954; Brown, Wilding, & Coulter,
1991; Komorita, 1963; Matell & Jacoby, 1971). In these previous studies, the number
of options was discussed from the perspective of reliability, which estimates only the
random error in the error of measurement. The main target of the present study, the
psychological distance between options, is considered to be more suitable for asses-
sing the systematic error in the error of measurement than Cronbach’s a, intraclass
correlation, and test–retest reliability. The number of options has also been examined
from the perspective of how participants feel when considering the appropriate
option. Preston and Colman (2000) examined the following questions with the same
534 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 4
participants: (a) ease of rating, (b) time required to select an answer, and (c) partici-
pants’ satisfaction with their ability to express their feelings. Their results suggested
that 5 to 10 categories were easy to rate. In addition, 5 categories were evaluated as
being short enough to select an answer quickly and 3 or 4 categories were evaluated
as being complete enough for participants to express their feelings satisfactorily.
Thus, these results indicate that a maximum of 5 categories is adequate for most
scales.
Although the number of options has been considered from the viewpoints of
researcher orientation, statistical reliability, and participant evaluation, no previous
studies focused on the assumption of the original Likert scale—that is, psychological
distance between categories is equal—when evaluating the appropriate number of
options.
Many psychological scales include a neutral category, such as ‘‘Neither agree nor
disagree,’’ to allocate equal psychological distance between the neutral category and
the adjacent side categories in line with the assumption that the psychological dis-
tance between categories must be equal.
Wakita (2004) described a method for estimating the widths of each category
(Figure 1) and showed that the widths were affected by the item contents. The widths
were defined as W1 = C2–C1, W2 = C3-–C2, W3 = C4–C3, and it was shown that the
psychological distances between each category were equal when W1:W2:W3 = 1:1:1.
W1:W2:W3 was skewed when the item contents were negative; specifically, the width
of the neutral category was significantly narrower than the widths of the other cate-
gories. However, this tool is not adequate for discussing the psychological distance
between options. To discuss psychological distance in detail, we must obtain scale
values for the categories shown in Figure 1.
The present study presents a new formula for obtaining scale values that corre-
spond to each original category in order to reveal the differences between these scale
values and the original categories. We aimed to examine the appropriate number of
categories for Likert scales, focusing on the psychological distance between cate-
gories, and clarify how the number of options affects this distance. For the purpose
of this study, 4-, 5-, and 7-point scales were used for the same personality scale.
Method
Formula for Calculating Scale Values
Item response theory (IRT) was applied to calculate the scale values of each category
in this study. IRT applies the generalized partial credit model (GPCM) by Muraki
(1992). This new formula was organized according to the following two assumptions:
Assumption 1: In the Likert scale, a latent continuum is assumed to exist
behind each category, and this continuum is divided to give the interval to
each category. A border to the next category is assumed to exist at a
midpoint between the adjacent categories on the rating scale continuum
Wakita et al. 535
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 5
(Figure 1). Thus, the category of the rating scale has a certain range of
length on the rating scale continuum; however, both ends of the categories
are open intervals.
Assumption 2: In the GPCM, the intersection of two adjacent categories is
defined as the point representing category parameters. This intersection is
assigned on the borders of the rating continuum in the Likert scale
(Assumption 1).
Scale Value
If the scale values are normally distributed according to category parameters, the
expectations of each interval are defined as scale values. For example, the expecta-
tion of the interval between 2N and the first category parameter (C1) is defined as
the scale value of the first category (m1), and the expectation of the interval between
the first category parameter (C1) and the second category parameter (C2) is defined
as the scale value of the second category (m2). Therefore, in the case of
fðxÞ ¼ 1ffiffiffiffiffiffi2pp e�
x2
2 ;
the scale value (mP) of the Pth category, which is the expectation of [CP21, CP], is
obtained by
Category 1
Category 2
Category 5
Category 4
Category 3
Likert scale
C1 C2 C3 C4
Wakita (2004)
Current study μ1 μ2 μ3 μ4 μ5
C2–C1 C3–C2 C4–C3
Figure 1. Calculating scale value (m)
536 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 6
mP ¼ðCP
CP�1
x 3fðxÞÐ CP
CP�1fðxÞdx
n o dx ¼ fðCP�1Þ � fðCPÞÐ CP
CP�1fðxÞdx
:
When C0 is 2N and f(C0) = 0 and the number of categories is m, Cm for the mth
category would be + N and f(Cm) = 0. Thus, the resulting mP is defined as the scale
value of the Pth category.
Management of Number of Options
The Big Five Scale (Wada, 1996), which is a major personality scale, was modified
into three types of questionnaires with different numbers of options. This scale is one
of the major psychological scales that is commonly used with different numbers of
options. From its subscales, 11 neuroticism items (BF-N) and 12 extraversion items
(BF-extraversion normal [EN] and BF-extraversion reversed [ER]) were selected.
BF-N comprises items that ask about socially negative attitudes, BF-EN asks about
socially positive attitudes, and BF-ER asks about socially positive attitudes toward
extraversion. For these items, 4-, 5-, and 7- point categories were used as follows:
(a) a 4-point scale was adopted based on its frequency of use and participants’ satis-
faction of expressing their feelings (Preston & Colman, 2000), (b) a 5-point scale
was set up based on its frequency of use and ease of selecting an answer (Preston &
Colman, 2000), and (c) a 7-point scale was set up based on the higher reliability of
this number of options shown by Cicchetti, Showalter, and Tyrer. (1985), Oaster
(1989), and Preston and Colman (2000). These numbers of categories are commonly
used in psychological and clinical research. The expressions of ratings for each scale
Table 1. Expressions of Ratings for Each Scale
Number of categories Anchors
4 DisagreeSlightly disagreeSlightly agreeAgree
5 DisagreeSlightly disagreeNeither agree nor disagreeSlightly agreeAgree
7 Strongly disagreeAlmost disagreeDo not really disagreeNeither agree nor disagreeDo not really agreeSlightly agreeStrongly agree
Wakita et al. 537
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 7
are described in Table 1. The order of the items and other parts of the questionnaire
were not changed.
Participants and Study Period
Participants comprised 772 undergraduate students. The questionnaire was com-
pleted anonymously, and a response to the questionnaire was considered to represent
informed consent to participate in the study. Questionnaires were administered in the
autumn semester of 2002.
Procedure
Questionnaires were randomly administered during a lecture (4-point scale, n = 258;
5-point scale, n = 254; 7-point scale, n = 260) with each participant answering one
questionnaire (4-, 5-, or 7-point scale).
Results
Analysis
To compare the characteristics of each number of categories, the following three
points were examined: (a) the mean and standard deviation (SD) of each subscale
score, (b) the estimates of the coefficient of reliability (coefficient of Cronbach’s a),
and (c) the estimates of the scale value based on IRT. Subsequently, the relation
between the conventional scale values and the estimated scale value (converted scale
score) in (c) was examined. The estimates of scale values by IRT were obtained
based on the category parameter by the GPCM (Muraki, 1992). The PARSCLE4.1
(Muraki & Bock, 2003) was used.
Descriptive Statistics (Mean and Standard Deviation)
Each subscale was assigned a consecutive integral item value, such as 1 point or 2
points, from the first category to the end category, and the mean scores were assigned
to correspondent subscale scores. To compare these values, the scale scores of the 4-
point scale and 7-point scale were adjusted to the same range as the 5-point scale
(adjusted scale-scores).1 The results showed that the mean and SD of each subscale
score were not significantly different except for the 7-point scale (Table 2). In the 7-
point scale, the adjusted scale score was slightly lower than the other two scales, and
the SD was also slightly smaller.
Reliability
The estimates of reliability were obtained by using Cronbach’s a coefficient
(Table 3). No subscale showed an obvious difference in a based on the number of
categories.
538 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 8
Estimates of Scale Values for Each Category
The eigenvalues of the matrix of correlation were examined to confirm the unidimen-
sionality of the scale as a latent trait in the IRT model, and the unidimensionality of
the scale was confirmed in all subscales. The first and second eigenvalues and their
ratios are shown in Table 4.
Then, the scale values (mP) of each category were calculated from the resulting
category parameter, which was estimated by IRT. Only the subscale BF-EN in the
7-point scale was estimated from five items because no participants selected
‘‘Strongly disagree’’ for the second item. In addition, the resulting scale values were
converted to the range from 1 to 4 points, 5 points, and 7 points in each category
(converted item value). For instance, the converted scale values ranged from 1 to 4
points when that scale had four categories.2 The category parameters in GPCM, the
scale value, and the converted scale value are shown in Table 5, and the converted
scale values are shown in Figures 2 to 4. In the BF-ER, the fifth and sixth parameters
were not ordered.
Table 2. Mean and Standard Deviation of Each Subscale Score
Number ofBF-N BF-EN BF-ER
categories N M SD N M SD N M SD
Conventional 4 257 2.747 0.601 256 2.660 0.647 256 2.074 0.603scale score 5 252 3.517 0.799 254 3.317 0.803 254 2.552 0.791
7 257 4.722 1.037 255 4.501 0.998 260 3.366 1.020
Adjusted 4 257 3.434 0.752 256 3.325 0.809 256 2.593 0.754scale score 5 252 3.517 0.799 254 3.317 0.803 254 2.552 0.791
7 257 3.373 0.741 255 3.215 0.713 260 2.404 0.729
Note: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;
BF-ER = Big Five Scale—extraversion reversed items.
Table 3. Estimates of Reliability (Cronbach’s a Coefficient)
Number of categories BF-N BF-EN BF-ER
4 .882 .865 .7955 .889 .859 .7957 .900 .858 .805
Note: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;
BF-ER = Big Five Scale—extraversion reversed items.
Wakita et al. 539
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 9
In the converted scale value of the 4-point scale shown in Figure 1, all converted
scale values were allocated around the conventional item values. In the 5-point scale
shown in Figure 2, most converted scale values were also around the conventional
scale values except the fourth category of BF-N. In contrast, in the 7-point scale
shown in Figure 4, half of the converted scale values deviated from the conventional
item value. For instance, the fourth and fifth categories of BF-N were smaller than
their conventional item values, and the fifth and sixth categories of BF-ER were dis-
proportionately close to 7. Consequently, only the results from IRT evaluation
revealed that the psychological distance between categories was affected by the num-
ber of options shown in the figures of the converted scale scores.
Comparison of Conventional Scale Scores and the Converted Scale Scores
The conventional scale scores were calculated by summing the item scores that
assigned an integer value to each option, and the converted scale scores were calcu-
lated from the converted scale value. When calculating descriptive statistics, the dif-
ference in the absolute value between these scales was examined to determine the
difference between the scores (Table 6). The results indicated that the 4- and 5-point
scales had only slight differences less than 0.15, whereas the 7-point scale showed
larger differences in the BF-N and the BF-ER. The coefficient of correlation between
the conventional and the converted BF-N scores was lowest in the 7-point scale (i.e.,
0.993).
Finally, the correlations between the items, which influence the factor analysis and
structure equation modeling, were compared by focusing on BF-N, which had the
largest variance in psychological distance between the 4- and 7-point scales. The
maximum difference in the absolute value of the correlation between items was 0.181
(between Items 7 and 10), and the minimum was 0.002 (between Items 4 and 8).
Table 4. Eigenvalues of Each Subscale
Number of categories Eigenvalue BF-N BF-EN BF-ER
4 First (l1) 5.189 3.594 3.007Second (l2) 1.070 0.772 0.829l1/l2 4.849 4.656 3.628
5 First (l1) 5.362 3.562 3.013Second (l2) 1.008 0.779 0.870l1/l2 5.319 4.576 3.464
7 First (l1) 5.619 3.545 3.072Second (l2) 1.105 0.886 0.757l1/l2 5.087 4.000 4.059
Note: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items; BF-
ER = Big Five Scale—extraversion reversed items.
540 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 10
Tab
le5.
Cat
egory
Par
amet
ers
inIt
emR
espons
eT
heory
and
Scal
eVal
ue
ofEa
chC
ateg
ory
Num
ber
ofC
ateg
ori
es
45
7
Cat
egory
par
amet
erSc
ale
valu
e(m
)C
onv
erte
dsc
ale
valu
esC
ateg
ory
par
amet
erSc
ale
valu
e(m
)C
onv
erte
dsc
ale
valu
esC
ateg
ory
par
amet
erSc
ale
valu
e(m
)C
onv
erte
dsc
ale
valu
es
BF-
N1
1.3
20
21.7
87
1.0
00
1.1
76
21.6
68
1.0
00
1.7
67
22.1
69
1.0
00
20.0
01
20.5
71
2.0
20
0.3
54
20.7
23
2.0
78
0.9
47
21.2
84
2.1
69
32
1.3
21
0.5
70
2.9
78
20.1
50
20.1
00
2.7
90
0.5
09
20.7
16
2.9
20
41.7
88
4.0
00
21.3
79
0.6
74
3.6
73
0.0
45
20.2
72
3.5
07
51.8
36
5.0
00
21.2
71
0.5
30
4.5
68
62
1.9
96
1.5
65
5.9
36
72.3
70
7.0
00
BF-
EN
11.4
92
21.9
32
1.0
00
1.6
41
22.0
59
1.0
00
1.8
94
22.2
80
1.0
00
22
0.0
66
20.5
82
2.0
63
0.5
89
21.0
18
1.9
86
1.4
67
21.6
55
1.7
96
32
1.4
26
0.6
40
3.0
26
20.4
70
20.0
54
2.8
99
0.4
92
20.9
05
2.7
53
41.8
76
4.0
00
21.7
61
0.9
73
3.8
73
20.2
50
20.1
16
3.7
61
52.1
63
5.0
00
21.5
46
0.7
82
4.9
05
62
2.0
57
1.7
63
6.1
57
72.4
24
7.0
00
BF-
ER
11.4
13
21.8
65
1.0
00
1.4
19
21.8
70
1.0
00
2.1
02
22.4
64
1.0
00
20.1
63
20.6
92
1.9
10
0.6
98
21.0
14
1.8
92
1.2
40
21.5
73
2.1
92
32
1.5
76
0.5
49
2.8
72
20.5
82
20.0
51
2.8
96
0.6
28
20.9
05
3.0
86
42.0
04
4.0
00
21.5
35
0.9
82
3.9
72
20.6
05
20.0
10
4.2
85
51.9
68
5.0
00
21.7
72
1.0
63
5.7
22
62
1.5
93
1.6
78
6.5
45
72.0
18
7.0
00
Note
:BF-
N=
Big
Five
Scal
e—neu
rotici
smitem
s;BF-
EN
=Big
Five
Scal
e—ex
trav
ersi
on
norm
alitem
s;BF-
ER
=Big
Five
Scal
e—ex
trav
ersi
on
reve
rsed
item
s.
541
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 11
Discussion
Method for Evaluating Psychological Distance
To clarify how the number of categories influences psychological distance in the
Likert scale, 4-, 5- and 7-point scales of the same psychological scale and with the
same instructions were compared. Moreover, this study proposed a new method for
measuring the scale values to examine the distance between items.
Figure 2. Converted item values of the 4-category scaleNote: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;
BF-ER = Big Five Scale—extraversion reversed items.
Figure 3. Converted item values of the 5-category scaleNote: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;
BF-ER = Big Five Scale—extraversion reversed items.
Figure 4. Converted item values of the 7-category scaleNote: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;
BF-ER = Big Five Scale—extraversion reversed items.
542 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 12
Our new IRT method, which is based on the method reported by Wakita (2004),
enabled a discussion of the number of categories in the Likert scale derived from the
psychological distance in the rating scales, which has been previously discussed from
the perspective of the estimates of the reliability coefficient.
Descriptive Statistics (Mean and Standard Deviation)
The descriptive statistics suggest that in the 7-point scale, the participants tended to
select somewhat negative answers, such as ‘‘Disagree,’’ and that they avoided select-
ing both ends of categories. These tendencies might imply that an increase in the
number of options biases respondents against answers containing the strongest
expressions.
Reliability
The coefficient of reliability was independent of the number of categories in this
study, a finding that is consistent with previous studies showing that the appropriate
number of categories cannot be determined based on the estimates of the coefficient
of reliability (Bendig, 1953, 1954; Brown et al., 1991; Komorita, 1963; Matell &
Jacoby, 1971).
Estimates of Scale Values for Category (Converted Scale ValuesObtained by IRT)
A comparison of the numbers of categories indicated that the psychological distance
deviated more as the number of categories increased in the BF-N and BF-ER sub-
scales. In the 5-point scale, deviation in the psychological distance was seen,
Table 6. Difference Between Conventional and Converted Item Valuesa
Number ofBF-N BF-EN BF-ER
Categories M SD M SD M SD
Absolute of difference ofconventional item value and
4 0.007 0.005 0.032 0.017 0.067 0.033(0.000-0.022) (0.000-0.063) (0.000-0.128)
converted item value 5 0.142 0.080 0.074 0.033 0.064 0.028(0.000-0.316) (0.000-0.127) (0.000-0.108)
7 0.224 0.119 0.130 0.074 0.260 0.134(0.000-0.465) (0.000-0.246) (0.000-0.649)
Note: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;
BF-ER = Big Five Scale—extraversion reversed items.aValues in parentheses represent ranges.
Wakita et al. 543
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 13
especially in the BF-N subscale. In the 7-point scale, deviation was seen in all the
three subscales; however, the psychological distance deviated more in the BF-N and
BF-ER subscales than in the BF-EN subscale.
In this study, the number of categories did not influence the descriptive statistics
and the estimates of the reliability coefficient, but it did influence the item values.
Consequently, the psychological distance estimated by the converted item value by
the IRT deviated more in the 7-point scale than in the 4- and 5-point scales. In addi-
tion, the 7-point scale did not function well because of the reversal of the category
parameters shown in Table 5. Furthermore, this deviation was greater when items
asked about socially negative personality traits shown in the BF-N and the BF-ER. In
short, these results imply that an attempt to set a neutral category such as ‘‘Neither
agree nor disagree’’ between positive and negative categories did not accomplish the
intended purpose. These results suggest that it was not necessary to adapt the 7-point
scale, which requires more time, and that the psychological distance was sensitive to
items with socially negative contents. The latter suggestion supports the following
two perspectives based on statistical evidence. First, it is recommended that the words
of a rating scale be carefully considered when asking participants to rate contents as
reversed items. Second, self-reported questionnaires using the Likert scale are abso-
lutely affected by the bias of social desirability.
Our study not only identified weak points in Likert scales but also suggested a
practical method for developing new questionnaires and modifying established items.
The new method presented here demonstrated the inequality in the psychological dis-
tance of the Likert scale. When developing new scales, our IRT method enabled us to
ensure equality in the psychological distance between options, allowing us to select
suitable expressions for anchors and an appropriate number of options. For example,
whether an increase in the number of positive ratings in a scale such as ‘‘Disagree,’’
‘‘Slightly agree,’’ ‘‘Somewhat agree,’’ ‘‘Moderately agree,’’ and ‘‘Strongly agree’’
would improve the deviation of the responses when the items might be influenced by
social desirability could be shown by scale values. Such manipulation has not yet
been used but is necessary to support the important original assumption of the Likert
scale that the psychological distance between items is equal.
Limitations and Future Direction
This study aimed to examine whether the number of options had an effect on the psy-
chological distance in the Likert scale by applying IRT theory to consider the appro-
priate number of options. The results of IRT analysis indicated that the number of
options had an effect on the response, especially in the 7-point scale. However, this
study assessed only one major psychological (personality) scale using the anchors
shown in Table 1. In addition, participants were all undergraduate students. Surveys
using other scales and in other populations are needed before the results can be
generalized.
544 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 14
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship,
and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of
this article.
Note
1. When the item values before the conversion are set at x and those after the conversion are
set at y, y = 54x in the case of the 4-point scale and y = 5
7x in the case of the 7-point scale.
2. When using the 4-point scale (e.g. BF-N), the range (w; 3.575) of the difference from m1
(–1.787) to m4 (1.788) and gn = 4�13:575
mn. It is –1.500, –0.479, 0.479, and 1.500 from g1 in
order. Then xn is 1.000, 2.020, 2.978 and 4.000 from x1 in order.
References
Bendig, A. W. (1953). The reliability of self-ratings as a function of the amount of verbal
anchoring and the number of categories on the scale. Journal of Applied Psychology, 37,
38-41.
Bendig, A. W. (1954). Reliability and the number of rating scale categories. Journal of Applied
Psychology, 38, 38-40.
Boote, A. S. (1981). Reliability testing of psychographic scales: Five-point or seven-point?
Anchored or labeled? Journal of Advertising Research, 21, 53-60.
Brown, G., Wilding, R. E., & Coulter, R. L. (1991). Customer evaluation of retail salespeople
using the SOCO scale: A replication extension and application. Journal of the Academy of
Marketing Science, 9, 347-351.
Chang, L. (1994). A psychometric evaluation of four-point and six-point Likert-type scales in
relation to reliability and validity. Applied Psychological Measurement, 18, 205-215.
Cicchetti, D. V., Showalter, D., & Tyrer, P. J. (1985). The effect of number of rating scale
categories on levels of inter-rater reliability: A Monte-Carlo investigation. Applied
Psychological Measurement, 9, 31-36.
Garner, W. R. (1960). Rating scales, discriminability and information transmission.
Psychological Review, 67, 343-352.
Green, P. E., & Rao, V. R. (1970). Rating scales and information recovery: How many scales
and response categories to use? Journal of Marketing, 34, 33-39.
Komorita, S. S. (1963). Attitude content, intensity, and the neutral point on a Likert scale.
Journal of Social Psychology, 61, 327-334.
Lissitz, R. W., & Green, S. B. (1975). Effect of the number of scale points on reliability: A
Monte-Carlo approach. Journal of Applied Psychology, 60, 10-13.
Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale
items? Study 1: Reliability and validity. Educational and Psychological Measurement, 31,
657-674.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm.
Applied Psychological Measurement, 16, 159-176.
Wakita et al. 545
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from
Page 15
Muraki, E., & Bock, R. D. (2003). PARSCALE: Parameter Scaling of Rating Data [Computer
program]. Chicago, IL: Scientific Software.
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating
scales: Reliability, validity, discriminating power, and respondent preferences. Acta
Psychologia, 104, 1-15.
Oaster, T. R. F. (1989). Number of alternatives per choice point and stability of Likert-type
scales. Perceptual and Motor Skills, 68, 549-550.
Schuts, H. G., & Rucker, M. H. (1975). A comparison of variables configurations across scale
lengths: An empirical study. Educational and Psychological Measurement, 35, 319-324.
Wada, S. (1996). Construction of the Big Five Scales of personality trait terms and concurrent
validity with NPI. Japanese Journal of Psychology, 67, 61-17.
Wakita, T. (2004). The distance between categories in rating-scale method: Applying item
response model to the assessment process. Japanese Journal of Psychology, 75, 331-338.
546 Educational and Psychological Measurement 72(4)
at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from