Educational and Psychological Measurement · 2019. 1. 24. · 534 Educational and Psychological Measurement 72(4) Downloaded from epm.sagepub.com at University of Central Florida

http://epm.sagepub.com/Measurement

Educational and Psychological

http://epm.sagepub.com/content/72/4/533The online version of this article can be found at:

DOI: 10.1177/0013164411431162

12 January 2012 2012 72: 533 originally published onlineEducational and Psychological Measurement

Takafumi Wakita, Natsumi Ueshima and Hiroyuki NoguchiComparing Different Numbers of Options

Psychological Distance Between Categories in the Likert Scale :

Published by:

http://www.sagepublications.com

at: can be foundEducational and Psychological MeasurementAdditional services and information for

http://epm.sagepub.com/cgi/alertsEmail Alerts:

http://epm.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

What is This?

- Jan 12, 2012OnlineFirst Version of Record

- Jul 6, 2012Version of Record >>

at University of Central Florida Libraries on April 16, 2013epm.sagepub.comDownloaded from

http://epm.sagepub.com/

http://epm.sagepub.com/content/72/4/533

http://www.sagepublications.com

http://epm.sagepub.com/cgi/alerts

http://epm.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://epm.sagepub.com/content/72/4/533.full.pdf

http://epm.sagepub.com/content/early/2012/01/06/0013164411431162.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml


Educational and PsychologicalMeasurement

72(4) 533–546� The Author(s) 2012

Reprints and permission:sagepub.com/journalsPermissions.nav

DOI: 10.1177/0013164411431162http://epm.sagepub.com

Psychological DistanceBetween Categories in theLikert Scale: ComparingDifferent Numbers ofOptions

Takafumi Wakita1, Natsumi Ueshima2, andHiroyuki Noguchi3

Abstract

This study examined whether the number of options in the Likert scale influencesthe psychological distance between categories. The most important assumption whenusing the Likert scale is that the psychological distance between options is equal. Theauthors proposed a new algorithm for calculating the scale values of options by apply-ing item response theory and the ideas of Wakita to reveal the influence of the num-ber of categories. Three types of questionnaires that were composed of the sameitems, but used different numbers of options to assess these items (specifically, 4-, 5-,and 7-point scales), were completed by 722 undergraduate students. The results indi-cated that the number of options influenced the psychological distance betweenoptions, particularly for the 7-point scale. This influence was revealed only by theauthors’ algorithm; descriptive statistics and coefficients of reliability did not showthat the number of options had a prominent influence. The importance of the num-ber of options and the new algorithm are discussed.

Keywords

Likert scale, number of options, item response theory

1Kansai University, Osaka, Japan2Chita Child Rearing Main Support Center, Kyoto, Japan3Nagoya University, Nagoya, Japan

Corresponding Author:

Takafumi Wakita, Faculty of Sociology, Kansai University, 3-3-35, Yamate-cho, Suita, Osaka 564-8680, Japan

Email: [email protected]



Background

The Likert scale is the most commonly used psychometric scale among psychologi-

cal measurements that require self-reporting. For this scale, it is assumed that if the

psychological distance between categories is equal, the scale will provide exact mea-

surements of the psychological trait being assessed. This assumption about the psy-

chological distance between categories is the most important factor in the Likert

scale. However, no conclusion has been reached regarding the influence of a different

number of options on the Likert scale, and no previous research has examined the

impact of the number of options on the psychological distance between the options.

The number of options has been a central issue for researchers in extracting infor-

mation from participants since Garner (1960) reported that psychological scales

require more than 20 categories to derive complete information from answers. A

decade later, Green and Rao (1970) reported that six or seven categories were appro-

priate. In contrast, Schuts and Rucker (1975) suggested that the number of options

might not affect participants’ responses. Consequently, no consensus has been

reached regarding the number of options required.

Most Likert scales include four to seven categories. An odd number of options is

used when researchers need a neutral anchor, such as ‘‘Neither agree nor disagree,’’

whereas an even number of options is used when researchers intend to elicit partici-

pants’ opinions or attitudes through answers such as ‘‘Agree’’ or ‘‘Disagree.’’

Previous research also investigated the appropriate number of options from the

perspective of statistical reliability. Lissitz and Green (1975) and Boote (1981) sug-

gested that a 5-point scale was reliable. Cicchetti, Showalter, and Tyrer (1985) exam-

ined the interrater reliability using a Monte Carlo simulation and reported an increase

in reliability when the number of categories was less than eight. Oaster (1989) indi-

cated that a 7-point scale showed the highest test–retest reliability. Preston and

Colman (2000) also revealed that a scale with two to four categories showed the low-

est test–retest reliability, and a scale with seven or more categories showed the high-

est test–retest reliability; however, there was no relation between the number of

options and criterion-related validity among scales with 2 to 11 categories. These

results indicate that 7-point scales are likely to show higher reliability than are any

other number of options. Chang (1994) compared 4- and 6-point scales for the same

items and suggested that an increase in the number of categories did not always result

in higher reliability. However, other studies have indicated that reliability is indepen-

dent of the number of options (Bendig, 1953, 1954; Brown, Wilding, & Coulter,

1991; Komorita, 1963; Matell & Jacoby, 1971). In these previous studies, the number

of options was discussed from the perspective of reliability, which estimates only the

random error in the error of measurement. The main target of the present study, the

psychological distance between options, is considered to be more suitable for asses-

sing the systematic error in the error of measurement than Cronbach’s a, intraclass

correlation, and test–retest reliability. The number of options has also been examined

from the perspective of how participants feel when considering the appropriate

option. Preston and Colman (2000) examined the following questions with the same

534 Educational and Psychological Measurement 72(4)



participants: (a) ease of rating, (b) time required to select an answer, and (c) partici-

pants’ satisfaction with their ability to express their feelings. Their results suggested

that 5 to 10 categories were easy to rate. In addition, 5 categories were evaluated as

being short enough to select an answer quickly and 3 or 4 categories were evaluated

as being complete enough for participants to express their feelings satisfactorily.

Thus, these results indicate that a maximum of 5 categories is adequate for most

scales.

Although the number of options has been considered from the viewpoints of

researcher orientation, statistical reliability, and participant evaluation, no previous

studies focused on the assumption of the original Likert scale—that is, psychological

distance between categories is equal—when evaluating the appropriate number of

options.

Many psychological scales include a neutral category, such as ‘‘Neither agree nor

disagree,’’ to allocate equal psychological distance between the neutral category and

the adjacent side categories in line with the assumption that the psychological dis-

tance between categories must be equal.

Wakita (2004) described a method for estimating the widths of each category

(Figure 1) and showed that the widths were affected by the item contents. The widths

were defined as W1 = C2–C1, W2 = C3-–C2, W3 = C4–C3, and it was shown that the

psychological distances between each category were equal when W1:W2:W3 = 1:1:1.

W1:W2:W3 was skewed when the item contents were negative; specifically, the width

of the neutral category was significantly narrower than the widths of the other cate-

gories. However, this tool is not adequate for discussing the psychological distance

between options. To discuss psychological distance in detail, we must obtain scale

values for the categories shown in Figure 1.

The present study presents a new formula for obtaining scale values that corre-

spond to each original category in order to reveal the differences between these scale

values and the original categories. We aimed to examine the appropriate number of

categories for Likert scales, focusing on the psychological distance between cate-

gories, and clarify how the number of options affects this distance. For the purpose

of this study, 4-, 5-, and 7-point scales were used for the same personality scale.

Method

Formula for Calculating Scale Values

Item response theory (IRT) was applied to calculate the scale values of each category

in this study. IRT applies the generalized partial credit model (GPCM) by Muraki

(1992). This new formula was organized according to the following two assumptions:

Assumption 1: In the Likert scale, a latent continuum is assumed to exist

behind each category, and this continuum is divided to give the interval to

each category. A border to the next category is assumed to exist at a

midpoint between the adjacent categories on the rating scale continuum

Wakita et al. 535



(Figure 1). Thus, the category of the rating scale has a certain range of

length on the rating scale continuum; however, both ends of the categories

are open intervals.

Assumption 2: In the GPCM, the intersection of two adjacent categories is

defined as the point representing category parameters. This intersection is

assigned on the borders of the rating continuum in the Likert scale

(Assumption 1).

Scale Value

If the scale values are normally distributed according to category parameters, the

expectations of each interval are defined as scale values. For example, the expecta-

tion of the interval between 2N and the first category parameter (C1) is defined as

the scale value of the first category (m1), and the expectation of the interval between

the first category parameter (C1) and the second category parameter (C2) is defined

as the scale value of the second category (m2). Therefore, in the case of

fðxÞ ¼ 1ffiffiffiffiffiffi2pp e�

x2

2 ;

the scale value (mP) of the Pth category, which is the expectation of [CP21, CP], is

obtained by

Category 1

Category 2

Category 5

Category 4

Category 3

Likert scale

C1 C2 C3 C4

Wakita (2004)

Current study μ1 μ2 μ3 μ4 μ5

C2–C1 C3–C2 C4–C3

Figure 1. Calculating scale value (m)




mP ¼ðCP

CP�1

x 3fðxÞÐ CP

CP�1fðxÞdx

n o dx ¼ fðCP�1Þ � fðCPÞÐ CP

CP�1fðxÞdx

:

When C0 is 2N and f(C0) = 0 and the number of categories is m, Cm for the mth

category would be + N and f(Cm) = 0. Thus, the resulting mP is defined as the scale

value of the Pth category.

Management of Number of Options

The Big Five Scale (Wada, 1996), which is a major personality scale, was modified

into three types of questionnaires with different numbers of options. This scale is one

of the major psychological scales that is commonly used with different numbers of

options. From its subscales, 11 neuroticism items (BF-N) and 12 extraversion items

(BF-extraversion normal [EN] and BF-extraversion reversed [ER]) were selected.

BF-N comprises items that ask about socially negative attitudes, BF-EN asks about

socially positive attitudes, and BF-ER asks about socially positive attitudes toward

extraversion. For these items, 4-, 5-, and 7- point categories were used as follows:

(a) a 4-point scale was adopted based on its frequency of use and participants’ satis-

faction of expressing their feelings (Preston & Colman, 2000), (b) a 5-point scale

was set up based on its frequency of use and ease of selecting an answer (Preston &

Colman, 2000), and (c) a 7-point scale was set up based on the higher reliability of

this number of options shown by Cicchetti, Showalter, and Tyrer. (1985), Oaster

(1989), and Preston and Colman (2000). These numbers of categories are commonly

used in psychological and clinical research. The expressions of ratings for each scale

Table 1. Expressions of Ratings for Each Scale

Number of categories Anchors

4 DisagreeSlightly disagreeSlightly agreeAgree

5 DisagreeSlightly disagreeNeither agree nor disagreeSlightly agreeAgree

7 Strongly disagreeAlmost disagreeDo not really disagreeNeither agree nor disagreeDo not really agreeSlightly agreeStrongly agree

Wakita et al. 537



are described in Table 1. The order of the items and other parts of the questionnaire

were not changed.

Participants and Study Period

Participants comprised 772 undergraduate students. The questionnaire was com-

pleted anonymously, and a response to the questionnaire was considered to represent

informed consent to participate in the study. Questionnaires were administered in the

autumn semester of 2002.

Procedure

Questionnaires were randomly administered during a lecture (4-point scale, n = 258;

5-point scale, n = 254; 7-point scale, n = 260) with each participant answering one

questionnaire (4-, 5-, or 7-point scale).

Results

Analysis

To compare the characteristics of each number of categories, the following three

points were examined: (a) the mean and standard deviation (SD) of each subscale

score, (b) the estimates of the coefficient of reliability (coefficient of Cronbach’s a),

and (c) the estimates of the scale value based on IRT. Subsequently, the relation

between the conventional scale values and the estimated scale value (converted scale

score) in (c) was examined. The estimates of scale values by IRT were obtained

based on the category parameter by the GPCM (Muraki, 1992). The PARSCLE4.1

(Muraki & Bock, 2003) was used.

Descriptive Statistics (Mean and Standard Deviation)

Each subscale was assigned a consecutive integral item value, such as 1 point or 2

points, from the first category to the end category, and the mean scores were assigned

to correspondent subscale scores. To compare these values, the scale scores of the 4-

point scale and 7-point scale were adjusted to the same range as the 5-point scale

(adjusted scale-scores).1 The results showed that the mean and SD of each subscale

score were not significantly different except for the 7-point scale (Table 2). In the 7-

point scale, the adjusted scale score was slightly lower than the other two scales, and

the SD was also slightly smaller.

Reliability

The estimates of reliability were obtained by using Cronbach’s a coefficient

(Table 3). No subscale showed an obvious difference in a based on the number of

categories.




Estimates of Scale Values for Each Category

The eigenvalues of the matrix of correlation were examined to confirm the unidimen-

sionality of the scale as a latent trait in the IRT model, and the unidimensionality of

the scale was confirmed in all subscales. The first and second eigenvalues and their

ratios are shown in Table 4.

Then, the scale values (mP) of each category were calculated from the resulting

category parameter, which was estimated by IRT. Only the subscale BF-EN in the

7-point scale was estimated from five items because no participants selected

‘‘Strongly disagree’’ for the second item. In addition, the resulting scale values were

converted to the range from 1 to 4 points, 5 points, and 7 points in each category

(converted item value). For instance, the converted scale values ranged from 1 to 4

points when that scale had four categories.2 The category parameters in GPCM, the

scale value, and the converted scale value are shown in Table 5, and the converted

scale values are shown in Figures 2 to 4. In the BF-ER, the fifth and sixth parameters

were not ordered.

Table 2. Mean and Standard Deviation of Each Subscale Score

Number ofBF-N BF-EN BF-ER

categories N M SD N M SD N M SD

Conventional 4 257 2.747 0.601 256 2.660 0.647 256 2.074 0.603scale score 5 252 3.517 0.799 254 3.317 0.803 254 2.552 0.791

7 257 4.722 1.037 255 4.501 0.998 260 3.366 1.020

Adjusted 4 257 3.434 0.752 256 3.325 0.809 256 2.593 0.754scale score 5 252 3.517 0.799 254 3.317 0.803 254 2.552 0.791

7 257 3.373 0.741 255 3.215 0.713 260 2.404 0.729

Note: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;

BF-ER = Big Five Scale—extraversion reversed items.

Table 3. Estimates of Reliability (Cronbach’s a Coefficient)

Number of categories BF-N BF-EN BF-ER

4 .882 .865 .7955 .889 .859 .7957 .900 .858 .805



Wakita et al. 539



In the converted scale value of the 4-point scale shown in Figure 1, all converted

scale values were allocated around the conventional item values. In the 5-point scale

shown in Figure 2, most converted scale values were also around the conventional

scale values except the fourth category of BF-N. In contrast, in the 7-point scale

shown in Figure 4, half of the converted scale values deviated from the conventional

item value. For instance, the fourth and fifth categories of BF-N were smaller than

their conventional item values, and the fifth and sixth categories of BF-ER were dis-

proportionately close to 7. Consequently, only the results from IRT evaluation

revealed that the psychological distance between categories was affected by the num-

ber of options shown in the figures of the converted scale scores.

Comparison of Conventional Scale Scores and the Converted Scale Scores

The conventional scale scores were calculated by summing the item scores that

assigned an integer value to each option, and the converted scale scores were calcu-

lated from the converted scale value. When calculating descriptive statistics, the dif-

ference in the absolute value between these scales was examined to determine the

difference between the scores (Table 6). The results indicated that the 4- and 5-point

scales had only slight differences less than 0.15, whereas the 7-point scale showed

larger differences in the BF-N and the BF-ER. The coefficient of correlation between

the conventional and the converted BF-N scores was lowest in the 7-point scale (i.e.,

0.993).

Finally, the correlations between the items, which influence the factor analysis and

structure equation modeling, were compared by focusing on BF-N, which had the

largest variance in psychological distance between the 4- and 7-point scales. The

maximum difference in the absolute value of the correlation between items was 0.181

(between Items 7 and 10), and the minimum was 0.002 (between Items 4 and 8).

Table 4. Eigenvalues of Each Subscale

Number of categories Eigenvalue BF-N BF-EN BF-ER

4 First (l1) 5.189 3.594 3.007Second (l2) 1.070 0.772 0.829l1/l2 4.849 4.656 3.628



Note: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items; BF-

ER = Big Five Scale—extraversion reversed items.




Tab

le5.

Cat

egory

Par

amet

ers

inIt

emR

espons

eT

heory

and

Scal

eVal

ue

ofEa

chC

ateg

ory

Num

ber

ofC

ateg

ori

es

45

7

Cat

egory

par

amet

erSc

ale

valu

e(m

)C

onv

erte

dsc

ale

valu

esC

ateg

ory

par

amet

erSc

ale

valu

e(m

)C

onv

erte

dsc

ale

valu

esC

ateg

ory

par

amet

erSc

ale

valu

e(m

)C

onv

erte

dsc

ale

valu

es

BF-

N1

1.3

20

21.7

87

1.0

00

1.1

76

21.6

68

1.0

00

1.7

67

22.1

69

1.0

00

20.0

01

20.5

71

2.0

20

0.3

54

20.7

23

2.0

78

0.9

47

21.2

84

2.1

69

32

1.3

21

0.5

70

2.9

78

20.1

50

20.1

00

2.7

90

0.5

09

20.7

16

2.9

20

41.7

88

4.0

00

21.3

79

0.6

74

3.6

73

0.0

45

20.2

72

3.5

07

51.8

36

5.0

00

21.2

71

0.5

30

4.5

68

62

1.9

96

1.5

65

5.9

36

72.3

70

7.0

00

BF-

EN

11.4

92

21.9

32

1.0

00

1.6

41

22.0

59

1.0

00

1.8

94

22.2

80

1.0

00

22

0.0

66

20.5

82

2.0

63

0.5

89

21.0

18

1.9

86

1.4

67

21.6

55

1.7

96

32

1.4

26

0.6

40

3.0

26

20.4

70

20.0

54

2.8

99

0.4

92

20.9

05

2.7

53

41.8

76

4.0

00

21.7

61

0.9

73

3.8

73

20.2

50

20.1

16

3.7

61

52.1

63

5.0

00

21.5

46

0.7

82

4.9

05

62

2.0

57

1.7

63

6.1

57

72.4

24

7.0

00

BF-

ER

11.4

13

21.8

65

1.0

00

1.4

19

21.8

70

1.0

00

2.1

02

22.4

64

1.0

00

20.1

63

20.6

92

1.9

10

0.6

98

21.0

14

1.8

92

1.2

40

21.5

73

2.1

92

32

1.5

76

0.5

49

2.8

72

20.5

82

20.0

51

2.8

96

0.6

28

20.9

05

3.0

86

42.0

04

4.0

00

21.5

35

0.9

82

3.9

72

20.6

05

20.0

10

4.2

85

51.9

68

5.0

00

21.7

72

1.0

63

5.7

22

62

1.5

93

1.6

78

6.5

45

72.0

18

7.0

00

Note

:BF-

N=

Big

Five

Scal

e—neu

rotici

smitem

s;BF-

EN

=Big

Five

Scal

e—ex

trav

ersi

on

norm

alitem

s;BF-

ER

=Big

Five

Scal

e—ex

trav

ersi

on

reve

rsed

item

s.

541



Discussion

Method for Evaluating Psychological Distance

To clarify how the number of categories influences psychological distance in the

Likert scale, 4-, 5- and 7-point scales of the same psychological scale and with the

same instructions were compared. Moreover, this study proposed a new method for

measuring the scale values to examine the distance between items.

Figure 2. Converted item values of the 4-category scaleNote: BF-N = Big Five Scale—neuroticism items; BF-EN = Big Five Scale—extraversion normal items;









Our new IRT method, which is based on the method reported by Wakita (2004),

enabled a discussion of the number of categories in the Likert scale derived from the

psychological distance in the rating scales, which has been previously discussed from

the perspective of the estimates of the reliability coefficient.

Descriptive Statistics (Mean and Standard Deviation)

The descriptive statistics suggest that in the 7-point scale, the participants tended to

select somewhat negative answers, such as ‘‘Disagree,’’ and that they avoided select-

ing both ends of categories. These tendencies might imply that an increase in the

number of options biases respondents against answers containing the strongest

expressions.

Reliability

The coefficient of reliability was independent of the number of categories in this

study, a finding that is consistent with previous studies showing that the appropriate

number of categories cannot be determined based on the estimates of the coefficient

of reliability (Bendig, 1953, 1954; Brown et al., 1991; Komorita, 1963; Matell &

Jacoby, 1971).

Estimates of Scale Values for Category (Converted Scale ValuesObtained by IRT)

A comparison of the numbers of categories indicated that the psychological distance

deviated more as the number of categories increased in the BF-N and BF-ER sub-

scales. In the 5-point scale, deviation in the psychological distance was seen,

Table 6. Difference Between Conventional and Converted Item Valuesa

Number ofBF-N BF-EN BF-ER

Categories M SD M SD M SD

Absolute of difference ofconventional item value and

4 0.007 0.005 0.032 0.017 0.067 0.033(0.000-0.022) (0.000-0.063) (0.000-0.128)

converted item value 5 0.142 0.080 0.074 0.033 0.064 0.028(0.000-0.316) (0.000-0.127) (0.000-0.108)

7 0.224 0.119 0.130 0.074 0.260 0.134(0.000-0.465) (0.000-0.246) (0.000-0.649)


BF-ER = Big Five Scale—extraversion reversed items.aValues in parentheses represent ranges.

Wakita et al. 543



especially in the BF-N subscale. In the 7-point scale, deviation was seen in all the

three subscales; however, the psychological distance deviated more in the BF-N and

BF-ER subscales than in the BF-EN subscale.

In this study, the number of categories did not influence the descriptive statistics

and the estimates of the reliability coefficient, but it did influence the item values.

Consequently, the psychological distance estimated by the converted item value by

the IRT deviated more in the 7-point scale than in the 4- and 5-point scales. In addi-

tion, the 7-point scale did not function well because of the reversal of the category

parameters shown in Table 5. Furthermore, this deviation was greater when items

asked about socially negative personality traits shown in the BF-N and the BF-ER. In

short, these results imply that an attempt to set a neutral category such as ‘‘Neither

agree nor disagree’’ between positive and negative categories did not accomplish the

intended purpose. These results suggest that it was not necessary to adapt the 7-point

scale, which requires more time, and that the psychological distance was sensitive to

items with socially negative contents. The latter suggestion supports the following

two perspectives based on statistical evidence. First, it is recommended that the words

of a rating scale be carefully considered when asking participants to rate contents as

reversed items. Second, self-reported questionnaires using the Likert scale are abso-

lutely affected by the bias of social desirability.

Our study not only identified weak points in Likert scales but also suggested a

practical method for developing new questionnaires and modifying established items.

The new method presented here demonstrated the inequality in the psychological dis-

tance of the Likert scale. When developing new scales, our IRT method enabled us to

ensure equality in the psychological distance between options, allowing us to select

suitable expressions for anchors and an appropriate number of options. For example,

whether an increase in the number of positive ratings in a scale such as ‘‘Disagree,’’

‘‘Slightly agree,’’ ‘‘Somewhat agree,’’ ‘‘Moderately agree,’’ and ‘‘Strongly agree’’

would improve the deviation of the responses when the items might be influenced by

social desirability could be shown by scale values. Such manipulation has not yet

been used but is necessary to support the important original assumption of the Likert

scale that the psychological distance between items is equal.

Limitations and Future Direction

This study aimed to examine whether the number of options had an effect on the psy-

chological distance in the Likert scale by applying IRT theory to consider the appro-

priate number of options. The results of IRT analysis indicated that the number of

options had an effect on the response, especially in the 7-point scale. However, this

study assessed only one major psychological (personality) scale using the anchors

shown in Table 1. In addition, participants were all undergraduate students. Surveys

using other scales and in other populations are needed before the results can be

generalized.




Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship,

and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of

this article.

Note

1. When the item values before the conversion are set at x and those after the conversion are

set at y, y = 54x in the case of the 4-point scale and y = 5

7x in the case of the 7-point scale.

2. When using the 4-point scale (e.g. BF-N), the range (w; 3.575) of the difference from m1

(–1.787) to m4 (1.788) and gn = 4�13:575

mn. It is –1.500, –0.479, 0.479, and 1.500 from g1 in

order. Then xn is 1.000, 2.020, 2.978 and 4.000 from x1 in order.

References

Bendig, A. W. (1953). The reliability of self-ratings as a function of the amount of verbal

anchoring and the number of categories on the scale. Journal of Applied Psychology, 37,

38-41.

Bendig, A. W. (1954). Reliability and the number of rating scale categories. Journal of Applied

Psychology, 38, 38-40.

Boote, A. S. (1981). Reliability testing of psychographic scales: Five-point or seven-point?

Anchored or labeled? Journal of Advertising Research, 21, 53-60.

Brown, G., Wilding, R. E., & Coulter, R. L. (1991). Customer evaluation of retail salespeople

using the SOCO scale: A replication extension and application. Journal of the Academy of

Marketing Science, 9, 347-351.

Chang, L. (1994). A psychometric evaluation of four-point and six-point Likert-type scales in

relation to reliability and validity. Applied Psychological Measurement, 18, 205-215.

Cicchetti, D. V., Showalter, D., & Tyrer, P. J. (1985). The effect of number of rating scale

categories on levels of inter-rater reliability: A Monte-Carlo investigation. Applied

Psychological Measurement, 9, 31-36.

Garner, W. R. (1960). Rating scales, discriminability and information transmission.

Psychological Review, 67, 343-352.

Green, P. E., & Rao, V. R. (1970). Rating scales and information recovery: How many scales

and response categories to use? Journal of Marketing, 34, 33-39.

Komorita, S. S. (1963). Attitude content, intensity, and the neutral point on a Likert scale.

Journal of Social Psychology, 61, 327-334.

Lissitz, R. W., & Green, S. B. (1975). Effect of the number of scale points on reliability: A

Monte-Carlo approach. Journal of Applied Psychology, 60, 10-13.

Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale

items? Study 1: Reliability and validity. Educational and Psychological Measurement, 31,

657-674.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm.

Applied Psychological Measurement, 16, 159-176.

Wakita et al. 545



Muraki, E., & Bock, R. D. (2003). PARSCALE: Parameter Scaling of Rating Data [Computer

program]. Chicago, IL: Scientific Software.

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating

scales: Reliability, validity, discriminating power, and respondent preferences. Acta

Psychologia, 104, 1-15.

Oaster, T. R. F. (1989). Number of alternatives per choice point and stability of Likert-type

scales. Perceptual and Motor Skills, 68, 549-550.

Schuts, H. G., & Rucker, M. H. (1975). A comparison of variables configurations across scale

lengths: An empirical study. Educational and Psychological Measurement, 35, 319-324.

Wada, S. (1996). Construction of the Big Five Scales of personality trait terms and concurrent

validity with NPI. Japanese Journal of Psychology, 67, 61-17.

Wakita, T. (2004). The distance between categories in rating-scale method: Applying item

response model to the assessment process. Japanese Journal of Psychology, 75, 331-338.




Educational and Psychological Measurement · 2019. 1. 24. · 534 Educational and Psychological Measurement 72(4) Downloaded from epm.sagepub.com at University of Central Florida

Documents