A classification of response scale characteristics that affect data quality: a literature review Anna DeCastellarnau 1,2 Published online: 24 July 2017 Ó The Author(s) 2017. This article is an open access publication Abstract Quite a lot of research is available on the relationships between survey response scales’ characteristics and the quality of responses. However, it is often difficult to extract practical rules for questionnaire design from the wide and often mixed amount of empirical evidence. The aim of this study is to provide first a classification of the characteristics of response scales, mentioned in the literature, that should be considered when developing a scale, and second a summary of the main conclusions extracted from the literature regarding the impact these characteristics have on data quality. Thus, this paper provides an updated and detailed classification of the design decisions that matter in questionnaire development, and a summary of what is said in the literature about their impact on data quality. It distinguishes between characteristics that have been demonstrated to have an impact, characteristics for which the impact has not been found, and characteristics for which research is still needed to make a conclusion. Keywords Data quality Á Measurement error Á Literature review Á Response scale characteristics Á Classification 1 Introduction A challenge for questionnaire designers is to create survey measurement instruments (from now on called: survey questions) that capture the true responses from the population. To do so, they need to create survey questions that not only capture the theoretical concept under evaluation, but that also minimize the impact of their design characteristics on the quality of the responses. & Anna DeCastellarnau [email protected]1 RECSM-Universitat Pompeu Fabra, Ramon Trias Fargas, 25-27, Merce ` Rodoreda Building, 08003 Barcelona, Spain 2 Tilburg University, Tilburg, The Netherlands 123 Qual Quant (2018) 52:1523–1559 https://doi.org/10.1007/s11135-017-0533-4
37
Embed
A classification of response scale characteristics that ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A classification of response scale characteristicsthat affect data quality: a literature review
Anna DeCastellarnau1,2
Published online: 24 July 2017� The Author(s) 2017. This article is an open access publication
Abstract Quite a lot of research is available on the relationships between survey response
scales’ characteristics and the quality of responses. However, it is often difficult to extract
practical rules for questionnaire design from the wide and often mixed amount of empirical
evidence. The aim of this study is to provide first a classification of the characteristics of
response scales, mentioned in the literature, that should be considered when developing a
scale, and second a summary of the main conclusions extracted from the literature
regarding the impact these characteristics have on data quality. Thus, this paper provides
an updated and detailed classification of the design decisions that matter in questionnaire
development, and a summary of what is said in the literature about their impact on data
quality. It distinguishes between characteristics that have been demonstrated to have an
impact, characteristics for which the impact has not been found, and characteristics for
which research is still needed to make a conclusion.
Keywords Data quality � Measurement error � Literature review � Response
scale characteristics � Classification
1 Introduction
A challenge for questionnaire designers is to create survey measurement instruments (from
now on called: survey questions) that capture the true responses from the population. To do
so, they need to create survey questions that not only capture the theoretical concept under
evaluation, but that also minimize the impact of their design characteristics on the quality
than rating scales (Krosnick 1991; Krosnick and Berent 1993). When rating scales are
compared to continuous scales, like absolute metric scales or open-ended quantifiers,
evidence is mixed: continuous scales are more reliable in Saris and Gallhofer (2007), but in
Couper et al. (2001) and Miethe (1985) they provided higher item-nonresponse and lower
reliability, respectively, than rating scales, and no differences between the two have been
found on measurement quality by Koskey et al. (2013). Comparing rating to metric scales,
the second appeared less reliable and leading to higher item-nonresponse in the studies of
Cook et al. (2001), Couper et al. (2006) and Krosnick (1991), however, others find
comparable impact between the two (Alwin 2007; Funke and Reips 2012; McKelvie 1978).
Finally, Al Baghal (2014b) compares closed with open-ended quantifiers showing non-
significant differences on measurement quality.
Overall, the decision on type of scale to provide has an impact on data quality and
should be considered carefully when designing survey questions.
3.2.2 Scales’ length
The length of the scale is one of the key issues in scale development. As Krosnick and
Presser (2010, p. 269) say, ‘‘the length of scales can impact the process by which people
map their attitudes onto the response alternatives’’.
The minimum and maximum possible values are used to evaluate the length of con-
tinuous scales. This characteristic has been fairly studied. Reips and Funke (2008) argue
that differences on the length of metric scales may depend on the devices’ screen size and
resolution, while, Saris and Gallhofer (2007) find a significant effect of the maximum
possible value to answer in continuous scales on measurement quality.
The number of categories is used to evaluate the length of categorical scales. Among the
characteristics of categorical scales, the number of categories is one of the most studied
and complex design decisions: while a two-point scale allows only the assessment of the
direction of the attitude, a three-point scale with a midpoint allows the assessment of both
the direction and the neutrality, and even more categories allow the assessment of its
intensity or extremity. Furthermore, while too few categories can fail to discriminate
between respondents with different underlying opinions, too many categories may reduce
the clarity of the meaning of the options and limit the capacity of respondents to make clear
distinctions between them (Krosnick and Fabrigar 1997; Schaeffer and Presser 2003). The
results regarding its impact on data quality are mixed. Most evidence suggest using more
than 2-points to increase measurement quality (e.g. Andrews 1984). Some find evidence in
favour of using 5–7-points (Komorita and Graham 1965; Rodgers et al. 1992; Scherpenzeel
and Saris 1997). Others argue that options from 7 up to 10-points should be preferred
(Alwin and Krosnick 1991; Lundmark et al. 2016; Preston and Colman 2000). Some others
123
A classification of response scale characteristics that… 1531
argue that even more categories, i.e. 11-points, can provide better measurements (Alwin
1997; Revilla and Ochoa 2015; Saris and Gallhofer 2007). Finally, others do not find
differences across different number of points (Aiken 1983; Bendig 1954; Jacoby and
Matell 1971; Matell and Jacoby 1971; McKelvie 1978). More recently, research has looked
at the specific circumstances of the questions when evaluating the impact of the number of
points. Some find, when distinguishing between item-specific and agree–disagree scales,
that the quality does not improve for agree–disagree scales with more than 5-points
(Revilla et al. 2014; Weijters et al. 2010) and for item-specific it goes up between 7 and
11-points (Alwin and Krosnick 1991; Revilla and Ochoa 2015). Similarly, Alwin (2007)
argue that the optimal of points in a scale should be considered in relation to the scales’
polarity, and show that the use of 4-point scales improved the reliability in unipolar scales,
while 2, 3 and 5-point scales improved the reliability in bipolar scales.
This summary has clearly shown that the length of the scale is a characteristic to
consider.
3.3 The scales’ labels
3.3.1 Verbal labels
Verbal labels are words used as a reference to clarify the meanings of the different scale
points and its interval nature and reduce ambiguity (Alwin 2007; Krosnick and Presser
2010). Although it has been found that fully-labelling all points increases the cognitive
effort of reading and processing all options (Krosnick and Fabrigar 1997; Kunz 2015).
Studies about its effects on response style bias show that acquiescence is higher and
extreme responding is lower with fully-labelled scales (Eutsler and Lang 2015; Moors et al.
2014; Weijters et al. 2010). Other studies about its impact show, higher reliability of end-
points labelled scales compared to fully-labelled scales (Andrews 1984; Rodgers et al.
1992), while the majority show that labelling all points in the scale has a positive impact on
reliability (Alwin 2007; Alwin and Krosnick 1991; Krosnick and Berent 1993; Menold
et al. 2014; Saris and Gallhofer 2007). Thus, the impact on data quality is clear.
Usually a distinction between fully-labelled, partially-labelled and not at all labelled is
made. However, there are multiple ways to design a scale partially-labelled and these
should also be considered when assessing its effects on data quality. Thus, I propose the
following distinction to cover the possible design choices in surveys: scales not at all
labelled, only labelled at the end-points, labelled at the end- and the midpoints, labelled at
the end- and more points but not all, and fully-labelled.
3.3.2 Verbal labels’ information
Verbal labels can provide different lengths and amounts of information. The more infor-
mation is provided in the labels, the less information is needed in the request. Saris and
Gallhofer (2007) distinguish between short labels or complete sentences and conclude that
reliability improved when short labels instead of sentences are used. But still, more
research is needed to assess the impact of this characteristic on data quality.
The length of a label does not actually provide sufficient advice on how to design them.
For instance, even if using complete sentences may improve reliability are very long labels
still preferable? It is for this reason, that I belief what affects data quality may be the
amount of information provided in the label rather than its length. Thus, I propose the
following differentiation. Non-conceptual labels require a previous specification of the
123
1532 A. DeCastellarnau
type of measurement concept. For instance, the labels ‘‘Not at all’’ and ‘‘Completely’’
cannot be used without a previous specification of the concept like in the form of a
question: ‘‘How satisfied are you with your job?’’. Scales can otherwise provide conceptual
labels like ‘‘Not at all satisfied’’. Verbal labels can also provide information about the
object and/or the subject under evaluation. An example of objective label would be ‘‘Not at
all satisfied with my job’’, and of subjective label, ‘‘I am not at all satisfied’’. Finally, a full-
informative label would be ‘‘I am not at all satisfied with my job’’.
3.3.3 Quantifier labels
Two types of labels for closed quantifier scales can be distinguished. First, vague quantifier
labels which are known to be prone to different interpretations, e.g. ‘‘often’’ can mean
‘‘once a week’’ for a respondent and ‘‘once a day’’ for another (Pohl 1981; Saris and
Gallhofer 2014). In terms of its impact on data quality no clear conclusions can be
extracted so far: Al Baghal (2014b) show that measurement quality is not affected with
vague labels for closed quantifiers compared to open-ended responses, while Al Baghal
(2014a) find higher levels of validity than in open-ended scales. Second, closed-range (or
interval) quantifier labels, compared to vague quantifiers, are argued to be more precise
and less prone to different interpretations (Saris and Gallhofer 2014). However, when
providing closed-range quantifiers, respondents may use the frame of reference provided
by the scale in estimating their own behaviour (Schwarz et al. 1985). Selecting unbiased
ranges allowing respondents using the middle of the scale as a reference point is preferable
(Revilla 2015). More research is needed to shed light towards whether the use of vague or
closed-range quantifiers impacts or not data quality.
3.3.4 Fixed reference points
Fixed reference points are verbal labels used in a scale to prevent variations in the response
functions and set no doubt about the position of the reference point on the subjective mind
of the respondent (Saris 1988; Saris and Gallhofer 2014). For instance, the use of ‘‘always’’
and ‘‘never’’ can be fixed reference points on objective scales, and the words ‘‘not at all’’,
‘‘completely’’, ‘‘absolutely’’ and ‘‘extremely’’ for subjective scales. Usually, these are
provided at the end-points of a scale. However, with closed-range quantifiers usually all
labels are fixed reference points (e.g. ‘‘from 1 to 2 h’’), and in bipolar scales, the midpoint
alternative is also such. The use of fixed reference labels make the scale the same and
comparable for all respondents (Saris and De Rooij 1988). Moreover, it has been proved to
have a positive impact on improving measurements’ quality (Revilla and Ochoa 2015;
Saris and Gallhofer 2007), and that when fixed reference points are not provided,
respondents use different scales (Saris and De Rooij 1988).
3.3.5 Order of verbal labels
The ordering of verbal labels can be from negative (or passive)-to-positive (or active) or
from positive-to-negative. The order of the verbal labels is an important characteristic since
it provides an additional source of information to the respondents (Christian et al. 2007a).
Moreover, scales ordered form positive-to-negative tend to provide more quick responses,
which increases the chance that respondents do not processes all options consciously (Kunz
2015). Studies find that the order does impact measurement error and response style bias
123
A classification of response scale characteristics that… 1533
(Christian et al. 2007a, 2009; Krebs and Hoffmeyer-Zlotnik 2010; Saris and Gallhofer
2007; Scherpenzeel and Saris 1997).
3.3.6 Nonverbal labels
Nonverbal labels are numbers, letters or symbols instead of words attached to the options
in the scale. The most commonly used are numbers and symbols, e.g. radio and checkbox
buttons. Krosnick and Fabrigar (1997) suggest combining numerical and verbal labels.
Similarly, others suggest that numbers may help respondents to decide whether the scale is
supposed to be unipolar or bipolar (Schwarz et al. 1991; Tourangeau et al. 2007). However,
respondents may take longer to submit an answer when numerical labels are provided since
they are an additional source of information to process (Christian et al. 2009). Regarding
its effect on data quality: Moors et al. (2014) show that scales without numbers and only
verbal end-labels evoked more extreme responses than those with numbers, while Christian
et al. (2009) and Tourangeau et al. (2000) conclude that response style is unaffected by the
use or not of numbers in the scale. Thus, slightly more evidence points toward the fact that
the choice of nonverbal labels does not affect data quality.
3.3.7 Order of numerical labels
Order of numerical labels can be from low-to-high or from high-to-low. From the few
studies about its impact on response style that have been found, two of them conclude that,
when negative numerical labels are provided compared to when all numbers are positive,
the differences in the response distributions are significant (Schwarz et al. 1991; Tour-
angeau et al. 2007), while Reips (2002) concludes that it does not influence the answering
behaviour of participants.
Since there is no classification, I propose the following distinction to account for the
different choices in surveys: numerical labels ordered from negative-to-positive, from
positive-to-negative, from 0-to-positive, from 0-to-negative, from positive-to-0, from
negative-to-0, from 1 (or higher)-to-positive or from positive-to-1 (or higher).
3.3.8 Correspondence between numerical and verbal labels
The order of numerical labels is of special relevance when these are combined with verbal
labels. Correspondence between numerical and verbal labels refers to the extent to which
the order of numerical labels matches with the order of verbal labels. Numerical labels
should reinforce the meaning and the polarity of verbal labels (Krosnick 1999; Krosnick
and Fabrigar 1997; O’Muircheartaigh et al. 1995; Schaeffer 1991; Schwarz et al. 1991).
However, it should be considered that a more negative connotation is given to the label
related to a negative number (Amoo and Friedman 2001; Schwarz and Hippler 1995).
Following Saris and Gallhofer (2007) the level of correspondence is classified into: high
correspondence which refers to combinations of numerical and verbal labels that match
perfectly, e.g. a bipolar scale where numbers are ordered from -5 to ?5 and verbal labels
range from ‘‘Extremely bad’’ to ‘‘Extremely good’’ or a unipolar scale where numbers
range from 0 to 10 and labels from ‘‘Not at all’’ to ‘‘Completely’’; low correspondence
which refers to combinations where the lower numbers are related to positive verbal labels
or vice versa, e.g. a scale numbered from 0 to 10 and labelled from ‘‘Good’’ to ‘‘Bad’’; and
medium correspondence which refers to any other combination of numerical and verbal
123
1534 A. DeCastellarnau
labels that matches the order of the labels: negative/low and positive/high but not perfectly.
Among the little amount of empirical evidence found, only one study concludes that low
correspondence do not impact the distribution of responses (Christian et al. 2007a), while
two conclude that reliability improves with high correspondence between the verbal and
the numerical labels in the scale (Rammstedt and Krebs 2007; Saris and Gallhofer 2007),
i.e. there is an impact.
3.3.9 Scales’ symmetry
Symmetry is a specific characteristic of bipolar scales. Symmetric scales assure that the
number of labels in bipolar scales is the same in the positive and in the negative side.
Asymmetric scales assume previous knowledge about the population, otherwise it would be
biased (Saris and Gallhofer 2014). However, its impact on measurement error is not clear:
while Scherpenzeel and Saris (1997), for symmetric scales, find no effect (or very little) on
reliability and validity, Saris and Gallhofer (2007) find a positive effect.
3.3.10 Neutral alternative
Neutral alternative is also a characteristic of bipolar scales, where the respondents are not
forced to make a choice in a specific direction. Neutral alternatives can be provided
implicitly or explicitly. Explicit neutral alternatives are usually labelled such as ‘‘neither A
nor B’’, while implicit neutral alternatives do not need to be labelled to understand its
implicit neutral connotation, i.e. a bipolar scale with an uneven number of points, the
midpoint will be considered neutral even if it is not labelled. Some argue that providing a
neutral alternative can increase the risk of survey satisficing (Bishop 1987; Kulas and
Stachowski 2009). Others argue that not providing a neutral point forces respondents to
select an option which do not reflect the true attitudinal position (Saris and Gallhofer 2014;
Sturgis et al. 2014). Finally, Tourangeau et al. (2004) argue that the neutral point in a scale
can be interpreted as the most typical and use it to make relative judgements. Regarding the
impact on response styles, studies find that including a neutral point increases acquiescence
and lowers the propensity towards extreme responding (Schuman and Presser 1981;
Weijters et al. 2010). In terms of its impact on measurements’ quality, most evidence
suggest that providing the neutral impacts measurement quality (Alwin and Krosnick 1991;
Malhotra et al. 2009; Saris and Gallhofer 2007; Scherpenzeel and Saris 1997). Only
Andrews (1984) finds that the effect was very small.
3.3.11 ‘‘Don’t know’’ option
‘‘Don’t know’’ (or ‘‘No opinion’’) option is a non-substantive response alternative. These
can also be implicit or explicit. An implicit ‘‘don’t know’’ option is an admissible answer
not explicitly provided to the respondent, which requires an interviewer to record it. An
explicit ‘‘don’t know’’ option can be directly provided as a different response alternative to
the respondent. Providing an explicit ‘‘don’t know’’ option depends on whether researchers
believe that respondents truly have no opinion on the issue in question (Dolnicar 2013;
Kunz 2015). However, many authors argue that when the ‘‘don’t know’’ is provided this
leads to incomplete, less valid and less informative data (Alwin and Krosnick 1991;
Gilljam and Granberg 1993; Krosnick et al. 2002, 2005; Saris and Gallhofer 2014).
Whether providing explicitly or implicitly a ‘‘don’t know’’ option impacts data quality is
123
A classification of response scale characteristics that… 1535
not clear: some authors show that providing it explicitly impacts data quality (Andrews
1984; De Leeuw et al. 2016; McClendon 1991; Rodgers et al. 1992), while others conclude
that there is no support towards this impact (Alwin 2007; McClendon and Alwin 1993;
Saris and Gallhofer 2007; Scherpenzeel and Saris 1997).
3.4 The scales’ visual presentation
3.4.1 Types of visual response requirement
The type of visual presentation requires from the respondent higher or lower effort when
responding. Following are the different types of visual response requirements distinguished
in the literature: (1) point-selection is the most standard way to present scales, either a
continuous line or categorical options are provided from which the respondent should point
and select the desired choice; (2) slider is a type of linear implementation in which the
respondent should move a marker to give a rating; (3) text-box input is a typing space
where respondents can type in their answer; (4) drop-down menu shows the list of response
options after clicking on the rectangular box, i.e. before clicking the respondent do not see
the whole list of options and sometimes respondents have to scroll down to select the most
desired option; and (5) drag-and-drop refer to the technique where respondents need to
drag an element (e.g. the item or the response) to the desired position.
Comparing point-selection to sliders, the first are less demanding but also less fun and
engaging (Funke et al. 2011; Roster et al. 2015). In this line, Cook et al. (2001) and Roster
et al. (2015) compare sliders with radio buttons and find non-significant differences on
reliability or item-nonresponse, respectively. The use of box format is closer to how
questions are asked on the telephone, and do not provide a clear sense of the range of the
options (Buskirk et al. 2015; Christian et al. 2009). Comparing the use of text-box input
with the use of point-selection or sliders, some demonstrate that item-nonresponse and
response style and are comparable across the three types (Christian et al. 2007b), while
others show that there is an impact on item-nonresponse and response style between the
three (Buskirk et al. 2015; Christian et al. 2009; Couper et al. 2006). Christian et al.
(2007b) argue that drop-down menus are more cumbersome than text-box input when large
number of options are listed. In this line, other authors argue that drop-down menus are
more burdensome to respondents because they require an added effort to click and scroll
(Couper et al. 2004; Dillman and Bowker 2001; De Leeuw et al. 2008; Reips 2002). Liu
and Conrad (2016) compare drop-down menus with sliders or text-box input and find that
item-nonresponse was non-significantly different. Similarly, when drop-down menus are
compared to point-selection comparable results in terms of response style and item-non-
response are found (Couper et al. 2004; Reips 2002). Finally, drag-and-drop provides
higher item-nonresponse compared to point-selection and it is argued to prevent systematic
response tendencies since respondent need more time to process what is the task they are
required to do (Kunz 2015).
Overall, the evidence provided by these studies suggests that there is no impact on data
quality depending on the type of visual response requirement.
3.4.2 Sliders’ marker position
Slider marker position is a specific characteristic of sliders. Markers can be placed at the
top- or left-side, at the bottom- or right-side, at the middle or outside of a slider. A
challenge when designing an slider is how to handle the starting position of the marker and
123
1536 A. DeCastellarnau
identify non-respondents (Funke 2016). The impact of this characteristic on measurements’
error is not yet clear, since only one study looks at its effect on data quality and finds that
higher nonresponse and higher response style bias occurred when the marker position was
at the middle or the right-side of the slider compared to when the marker was placed at the
left-side (Buskirk et al. 2015).
3.4.3 Scales’ illustrative format
Sometimes scales are presented using an illustrative format instead of using the traditional
scales. Usual illustrative formats are ladders (or pyramids), to indicate levels of some
aspect, and thermometers, to indicate degrees of feelings. Other illustrative formats can be
clocks to indicate the timing of things, or dials to enter numerical values. The use of these
types of scales usually require lengthy introductions and not all points can be labelled, but
are useful to visually provide numerical scales with many points (Alwin 2007; Krosnick
and Presser 2010; Sudman and Bradburn 1983). The few studies available suggest that this
characteristic has an impact on data quality: thermometer scales provide less measurement
quality than ladders or radio button scales (Andrews and Withey 1976; Krosnick 1991),
ladder scales provide better measurement quality than traditional scales (Levin and Currie
2014) but lower validity compared to other illustrative formats (Andrews and Crandall
1975), and responses are significantly different whether a pyramid or an onion format are
used (Schwarz et al. 1998).
3.4.4 Scales’ layout display
The scales’ layout display of the answer options can be horizontal, vertical or nonlinear.
Nonlinear scales can provide, for instance, the answer options on different columns.
Tourangeau et al. (2004, p. 372) argue that respondents usually expect, in vertically oriented
scales, the positive points to appear first at the top. However, Toepoel et al. (2009, p. 522)
argue that respondents read more naturally in a horizontal format. Two studies looked at the
effect of scales’ layout display on response styles but they both find that whether presenting
the scales in an horizontal, vertical or nonlinear layout provided significant differences on
the responses (Christian et al. 2009; Toepoel et al. 2009), i.e. it has an impact.
3.4.5 Overlap between verbal and numerical labels
Overlap between labels is a characteristic considered by Saris and Gallhofer (2014) for
which no relevance has been found while reviewing the literature. This characteristic
intends to indicate whether the verbal labels used in a horizontal scale are clearly con-
nected to one nonverbal label or they overlap with several of them. More research is
needed on this characteristic to assess whether it is or not relevant to consider when
designing visually presented scales.
3.4.6 Labels’ visual separation
Labels can be visually separated by adding more space between them, separating lines or
the options in boxes. The aim of this is to provide a visual distinction between the labels in
the scale. For instance, researchers may be interested in visually separating the ‘‘don’t
know’’ option from the substantive responses to make a clear differentiation. However,
123
A classification of response scale characteristics that… 1537
Christian et al. (2009) and Tourangeau et al. (2004) argue that visually separating some of
the labels may encourage respondents to select it more often. The impact on data quality is
clear: De Leeuw et al. (2016) show that by separating the non-substantive option reduces
item-nonresponse and provides higher reliability, Christian et al. (2009) and Tourangeau
et al. (2004) show that separating the non-substantive option lead to significant differences
on the responses while it do not happen when the midpoint is separated.
The current distinction in Saris and Gallhofer (2014) is whether the labels are separated
within different boxes or not. However, given that I found more choices in the literature, I
propose to distinguish between visually separating the non-substantive option, the neutral
option, the end-points, all points or none of the points in the scale.
3.4.7 Labels’ illustrative images
Illustrative nonverbal labels can be used instead of or in combination with verbal and
numerical labels when they are provided visually to the respondent. Usual illustrative
labels are: feeling faces (also called smileys) which attach images of different face
expressions (e.g. from sad to happy). They are easy to format and they attract the attention
of the respondents (Emde and Fuchs 2013). Moreover, they have the advantage of being
easier to identify by respondents than verbal labels because they eliminate the barrier of
mapping feelings into words (Kunin 1998). Its effect on data quality indicate that there is
no impact: while Derham (2011) shows that nonresponse is significantly higher in faces
scales compared to sliders and point-selection scales, Andrews and Crandall (1975), Emde
and Fuchs (2013) show that the differences in the responses between smiley scales and
radio button are non-significant.
For the sake of completeness and to capture the different formats found in the literature
I propose to distinguish two other types labels’ illustrative images: other human symbols,
like thumbs and manikins, and other nonhuman symbols, like stars or harts.
4 Conclusions
This paper provides a complete and updated classification of the characteristics and its
possible design choices considered in the literature when designing forced-choice, closed
and ordinal response scales. This classification has been summarized in Table 1 together
with the main conclusion of the literature review, which indicate whether evidence has
been shown in the literature of each characteristics’ impact on data quality.
Three main limitations of this study should be kept in mind: First, to assess whether
there is an impact or not on data quality, I did not consider the different sample sizes or the
power of the studies. I considered the absolute amount of studies. Further research, could
provide weights to the different studies. Second, it is likely that publication bias in favour
of studies which found an effect of a certain characteristic is present, i.e. the number of
characteristics which have an impact may be overestimated. Third, I did not aim to provide
information to improve the design of response scales. Thus, the results on the impact are
provided independently of its positive or negative effect.
From Table 1 the following main conclusions can be extracted:
1. 11 characteristics have an impact on data quality: the scales’ evaluative dimension, the
type of scale, the length of the scales, the use of verbal labels, the use of fixed
reference points, the order of numerical labels, the correspondence between numerical
123
1538 A. DeCastellarnau
and verbal labels, the use of a neutral alternative, the scales’ illustrative format, the
visual layout display of the scales, and the labels’ visual separation.
2. 4 characteristics do not have an impact on data quality: the order of the verbal labels,
the use of nonverbal labels, the type of visual response requirement, and the labels’
illustrative images.
3. Further research is needed for 8 characteristics: to know whether the scales’ polarity,
the agreement between concept and the scale’s polarity, the information provided by
verbal labels, the quantifier labels, the scales’ symmetry, the use of a ‘‘don’t know’’
option, the slider marker position, and the overlap between verbal and numerical labels
have or not an impact on data quality.
What is clear from the large body of research presented here and its often mixed results is
that characteristics interact with each other, e.g. usually scales with more points are par-
tially labelled. Thus, researchers should account for the effects driven by the overall design
of the survey question, when assessing how to optimally decide upon a characteristic. That
is in line to what Cox III (1980, p. 418) already concluded for the optimal number of
categories: ‘‘there is no single number of response alternatives for a scale which is
appropriate under all circumstances’’.
The results presented in this paper provide on the one hand a source for researchers that
want a complete list of characteristics and its possible design choices for closed and ordinal
scales, and on the other hand, a detailed summary of the literature that refer to the impact
of each characteristic on data quality.
Finally, further research should provide the same summary for other characteristics
related to the design of survey questions, such as the design of the request for an answer or
the overall visual presentation of the survey question.
Acknowledgements I would also like to show my gratitude to Melanie Revilla, Wiebke Weber and WillemE. Saris for their fruitful comments and feedback on an earlier version of the manuscript, although any errorsare my own and should not tarnish the reputations of these esteemed persons.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Inter-national License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,and reproduction in any medium, provided you give appropriate credit to the original author(s) and thesource, provide a link to the Creative Commons license, and indicate if changes were made.
Appendix
See Tables 2 and 3.
Table 2 Saris and Gallhofer’s list of response scale characteristics and choices
Characteristics Design choices
Response scale: basic choice More than 2 categories scale
Two-category scale
Numerical open-ended scale
Magnitude estimation
Line production
More steps procedures
Number of categories (categorical) [Enter value]
Maximum possible value (continuous) [Enter value]
123
A classification of response scale characteristics that… 1539
Labels with short text or complete sentences Short text
Complete sentences
Order verbal labels First label negative
First label positive
Correspondence between numerical and verbal labels High correspondence
Medium correspondence
Low correspondence
Range of the used scale Bipolar
Unipolar
Range correspondence Both bipolar
Both unipolar
Concept bipolar/Scale unipolar
Symmetry of response scale Symmetric
Asymmetric
Neutral category Present
Absent
Number of fixed reference points [Enter value]
‘‘Don’t know’’ option Present
Only registered
Absent
Horizontal or vertical scale Horizontal
Vertical
Overlap between verbal and numerical labels Present
Text clearly connected to categories
Numbers or letters before answer categories Numbers
Letters
Neither
Scale with only numbers or numbers in boxes In boxes
Not in boxes
123
1540 A. DeCastellarnau
Table
3L
iter
atu
rere
vie
wsu
mm
ary
of
fin
din
gs
by
theo
reti
cal
and
empir
ical
arg
um
enta
tio
ns
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Ch
ara
cter
isti
cso
fth
ere
spo
nse
sca
les’
con
cep
tua
liza
tio
n
Sca
les’
eval
uat
ive
dim
ensi
on
Ag
ree–
dis
agre
e(A
D)
Item
-sp
ecifi
c(I
S)
(Bro
wn
20
04):
AD
scal
esar
ecl
eare
rto
inte
rpre
tth
anv
agu
eo
rcl
ose
d-r
ange
qu
anti
fier
scal
es(K
rosn
ick
19
99):
peo
ple
sim
ply
cho
ose
toag
ree
bec
ause
itse
ems
lik
eth
eco
mm
and
edan
dp
oli
teac
tio
nto
tak
e(K
rosn
ick
etal
.2
00
5):
toel
imin
ate
acq
uie
scen
ceav
oid
AD
scal
es(K
un
z2
01
5):
AD
scal
esar
em
ore
dif
ficu
ltto
un
der
stan
dan
dm
apth
eap
pro
pri
ate
jud
gem
ent
(Sar
iset
al.
20
10):
AD
more
acqu
iesc
ence
bec
ause
of
its
usu
alpre
senta
tion
inbat
teri
es(S
chae
ffer
and
Pre
sser
20
03):
AD
sim
ple
rto
con
duct
(Alw
in2
00
7):
the
reli
abil
ity
of
AD
scal
esis
low
erco
mp
ared
toIS
scal
es[W
iley
–W
iley
reli
abil
ity
]?
YE
S(B
illi
etan
dM
cCle
ndo
n2
00
0):
Acq
uie
scen
ceis
fou
nd
inA
Dsc
ales
[Acq
uie
scen
ceb
ias
thro
ug
hS
EM
fact
or]?
YE
S(K
rosn
ick
19
91):
AD
scal
esle
adto
low
erre
liab
ilit
ies
than
IS[P
ears
on
pro
duct
-mo
men
tte
st-r
est
corr
elat
ions]
?Y
ES
(Rev
illa
and
Och
oa
20
15):
AD
scal
esh
ave
much
low
erq
ual
ity
than
IS[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity]?
YE
S(S
aris
and
Gal
lho
fer
20
14):
AD
scal
esh
ave
low
erq
ual
ity
than
IS[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity]?
YE
S(S
aris
etal
.2
01
0):
ISsc
ales
hav
eh
igh
erq
ual
ity
than
AD
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
YE
S
Sca
les’
po
lari
tyB
ipo
lar
Un
ipo
lar
(Kun
z2
01
5):
ad
isad
van
tag
eo
fb
ipo
lar
scal
esis
that
resp
on
den
tsar
ere
luct
ant
toch
oo
sen
egat
ive
resp
on
ses
(Alw
in2
00
7):
un
ipo
lar
scal
esh
ave
som
ewh
ath
igh
erre
liab
ilit
ies
than
bip
ola
rsc
ales
[Wil
ey–
Wil
eyre
liab
ilit
y]?
YE
S
Conce
pt-
Sca
lepola
rity
agre
emen
tB
oth
bip
ola
rB
oth
un
ipo
lar
Bip
ola
rco
nce
pt
wit
hU
nip
ola
rsc
ale
Un
ipo
lar
con
cep
tw
ith
Bip
ola
rsc
ale
(Ro
ssit
er2
01
1):
no
td
isti
ng
uis
hb
etw
een
un
ipo
lar
and
bip
ola
rle
ads
tost
up
idm
isin
terp
reta
tions;
unip
ola
rat
trib
ute
ssh
ould
not
be
mea
sure
dw
ith
bip
ola
rsc
ales
(Sar
isan
dG
allh
ofe
r2
00
7):
the
imp
act
of
usi
ng
un
ipo
lar
scal
esfo
rb
ipo
lar
con
cep
tsis
no
tsi
gn
ifica
ntl
ylo
wer
ing
reli
abil
ity
and
incr
easi
ng
val
idit
y[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity]?
NO
(van
Do
orn
etal
.1
98
2):
dif
fere
nce
sin
the
resp
onse
dis
trib
uti
ons
are
clea
r[R
esponse
style
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on]?
YE
S
123
A classification of response scale characteristics that… 1541
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Ch
ara
cter
isti
cso
fth
ety
pe
of
resp
on
sesc
ale
an
dit
sle
ng
th
Ty
pe
of
resp
on
sesc
ales
Ab
solu
teo
pen
-en
ded
qu
anti
fier
Rel
ativ
eo
pen
-en
ded
qu
anti
fier
Rel
ativ
em
etri
cA
bso
lute
met
ric
Dic
hoto
mo
us
Rat
ing
Clo
sed
qu
anti
fier
sB
ran
chin
g
(Hje
rmst
adet
al.
20
11):
met
ric
scal
esar
eco
mp
arab
leto
cate
go
rica
lsc
ales
;th
ety
pe
of
scal
eis
no
tth
em
ost
imp
ort
ant
bu
tth
eco
nd
itio
ns
rela
ted
toth
em(K
rosn
ick
etal
.2
00
5):
dic
ho
tom
ou
ssc
ales
are
clea
rer
inm
ean
ing
and
req
uir
ele
ssin
terp
reta
tiv
eef
fort
sw
hic
hca
nh
arm
consi
sten
cyco
mpar
edto
rati
ng
scal
es(K
rosn
ick
and
Fab
rig
ar1
99
7):
rela
tiv
eo
pen
-en
ded
scal
es(o
rm
agn
itu
de
scal
ing
)ar
ea
dif
ficu
ltm
eth
od
toad
min
iste
rw
hic
ho
nly
revea
lsra
tios
among
stim
uli
and
not
abso
lute
jud
gm
ents
(Liu
and
Con
rad
20
16):
resp
on
den
tsar
em
ore
lik
ely
top
rov
ide
rou
nded
answ
ers
in1
01
met
ric
scal
es,
asan
easy
way
ou
t(R
evil
la2
01
5):
the
clo
sed
-ran
ge
qu
anti
fier
lab
els
pro
vid
edca
nin
flu
ence
thei
rre
sult
sif
they
do
no
tre
pre
sen
tth
ep
op
ula
tio
nd
istr
ibu
tio
n(S
aris
and
Gal
lho
fer
20
14):
lin
ep
rod
uct
ion
(or
rela
tiv
em
etri
c)sc
ales
are
bet
ter
than
rela
tiv
eo
pen
-en
ded
qu
anti
fier
sb
ecau
sero
un
din
gis
avo
ided
(Sch
aeff
eran
dB
rad
bu
rn1
98
9):
mag
nit
ude
esti
mat
es(o
rre
lati
ve
open
-ended
quan
tifi
ers)
hav
ep
rob
lem
sre
late
dto
the
app
ropri
ate
stan
dar
dan
dre
codin
gin
toca
tegori
cal
dis
tin
ctio
ns
(Sch
aeff
eran
dP
ress
er2
00
3):
bra
nch
ing
has
the
adv
anta
ge
top
rov
ide
larg
en
um
ber
of
cate
gori
esnot
vis
ual
ly
(Al
Bag
hal
20
14
a):
nu
mer
ical
op
enen
ded
are
asac
cura
teas
vag
ue-
clo
sed
op
tio
ns
[Ran
k-o
rder
corr
elat
ions
and
regre
ssio
nsl
opes
]?
NO
(Alw
in2
00
7):
rati
ng
scal
esh
ave
hig
her
reli
abil
itie
sth
and
icho
tom
ou
sb
ut
com
par
able
tom
etri
csc
ales
[Wil
ey–
Wil
eyre
liab
ilit
y]?
YE
S(C
oo
ket
al.2
00
1):
met
ric
scal
ele
ssre
liab
leth
anra
dio
bu
tto
n[S
core
reli
abil
ity
]?
YE
S(C
ou
per
etal
.2
00
6):
met
ric
scal
essu
ffer
more
mis
sing
dat
ath
anca
tegori
cal
or
open
-ended
qu
anti
fier
[Ite
m-n
on
resp
on
se]?
YE
S(F
unk
ean
dR
eip
s2
01
2):
met
ric
scal
esar
eco
mp
arab
leto
5p
scal
eso
nit
em-n
on
resp
on
se[I
tem
-no
nre
spo
nse
]?
NO
(Ko
skey
etal
.2
01
3):
abso
lute
op
en-e
nded
scal
esar
eco
mp
arab
leto
rati
ng
scal
eso
nre
liab
ilit
y[C
ram
er’s
Vre
liab
ilit
y]?
NO
(Kro
snic
k1
99
1):
met
ric
scal
esh
ave
low
erre
liab
ilit
yth
anra
tin
gsc
ales
;lo
wer
reli
abil
itie
sw
hen
usi
ng
dic
ho
tom
ou
ssc
ales
;b
ran
chin
gp
rov
ides
hig
her
reli
abil
itie
sth
anra
tin
gsc
ales
[Pea
rson
pro
duct
-mom
ent
test
–re
test
corr
elat
ions]
?Y
ES
(Kro
snic
kan
dB
eren
t1
99
3):
bra
nch
ing
imp
rov
esre
liab
ilit
yco
mp
ared
ton
ob
ran
chin
g(r
atin
gsc
ale)
[Ite
mre
liab
ilit
y]?
YE
S(L
iuan
dC
on
rad
20
16):
no
n-s
ign
ifica
nt
dif
fere
nce
so
nit
em-n
on
resp
on
seb
etw
een
abso
lute
op
enen
ded
,ra
ting
scal
eo
rm
etri
c[I
tem
-no
nre
spo
nse
]?
NO
123
1542 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
(Sch
war
zet
al.
19
85):
clo
sed-r
ange
info
rms
the
resp
onden
tab
out
the
rese
arch
erex
pec
tati
ons
and
add
ssy
stem
atic
bia
sin
resp
on
den
t’s
rep
ort
san
dre
late
dju
dg
emen
tsco
mp
ared
toab
solu
teo
pen
-en
ded
form
ats
(Su
dm
anan
dB
rad
bu
rn1
98
3):
bet
ter
use
op
enq
uan
tifi
ers
than
clo
sed
qu
anti
fier
sfo
rnum
eric
alan
swer
sto
avoid
mis
lead
ing
the
resp
on
den
t(T
ou
ran
gea
uet
al.
20
00):
rou
nd
answ
ers
ino
pen
-en
ded
qu
anti
fier
sm
ayb
ea
sig
nal
of
the
un
wil
lin
gn
ess
toco
me
up
wit
ha
mo
reex
act
answ
eran
din
troduce
syst
emat
icbia
s,in
con
tin
uo
us
scal
es
(Lu
ndm
ark
etal
.2
01
6):
dic
ho
tom
ou
sle
ssv
alid
than
rati
ng
scal
es[C
oncu
rren
tv
alid
ity
]?
YE
S(M
cKel
vie
19
78):
no
dif
fere
nce
on
reli
abil
ity
or
val
idit
yb
etw
een
met
ric
and
rati
ng
scal
e[T
est
rete
stre
liab
ilit
yan
dT
est
val
idit
y]?
NO
(Mie
the
19
85):
mag
nit
ud
esc
alin
gle
sscr
edib
lein
term
so
fre
liab
ilit
yco
mp
ared
tora
ting
scal
es[T
est–
rete
stre
liab
ilit
y]?
YE
S(P
rest
on
and
Colm
an2
00
0):
2p
scal
esle
ssre
liab
lean
dv
alid
[Tes
tre
test
reli
abil
ity
,C
ronbac
hal
pha
and
Cri
teri
on
val
idit
y]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
op
en-e
nded
qu
anti
fier
san
dm
etri
csc
ales
hav
esi
gn
ifica
ntl
yh
igh
erre
liab
ilit
yb
ut
low
erv
alid
ity
than
rati
ng
scal
es[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity
]?
YE
S
Res
po
nse
scal
es’
len
gth
Min
imu
mp
oss
ible
val
ue
Max
imu
mp
oss
ible
val
ue
Num
ber
of
cate
gori
es
(Alw
in2
00
7):
the
op
tim
aln
um
ber
of
po
ints
ina
scal
esh
ould
be
tak
enin
toco
nsi
der
atio
nin
rela
tio
nto
the
po
lari
tyo
fth
esc
ale
(Co
xII
I1
98
0):
ther
eis
no
sin
gle
nu
mb
ero
fre
spo
nse
alte
rnat
ives
for
asc
ale
wh
ich
isap
pro
pri
ate
un
der
all
circ
um
stan
ces
(Kro
snic
kan
dF
abri
gar
19
97):
op
tim
alis
aco
mple
xdec
isio
nto
few
cate
gori
esm
ayco
mp
rom
ise
the
info
rmat
ion
gat
her
ed,
too
lon
gco
mp
rom
ises
the
clar
ity
of
mea
nin
g(R
eip
san
dF
un
ke
20
08):
op
tim
alle
ng
tho
fco
nti
nuo
us
scal
esd
epen
ds
on
the
size
of
the
dev
ice
scre
en
(Aik
en1
98
3):
reli
abil
itie
sre
mai
ned
con
stan
tdes
pit
ech
angin
gth
enum
ber
of
cate
gori
es[I
nte
rnal
consi
sten
cyre
liab
ilit
y]?
NO
(Alw
in1
99
7):
11
psc
ales
mo
rere
liab
leth
an7
p[T
rue
Sco
reM
TM
Mre
liab
ilit
y]?
YE
S(A
lwin
20
07):
the
use
of
4p
scal
esim
pro
ves
reli
abil
ity
inu
nip
ola
rsc
ales
,w
hil
eth
ere
liab
ilit
yin
bip
ola
rsc
ales
ish
igh
erfo
r2
,3
and
5p
and
low
est
for
7p.
[Wil
ey–W
iley
reli
abil
ity
]?
YE
S(A
lwin
and
Kro
snic
k1
99
1):
no
dif
fere
nce
sb
etw
een
AD
wit
h2
and
5p
,IS
reli
abil
ity
incr
ease
sfr
om
3to
9p
,b
ut
no
dif
fere
nce
sb
etw
een
7to
9p
[Pro
po
rtio
no
fv
aria
nce
attr
ibu
ted
totr
ue
atti
tud
es]?
YE
S
123
A classification of response scale characteristics that… 1543
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
(Sch
aeff
eran
dP
ress
er2
00
3):
more
cate
go
ries
com
pro
mis
ed
iscr
imin
atio
nan
dli
mit
the
capac
ity
of
resp
onden
tsto
mak
efi
ner
dis
tin
ctio
ns
bet
wee
nth
eo
pti
ons
(An
dre
ws
19
84).
Th
eb
igg
est
effe
cto
nd
ata
qual
ity.
More
cate
gori
esbet
ter.
3p
isw
ors
eth
an2
p[M
TM
Mv
alid
ity
,m
eth
od
effe
ctan
dre
sid
ual
erro
r]?
YE
S(B
endig
19
54):
reli
abil
ity
ind
epen
den
to
fth
enum
ber
of
scal
eca
tegori
es[T
est
reli
abil
ity
]?
NO
(Jac
ob
yan
dM
atel
l1
97
1):
reli
abil
ity
and
val
idit
yar
ein
dep
end
ent
of
the
nu
mb
ero
fp
oin
ts[T
est
rete
stre
liab
ilit
y,
con
curr
ent
val
idit
yan
dp
red
icti
ve
val
idit
y]?
NO
(Ko
mo
rita
and
Gra
ham
19
65):
reli
abil
ity
incr
ease
sw
ith
the
nu
mb
ero
fp
oin
tsu
pto
6p
[Cro
nb
ach
alp
ha]
?Y
ES
(Lu
ndm
ark
etal
.2
01
6):
val
idit
yh
igh
erin
7p
and
11
pp
oin
tsth
an2
p[C
oncu
rren
tv
alid
ity
]?
YE
S(M
atel
lan
dJa
cob
y1
97
1):
reli
abil
ity
indep
enden
to
fth
enum
ber
of
poin
ts[I
nte
rnal
consi
sten
cyan
dT
est
rete
stre
liab
ilit
y]?
NO
(McK
elv
ie1
97
8):
val
idit
yis
slig
htl
yb
ette
ro
n7
pra
ther
than
11
p,
reli
abil
ity
un
affe
cted
scal
e[T
est
rete
stre
liab
ilit
yan
dT
est
val
idit
y]?
NO
(Pre
ston
and
Colm
an2
00
0):
reli
abil
ity
low
erfo
r2
,3
,4
p,
hig
her
for
7,
8,
9,
10
p,
dec
reas
esw
ith
more
than
10
p[T
est–
rete
stre
liab
ilit
y]?
YE
S(R
evil
laan
dO
cho
a2
01
5):
11p
affe
cts
posi
tivel
yth
eq
ual
ity
of
ISsc
ales
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
YE
S(R
evil
laet
al.
20
14):
qu
alit
yd
oes
no
tim
pro
ve
wit
hm
ore
than
5p
for
AD
scal
es[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity
]?
YE
S
123
1544 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
(Ro
dg
ers
etal
.1
99
2):
the
nu
mb
ero
fp
oin
tsh
asth
eb
igg
est
effe
cto
nv
alid
ity;
use
atle
ast
5to
7p
,b
ette
rq
ual
ity
[MT
MM
con
stru
ctv
alid
ity
]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
reli
abil
ity
can
be
impro
ved
by
usi
ng
more
cate
gori
es(1
1p)
wit
ho
ut
dec
reas
ing
val
idit
y;
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
the
max
imu
mv
alue
of
aco
nti
nu
ou
ssc
ale
has
asi
gn
ifica
nt
effe
cto
nre
liab
ilit
yo
rv
alid
ity
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
YE
S(S
cher
pen
zeel
and
Sar
is1
99
7):
hig
hes
tv
alid
ity
wit
h4
,5
or
7p
[Tru
e-sc
ore
MT
MM
val
idit
y]?
YE
S(W
eijt
ers
etal
.2
01
0):
5A
Dp
oin
tsre
du
ces
extr
eme
resp
on
sest
yle
[Ex
trem
eR
espo
nse
Sty
leth
roug
hlo
go
dd
s]?
YE
S
Ch
ara
cter
isti
cso
fth
ere
spo
nse
sca
les’
label
s
Ver
bal
lab
els
Fu
lly-l
abel
led
En
d-p
oin
tsan
dm
ore
po
ints
lab
elle
dE
nd
and
mid
poin
tsla
bel
led
En
d-p
oin
tso
nly
lab
elle
dN
ot
lab
elle
d
(Alw
in2007):
label
sre
duce
ambig
uit
yin
tran
slat
ing
subje
ctiv
ere
sponse
sto
scal
es’
opti
ons
(Kro
snic
kan
dF
abri
gar
1997):
ver
bal
label
ssu
ffer
from
languag
eam
big
uit
yan
dar
em
ore
com
ple
xto
hold
inm
emory
,la
bel
only
the
endpoin
tsar
ele
ssco
gnit
ivel
ydem
andin
gth
anfu
lly
label
ling;
ver
bal
label
sar
em
ore
nat
ura
lfo
rmof
expre
ssio
nth
annum
ber
san
dla
bel
ling
all
poin
tsca
nhel
pto
clar
ify
the
mea
nin
gof
num
ber
s
(Alw
in2
00
7):
full
yla
bel
led
incr
ease
sre
liab
ilit
ysi
gn
ifica
ntl
yco
mp
ared
too
nly
lab
elli
ng
the
end
po
ints
.[W
iley
–W
iley
reli
abil
ity
]?
YE
S(A
lwin
and
Kro
snic
k1
99
1):
full
yla
bel
led
incr
ease
sre
liab
ilit
y[P
rop
ort
ion
of
var
ian
ceat
trib
ute
dto
tru
eat
titu
des
]?
YE
S(A
nd
rew
s1
98
4):
dat
aq
ual
ity
isb
elow
aver
age
wit
hal
lca
teg
ori
esla
bel
led
[MT
MM
val
idit
y,
met
ho
def
fect
and
resi
du
aler
ror]?
YE
S
123
A classification of response scale characteristics that… 1545
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
(Kro
snic
kan
dP
ress
er2
01
0):
ver
bal
lab
els
are
adv
anta
geo
us
bec
ause
they
clar
ify
the
mea
nin
gs
of
the
scal
ep
oin
tsw
hil
ere
du
cin
gth
ere
spo
nden
tb
urd
en(K
un
z2
01
5):
lab
elli
ng
may
incr
ease
the
cog
nit
ive
effo
rtre
qu
ired
tore
adan
dp
roce
ssal
lopti
ons,
whil
ecl
arif
yin
gth
em
eanin
gof
them
(Eu
tsle
ran
dL
ang
20
15):
Fu
lly
lab
elle
dp
rod
uce
sle
ssex
trem
ere
spo
nse
s[E
xtr
eme
resp
on
seb
ias
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on]?
YE
S(K
rosn
ick
and
Ber
ent
19
93):
full
ver
bal
lab
elli
ng
imp
rove
reli
abil
ity
[Ite
mre
liab
ilit
y]?
YE
S(M
eno
ldet
al.
20
14):
Fu
lly
lab
elle
dsc
ales
hav
eh
igh
erre
liab
ilit
ies
than
wh
eno
nly
the
end
po
ints
are
lab
elle
d[G
utt
man
’sla
mb
da]
?Y
ES
(Mo
ors
etal
.2
01
4):
end
lab
elli
ng
evo
kes
mo
reex
trem
ere
spo
nse
s[E
xtr
eme
resp
on
seb
ias
thro
ugh
late
nt
clas
sfa
ctor]?
YE
S(R
od
ger
set
al.
19
92):
non-v
erbal
alte
rnat
ives
hav
elo
wer
rand
om
erro
r[M
TM
Mco
nst
ruct
val
idit
y]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
Th
eu
seo
fla
bel
sin
crea
sere
liab
ilit
ysi
gn
ifica
ntl
y[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity
]?
YE
S(W
eijt
ers
etal
.2
01
0):
hig
her
acquie
scen
cean
dlo
wer
extr
eme
sco
res
wh
enal
lca
teg
ori
esar
ela
bel
led
[Acq
uie
scen
cean
dE
xtr
eme
resp
onse
bia
sth
roug
hlo
go
dd
s]?
YE
S
Ver
bal
lab
els’
info
rmat
ion
No
n-c
on
cep
tual
Con
cep
tual
Ob
ject
ive
Su
bje
ctiv
eF
ull
-in
form
ativ
e
–(S
aris
and
Gal
lho
fer
20
07):
reli
abil
ity
redu
ced
by
hav
ing
larg
ela
bel
s[T
rue
Sco
reM
TM
Mre
liab
ilit
y]?
YE
S
123
1546 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Qu
anti
fier
lab
els
Vag
ue
Clo
sed-r
ange
(Bro
wn
20
04):
AD
scal
esar
ecl
eare
rto
inte
rpre
tth
anv
agu
eq
uan
tifi
ers
(Po
hl
19
81):
itis
no
tcl
ear
wh
atex
actl
yw
ord
set
pro
vid
esb
ette
req
ual
inte
rval
scal
ing
(Rev
illa
20
15):
clo
sed-r
ang
esh
ou
ldp
rov
ide
eno
ugh
lab
els
such
that
resp
on
den
tsd
on
ot
feel
that
thei
rb
ehav
iou
rsar
en
ot
no
rmal
(Sar
isan
dG
allh
ofe
r2
01
4):
vag
ue
are
pro
ne
todif
fere
nt
inte
rpre
tati
ons
than
close
d(S
chw
arz
etal
.1
98
5):
resp
on
den
tsu
seth
ela
bel
sli
ke
‘‘u
sual
’’as
stan
dar
ds
of
com
par
ison
and
seem
relu
ctan
tto
rep
ort
beh
avio
urs
that
are
un
usu
alin
the
con
tex
to
fth
esc
ale
(Al
Bag
hal
20
14
b):
vag
ue
qu
anti
fier
sd
isp
lay
hig
her
lev
els
of
val
idit
yth
ann
um
eric
op
en-
end
edq
uan
tifi
ers
[Pre
dic
tiv
ev
alid
ity]?
YE
S(A
lB
aghal
20
14
a):
vag
ue
are
equ
alo
rb
ette
rth
ano
pen
-en
ded
qu
anti
fier
s[R
ank-o
rder
corr
elat
ions
and
regre
ssio
nsl
opes
]?
NO
Fix
edre
fere
nce
po
ints
Nu
mb
ero
ffi
xed
refe
ren
cep
oin
ts(S
aris
and
De
Roo
ij1
98
8):
the
refe
ren
cep
oin
tssh
ou
ldad
dn
od
ou
bt
of
its
po
siti
on
on
the
sub
ject
ive
scal
eo
fth
ere
spo
nden
ts(S
aris
and
Gal
lho
fer
20
14):
refe
ren
cep
oin
tsar
en
eces
sary
toas
sure
that
resp
on
den
tsar
eu
sin
gth
esa
me
un
der
lyin
gsc
ale
(Rev
illa
and
Och
oa
20
15):
the
use
of
two
fix
edre
fere
nce
po
ints
incr
ease
ssl
igh
tly
mea
sure
men
tq
ual
ity
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
YE
S(S
aris
and
De
Ro
oij
19
88):
dif
fere
nce
sar
ed
ue
toth
efr
eed
om
resp
on
den
tsh
ave
wh
enn
ofi
xed
refe
rence
poin
tsar
est
abli
shed
[Res
ponse
bia
sth
rou
gh
dis
trib
uti
on
com
par
iso
n]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
fix
edre
fere
nce
po
ints
hav
ea
po
siti
ve
and
sig
nifi
can
tef
fect
on
reli
abil
ity
and
val
idit
y[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity]?
YE
S
123
A classification of response scale characteristics that… 1547
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Ord
erv
erb
alla
bel
sF
rom
neg
ativ
e-to
-po
siti
ve
(N-P
)F
rom
po
siti
ve-
to-n
egat
ive
(P-N
)
(Ch
rist
ian
etal
.2
00
7b):
resp
on
ses
var
yd
epen
din
go
nth
eo
rder
sin
ceit
pro
vid
esan
add
itio
nso
urc
eo
fin
form
atio
n(K
un
z2
01
5):
P-N
scal
esm
ayte
mp
tre
spo
nden
tsto
rush
thro
ug
ha
set
of
item
sat
afa
ster
pac
e
(Chri
stia
net
al.
20
07
b):
the
ord
ero
fth
ev
erb
alla
bel
sd
oes
no
tp
rov
ide
sig
nifi
can
td
iffe
ren
ces
on
resp
onse
s[R
esponse
style
thro
ugh
dis
trib
uti
on
com
par
ison
]?
YE
S(C
hri
stia
net
al.
20
09):
no
pri
mac
yef
fect
fou
nd
by
var
yin
gth
eo
rder
of
the
ver
bal
lab
els
[Sat
isfi
cin
gb
ias
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on]?
YE
S(K
rebs
and
Hoff
mey
er-Z
lotn
ik2
01
0):
more
posi
tive
answ
ers
(pri
mar
yef
fect
)o
nP
-N,
non-
sign
ifica
nt
evid
ence
inth
eN
-Pfo
rmat
[Sat
isfi
cin
gb
ias
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
the
ord
erd
oes
no
th
ave
asi
gn
ifica
nt
imp
act
on
mea
sure
men
tq
ual
ity
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
NO
(Sch
erpen
zeel
and
Sar
is1
99
7):
ord
erh
adli
ttle
or
no
effe
cto
nv
alid
ity
and
reli
abil
ity
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
NO
123
1548 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
No
nv
erb
alla
bel
sN
um
ber
sL
ette
rsS
ym
bo
lsN
on
e
(Ch
rist
ian
etal
.2
00
9):
add
ing
nu
mb
ers
pro
vid
esan
add
itio
nal
sou
rce
of
info
rmat
ion
top
roce
ssb
yth
ere
spo
nden
tsb
efo
resu
bm
itti
ng
anan
swer
(Kro
snic
kan
dF
abri
gar
19
97):
nu
mer
icla
bel
sm
ore
pre
cise
and
easi
erb
ut
hav
en
oin
her
ent
mea
nin
g(T
ou
ran
gea
uet
al.
20
07):
nu
mb
ers
hel
pre
spo
nden
tsto
dec
ide
wh
eth
erth
esc
ale
issu
pp
ose
dto
be
un
ipo
lar
or
bip
ola
r(S
chw
arz
etal
.1
99
1):
use
nu
mer
icla
bel
sto
dis
ambig
uat
eth
em
eanin
go
fsc
ale
ver
bal
lab
els.
0to
10
nu
mb
ers
sug
ges
tth
eab
sen
ceo
rp
rese
nce
of
anat
trib
ute
,w
hil
e-5
to5
sug
ges
tth
atth
eab
sen
ceco
rres
po
nd
sto
0w
her
eas
the
neg
ativ
ev
alu
esre
fer
toth
ep
rese
nce
of
its
op
po
site
(Chri
stia
net
al.
20
09):
resp
on
sest
yle
isu
naf
fect
edw
hen
usi
ng
scal
esw
ith
or
wit
ho
ut
nu
mb
ers
[Sat
isfi
cin
gb
ias
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on]?
NO
(Mo
ors
etal
.2
01
4):
scal
esw
ith
no
nu
mb
ers
evo
ke
mo
reex
trem
ere
spo
ndin
gth
anw
ith
nu
mb
ers
[Ex
trem
ere
spo
nse
bia
sth
roug
hla
ten
tcl
ass
fact
or]?
YE
S(T
ou
ran
gea
uet
al.
20
00):
scal
esw
ith
no
nu
mb
ers
are
com
par
able
toth
ose
wit
hp
osi
tiv
enum
ber
s[R
esponse
style
thro
ugh
dis
trib
uti
on
com
par
iso
n]?
NO
Ord
ernum
eric
alla
bel
sN
egat
ive-
to-p
osi
tive
Po
siti
ve-
to-n
egat
ive
0-t
o-p
osi
tiv
e0
-to
-neg
ativ
eP
osi
tiv
e-to
-0N
egat
ive-
to-0
1(o
rh
igh
er)-
to-p
osi
tiv
eP
osi
tiv
e-to
-1(o
rh
igh
er)
–(S
chw
arz
etal
.1
99
1):
dif
fere
nce
sar
esi
gn
ifica
nt
wh
ena
scal
eis
pre
sen
ted
wit
h0
to10
val
ues
or
wit
h-5
to5
[Res
ponse
style
thro
ugh
dis
trib
uti
on
com
par
ison
]?
YE
S(T
ou
ran
gea
uet
al.
20
07):
dif
fere
nce
sar
esi
gn
ifica
nt
wh
enn
egat
ive
nu
mer
ical
lab
els
are
pro
vid
edin
com
par
iso
nto
wh
enal
lar
eposi
tive
[Res
ponse
style
though
dis
trib
uti
on
com
par
iso
n]?
YE
S(R
eip
s2
00
2):
dif
fere
nt
nu
mer
ical
lab
elli
ng
do
no
tse
emto
infl
uen
ceth
ean
swer
ing
beh
avio
urs
of
par
tici
pan
ts[R
esponse
style
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on
]?
NO
123
A classification of response scale characteristics that… 1549
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Corr
esp
on
den
ceb
etw
een
nu
mer
ical
and
ver
bal
lab
els
Hig
hM
ediu
mL
ow
(Am
oo
and
Fri
edm
an2
00
1):
more
neg
ativ
eco
nn
ota
tio
nis
atta
ched
ton
egat
ive
nu
mb
ers
than
po
siti
ve
wit
hth
esa
me
ver
bal
lab
el(K
rosn
ick
19
99):
use
on
lyv
erb
alla
bel
so
ru
sen
um
ber
sth
atre
info
rce
the
mea
nin
gs
of
the
wo
rds
(Kro
snic
kan
dF
abri
gar
19
97):
nu
mb
ers
sho
uld
be
sele
cted
care
full
yto
rein
forc
eth
em
eanin
go
fth
esc
ale
po
ints
(O’M
uir
chea
rtai
gh
etal
.1
99
5):
nu
mer
ican
dv
erb
alla
bel
ssh
ou
ldp
rov
ide
bip
ola
r/u
nip
ola
rfr
amew
ork
toth
ere
spo
nd
ent
(Sch
aeff
eran
dP
ress
er2
00
3):
wh
enb
ipo
lar
ver
bal
lab
els
are
com
bin
edw
ith
bip
ola
rn
um
eric
lab
els
they
wo
uld
rein
forc
eea
cho
ther
toap
pea
rcl
eare
rto
resp
on
den
ts,
ho
wev
erb
ipo
lar
nu
mer
icla
bel
sm
ov
ere
spo
nse
sto
war
dth
ep
osi
tiv
een
d(S
chw
arz
and
Hip
ple
r1
99
5):
av
erb
alsc
ale
wit
ha
neg
ativ
en
um
eric
val
ue
sug
ges
ta
more
neg
ativ
ein
terp
reta
tion
of
the
ver
bal
scal
ean
cho
ran
dre
sult
sin
mo
rep
osi
tiv
ere
spo
nse
sal
on
gth
esc
ale
(Sch
war
zet
al.
19
91):
mat
chn
um
eric
val
ues
wit
hth
ein
ten
ded
con
cep
tual
izat
ion
of
the
un
ior
bip
ola
rd
imen
sio
n,
nu
mb
ers
sho
uld
no
tb
ese
lect
edar
bit
rari
lyb
ecau
sere
spo
nden
tsu
seth
emto
com
mun
icat
ein
ten
ded
mea
nin
gs
(Chri
stia
net
al.
20
07
b):
low
corr
esp
on
den
ced
oes
no
tim
pac
tsu
bst
anti
ally
the
resp
on
ses
[Res
ponse
style
thro
ugh
dis
trib
uti
on
com
par
iso
n]?
NO
(Ram
mst
edt
and
Kre
bs
20
07):
low
erre
liab
ilit
ies
wh
enth
elo
wer
nu
mb
ers
corr
esp
on
dto
hig
her
po
siti
ve
lab
els
[Tes
t–re
test
reli
abil
ity
]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
low
corr
esp
on
den
celo
wer
ssi
gn
ifica
ntl
yre
liab
ilit
y[T
rue-
sco
reM
TM
Mre
liab
ilit
y]?
YE
S
123
1550 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Sca
les’
sym
met
ryS
ym
met
ric
Asy
mm
etri
c(S
aris
and
Gal
lho
fer
20
14):
anas
ym
met
ric
scal
ep
resu
ppo
ses
kn
ow
led
ge
abo
ut
the
op
inio
no
fth
esa
mp
le,
oth
erw
ise
isb
iase
d
(Sar
isan
dG
allh
ofe
r2
00
7):
sym
met
ric
scal
esh
ave
ap
osi
tiv
eef
fect
on
reli
abil
ity
and
val
idit
y[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity
]?
YE
S(S
cher
pen
zeel
and
Sar
is1
99
7):
reli
abil
ity
and
val
idit
yar
esl
igh
tly
hig
her
for
asy
mm
etri
csc
ales
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
NO
Neu
tral
alte
rnat
ive
Ex
pli
cit
Imp
lici
tN
ot
pro
vid
ed
(Bis
ho
p1
98
7):
mid
poin
tsat
trac
tre
spo
nden
tsu
nd
eru
nce
rtai
nty
(Kula
san
dS
tach
ow
ski
20
09):
mid
po
ints
are
use
dw
hen
resp
on
den
tsar
eu
nd
ecid
ed,
mis
und
erst
and
ing
the
item
,w
hen
thei
rre
spo
nse
isco
nd
itio
nal
or
wh
enth
eyh
ave
an
eutr
alo
pin
ion
(Sar
isan
dG
allh
ofe
r2
01
4):
use
dto
no
tfo
rce
peo
ple
tom
ake
ach
oic
eo
na
spec
ific
dir
ecti
on
(Stu
rgis
etal
.2
01
4):
peo
ple
do
app
ear
toh
ave
po
siti
on
sw
hic
har
en
eutr
al;
om
itti
ng
wil
lfo
rce
thes
ein
div
idual
sto
sele
ctan
op
tio
nw
hic
hd
oes
no
tre
flec
tth
etr
ue
op
inio
n(T
ou
ran
gea
uet
al.
20
04):
resp
on
den
tsca
nin
terp
ret
de
mid
po
int
ina
scal
eas
the
most
typ
ical
and
use
itas
refe
ren
cep
oin
t
(Alw
inan
dK
rosn
ick
19
91):
Mid
po
ints
low
erre
liab
ilit
y,
more
val
uab
lein
7p
oin
tsc
ales
[Pro
po
rtio
no
fv
aria
nce
attr
ibute
dto
tru
eat
titu
des
]?
YE
S(A
nd
rew
s1
98
4):
mid
poin
th
ado
nly
slig
ht
effe
cto
nd
ata
qu
alit
y[M
TM
Mv
alid
ity
,m
eth
od
effe
ctan
dre
sid
ual
erro
r]?
NO
(Mal
ho
tra
etal
.2
00
9):
mid
poin
tre
du
ces
val
idit
y[C
rite
rio
nv
alid
ity]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
no
tp
rov
idin
ga
neu
tral
cate
gory
impro
ves
signifi
cantl
yboth
reli
abil
ity
and
val
idit
y[T
rue-
sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity]?
YE
S(S
cher
pen
zeel
and
Sar
is1
99
7):
exp
lici
tm
idp
oin
th
asn
oef
fect
on
reli
abil
ity
bu
ta
hig
her
val
idit
y[T
rue
Sco
reM
TM
Mre
liab
ilit
yan
dv
alid
ity]?
YE
S(S
chum
anan
dP
ress
er1
98
1):
off
erin
gth
em
idd
leal
tern
ativ
ein
crea
ses
the
pro
port
ion
of
resp
onden
tsin
that
cate
gory
[Res
ponse
style
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on
]?
YE
S(W
eijt
ers
etal
.2
01
0):
mid
poin
tin
crea
ses
acqu
iesc
ence
and
low
ers
extr
eme
resp
on
ses
[Acq
uie
scen
cean
dE
xtr
eme
resp
on
seb
ias]
?Y
ES
123
A classification of response scale characteristics that… 1551
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
‘‘D
on
’tk
no
w’’
(DK
)o
pti
on
Ex
pli
cit
Imp
lici
tN
ot
pro
vid
ed
(Alw
inan
dK
rosn
ick
19
91):
DK
may
be
sele
cted
bec
ause
of
truly
not
hav
ing
anat
titu
de,
lack
of
mo
tiv
atio
n,
wis
hto
avo
idg
ivin
gan
answ
ero
rar
eu
nce
rtai
no
fw
hic
hex
act
po
int
rep
rese
nts
bes
tth
eir
op
inio
n(D
oln
icar
20
13):
ifso
me
resp
on
den
tsca
nn
ot
answ
erth
eq
ues
tion
,o
ffer
exp
lici
tD
K(G
illj
aman
dG
ranb
erg
19
93):
exp
lici
tD
Kin
crea
ses
the
lik
elih
oo
do
ffa
lse
neg
ativ
es(K
rosn
ick
etal
.2
00
2):
pro
vid
ing
DK
lead
sto
less
val
idan
din
form
ativ
ed
ata
than
om
itti
ng
it(K
rosn
ick
etal
.2
00
5)
DK
pro
vis
ion
enco
ura
ges
resp
on
den
tsto
no
tp
rov
ide
un
des
irab
leo
ru
nfl
atte
rin
go
pin
ion
s(K
un
z2
01
5):
DK
op
tio
nsh
ould
be
exp
lici
tly
pro
vid
edif
ther
eis
ag
oo
dre
aso
nto
bel
iev
eth
atre
spo
nd
ents
tru
lyh
ave
no
op
inio
no
nth
eis
sue
inq
ues
tio
n(S
aris
and
Gal
lho
fer
20
14):
exp
lici
tD
Kle
ads
toin
com
ple
ted
ata,
bet
ter
use
imp
lici
tD
K
(Alw
in2
00
7):
Pro
vid
ing
anex
pli
cit
DK
op
tio
nh
asa
com
par
able
reli
abil
ity
ton
ot
pro
vid
ing
it[W
iley
–W
iley
reli
abil
ity
]?
NO
(An
dre
ws
19
84):
exp
lici
tD
Kle
ads
toh
igh
erd
ata
qu
alit
y[M
TM
Mv
alid
ity
,m
eth
od
effe
ctan
dre
sid
ual
erro
r]?
YE
S(D
eL
eeu
wet
al.
20
16):
Ex
pli
cit
DK
incr
ease
sm
issi
ng
dat
aan
dlo
wer
sre
liab
ilit
y.
Imp
lici
tD
Klo
wer
sm
issi
ng
dat
aan
din
crea
ses
reli
abil
ity
[Ite
mn
on
-res
po
nse
and
Coef
fici
ent
alp
ha]
?Y
ES
(McC
lendon
19
91):
exp
lici
tD
Kd
oes
no
tre
du
ceac
quie
scen
ceor
rece
ncy
resp
onse
s[A
cqu
iesc
ence
and
Sat
isfi
cin
gb
ias]
?Y
ES
(McC
lendon
and
Alw
in1
99
3):
no
sup
po
rtto
war
ds
off
erin
gD
Kto
imp
rov
ere
liab
ilit
y[T
rue-
sco
rere
liab
ilit
y]?
NO
(Ro
dg
ers
etal
.1
99
2):
low
erv
alid
itie
sw
hen
off
erin
gD
Kex
pli
citl
y[M
TM
Mco
nst
ruct
val
idit
y]?
YE
S(S
aris
and
Gal
lho
fer
20
07):
Th
ep
rov
isio
no
fth
eD
Ko
pti
on
do
esn
ot
hav
ea
sig
nifi
can
tef
fect
on
mea
sure
men
tq
ual
ity
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
NO
(Sch
erpen
zeel
and
Sar
is1
99
7)
DK
exp
lici
to
rim
pli
cit
do
esn
ot
affe
ctre
liab
ilit
yo
rv
alid
ity
[Tru
e-sc
ore
MT
MM
reli
abil
ity
and
val
idit
y]?
NO
123
1552 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Ch
ara
cter
isti
cso
fth
ere
spo
nse
sca
les’
visu
al
pre
sen
tati
on
Ty
pes
of
vis
ual
resp
on
sere
qu
irem
ent
Poin
t-se
lect
ion
Sli
der
Tex
t-b
ox
inp
ut
Dro
p-d
ow
nm
enu
Dra
g-a
nd
-dro
p
(Bu
skir
ket
al.
20
15):
bo
xfo
rmat
do
esn
og
ive
acl
ear
sen
seo
fth
era
ng
eo
fth
eo
pti
on
s(C
hri
stia
net
al.
20
07a)
:n
um
eric
tex
t-b
ox
inp
ut
bet
ter
bec
ause
dro
p-d
ow
nm
enu
sar
em
ore
cum
ber
som
ew
hen
larg
en
um
ber
of
po
ssib
leo
pti
ons
are
list
ed(C
hri
stia
net
al.
20
09):
bo
xfo
rmat
iscl
ose
rto
ho
wques
tions
are
asked
on
tele
phone,
wher
eth
evis
ual
dis
pla
yis
no
tp
rov
ided
(Co
up
eret
al.
20
04):
dro
pb
ox
esre
qu
ire
add
edef
fort
from
resp
onden
tsw
ho
hav
eto
clic
kan
dsc
roll
sim
ply
tose
eth
ean
swer
op
tio
ns
(De
Lee
uw
etal
.2
00
8):
dro
p-d
ow
nm
enu
sar
em
ore
bu
rden
som
efo
rre
spo
nd
ents
(Dil
lman
and
Bow
ker
20
01):
resp
on
den
tsar
em
ore
frust
rate
dw
ith
dro
p-d
ow
nm
enus
asit
requir
esa
two-
step
pro
cess
(Fu
nk
eet
al.
20
11):
mo
red
eman
din
gre
qu
ires
more
han
d–ey
eco
ord
inat
ion
than
po
int-
sele
ctio
nan
dp
rov
ides
pro
ble
ms
toid
enti
fyn
on
-sub
stan
tiv
ere
spo
nse
s(K
un
z2
01
5):
dra
gan
dd
rop
may
pre
ven
tsy
stem
atic
resp
on
sete
nden
cies
since
resp
on
den
tsn
eed
tosp
end
mo
reti
me
(Rei
ps
20
02):
han
dm
ov
emen
tis
lon
ger
than
for
oth
erty
pes
of
scal
es(R
ost
eret
al.
20
15):
slid
ers
are
mo
refu
nan
den
gag
ing
and
pro
duce
bet
ter
dat
ath
anp
oin
t-se
lect
ion
scal
es
(Bu
skir
ket
al.
20
15):
dif
fere
nce
so
nse
lect
ing
the
low
est,
mid
dle
or
hig
hes
to
pti
ons
and
inm
issi
ng
dat
ab
etw
een
slid
ers,
radio
bu
tto
nsc
ales
and
bo
xfo
rmat
[Sat
isfi
cin
gb
ias
and
Item
-no
nre
spo
nse
]?
YE
S(C
hri
stia
net
al.
20
07
b):
resp
on
ses
are
com
par
able
bet
wee
np
oin
t-se
lect
ion
and
nu
mb
erb
ox
scal
es[R
esp
on
sest
yle
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on
]?
NO
(Ch
rist
ian
etal
.2
00
9):
Bo
xen
try
has
asi
gn
ifica
nt
imp
act
on
resp
onse
sco
mpar
edto
poin
t-se
lect
ion
[Res
ponse
sty
leb
ias
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on
]?
YE
S(C
ook
etal
.2
00
1):
slid
ers
sho
wn
od
iffe
ren
ceco
mp
ared
rati
ng
scal
eso
nre
liab
ilit
y[S
core
reli
abil
ity
]?
NO
(Co
up
eret
al.
20
04):
no
nre
spo
nse
was
com
par
able
bet
wee
nd
rop
-dow
nm
enu
and
po
int-
sele
ctio
n[I
tem
-n
on
resp
on
se]?
NO
(Co
up
eret
al.
20
06):
more
mis
sin
gd
ata
inth
esl
ider
than
inth
era
dio
bu
tto
no
rte
xt
inp
ut
scal
e[I
tem
-n
on
resp
on
se]?
YE
S(K
un
z2
01
5):
dra
g-a
nd-d
rop
scal
essu
ffer
edfr
om
hig
her
item
-no
nre
spon
seco
mp
ared
tora
dio
bu
tto
nsc
ales
[Ite
m-n
on
resp
on
se]?
YE
S(L
iuan
dC
on
rad
20
16):
item
-no
nre
spo
nse
isn
on
sig
nifi
can
tly
dif
fere
nt
com
par
edto
dro
p-d
ow
nan
dte
xt-
bo
xin
pu
t[I
tem
-no
nre
spo
nse
]?
NO
(Rei
ps
20
02):
dro
p-d
ow
nm
enu
sd
on
ot
infl
uen
ceo
nth
ean
swer
ing
beh
avio
urs
com
par
edto
radio
butt
on
scal
es[R
esp
on
sest
yle
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on
]?
NO
(Ro
ster
etal
.2
01
5):
resp
on
sera
tes
bet
wee
nsl
ider
san
dra
dio
-butt
on
scal
esar
enon-s
ignifi
cantl
ydif
fere
nt
[Ite
m-
no
nre
spo
nse
]?
NO
123
A classification of response scale characteristics that… 1553
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Sli
der
s’m
arker
posi
tion
Lef
t/B
ott
om
Rig
ht/
To
pM
idd
leO
uts
ide
(Fu
nk
e2
01
6):
ad
raw
bac
ko
fsl
ider
sis
item
-n
on
resp
on
seis
dif
ficu
ltto
iden
tify
(Bu
skir
ket
al.
20
15):
mo
ren
on
resp
on
se,
mid
dle
and
hig
her
resp
onse
opti
ons
sele
ctio
nfo
rm
iddle
and
right
mar
ker
po
siti
on
com
par
edto
left
mar
ker
[Sat
isfi
cin
gb
ias
and
item
-no
nre
spon
se]?
YE
S
Sca
les’
illu
stra
tiv
efo
rmat
Lad
der
Th
erm
om
eter
Oth
erN
on
e
(Alw
in2
00
7):
off
erin
ga
ther
mo
met
ersc
ale
usu
ally
req
uir
esle
ngth
yin
tro
du
ctio
ns
(Kro
snic
kan
dP
ress
er2
01
0):
ther
mo
met
ers
and
lad
der
sm
ayn
ot
be
go
od
mea
suri
ng
dev
ices
bec
ause
all
po
ints
can
no
tb
ela
bel
led
(Su
dm
anan
dB
rad
bu
rn1
98
3):
use
ther
mo
met
ers,
lad
der
s,te
lep
ho
ne
dia
lsan
dcl
ock
sfo
rn
um
eric
alsc
ales
wit
hm
any
po
ints
(An
dre
ws
and
Cra
ndal
l1
97
5):
lad
der
scal
eso
bta
ined
low
erv
alid
ity
than
oth
erty
pes
of
scal
es[C
onst
ruct
val
idit
y]?
YE
S(K
rosn
ick
19
91):
reli
abil
ity
ish
igh
erfo
ra
rati
ng
scal
eth
anfo
rth
efe
elin
gth
erm
om
eter
[Pea
rso
np
rod
uct
-m
om
ent
test
–re
test
corr
elat
ions]
?Y
ES
(Lev
inan
dC
urr
ie2
01
4):
the
lad
der
scal
ep
rov
ided
bet
ter
reli
abil
ity
and
val
idit
ysc
ore
sth
ano
ther
scal
es[P
ears
on
corr
elat
ions
and
conver
gen
tval
idit
y]?
YE
S(S
chw
arz
etal
.1
99
8):
resp
on
ses
are
sign
ifica
ntl
yd
iffe
ren
tw
het
her
apyra
mid
or
anonio
nfo
rmat
isuse
d[R
esponse
sty
leth
roug
hd
istr
ibu
tio
nco
mp
aris
on
]?
YE
S
Sca
les’
lay
ou
td
isp
lay
Ho
rizo
nta
lV
erti
cal
No
nli
nea
r
(To
epoel
etal
.2
00
9):
resp
on
den
tsar
em
ore
wil
lin
gto
read
op
tio
nin
the
ho
rizo
nta
lfo
rmat
bec
ause
they
firs
tre
adh
ori
zon
tall
yan
dth
env
erti
call
y(T
ou
ran
gea
uet
al.
20
04):
ver
tica
lsc
ales
imp
lym
ore
po
siti
ve
op
tio
ns
atth
eto
p
(Ch
rist
ian
etal
.2
00
9):
resp
on
ses
ton
on
lin
ear
lay
ou
tco
mp
ared
tov
erti
cal
wer
esi
gn
ifica
ntl
yd
iffe
ren
t[R
esp
on
sest
yle
thro
ug
hd
istr
ibu
tio
nco
mp
aris
on
]?
YE
S(T
oep
oel
etal
.2
00
9):
pre
sen
tin
gth
eo
pti
ons
ina
ho
rizo
nta
lo
rv
erti
cal
lay
ou
tre
sult
sin
dif
fere
nt
resp
on
sedis
trib
uti
ons
[Res
ponse
style
thro
ugh
dis
trib
uti
on
com
par
iso
n]?
YE
S
Ov
erla
pb
etw
een
ver
bal
and
nu
mer
ical
lab
els
Ov
erla
pp
rese
nt
Tex
tcl
earl
yco
nn
ecte
dto
cate
gori
es
NS
NS
123
1554 A. DeCastellarnau
Table
3co
nti
nued
Char
acte
rist
ics
Des
ign
choic
esT
heo
reti
cal
argum
ents
Em
pir
ical
evid
ence
on
dat
aqual
ity
Lab
els’
vis
ual
sep
arat
ion
No
n-s
ub
stan
tiv
eo
pti
ons
Neu
tral
op
tio
ns
En
d-p
oin
tsA
llo
pti
ons
No
ne
(Ch
rist
ian
etal
.2
00
9):
vis
ual
sep
arat
ion
of
lab
els
may
enco
ura
ge
resp
on
den
tsto
sele
ctit
and
may
tak
elo
ng
erfo
rre
spo
nden
tsto
pro
cess
than
wh
enal
lla
bel
sar
eev
enly
spac
ed(T
ou
ran
gea
uet
al.
20
04):
separ
atio
nca
lls
the
atte
nti
on
of
the
sep
arat
edo
pti
on
(De
Lee
uw
etal
.2
01
6):
clea
rly
sep
arat
ing
the
DK
op
tio
nfr
om
the
sub
stan
tiv
ere
spo
nse
sre
du
ces
mis
sin
gd
ata
and
pro
du
ced
hig
her
reli
abil
ity
[Ite
mn
on
resp
on
sean
dC
oef
fici
ent
alpha]
?Y
ES
(Ch
rist
ian
etal
.2
00
9):
sep
arat
ion
of
the
no
n-s
ub
stan
tiv
eo
pti
on
lead
sto
sign
ifica
nt
dif
fere
nt
resp
on
ses,
sep
arat
ion
of
the
mid
po
int
do
esn
ot
lead
tosi
gn
ifica
nt
dif
fere
nce
s[R
esponse
style
thro
ugh
dis
trib
uti
on
com
par
iso
n]?
YE
S(T
ou
ran
gea
uet
al.
20
04):
sep
arat
ion
of
no
n-s
ub
stan
tiv
eopti
ons
affe
cted
the
dis
trib
uti
on
of
answ
ers
[Res
ponse
sty
leth
roug
hd
istr
ibu
tio
nco
mp
aris
on
]?
YE
S
Lab
els’
illu
stra
tiv
eim
ages
Fee
lin
gfa
ces
Oth
erh
um
ansy
mb
ols
No
n-h
um
ansy
mb
ols
No
ne
(Em
de
and
Fu
chs
20
13):
face
ssc
ales
are
easy
tofo
rmat
and
attr
act
the
atte
nti
on
and
incr
ease
resp
on
den
ts’
enjo
ym
ent
(Kun
in1
99
8):
Fac
essc
ales
hav
eth
ead
van
tag
eo
fel
imin
atin
gth
en
eces
sity
for
tran
slat
ing
feel
ing
sin
tow
ord
s,fa
ces
are
easi
erto
iden
tify
by
resp
on
den
tsth
anw
ord
s
(An
dre
ws
and
Cra
ndal
l1
97
5):
com
par
able
val
idit
yb
etw
een
face
ssc
ales
and
rati
ng
scal
es[C
onst
ruct
val
idit
y]?
NO
(Der
ham
20
11):
the
emoti
con
scal
ep
rese
nte
dsi
gn
ifica
ntl
yh
igh
ern
oan
swer
sth
ansl
ider
or
po
int-
sele
ctio
nsc
ales
[Ite
m-n
onre
sponse
]?
YE
S(E
md
ean
dF
uch
s2
01
3):
no
n-s
ign
ifica
nt
dif
fere
nce
sin
the
resp
on
ses
bet
wee
nth
esm
iley
scal
esan
dth
era
dio
butt
on
des
ign
[Res
ponse
style
thro
ugh
dis
trib
uti
on
com
par
iso
n]?
NO
123
A classification of response scale characteristics that… 1555
References
Aiken, L.R.: Number of response categories and statistics on a teacher rating scale. Educ. Psychol. Meas. 43,397–401 (1983). doi:10.1177/001316448304300209
Alwin, D.F.: Feeling thermometers versus 7-point scales. Which are better? Sociol. Methods Res. 25,318–340 (1997). doi:10.1177/0049124197025003003
Alwin, D.F.: Margins of Error: A Study of Reliability in Survey Measurement. Wiley, Hoboken (2007)Alwin, D.F., Krosnick, J.A.: The reliability of survey attitude measurement: the influence of question and
respondent attributes. Sociol. Methods Res. 20, 139–181 (1991). doi:10.1177/0049124191020001005Amoo, T., Friedman, H.H.: Do numeric values influence subjects’ responses to rating scales? J. Int. Mark.
Marking Res. 26, 41–46 (2001)Andrews, F.M.: Construct validity and error components of survey measures: a structural modelling
approach. Public Opin. Q. 48, 409–442 (1984). doi:10.1086/268840Andrews, F.M., Crandall, R.: The validity of measures of self-reported well-being. Soc. Indic. Res. 3, 1–19
(1975)Andrews, F.M., Withey, S.B.: Social Indicators of Well-Being: Americans’ Perceptions of Life Quality.
Plenum Press, New York (1976)Al Baghal, T.: Numeric estimation and response options: an examination of the accuracy of numeric and
Al Baghal, T.: Is vague valid? The comparative predictive validity of vague quantifiers and numericresponse options. Surv. Res. Methods 8, 169–179 (2014b). doi:10.18148/srm/2014.v8i3.5813
Bendig, A.W.: Reliability and the number of rating-scale categories. J. Appl. Psychol. 38, 38–40 (1954).doi:10.1037/h0055647
Billiet, J., McClendon, M.J.: Modeling acquiescence in measurement models for two balanced sets of items.Struct. Equ. Model A Multidiscip. J. 7, 608–628 (2000). doi:10.1207/S15328007SEM0704_5
Bishop, G.F.: Experiments with the middle response alternative in survey questions. Public Opin. Q. 51,220–232 (1987). doi:10.1086/269030
Brown, G.T.L.: Measuring attitude with positively packed self-report ratings: comparison of agreement andfrequency scales. Psychol. Rep. 94, 1015–1024 (2004). doi:10.2466/pr0.94.3.1015-1024
Buskirk, T.D., Saunders, T., Michaud, J.: Are sliders too slick for surveys? An experiment comparing sliderand radio button scales for smartphone, tablet and computer based surveys. Methods Data Anal. 9,229–260 (2015). doi:10.12758/mda.2015.013
Christian, L.M., Dillman, D.A., Smyth, J.D.: Helping respondents get it right the first time: the influence ofwords, symbols, and graphics in web surveys. Public Opin. Q. 71, 113–125 (2007a). doi:10.1093/poq/nfl039
Christian, L.M., Dillman, D.A., Smyth, J.D.: The effects of mode and format on answers to scalar questionsin telephone and web surveys. In: Lepkowski, J.M., Tucker, C., Brick, M., De Leeuw, E.D., Japec, L.,Lavrakas, P.J., Link, M.W., Sangster, R.L. (eds.) Advances in Telephone Survey Methodology,pp. 250–275. Wiley, Hoboken (2007b)
Christian, L.M., Parsons, N.L., Dillman, D.A.: Designing scalar questions for web surveys. Sociol. MethodsRes. 37, 393–425 (2009). doi:10.1177/0049124108330004
Couper, M.P., Tourangeau, R., Conrad, F.G., Crawford, S.D.: What they see is what we get: responseoptions for web surveys. Soc. Sci. Comput. Rev. 22, 111–127 (2004). doi:10.1177/0894439303256555
Couper, M.P., Tourangeau, R., Conrad, F.G., Singer, E.: Evaluating the effectiveness of visual analog scales:a web experiment. Soc. Sci. Comput. Rev. 24, 227–245 (2006). doi:10.1177/0894439305281503
Couper, M.P., Traugott, M.W., Lamias, M.J.: Web survey design and administration. Public Opin. Q. 65,230–253 (2001). doi:10.1086/322199
Cox III, E.P.: The optimal number of response alternatives for a scale. J. Mark. Res. 17, 407–422 (1980).doi:10.2307/3150495
De Leeuw, E.D., Hox, J.J., Dillman, D.A.: International Handbook of Survey Methodology. Routledge, NewYork (2008)
De Leeuw, E.D., Hox, J.J., Boeve, A.: Handling do-not-know answers: exploring new approaches in onlineand mixed-mode surveys. Soc. Sci. Comput. Rev. 34, 116–132 (2016). doi:10.1177/0894439315573744
Derham, P.A.J.: Using preferred, understood or effective scales? How scale presentations effect onlinesurvey data collection. Australas. J. Mark. Soc. Res. 19, 13–26 (2011)
Dillman, D., Bowker, D.: The web questionnaire challenge to survey methodologists. In: Reips, U.D.,Bosnjak, M. (eds.) Dimensions of Internet Science. Pabst Science Publishers, Lengerich (2001)
Dolnicar, S.: Asking good survey questions. J. Travel Res. 52, 551–574 (2013). doi:10.1177/0047287513479842
Emde, M., Fuchs, M.: Exploring animated faces scales in web surveys: drawbacks and prospects. Surv.Pract. 5 (2013). http://www.surveypractice.org/index.php/SurveyPractice/article/view/60
Eutsler, J., Lang, B.: Rating scales in accounting research: the impact of scale points and labels. Behav. Res.Acc. 27, 35–51 (2015). doi:10.2308/bria-51219
Funke, F.: A web experiment showing negative effects of slider scales compared to visual analogue scalesand radio button scales. Soc. Sci. Comput. Rev. 34, 244–254 (2016). doi:10.1177/0894439315575477
Funke, F., Reips, U.-D.: Why semantic differentials in web-based research should be made from visualanalogue scales and not from 5-point scales. Field Methods 24, 310–327 (2012). doi:10.1177/1525822X12444061
Funke, F., Reips, U.-D., Thomas, R.K.: Sliders for the smart: type of rating scale on the web interacts witheducational level. Soc. Sci. Comput. Rev. 29, 221–231 (2011). doi:10.1177/0894439310376896
Gilljam, M., Granberg, D.: Should we take don’t know for an answer? Public Opin. Q. 57, 348–357 (1993).doi:10.1086/269380
Groves, R.M., Fowler Jr., F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: SurveyMethodology. Wiley, New York (2009)
Hjermstad, M.J., Fayers, P.M., Haugen, D.F., Caraceni, A., Hanks, G.W., Loge, J.H., Fainsinger, R., Aass,N., Kaasa, S.: Studies comparing numerical rating scales, verbal rating scales, and visual analoguescales for assessment of pain intensity in adults: a systematic literature review. J. Pain SymptomManag. 41, 1073–1093 (2011). doi:10.1016/j.jpainsymman.2010.08.016
Jacoby, J., Matell, M.S.: Three-point Likert scales are good enough. J. Mark. Res. 8, 495–500 (1971).doi:10.2307/3150242
Komorita, S.S., Graham, W.K.: Number of scale points and the reliability of scales. Educ. Psychol. Meas.25, 987–995 (1965). doi:10.1177/001316446502500404
Koskey, K.L.K., Sondergeld, T.A., Beltyukova, S.A., Fox, C.M.: An experimental study using rasch analysisto compare absolute magnitude estimation and categorical rating scales as applied in survey research.J. Appl. Meas. 14, 1–21 (2013)
Krebs, D., Hoffmeyer-Zlotnik, J.H.P.: Positive first or negative first? Methodology 6, 118–127 (2010).doi:10.1027/1614-2241/a000013
Krosnick, J.A.: The stability of political preferences: comparisons of symbolic and nonsymbolic attitudes.Am. J. Pol. Sci. 35, 547–576 (1991). doi:10.2307/2111553
Krosnick, J.A., Berent, M.K.: Comparisons of party identifications and policy preferences: the impact ofsurvey question format. Am. J. Pol. Sci. 37, 941–964 (1993). doi:10.2307/2111580
Krosnick, J.A., Fabrigar, L.R.: Designing rating scales for effective measurement in surveys. In: Lyberg,L.E., Biemer, P.P., Collins, M., De Leeuw, E.D., Dippo, C., Schwarz, N., Trewin, D. (eds.) SurveyMeasurement and Process Quality, pp. 141–164. Wiley, Hoboken (1997)
Krosnick, J.A., Holbrook, A.L., Berent, M.K., Carson, R.T., Hanemann, W.M., Kopp, R.J., Mitchell, R.C.,Presser, S., Ruud, P.A., Smith, V.K., Moody, W.R., Green, M.C., Conaway, M.: The impact of ‘‘noopinion’’ response options on data quality: non-attitude reduction or an invitation to satisfice? PublicOpin. Q. 66, 371–403 (2002). doi:10.1086/341394
Krosnick, J.A., Judd, C.M., Wittenbrink, B.: The measurement of attitudes. In: Albarracin, D., Johnson,B.T., Zanna, M.P. (eds.) The Handbook of Attitudes, pp. 21–78. Lawrence Erlbaum, Mahwah (2005)
Krosnick, J.A., Presser, S.: Question and Questionnaire Design. In: Marsden, P.V., Write, J.D. (eds.)Handbook of Survey Research, pp. 263–313. Emerald Group Publishing Limited, Bingley (2010)
Kulas, J.T., Stachowski, A.A.: Middle category endorsement in odd-numbered Likert response scales:associated item characteristics, cognitive demands, and preferred meanings. J. Res. Pers. 43, 489–493(2009). doi:10.1016/j.jrp.2008.12.005
Kunin, T.: The construction of a new type of attitude measure. Pers. Psychol. 51, 823–824 (1998). doi:10.1111/j.1744-6570.1998.tb00739.x
Kunz, T.: Rating scales in Web surveys. A test of new drag-and-drop rating procedures. TechnischeUniversitat, Darmstadt [Ph.D. Thesis] (2015)
Levin, K.A., Currie, C.: Reliability and validity of an adapted version of the cantril ladder for use withadolescent samples. Soc. Indic. Res. 119, 1047–1063 (2014). doi:10.1007/s11205-013-0507-4
Liu, M., Conrad, F.G.: An experiment testing six formats of 101-point rating scales. Comput. Hum. Behav.55, 364–371 (2016). doi:10.1016/j.chb.2015.09.036
123
A classification of response scale characteristics that… 1557
Lundmark, S., Gilljam, M., Dahlberg, S.: measuring generalized trust. an examination of question wordingand the number of scale points. Public Opin. Q. 80, 26–43 (2016). doi:10.1093/poq/nfv042
Malhotra, N., Krosnick, J.A., Thomas, R.K.: Optimal design of branching questions to measure bipolarconstructs. Public Opin. Q. 73, 304–324 (2009). doi:10.1093/poq/nfp023
Matell, M.S., Jacoby, J.: Is there an optimal number of alternatives for Likert scale items? Study I: reliabilityand validity. Educ. Psychol. Meas. 31, 657–674 (1971). doi:10.1177/001316447103100307
McClendon, M.J.: Acquiescence and recency response-order effects in interview surveys. Sociol. MethodsRes. 20, 60–103 (1991). doi:10.1177/0049124191020001003
McKelvie, S.J.: Graphic rating scales—How many categories? Br. J. Psychol. 69, 185–202 (1978). doi:10.1111/j.2044-8295.1978.tb01647.x
Menold, N., Kaczmirek, L., Lenzner, T., Neusar, A.: How do respondents attend to verbal labels in ratingscales? Field Methods 26, 21–39 (2014). doi:10.1177/1525822X13508270
Miethe, T.D.: The validity and reliability of value measurements. J. Psychol. 119, 441–453 (1985). doi:10.1080/00223980.1985.10542914
Moors, G., Kieruj, N.D., Vermunt, J.K.: The effect of labeling and numbering of response scales on thelikelihood of response bias. Sociol. Methodol. 44, 369–399 (2014). doi:10.1177/0081175013516114
O’Muircheartaigh, C., Gaskell, G., Wright, D.B.: Weighing anchors: verbal and numeric labels for responsescales. J. Off. Stat. 11, 295–307 (1995)
Pohl, N.F.: Scale considerations in using vague quantifiers. J. Exp Educ. 49, 235–240 (1981). doi:10.1080/00220973.1981.11011790
Preston, C.C., Colman, A.M.: Optimal number of response categories in rating scales: reliability, validity,discriminating power, and respondent preferences. Acta. Psychol. (Amst). 104, 1–15 (2000). doi:10.1016/S0001-6918(99)00050-5
Rammstedt, B., Krebs, D.: Does response scale format affect the answering of personality scales? Eur.J. Psychol. Assess. 23, 32–38 (2007). doi:10.1027/1015-5759.23.1.32
Reips, U.-D.: Context effects in web-surveys. In: Batnic, B., Reips, U.-D., Bosnjak, M. (eds.) Online SocialSciences, pp. 69–79. Hogrefe & Huber, Cambridge (2002)
Revilla, M.: Effect of using different labels for the scales in a web survey. Int. J. Mark. Res. 57, 225–238(2015). doi:10.2501/IJMR-2014-028
Revilla, M., Ochoa, C.: Quality of different scales in an online survey in Mexico and Colombia. J. Polit. Lat.Am. 7, 157–177 (2015)
Revilla, M., Saris, W.E., Krosnick, J.A.: Choosing the number of categories in agree-disagree scales. Sociol.Methods Res. 43, 73–97 (2014). doi:10.1177/0049124113509605
Rodgers, W.L., Andrews, F.M., Herzog, A.R.: Quality of survey measures: a structural modeling approach.J. Off. Stat. 8, 251–275 (1992)
Rossiter, J.R.: Measurement for the social sciences: The C-OAR-SE method and why it must replacepsycometrics. Springer, New York (2011)
Roster, C.A., Lucianetti, L., Albaum, G.: Exploring slider vs. categorical response formats in web-basedsurveys. J. Res. Pract. 11 (2015). http://jrp.icaap.org/index.php/jrp/article/view/509/413
Saris, W.E.: Variation in Response Functions: A Source of Measurement Error in Attitude Research.Sociometric Research Foundation, Amsterdam (1988)
Saris, W.E., Gallhofer, I.N.: Design, Evaluation, and Analysis of Questionnaires for Survey Research.Wiley, Hoboken (2007)
Saris, W.E., Gallhofer, I.N.: Design, Evaluation, and Analysis of Questionnaires for Survey Research.Wiley, Hoboken (2014)
Saris, W.E., Revilla, M.: Correction for measurement errors in survey research: necessary and possible. Soc.Indic. Res. 127, 1005–1020 (2016). doi:10.1007/s11205-015-1002-x
Saris, W.E., Revilla, M., Krosnick, J.A., Shaeffer, E.M.: Comparing questions with agree/disagree responseoptions to questions with item-specific response options. Surv. Res. Methods. 4, 61–79 (2010). doi:10.18148/srm/2010.v4i1.2682
Saris, W.E., De Rooij, K.: What kind of terms should be used for reference points. In: Saris, W.E. (ed.)Variations in Response Functions: A Source of Measurement Error in Attitude Research, pp. 188–219.Sociometric Research Foundation, Amsterdam (1988)
Schaeffer, N.C.: Hardly ever or constantly? Group comparisons using vague quantifier. Public Opin. Q. 55,395–423 (1991). doi:10.1086/269270
Schaeffer, N.C., Bradburn, N.M.: Respondent behavior in magnitude estimation. J. Am. Stat. Assoc. 84,402–413 (1989). doi:10.2307/2289923
Schaeffer, N.C., Presser, S.: The science of asking questions. Annu. Rev. Sociol. 29, 65–88 (2003). doi:10.1146/annurev.soc.29.110702.110112
Scherpenzeel, A.C., Saris, W.E.: The validity and reliability of survey questions: a meta-analysis of MTMMstudies. Sociol. Methods Res. 25, 341–383 (1997)
Schuman, H., Presser, S.: Questions and Answers in Attitude Surveys: Experiments on Question Form.Wording and Context. Sage Publications, Thousands Oaks (1981)
Schwarz, N., Grayson, C.E., Knauper, B.: Formal features of rating scales and their interpretation ofquestion meaning. Int. J. Public Opin. Res. 10, 177–183 (1998). doi:10.1093/ijpor/10.2.177
Schwarz, N., Hippler, H.-J.: the numeric values of rating scales: a comparison of their impact in mailsurveys and telephone interviews. Int. J. Public Opin. Res. 7, 72–74 (1995). doi:10.1093/ijpor/7.1.72
Schwarz, N., Hippler, H.-J., Deutsch, B., Strack, F.: Response scales: effects of category range on reportedbehavior and comparative judgments. Public Opin. Q. 49, 388–395 (1985). doi:10.1086/268936
Schwarz, N., Knauper, B., Hippler, H.-J., Noelle-Neumann, E., Clark, L.: Rating scales: numeric values maychange the meaning of scale labels. Public Opin. Q. 55, 570–582 (1991). doi:10.1086/269282
Sturgis, P., Roberts, C., Smith, P.: Middle alternatives revisited: how the neither/nor response acts as a wayof saying ‘‘I don’t know’’? Sociol. Methods Res. 43, 15–38 (2014). doi:10.1177/0049124112452527
Sudman, S., Bradburn, N.M.: Asking Questions: A Practical Guide to Questionnaire Design. Jossey Bass,San Francisco (1983)
Toepoel, V., Das, M., van Soest, A.: Design of web questionnaires: the effect of layout in rating scales.J. Off. Stat. 25, 509–528 (2009)
Tourangeau, R., Couper, M.P., Conrad, F.: Spacing, position, and order. interpretive heuristics for visualfeatures of survey questions. Public Opin. Q. 68, 368–393 (2004). doi:10.1093/poq/nfh035
Tourangeau, R., Couper, M.P., Conrad, F.: Color, labels, and interpretive heuristics for response scales.Public Opin. Q. 71, 91–112 (2007). doi:10.1093/poq/nfl046
Tourangeau, R., Rips, L.J., Rasinksi, K.: The Psychology of Survey Response. Cambridge University Press,Cambridge (2000)
van Doorn, L.J., Saris, W.E., Lodge, M.: The measurement of issue-variables: positions of respondents,candidates and parties. In: Middendorp, C.P., Niemoller, B., Saris, W.E. (eds.) Het Tweed SociometricCongress, pp. 229–250. Dutch Sociometric Society, Amsterdam (1982)
Weijters, B., Cabooter, E., Schillewaert, N.: The effect of rating scale format on response styles: the numberof response categories and response category labels. Int. J. Res. Mark. 27, 236–247 (2010). doi:10.1016/j.ijresmar.2010.02.004
123
A classification of response scale characteristics that… 1559