The response category labeling effect: How the wording of labels affects response distributions in Likert data Bert Weijters Maggie Geuens Hans Baumgartner
Dec 16, 2015
The response category labeling effect:How the wording of labels affects
response distributions in Likert data
Bert Weijters
Maggie Geuens
Hans Baumgartner
The response category labeling effect
Research questions Do the labels attached to scale categories influence
response behavior? What mechanism(s) can account for this response
category labeling effect? Are there moderators of this effect? What are the implications of the response category
labeling effect for cross-cultural research?
The response category labeling effect
The importance of category labels
________ ________ ________ ________ ________
strongly disagree disagree neither agree nor disagree
agree strongly agree
________ ________ ________ ________ ________
completely disagree
disagree neither agree nor disagree
agree completelyagree
versus
I try to avoid foods that are high in cholesterol.
The response category labeling effect
The intensity hypothesis Label intensity refers to the perceived degree of
(dis)agreement implied by the label; More intense labels represent more extreme
positions, which are endorsed less often (e.g., agree vs. strongly agree; superior vs. very good);
Even more subtle adverbial modifiers (e.g., strongly vs. completely agree) may influence response behavior;
Prior evidence that different intensities are associated with different adverbs (e.g., Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement;
The response category labeling effect
The fluency hypothesis Research on processing fluency shows that the meta-cognitive
experience of ease of processing affects judgment and decision making: perceptions of the truth value of statements (e.g., Unkelbach 2007); liking for objects and events (e.g., Reber, Schwarz, and Winkielman
2004); Repeated statements are more likely to be rated as true (Unkelbach
2007) and repetition increases liking, as suggested by the mere exposure effect (e.g., Bornstein 1989), in part because repetition makes stimuli more familiar and contributes to greater processing fluency;
Words vary in how often they are encountered, and high frequency words are processed more fluently;
If scale labels are more commonly used in everyday language and are thus easier to process, this may increase the likelihood that the corresponding response option on the rating scale is selected;
The response category labeling effect
Two alternative hypotheses to explain the effect of response
category labelsIntensity hypothesis:
H1: Response categories are endorsed less frequently if their labels are more intense.
Fluency hypothesis:
H2: Response categories are endorsed more frequently if their labels are more fluent.
The response category labeling effect
Verbal ability as a moderator of the fluency effect
when people are processing more carefully or when people are highly experienced, their actual thoughts, not the ease of generating them, play a more decisive role;
Verbal ability (as a form of language expertise) may moderate the fluency effect;
We posit that for respondents who tend to use words in a precise manner and who make fine-grained distinctions as to the exact meaning and implications of words, fluency will be less important as a cue in selecting a response;
The response category labeling effectStudy 1: Scaling intensity and
fluency• Do different methods for scaling the intensity
and fluency of response category labels lead to similar results?
If the intensity or fluency of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents.
• Can we identify endpoint labels that vary significantly in intensity and fluency for use in subsequent studies?
We need two labels that imply contradictory responses under the intensity and fluency hypotheses.
The response category labeling effect
Study 1 (cont’d)• Label intensity
– Direct ratings of intensity (0 = neutral; 10 = 100% agreement)– Pairwise comparisons of intensity (“Which expression indicates
the stronger sense of agreement?”)
• Label fluency– Direct ratings of fluency (0 = we never use this term in day-to-day
speech; 10 = we use this term very often in day-to-day speech)– Pairwise comparisons of fluency (“Which expression is more
commonly used in day-to-day speech?”)– Lexical decision task (press a button labeled ‘end category label’
or ‘not an end category label’ for 6 endpoint labels and five non-endpoint labels)
– Word frequency counts in corpora of texts (Google hits, available for specific word combinations in particular countries and languages)
The response category labeling effect
Study 1: Method
Sample 1: 83 undergraduates; pairwise comparisons of intensity and fluency of six endpoint labels;
Sample 2: 112 respondents (mean age 32.03, 66% female) from an online panel; direct ratings of intensity and fluency on 11-point scales;
Sample 3: 125 under graduates (57% female); lexical decision task;
The response category labeling effect
INTENSITY FLUENCY
Dutch label Free translationPaired
comparisonDirect rating
Paired comparison
Direct rating
Response latency
Google hits (in
millions)
Sterk (on)eens Strongly (dis)agree0.94 (.13) 8.89a
(.15)1.14 (.11) 4.62b
(.27)1011.53a (55.01)
11.70
Zeer (on)eens Very much (dis)agree1.43 (.12) 8.60a
(.18)1.65 (.12) 3.49a
(.24)1002.93a (54.86)
38.20
Zeker (on)eens Certainly (dis)agree2.11 (.11) 8.78a
(.20)2.40 (.10) 6.05c
(.28) 989.72a (54.61)
35.90
Uitgesproken
(on)eens
Distinctly (dis)agree2.98 (.18) 9.57b
(.22)1.18 (.13) 3.81b
(.28)1021.87a (54.73)
2.16
Helemaal (on)eens Fully (dis)agree3.72 (.13) 10.54c
(.12)4.24 (.08) 9.62d
(.16) 724.88b (55.83)
33.60
Volledig (on)eens Completely (dis)agree3.82 (.12) 10.56c
(.10)4.39 (.08) 9.59d
(.18) 672.73b (55.21)
67.10
Study 1: Results
The response category labeling effect
Study 1: Results (cont’d) For intensity, the correlation of the means obtained from
the paired comparison and direct rating tasks is .92; The correlations of the means derived from the four
fluency methods range from .66 to .97, with an average of r = .84;
Thus, there is considerable consistency in respondents’ judgments of the perceived intensity and fluency of different category labels;
‘sterk eens’ (strongly agree) consistently emerged as one of the least intense and least fluent labels, while ‘volledig eens’ (completely agree) surfaced as one of the most intense and most fluent labels;
The response category labeling effect
Study 2 Direct test of the intensity and fluency hypotheses:
The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true.
Preliminary test of whether the intensity/fluency of labels affects predictive validity.
The response category labeling effect
Measuring response distributions
A major challenge is to measure differences in response distributions that are not item-specific and independent of substantive content;
To do this, we need to observe patterns of responses across heterogeneous items (i.e., items that do not share common content but have the same response format):
Deliberately designed scales consisting of heterogeneous items (Greenleaf 1992)
Random samples of items from scale inventories (Weijters, Geuens & Schillewaert 2010)
The response category labeling effect
Study 2: Method 161 Dutch-speaking respondents (mean age 31.27, 67%
female) from a university panel were randomly assigned to two versions of an online questionnaire:
□ Endpoint labels of ‘completely (dis)agree’□ Endpoint labels of ‘strongly (dis)agree’
Four sections:□ 6 attitudinal items, one of which was “I love to go out for dinner”; □ 10 heterogeneous items from unrelated scales (e.g., “I am a
sensitive person”, “Financial security is important to me”), rated on 5-point scales;
□ Direct ratings of the intensity and fluency of six end labels (100-point scale for intensity, 11-point scale for fluency);
□ Behavioral measure of choice between five different vouchers worth 15 EUR (cinema, book, restaurant, theatre, gym);
The response category labeling effect
Study 2: Results
The findings support the fluency hypothesis:
Intensity Fluency Mean endorsement of the extreme positive category
Strongly agree 75.47 3.20 2.47
Completely agree 93.63 8.22 3.61
Logistic regression of choice of restaurant voucher on label, attitude toward going out for dinner, and interaction indicates a significant interaction: predictive validity is better for ‘strongly agree’ than ‘completely agree’.
(p<.001 based on a Poisson regression)
The response category labeling effect
Study 3
Replication of the fluency effect with a sample drawn from the general population;
Literacy as a potential moderator;
The response category labeling effect
Study 3: Method 369 Dutch-speaking panel members (mean age 45.8, 50%
female) of an online market research agency in Belgium were randomly assigned to two versions of an online questionnaire: Endpoint labels of ‘completely (dis)agree’ Endpoint labels of ‘strongly (dis)agree’
Questionnaire: 16 heterogeneous items based on Greenleaf (1992), rated
on 5-point scales; Pairwise comparisons of four endpoint labels in terms of
intensity and fluency (strongly, completely, fully, and absolutely);
Literacy measure: “I do a lot of reading” and “I prefer activities that don’t require a lot of reading” (strongly associated with having a higher education);
The response category labeling effect
Study 3: Results
The findings support the fluency hypothesis:
Intensity Fluency Mean endorsement of the extreme categories
Strongly agree .74 .65 2.66
Completely agree 2.02 2.18 3.05
Fluency effect occurs primarily for respondents with lower literacy;
(p<.05 based on a Poisson regression)
The response category labeling effect
Sterk
(on)
eens
Volled
ig (o
n)ee
ns
0
1
2
3
4
Low literacyHigh literacy
Nu
mb
er
of
en
dp
oin
t re
spo
nse
s
The moderating effect of literacy on the fluency effect
The response category labeling effect
Study 4: Method 271 Dutch-speaking panel members (mean age 39.2,
51% female) of an online market research agency in Belgium were randomly assigned to two versions of an online questionnaire: Endpoint labels of ‘completely (dis)agree’ Endpoint labels of ‘strongly (dis)agree’
Questionnaire: 10 heterogeneous items, rated on 5-point scales; Pairwise comparisons of six response category labels in terms of
intensity and fluency; Antonym test as a measure of verbal ability (4 items); antonym
test strongly associated with having a higher education;
The response category labeling effect
Study 4: Results
Manipulation checks:
Intensity Fluency
Strongly agree 1.93 1.67
Completely agree 3.44 4.26
Fluency effect occurs primarily for respondents low in verbal ability (significant interaction, with significant simple main effect for low verbal ability respondents);
The response category labeling effect
The moderating effect of verbal ability on the fluency effect
Sterk
(on)
eens
Volled
ig (o
n)ee
ns0
1
2
3
4
5
Low verbal abilityHigh verbal ability
Nu
mb
er
of
en
dp
oin
t re
spo
nse
s
The response category labeling effect
Implications of the category labeling effect
for cross-cultural research Response category labels can affect findings in a single-language
context (e.g., estimation of population parameters, meta-analytic comparisons), but they are particularly important in cross-cultural research, where labels have to be translated;
Two types of translation: Literal Idiomatic
Some authors have emphasized the need to choose scale anchors that are equal in intensity (e.g., Harzing 2006), and prior research has demonstrated that supposedly similar terms may differ in intensity across languages (e.g., definitely vs. bestimmt; see Smith et al. 2009);
Translated adverbial modifiers may also differ in fluency;
The response category labeling effect
Schematic representation of the translation process
(based on Bassetti and Cook 2011)
The response category labeling effect
Study 5: Method
Approx. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe;
Five endpoint labels in each language; 16 heterogeneous items from Greenleaf (1992),
rated on 5-point scales; Pairwise comparisons of the six labels plus “agree”
or “d’accord” in terms of intensity and fluency;
The response category labeling effect
Study 5: Method
France USA Canada UK Total
Language French 227 0 203 0 430
English 0 185 196 187 568
Total 227 382 399 187 998
Version English French
1 Strongly agree Fortement d'accord2 Completely agree Complètement d'accord3 Extremely agree Extrêmement d'accord4 Definitely agree Définitivement d'accord5 Fully agree Entièrement d'accord6 Very much agree Tout à fait d'accord
The response category labeling effect
Study 5: ResultsIntensity and fluency ratings by region
Note: Correlation between the fluency ratings and the natural logarithm of the number of Google hits was at least .88.
The response category labeling effect
Dependent variable Independent variable B SE t p
Individual extreme responding (level 1)
Fluency -.015 .029 -.512 .609
Intensity .020 .017 1.213 .225
Group-level extreme responding (level 2)
Fluency .164 .065 2.532 .011
Intensity -.130 .132 -.979 .327
Language = French .054 .087 .621 .535
Country = USA .114 .100 1.139 .255
Country = France .002 .076 .026 .979
Country = UK -.004 .096 -.040 .968
Intercept term .995 .060 16.486 .000
Study 5: ResultsMultilevel model estimates
The response category labeling effect
Study 6 Demonstration that fluency is a viable determinant
of extreme responding differences between regions in an international survey;
Illustration of how to construct and use relative measures of fluency and extreme responding based on secondary data only;
The response category labeling effect
Study 6: Method
13,520 respondents from 17 European regions; 16 heterogeneous items based on Greenleaf (1992); Use of fully labeled 7-point response scales; Fluency: relative measure of fluency as the natural
logarithm of the ratio of the number of Google hits for the 7th category (strongly agree) to the number of Google hits for the 6th category (agree);
Endorsement: relative endorsement of the 7th vs. the 6th response category (natural logarithm).
The response category labeling effect
N female M age SD age
Belgium, Dutch 644 51% 41.0 11.1
Belgium, French 371 51% 40.5 11.7
UK, English 908 56% 41.8 11.3
Germany, German 993 50% 39.3 11.0
Hungary, Hungarian 1003 51% 38.3 11.8
Slovakia, Slovakian 1063 50% 38.2 12.1
Poland, Polish 802 37% 32.2 11.0
Netherlands, Dutch 1046 50% 40.8 11.4
France, French 1000 51% 39.4 11.9
Spain, Spanish 934 50% 37.8 10.5
Romania, Romanian 970 50% 37.9 11.5
Turkey, Turkish 914 43% 32.5 9.4
Italy, Italian 939 50% 39.0 10.6
Switzerland, French 303 51% 42.5 9.7
Switzerland, German 606 48% 43.5 9.4
Switzerland, Italian 50 56% 32.9 8.7
Sweden, Swedish 974 49% 39.9 11.3
Total 13520 49% 38.7 11.4
Sample descriptive statistics Pan-European study (Study 7 and 8)
The response category labeling effect
Study 6: Results
Note: Standardized regression slope of .67 (p<.01, R2=.45)
The response category labeling effect
Discussion: Summary of findings response category labels that are more commonly used
(i.e., that are more fluent) lead to higher endorsement of their associated response categories;
respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the fluency of the labels;
the effect of fluency is more pronounced for respondents who are lower in literacy and verbal ability;
the problem may be particularly serious in cross-cultural research when different languages are used;
The response category labeling effect
Implications formultilingual survey research
□ Translations usually imply a trade-off between the attempt to be literal and the attempt to be idiomatic;
□ Optimize equivalence: use response category labels that are equally fluent in different languages (rather than literal translations or words with equal intensity);
e.g., ‘Strongly agree’ is most commonly used in scales, but may not have valid equivalents in some other languages. ‘Completely agree’ seems to be a viable alternative.
fluency ERS%
Completely agree 1.24 18.8%
Tout à fait d’accord 1.22 19.2%