INTEGRATING OVERCONFIDENCE AND OVERCLAIMING: EXAGGERATION HARMS PERFORMANCE by PATRICK J. DUBOIS M.A., The University of British Columbia, 2015 A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Psychology) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2021 Patrick J. Dubois, 2021
136
Embed
Integrating Overconfidence and ... - open.library.ubc.ca
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
INTEGRATING OVERCONFIDENCE AND OVERCLAIMING:
EXAGGERATION HARMS PERFORMANCE
by
PATRICK J. DUBOIS
M.A., The University of British Columbia, 2015
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE
tasks online for financial compensation. See Buhrmester et al. (2011).
NFCS Need for Cognition Scale: An 18-item measure developed by Cacioppo et al.
(1984).
NPI Narcissistic Personality Inventory: A popular measure of non-clinical narcissism
(Raskin & Terry, 1988); both 40-item and 16-item versions are used in this paper.
OCQ Overclaiming Questionnaire: A set of reals and foils, based on Hirsch Jr et al.
(1988), introduced by Paulhus et al., 2003 to demonstrate the OCT.
OCT Overclaiming Technique: An application of SDT to overclaiming introduced by
Paulhus et al. (2003).
OLD20 Average Orthographic Levenshtein Distance of the 20 Closest Neighbors: A
technique for measuring (un)wordlikeness by averaging the edit distances of a
letter string from the 20 most similar in a reference corpus of words.
PES Psychological Sense of Entitlement: a 9-item measure by Campbell, Bonacci,
et al. (2004).
RExI Residualized Exaggeration Index: A general technique for isolating exaggeration
in self-image of competence; Residualized incompetence evidence after controlling
for competence evidence.
SDD Self-Deceptive Denial: Part of the BIDR.
SDE Self-Deceptive Enhancement: Part of the BIDR.
SDT Signal Detection Theory: A well-established theoretical framework with analytic
techniques for distinguishing accuracy from response bias when discriminating
ambiguous signals (Macmillan, 2002).
TIPI Ten-Item Personality Inventory: A popular brief measure of the five-factor model
of personality by Gosling et al. (2003).
xiv
UBC University of British Columbia: The location where all studies presented here
took place, using their enrolled undergraduates.
VoKE Vocabulary Knowledge Exaggeration: An English vocabulary overclaiming
inventory (set of reals and foils) developed for this paper, with items selected
using theory from psycholinguistics and cognitive psychology, and empirical
testing.
VST Vocabulary Size Test: A multiple-choice test for assessing size of one’s English
vocabulary (Beglar, 2010).
xv
Acknowledgements
I would never have known about overclaiming, its limitations or potential, unless
Delroy Paulhus had asked my familiarity with a list of jazz musicians. Unfortunately, I did
not overclaim, sparking the suspicion that drove this research.
It was the important work of Steven Heine and Ara Norenzayan that showed me how
incredibly inappropriate it was to rely on convenience samples of undergraduates, unless, of
course, one makes that the population of interest.
I could not have completed this PhD had not Jeremy Biesanz patiently helped me
reformulate my heretical, unsupervised research into a coherent thesis.
I will be eternally grateful for the many kindnesses offered by so many of the faculty,
staff and students of the UBC department of psychology as I impostered my way through
grad school. If you’re reading this, I made it out alive.
I could not have afforded my time at UBC without substantial funding from The
Social Sciences and Humanities Research Council (SSHRC) of Canada; your tax dollars at
work.
Thank you.
1
Introduction: Defining Exaggeration
“I know words. I have the best words.” — Trump (2015).
Donald Trump has famously displayed exaggerated self assessment, claiming greater
ability than he demonstrates. Anyone not born yesterday will have encountered other
people who imagine themselves overly positively, and experience teaches us to not always
trust self-presentation of ability.
Such a person may be labeled braggart, boaster, or blusterer, and we might describe
such behavior as overstating, overestimating, or overclaiming ability, being overconfident
about performance, or having a self-image exceeding genuine competence. Why do we have
so many synonymous descriptions? Probably because such behavior is socially noteworthy.
According to the lexical hypothesis, the basis for much of personality psychology, “Those
individual differences that are of most significance in the daily transactions of persons with
each other will eventually become encoded into their language. The more important is such
a difference, the more people will notice it and wish to talk of it” (Goldberg, 1981, pp.
141-142). We use such labels to warn others about exaggerated self-report.
Merriam-Webster (2020a) defines the verb exaggerate as “to enlarge beyond bounds
or the truth : overstate”. Central to this definition is disparity from reality. Trump’s
monosyllabic boast above might be plausible if he had demonstrated a superior vocabulary,
but he did not. A language analysis by the Boston Globe of candidates’ 2015 campaign
announcements rated Bernie Sanders at school grade 10, Hillary Clinton near grade 8, and
Donald Trump as the lowest of all at grade 4 (Schumacher & Eskenazi, 2016).
Exaggeration is about excess.
While a cartoon or caricature might have exaggerated movements or expressions,
exaggeration here refers only to a person’s excessive self-image of their ability. This paper
examines exaggeration as a psychological phenomenon, how it differs among individuals,
and what those differences might mean. To that end, I define exaggeration as individual
differences in discrepancy between imagined and actual competence, unrelated to
2
competence.
I first begin with a conceptual model of how exaggeration may arise, then review how
this individual difference has been measured in the past, identifying some oversights and
contradictions in existing literature. In response to that, a new approach to exaggeration is
proposed, then implemented in four empirical, quantitative studies. The end result
validates a methodology for more clearly understanding this well-known but misunderstood
phenomenon, and establishes a foundation for future research.
Framing Exaggeration
Figure 1Influence of Competence and Self-Image on Performance.
Exaggeration is an example of a latent construct, a theoretical conception of a hidden,
unobservable psychological phenomenon. When Forrest Gump famously noted that “Stupid
is as stupid does” (Zemeckis, 1994), he was wisely noting that we can only infer a latent
construct (e.g. stupidity) through its expression in observable behavior. Similarly, we can
explore what exaggeration is by looking at what it does to performance of the exaggerated
ability.
The model shown in Figure 1 is based on existing understanding of how abilities are
manifested: “Performance is conceived as the observable solution behavior of a person on a
set of domain-specific problems. Competence (ability, skills) is understood as a theoretical
3
construct accounting for the performance.” (Korossy, 1999, p. 103, original emphasis).
Competence is positively correlated with performance (the ‘+’); they increase (or decrease)
together. Humans have been fascinated with comparing competencies through
performance, as the long history of the Olympics or other competitions shows. Our abilities
are likewise tested throughout our education with examinations and other performance
tests. We objectively evaluate such performances because we know that self-report is not
always an accurate indicator of genuine competence: We don’t just ask who is fastest or
smartest, we test people.
Nonetheless, the raw potential of competence is not the only determinant of
performance. Our beliefs about our competence, and how we will meet a challenge, are also
relevant. Some of our beliefs will be based on feelings of confidence, our internal sense of
competence, but other beliefs may interfere with the accuracy of this sensing. If I believe
running a marathon makes me a good person, need to affirm that belief may overshadow
accurate competence assessment, leaving me collapsed half-way through the race.
Our self-image is our mental construction of who we are, and this will include beliefs
about what we can do, should be able to do, and what effort is required for success. Ideally,
self-image of competence should positively correlate with genuine competence, even if
imperfectly. The question mark in Figure 1 indicates that self-image may contribute
positively or negatively to performance, depending on the harmony between self-image and
competence. Performance is thus shaped by both what we are capable of (competence),
and how we imagine that capability (self-image). Aesop’s fable of the tortoise and the hare
(in which the hare, far more competent yet arrogant, loses a race to the tortoise) eloquently
demonstrates how self-image can interfere with expression of competence.
Exaggeration can thus be seen as a way in which distorted self-image impairs
performance.1 This is similar to, but distinct from, typical conceptions of confidence, where
1 While exaggeration considers excessive perception of competence, performance may also be impaired byinadequate perception of competence. Such “underconfidence” is not considered here because it isapparently rare and involves the methodological challenge of measuring competence that is not expressed.
4
overconfidence implies an extension of a linear, unidimensional construct beyond some
optimal point. Instead, exaggeration allows for various aspects of self-image to interfere
with competence expression in undermining performance. Note that the model does not
necessarily imply any mediation or moderation relationship. A central goal of the current
research is to distinguish the influence excessive self-image has on performance, separate
from the influence of competence.
Additionally, situational factors may alter our self-image, or its impact on
performance, such as an audience boosting or shriveling our confidence, but for the
purposes of the current research, those many, complex situational factors are set aside in
order to address issues with isolating effects from self-image.
To capture exaggeration, we will need behavioral indications of what a person’s
genuine competence is, and what they mistakenly imagine it is. This can be done by
soliciting optional expressions of ability that provide evidence of competence or falsely
imagined competence: active incompetence.2 By making the expressions optional, one need
only express competence where one imagines competence, e.g. one can admit “I can’t do
that”, rather than pretend they can. In such a situation, active incompetence (e.g. failing
an optional task) indicates error in imagined competence, suggesting exaggerated
self-image: The person thought they could do something they could not.
To minimize confounding influences, all evidence should be collected at the same time
under similar circumstances. To maximize reliability, several ability expressions should be
solicited and aggregated. In other words, to gather evidence of exaggeration, get people to
repeatedly volunteer evidence of competence or incompetence, in comparable proportions.
Finally, because competence is a strong predictor of successful outcomes, care should be
taken that measurement of exaggeration is demonstrably distinct from evidence of
competence. Altogether, this framework presents three requirements for measuring
exaggeration: a) active competence, b) active incompetence, and c) isolation of self-image
2 In constrast to the passive incompetence of not responding to a question.
5
error from competence as evidence of exaggeration.
Because exaggeration of one’s abilities to others may yield rewards (e.g. winning an
election) and involves several complicated contextual factors, for simplicity, all the research
considered here minimizes social or situational influences, or obvious opportunities for gain
from manipulation or deceit. The goal is to understand exaggeration as an intrapersonal
(within self), not interpersonal (between people) phenomenon.
Conceptually, exaggeration can be considered synonymous with the terms
overstatement, overestimation, overclaiming and overconfidence, yet all those terms have
been used for distinct methodological approaches to measuring differences in one’s imagined
and actual abilities. This may be an example of the jangle fallacy : “the use of two separate
words or expressions covering in fact the same basic situation” (Kelley, 1927, p. 64). The
present research aims to integrate those approaches into one unified methodology.
A Brief History
Broadly speaking, there have been two approaches to simultaneously gathering
evidence of competence and imagined competence. One approach (overstatement and
overestimation tests) combines objective tests with (prior or post) estimates of success.
The number of correct answers (objectively scored) serves as evidence of competence, while
falsely imagined correctness (subjective statement or estimation) suggests unacknowledged
incompetence. The number of answers not claimed correct is ignored but allows a degree of
freedom between the other two scores.
Another approach (overclaiming) uses only ability claims, but embeds the competence
distinction in the items themselves. All items involve claiming ability, but some of the
items are fictitious, so claiming them requires active incompetence. This also yields two
scores: the rate or amount of claiming genuine items (reals), and the rate or amount of
claiming fictitious items (foils).
Both approaches allow someone to volunteer evidence of competence or
6
incompetence, all within the same test. While both have been around for nearly a century,
these two approaches have not been explicitly examined together.
Overstatement
The rising popularity of IQ tests at the start of the 20th century inspired a plethora
of attempts to quantify human potential (Richardson, 2002). Based on the belief that a
discrepancy between claimed and demonstrated ability was diagnostic of one’s “character”,
a review of research around that time (Symonds, 1924) listed the overstatement test
(Voelker, 1921) as an emerging assessment method.
As an example, Woodrow and Bemmels (1927) compared results of an overstatement
test to a “goodness” of character rating by teachers of pre-school children, reporting a
rank-order correlation of rs = .56 for a group of 17 five-year-olds and rs = .433 for a group
of 14 four-year-olds. The overstatement test involved a researcher interviewing children
individually, telling them “I’m going to ask you some questions to find out how many
things you can do. I want to find out who in your class can do the most.” (p. 241), then
asking a variety of questions such as “Can you write your name?”, “Can you stand on your
head?” and “Can you count up to ten?”, finally followed by the child demonstrating each
claimed ability, which was then liberally assessed. The younger group claimed 51% (while
performing 30%) and the older group claimed 75% (then performing 50%) of the queried
abilities, with only one child under-estimating their ability.
As an individual difference measure, the test was scored as the number claimed
divided by the number performed, with “The smaller this ratio, the better the score” (p.
242), presumably meaning that reverse ranking this score accounted for the positive
correlations (above) between less overstatement and teachers’ ratings of character goodness.
Considering statistical and practical issues (inherent in using ratios for scores), the
authors conclude that the issue of scoring the test “is not an altogether simple one.”
3 These correlations reported as ρ in the original paper.
7
(Woodrow & Bemmels, 1927, p. 243). These issues may have proven insurmountable, as
the overstatement test had faded to obscurity by the 1960s4.
Overestimation
Part of modern research on overconfidence is an approach called overestimation, a
methodology very similar to overstatement: “If a student who took a 10-item quiz believes
that he answered five of the questions correctly when, in fact, he got only three correct,
then he has overestimated his score. Roughly 64% of empirical studies on overconfidence
examined overestimation.” (D. A. Moore & Healy, 2008, p. 502). Overestimation is
typically calculated as a difference score, the arithmetic excess of estimate over
performance: 5− 3 = 2 in that example.
Both overestimation and overstatement gather ostensibly identical information: the
number imagined correct and the number objectively correct. The methodology of
overestimation, however, may be less psychologically direct than that of overstatement.
An overestimation test requires answers for every item (possibly by guessing),
regardless of one’s sense of ability. After the test, the participant must reflect on their
aggregate score in order to make an estimate. Retrospectively evaluating one’s performance
on a completed task may elicit “choice-induced preference change” (D. Lee & Daunizeau,
2020, p. 1). For example, it may be easier to rationalize that one answered a question
correctly after committing to an answer, if only because considering alternatives after such
commitment can induce cognitive dissonance (Joule & Azdia, 2003). There is also the
possibility that reflection on past performance may include recency effects (Murre & Dros,
2015), where experiences from the last few questions may carry more weight in an
aggregate estimate.
A further complication arises from the transparency of the research question (e.g.
“How many do you estimate you got correct?”). Made aware that accuracy of estimation is
under scrutiny, motives for social desirability or impression management may inspire a false
modesty: Having already completed the task, I will appear more humble if deflating my
estimated score. By triggering self-consciousness, overestimation methodology may be
distorted by the same psychological processes it attempts to measure.
Thus, retrospective assessment of aggregated past performance done during
overestimation tests may differ psychologically from the prospective assessment of specific
abilities done in overstatement tests. The (prospective) overstatement approach may be
more psychologically valuable simply because we are often more interested in predicting
somebody’s future behavior (e.g. likelihood of making an error) than predicting their
after-the-fact estimate. Overestimation tells you how people perceive past performance,
while overstatement tells you how they imagine future success.
An established observation about overestimation is that it varies with difficulty, i.e. is
relatively greater in hard tests and lower in easy tests. A multi-cultural examination of
overestimation replicated this hard-easy effect and found it much stronger than effects of
culture, sex or age (D. A. Moore et al., 2018). This effect, where excess rating is inversely
related to difficulty or ability, may be a methodological artifact: Low performers have more
range to overestimate than high performers. Seen another way, if estimates tend toward the
center of the distribution, difference scores will tend to be positive for hard tests and lower
(or negative) for easy tests. Even if estimates correlate perfectly with performance, but are
shifted centrally, difference scores will show the hard-easy effect. Such a scoring will likely
be negatively correlated with ability, e.g. Duttle (2016) found r = −.69 between
overestimation and performance on Raven’s Progressive Matrices. Difference scores yield
an effect similar to that found in the “unskilled and unaware” Dunning–Kruger effect
(Kruger & Dunning, 1999) which has been shown to be largely a statistical illusion
(Krueger & Mueller, 2002). More precisely, Cor(Score, Estimate− Score) approaches
Cor(Score,−Score) to the extent that V ar(Estimate) < V ar(Score). As mean Score
rises, Estimate range decreases. Evidence of exaggeration is mathematically confounded
9
with evidence of competence, so it’s impossible to cleanly distinguish effects.
Given the statistical problems noted with ratios used in overstatement tests, and the
artifactual correlations of difference scores used in overestimation (and sometimes
overstatement), neither overstatement nor overestimation, as conventionally implemented,
provide a clean measure of exaggeration separate from the ability being exaggerated.
Overclaiming: Foils Among Reals
As charming as it may have been in the 1920s to have children stand on their heads
for science, that approach can be difficult to scale up. Any objective ability test may take a
long time and induce stress in participants. A far more convenient ability to test is
knowledgeability, and probably the most face-valid or obvious test of exaggerated
knowledge is to query familiarity with something that does not exist.
For example, imagine a simple vocabulary test that only required the respondent to
rate their knowledge of words, without demonstrating that knowledge. If such a list
included the ostensible word covfefe, claiming to know the meaning of that5 would
demonstrate active incompetence.
Questions about fabricated, non-existent items, often labeled as bogus or foil,6 seem
ideal for capturing exaggeration because knowledge of such items is impossible (if the item
is designed appropriately, which may not be the case, as we shall see). Such items are often
combined with similar genuine or real items to also collect some evidence of competence.
In the book New Perspectives on Faking in Personality Assessment, the chapter on
“Overclaiming on Personality Questionnaires” surveys the use of foils claiming in
psychological research, describing “several historical precedents for the notion that claiming
familiarity with foils is a face-valid indicator of knowledge exaggeration”, with exaggeration
5 As many have: www.snopes.com/fact-check/covfefe-arabic-antediluvian/
6 While both these terms apply to impossible items that honest, attentive, rational people should neverclaim, the term bogus is typically used for items researchers expect to have no desirability, while foil refersto items which someone might have reason to claim falsely. As will be discussed below, such a distinction isnot always clear cut.
interpreted there as faking (Paulhus, 2012, p. 151). It reports the earliest use of foils in
psychological research as Raubenheimer (1925) where respondents indicated which books
they had read, with 10 of 25 titles presented being fictitious. For example, respondents
(boys being assessed for potential delinquency) could claim to have read the existing book
“Robinson Crusoe” (to indicate literary knowledge) or the nonexistent book “The
Prize-Fighters Story”, indicating an exaggerated self-report.
That chapter title refers to a study by Phillips and Clancy (1972) which used
overclaiming to describe foils claiming, a term which Merriam-Webster (2020b) reports first
appeared in 1824 and means “to claim too much of something”. That study queried
participants about “their use of several new products, books, television programs, and
movies — all of which were actually nonexistent” (p. 928; their emphasis). This
overclaiming behavior was found to be related to participants’ rating of the desirability of
being the kind of person who tries new products, etc. The association between foils
claiming and valuing being trendy suggests a motivated, self-enhancing exaggeration.
As noted above, foils claiming has also been interpreted as dishonesty or faking.
Anderson et al. (1984) assessed job applicants by having them rate their skill levels on a
variety of tasks, many of which were genuine (i.e. reals), while several of were fictional, i.e.
foils. For example, respondents were asked to rate their experience with the fictitious task
“Typing from audio-fortran reports” (Table 2, p. 577). The extent of foils claiming was
found to negatively predict a later objective test of job skills, especially when controlling
for self-assessment of genuine skills.
Misrepresentation of self is not the only interpretation given to foils claiming: Foils
have appeared in research with other goals. For improving validity in marketing tests of
advertising exposure, Lucas (1942) describes a technique originating in 1937: Participants
report their recognition of various advertisements, some of which are unpublished and
could not have been seen, i.e. foils. Following that methodology, Smith and Mason (1970)
reported that warning participants about the foil ads had no effect on claim rates. This
11
suggests a possible recognition memory bias; respondents genuinely believed they
recognized something they had not seen before. The aim of such research, however, was to
assess advertising efficacy, not psychological mechanisms driving false claims.
Among other applications, foils have also been used to check validity of traumatic
brain injury reports (e.g. Mackenzie & McMillan, 2005) and digital literacy surveys (e.g.
Hargittai, 2009), to assess pretrial prejudice in court cases (e.g. Moran & Cutler, 1997), or
to generally identify careless survey responses (e.g. Meade & Craig, 2012). For example, if
a North American respondent agrees to the statement “I have never brushed my teeth”
(Meade & Craig, 2012, Table 1, p. 5), they are probably not paying attention to that
question, nor possibly the rest of the survey. Such investigations typically treat foils
claiming as errors to be corrected for, with little consideration of what such aberrant
behavior might indicate.
To summarize, foils have been used in research to ostensibly assess self-enhancement
(ego-motivated misrepresentation, such as claiming to have used a fictitious product),
cognitive bias (falsely recognizing an ad they had not seen) or carelessness (lack of
attention in survey responding). To my knowledge, no previous research has adequately
considered these contrasting explanations simultaneously.
An issue worth noting has to do with the ethics of using foils, i.e. whether it is
deceptive to ask about impossible abilities, to confront people with un-winnable challenges,
or entrap them into failure. Typical ability tests do not ask trick questions; one may
assume that if asked, “Do you know X?”, X actually exists. Warning about foils or failure,
however, raises the practical issue of insuring that participants acknowledge and
understand such a warning. The more careless, for example, may not take heed.
Alternatively, the more cautious or risk-averse may alter their responding more drastically
than others. Inevitably, there will be new individual differences introduced by the warning
and how it is presented. In everyday life, we encounter impossible problems, areas where
nobody should claim answers, with no guardian warning us of potential failure. Thus, an
12
ecologically valid test should introduce no more protective measures than ordinary life.
The absence of warning may be more valid in another way. There may be many
individual, contextual, or cultural differences in the degree to which people believe they
could or should be able to confront any and all challenges successfully. Exaggeration may
reflect such a difference, i.e. the tendency to assume, or be entitled to, a certain level of
success, as if one expects or deserves it.
Regardless, warning about potential foils (vs not), while it discouraged claiming in
general, did not remove relationships between overclaiming and narcissism in a study done
by Paulhus et al. (2003). This is consistent with the findings noted earlier that warning
participants about bogus ads did not affect false recognition rates (Smith & Mason, 1970),
and that the relationship between false knowledge claims and self-perceived knowledge was
not altered by warning about foils (Atir et al., 2015). Consequently, for ecological validity,
simplicity and practicality, the current research does not warn about the presence of foils.
Theoretical Causes of Exaggeration
Why might someone exaggerate their competence? Broadly, we can consider two
kinds of possible causes: motivated and unmotivated.
Someone may be motivated to enhance their self-presentation, to overly state or claim
ability, simply because, socially, it can pay off. Threatened animals often present
themselves as larger or more ferocious to intimidate others. Mate selection often requires
putting one’s best foot, feathers, calls, dancing or behavior, forward to impress the other
sex. Human overconfidence yields status benefits (Kennedy et al., 2013) so exaggeration
may help to intimidate others, or to acquire mates, votes, jobs, or other resources. For
example, Trump may not have won the 2016 election if he acknowledged his several
business failures (Stuart, 2016). Given that relative neocortex size relates to use of tactical
deception (Byrne & Whiten, 1992), humans may be uniquely equipped for exaggeration.
While these examples refer to interpersonal exaggeration, boasting to others, it may be
13
that such strategies get reinforced and internalized, leading to habitual exaggeration even
in the absence of social context.
Alternatively, implausible evidence of ability may appear for unmotivated reasons.
The false recognition of an advertisement despite warning (noted above), suggests that
participants had no motivation to misrepresent. Similarly, careless inattention to survey
responses would not indicate motivation to misrepresent an ability. Finally, perhaps
related, such misrepresentation may be, like any error, due to lower cognitive ability, or
simply poor self-awareness.
Exaggeration as Self-Enhancement
Motivated misrepresentation of competence may simply indicate self-enhancement, or
“tendencies to dwell on and elaborate positive information about the self relative to
negative information” (Heine & Hamamura, 2007, p. 4). Self-enhancement should thus
predict a bias toward claiming ability while denying inability or ignorance, leading to
exaggeration. This was the assumption of early uses of overstatement and overclaiming
tests, that people will overstate or overclaim their competence in an ego-enhancing way.
Self-enhancement is considered to have both a social, interpersonal dimension of
impression management, “the goal-directed activity of controlling information in order to
influence the impressions formed by an audience” (Schlenker, 2012, p. 542), as well as an
intrapersonal dimension of self-deception (Paulhus, 1984): We may be bluffing to others
and ourselves.
The personality trait most associated with self-enhancement is narcissism, named for
the mythological Greek youth Narcissus tragically obsessed with his own beauty:
“Narcissism is arguably the personality construct (and pathological disorder) most
fundamentally defined by chronic pursuit of self-enhancement.” (Wallace, 2011, p. 309).
For example, John and Robins (1994) compared self-perception of performance with peer
ratings and evaluations by a staff of 11 trained psychologists. Self ratings related less to
14
staff evaluations than did peer ratings, and showed substantial individual differences:
“people whose self-evaluations are the most unrealistically positive tend to be narcissistic”
(p. 215). Narcissism is the part of the Dark Triad of personalities (overlapping with, yet
distinct from, scheming Machiavellianism and antisocial psychopathy) distinguished by
self-enhancement (Paulhus & Williams, 2002). This personality trait is typically measured
by the Narcissistic Personality Inventory (NPI) and is considered to have multiple facets,
e.g. “Leadership/Authority, Grandiose Exhibitionism, and Entitlement/Exploitativeness”,
with the first being considered generally adaptive, and the last being most maladaptive
(R. A. Ackerman et al., 2011). This suggests that self-enhancement may have both helpful
and harmful aspects.
As discussed above, exaggerating to others may pay off, but why exaggerate with
nobody to impress? Back et al. (2010) note that the entitlement facet of narcissism is most
attractive at first impression while being most maladaptive in the long term. Thus,
short-term social rewards (boasting so strangers like you) may reinforce a dysfunctional
habit: Exaggerating your self to others may lead you to start believing it. More
importantly, exaggeration in the absence of social reward may be maladaptive.
Related to narcissism is overconfidence, an association that “remained significant in a
regression that included self-esteem, self-efficacy, and self-control in the model: for
narcissism, b = 0.33, t(99) = 3.16, p < 0.01” (Campbell, Goodie, et al., 2004, p. 302). In
that study (and most, as noted above), overconfidence is operationalized as overestimation.
Another, more interpersonal, form of overconfidence is thinking you are better than others,
more precisely called overplacement (D. A. Moore & Healy, 2008), also known as the
better-than-average effect. We know this effect is based on an illusion of superiority because
of the observed reality (yet mathematical impossibility) that more than half of people think
they are better than half the population for several abilities; individuals tend to place
themselves higher, relative to others, than objectively warranted.
In examining why humans overplace their abilities, Burks et al. (2013) compared
15
information-processing biases with social goals and found evidence only for the latter,
concluding: “it is natural to consider the possibility that the roots of overconfidence lie in
the value of over-confidence as a social signal” (p. 979). This effect, however, depends on
social comparison; how people view themselves and how they view others (Guenther &
Alicke, 2010), which introduces several situational influences. Nonetheless, while
overplacement and overestimation are conceptually and methodologically distinct, they
have both been labeled overconfidence, and Macenczak et al. (2016, Tables 1 & 2, p.
115-116) reports a correlation of r = .50 between the two, as well as positive (albeit
weaker) correlations between both those measures and narcissism.
Altogether, in terms of ego-motivated behavior, exaggeration may relate to
self-enhancement as impression management, self-deception, or narcissism, and
overconfidence as overestimation or overplacement. As a kind of solitary self-enhancement,
the tendency to exaggerate in non-social situations should be broadly maladaptive.
Exaggeration as Cognitive Bias
While the label “exaggeration” inherently connotes self-enhancement (as do the terms
overstatement, overestimation, or overclaiming) it is conceivable that incompetence may be
demonstrated with no motivation or goal, having little to do with ego or identity. A survey
respondent claiming to recognize an advertisement they had never seen may be merely
demonstrating a memory malfunction. Given that warning about the presence of foil ads
had no effect on claim rates (Smith & Mason, 1970), and that foils claiming is essentially
unaffected by warning (as noted above), such apparent exaggeration may not be
ego-motivated.
An ability well-known to be influenced by cognitive biases is memory, and probably
the easiest to study is recognition memory. Recognition memory involves distinguishing
stimuli that have been previously experienced from novel stimuli, e.g. given a list of words,
some of which are old (seen before) among others that are new, the ability to distinguish
16
old from new. For example, after reading a list of words (e.g. “person, woman, man,
camera, TV”; Baker, 2020), can someone identify them (among other distractors) later?7
Recognition memory bias, the tendency to identify new items as old, has been shown
to be a stable individual trait (Kantner & Lindsay, 2012; Kantner & Lindsay, 2014),
suggesting that people vary in false recognition reports. That latter paper noted similar
individual patterns for susceptibility to some bias manipulations, e.g. falsely claiming
having seen the word ‘sleep’ when trying to remember related words like ‘bed’, ‘rest’, and
‘night’.8
Cognitive psychology has demonstrated several techniques for manipulating
recognition memory bias, such as the use of discrepant fluency (Whittlesea & Leboe, 2003).
When testing recognition of items, if a new item is unexpectedly easy to process
(discrepantly fluent), it becomes easier to mistakenly think it has been seen before.
Intuitively, some minimal level of fluency is required for any useful exaggeration item: It is
unlikely anyone would claim to have read a book with an unpronounceable title. Extending
that logic, it may be that more fluent items facilitate more exaggeration.
Beyond an overall main effect of fluency, given individual differences in memory bias
(Kantner & Lindsay, 2014), individuals may also differ in susceptibility to fluency cues. For
any given item set, some people may be more prone to false recognition, increasing their
chances for exaggeration. How such cognitive traits relate to personality traits such as
self-enhancement has yet to be adequately studied. However, apparent individual
differences in levels of exaggeration may reflect, at least in part, individual differences in
recognition bias.
7 The Montreal Cognitive Assessment given to Donald Trump actually tested recall rather thanrecognition, but you get the point. Those words he “recalled” probably represented only what he could seeat the moment, not recollections from the test.
8 The Deese/Roediger–McDermott paradigm.
17
Exaggeration as Carelessness
If a job application asked “How often have you used the Wentzel Technique to solve a
budgetary problem?” (a fictitious job skill used by Levashina et al. (2009) to measure
faking, p. 274), someone might think “I have no idea what that is, but it sounds like I
should know it, so I’ll pretend I do”; affirming that foil as an expression of self-enhancement.
Or, someone may think, “I remember using some technique for a budgetary problem, and I
think it started with ‘W’, so that must be it”; sincerely but mistakenly affirming because of
recognition memory bias. Alternatively, someone may not think about the question at all
and carelessly affirm it. All three possible thought processes lead to the same behavior, the
active incompetence of claiming a foil, but for different reasons. To better understand the
careless component, we need to examine what we mean by “carelessness”.
While the concept of carelessness may cover several kinds of undesirable or
unintentional behaviors, for this research (because it used surveys to collect data), its most
relevant manifestation is as a source of invalidity in survey responding. This has long been
of interest to researchers, but has been difficult to clearly define, given that there are so
many possible reasons survey responses may not be what we expect. For example, Bond
(1986) argue that what had been labeled carelessness as a cause of inconsistent responding
to the MMPI9 may really be indecision, thus dramatically changing the interpretation.
Nichols et al. (1989) made the distinction between content nonresponsivity (e.g. ignoring
instructions) and content-responsive faking. Huang et al. (2012) more precisely referred to
“insufficient effort responding” (p. 99).
Meade and Craig (2012) found three fairly distinct latent classes (factors), painting a
multi-dimensional picture of carelessness. Part of the distinction is methodological, because
researchers have explored several ways to measure carelessness in survey responses;
DeSimone et al. (2015) provide an overview of several popular techniques (see their Table
1). Many of these operationalizations of carelessness are based on content of responses:
9 Minnesota Multiphasic Personality Inventory: a venerable, widely-used questionnaire.
18
The use of semantic or psychometric synonyms or antonyms makes assumptions about
which responses should be similar or oppositional in meaning or response patterns. A
related approach uses Mahalanobis distance to detect response patterns distant from the
multivariate normal distribution of all responses. These content-based techniques assume
that all respondents interpret the items similarly, which may lead to a subtle researcher
confirmation bias: unusual response patterns may get pathologized as careless, interpreting
diversity as deviance.
If we remove effects of (apparent) carelessness, are we improving data quality, or
limiting representativeness? Bowling et al. (2016) examined insufficient effort responding
and made the distinction between treating such behavior as a methodological nuisance (e.g.
errors to discount) and seeing it as a substantive variable indicating a trait-like, enduring
individual difference that, in fact, predicted academic performance. Using similar measures,
McKay et al. (2018) examined a wider range of personality traits and found that
malevolent traits showed a stronger relationship with carelessness. Their carelessness
measure having the strongest personality correlates in almost every case was the number of
incorrect responses to instructed items, e.g. not responding “strongly agree” when the
question explicitly said to do so. This response style, disregard for item content, may also
influence foils claiming. Further, M. K. Ward et al. (2017) examined careless responding
and attrition in completing online surveys and found personality correlates for both
measures, suggesting that participants who complete a survey carefully are a biased
sample. Using different measures of carelessness and personality, Furnham et al. (2015) also
found associations between validity of self-reports and personality.
Given that apparent carelessness may signal important individual differences (e.g.
exaggeration), how might we distinguish careless responses from the careless person? One
clear indication that a respondent is not paying attention to a question is when the
response is unreasonably fast. After informing participants that they would be answering
the same questions twice, Wood et al. (2017) found that consistency dropped sharply when
19
response time fell below an average of 1 second per item. While that study (and others)
used aggregate response times (e.g. time to complete a page of questions, or the whole
survey), that can be a poor measure, because it indicates the mean (average) time.
Cognitive psychologists, who regularly use response time measures, know that a better
indicator of central tendency is the median, not the mean, because distributions can be
highly skewed (Rousselet & Wilcox, 2020), e.g. a few very long response times can easily
pull the average away from the peak of the distribution.
Because carelessness (however interpreted) may reflect individual differences relevant
to exaggeration, the current research will take the approach of analyzing all complete
response sets, i.e. make no exclusions due aberrant response style. Where carelessness is
measured, it will involve median response time over several items within individuals.
All of the Above
Each of the three speculated mechanisms above may contribute independently to the
behavior of exaggeration: self-enhancing motivation to misrepresent, cognitive bias in
internal representation, and / or careless disregard for accurate representation.
How might these work together to explain exaggeration behavior? First, there must
be some cognitive fluency to facilitate misrepresentation: An opportunity to volunteer
incompetence must be believable, e.g. it’s easier to imagine covfefe is a word than cffveeo
is. Similarly, a lure (incorrect) option in a multiple-choice test should seem correct to some
test takers. The more fluent or believable a claim is, the more self-enhancing motives can
manifest. A similar interdependence could work for carelessness: It takes more inattention
to claim something unpronounceable.
How might these influences be teased apart? One clue might be processing time:
Self-enhancing misrepresentation requires attentive processing to determine the most
positive presentation, whereas a careless claim can be done hastily. Another clue might be
found in differential item responses: Carelessness should affect all items whereas
20
exaggeration may be more apparent when volunteering incompetence.
None of the Above
Finally, the answer may also be none of the above; there may be other reasons people
exaggerate their abilities. One possibility is the “unskilled and unaware” Dunning–Kruger
effect, which posits that lower ability leads to greater error in self-estimates (Kruger &
Dunning, 1999). This effect has been criticized as an artifact of the better-than-average
effect and statistical regression (Krueger & Mueller, 2002), and evidence on foils claiming
shows the opposite effect. Atir et al. (2015) found positive relationships between knowledge
foils claiming and both genuine and self-perceived knowledge: Knowledge exaggeration
apparently increases when one thinks they know more, genuinely or not. P. L. Ackerman
and Ellingsen (2014) specifically tested this hypothesis, and found that unwarranted claims
of vocabulary knowledge increased with validated knowledge, noting that this was in
opposition to the Dunning–Kruger effect.
In a more general sense, beyond specific skills or knowledge, exaggeration could be a
side-effect of lower general cognitive ability, simply a sign of lower intelligence, so this
should be considered as a potential influence. Along the same lines, poor metacognition
(awareness of one’s thinking processes) may also play a role, given that metacognition
As an example of inappropriate foil design, Fell and Konig (2018) attempted to
measure “Academic Faking in 41 Nations” by asking secondary school students around the
world to rate their knowledge of terms from mathematics. Among those were three
fabricated terms (foils), one of which was “proper number”, which is very similar to the
genuine math concept of “proper fraction”, especially if one considers fractions as numbers.
Their data13 show that this foil item empirically behaved more like a real math term, with
more claims of knowledge than ignorance, suggesting that many students appropriately
recognized the concept and graciously allowed for some ambiguity in expression.
Interpreting such partial knowledge as faking seems unjustified.
Another example is the use of “ultra-lipid” as a foil for capturing exaggeration of
science knowledge (Paulhus & Bruce, 1990). Unfortunately, the term can be found via
Google search to be a genuine term used to market cosmetics and in an article in
Comparative Clinical Pathology (Safat et al., 2018). Claiming it may not indicate the
knowledge the researchers had imagined, but it still may indicate knowledge more than
exaggeration.
As those examples illustrate, the real / foil distinction is less categorical than
continuous, an issue not adequately addressed in existing research. Foils with the highest
claim rates may be altered, creative, unofficial (e.g. slang), or rare indicators of genuine
competence, just not what the researchers expected. At the same time, foils must be
seductive enough to avoid floor effects in claiming. For example, over a range 0 to 4,
Bynum and Davison (2014) reported both mean and standard deviation of foils claiming
being 0.51, suggesting compressed variance which would limit power of the measurement.
When foils claiming reaches zero, how is exaggeration measured?
The choice of real items also presents challenges. Without an objective test (as with
overstatement), there is no assurance that a real item is claimed based on ability rather
13 Available as “Codebook for student questionnaire data file” atwww.oecd.org/pisa/pisaproducts/pisa2012database-downloadabledata.htm. See item ST62Q04.
than exaggeration. Making real items too easy could lead to ceiling effects, leaving the foil
items conspicuous by contrast. Alternatively, if reals are too difficult, some effectively act
as foils, and this distinction will vary by individual ability.
Ideally, meaningful claiming of real and foil items should show some distinction. For
convergent validity, reals claiming should correlate with valid demonstrations of ability,
while foils claiming should relate to errors of commission, e.g. choosing a wrong answer
instead of admitting ignorance. As divergent validity, while reals and foils claiming may
necessarily relate (given the evidence discussed above) the overlap should be small.
The historical use of foils and the above discussion highlights several potential reasons
for claiming foils. The difficulty is that real items may be claimed for the same reasons, in
addition to indicating competence. Even with optimal item design, care must be taken in
analysis to disentangle these shared influences.
Analytic Issues
Both the overstatement and overestimation approaches provides a clean,
well-accepted measure of actual ability: the number of successes, or percent correct.
However, as noted above, difference scores from either approach do not separate
exaggeration of ability from the ability itself.
With overclaiming, one can easily calculate claiming rates for both reals and foils,
knowing there is no methodological constraint linking these two measures. On the surface,
this might seem ideal: Reals rate indicates ability and foils rate indicates exaggeration.
However, without some mechanism to validate claims on reals (as done with overstatement
or overestimation), how do we know which reals claims are not exaggerations? Likewise,
how do we know that foils claiming is not related to competence?
P. L. Ackerman and Ellingsen (2014) addressed these questions, within a larger goal
of testing accuracy of self-estimates of vocabulary ability. Kirkpatrick (1907) had
developed a simple vocabulary test in which respondents marked a ‘+’ or ‘−’ beside a list
25
of 100 words to indicate which they knew or did not, respectively, then, without warning,
tested understanding of words marked as known. P. L. Ackerman and Ellingsen (2014)
built on this method, which is essentially an overstatement test, because foils were not
used. However, the term overclaiming was used for claiming knowledge of a word that
could not be adequately defined in the later test. Such active incompetence claims were
called false alarms while claims later validated on the test were called hits. The researchers
reported that overall knowledge claims correlated similarly with hit rates (r = .79) and
false-alarm rates (r = .79), and that hit and false-alarm rates also correlated significantly,
at r = .24 (all p < .01). This would suggest that self estimates of ability are fairly accurate
but also influenced by exaggeration, and that exaggeration increases slightly with ability.
The researchers note that this finding is in opposition to the well-known “unskilled and
unaware” Dunning–Kruger effect which posits that lower ability leads to greater error in
self estimates (Kruger & Dunning, 1999). Instead, self-image error may increase with
competence.
Consistent with this, Atir et al. (2015) found that higher self-perceived knowledge
predicted more foil claiming, i.e. greater confidence meant more overclaiming of knowledge.
This relationship existed even when controlling for level of knowledge, and when being
warned of the presence of foils. By manipulating self-perceived knowledge via an either
easy or hard pre-test, they also showed a causal relationship: The easy test increased
subject confidence and overclaiming.
The relationship between reals and foils claiming may be even more complicated: In
an examination of faking in a genuine job application (with warning that faking could be
detected and penalized), Levashina et al. (2009) introduced three foil items (e.g. asking
applicants how often they have used a fictitious technique) and found that impossible
ability claiming increased with genuine claiming, but was negatively related to mental
ability. Yet, as number of foils endorsed increased, so did the positive relationship between
genuine claiming and both job knowledge and verbal ability (from about r = .20 to
26
r = .40). The paper concluded that “job candidates with higher levels of mental ability
might fake in less detectable ways” (p. 279).
Clearly, the behavior of claiming foils is not always independent of the claiming of
real items. Claiming of either reals or foils may reflect any of the factors noted above
(self-enhancement, cognitive bias, carelessness, etc.), appearing as an indiscriminate
response bias.
For overstatement or overestimation approaches, there may be similar issues
confounding genuine and exaggerated claims. Ability claims may also be susceptible to
fluency effects, carelessness, partial knowledge, or poor item design. In a multiple-choice
test, number correct will be affected by chance. Difference scores used for overestimation
(and sometimes for overstatement, e.g. Brogden, 1940) have long been criticized (e.g. Peter
et al., 1993; Edwards, 1994).
A Unified Approach to Assessing Exaggeration
The above discussion summarizes how different methodologies — overstatement,
overestimation (a form of overconfidence), and overclaiming — have all simultaneously
gathered both evidence for competence and mistaken self-image, yet we can note an
interesting contradiction. The overstatement and overestimation techniques produce results
suggesting that the discrepancy between imagined and actual ability decreases with
competence, e.g. the r = −.69 found between overestimation and performance on Raven’s
Progressive Matrices (Duttle, 2016). However, overclaiming approaches (e.g.
P. L. Ackerman & Ellingsen, 2014; Atir et al., 2015) tend to find a positive relationship
between the active incompetence of foils claiming and genuine competence (assessed
independently). Does exaggeration, mistaken self-image of ability, decrease or increase with
genuine ability? The contradictory results found in the literature may be a result of not
properly isolating exaggeration from competence.
To address this, and to integrate those methodologies, this paper proposes a unified,
27
linear regression approach for measuring exaggeration:
1. In the same test, gather repeated evidence of competence (e.g. correct answers, reals
claiming), and active incompetence (e.g. incorrect answers, foils claiming).
2. Statistically remove common variance by finding the residuals of predicting
incompetence from competence.14
Let the resulting measure be called the Residualized Exaggeration Index (RExI). The
idea here is that there may be many common influences driving expressions of either
competence or incompetence, such as exaggeration, cognitive bias, partial knowledge,
carelessness or other response bias. The residuals capture what is not common to the two
measures, but what is unique to the behavior of active incompetence. The RExI is thus
guaranteed to be uncorrelated with evidence of competence. In this way, it represents
exaggeration of ability unrelated to the ability being exaggerated.
It is important to note that what the RExI captures, while conceptually related to the
connotations of exaggeration, overconfidence, overstatement, overclaiming, etc., is more
precisely the error variance of self-perceived competence, which may arise from various
causes. Thus, the RExI is more a technology to isolate useful information about
self-perception than a theory-driven operationalization of a hypothetical construct. This
bottom-up approach avoids some potential researcher biases: The goal is not to validate a
theory as much as to understand a behavior by isolating its effects. While the RExI serves
as a standalone measure of individual differences, for regression modeling, simply include
competence evidence as a control variable; the standardized β for active incompetence then
indicates the influence of exaggeration.
The RExI can be extracted from any overstatement test by finding the residuals of
predicting the number of failed attempts from the number of successful attempts.
14 More precisely, here is computer code for calculating the index, using the statistical programminglanguage R (R Core Team, 2020):RExI <- resid(lm(scale(Incompetence) ∼ scale(Competence), na.action = "na.exclude"))
28
Furthermore, any objective test (e.g. a math quiz or multiple-choice test) can be converted
to an overstatement test by adding a non-claiming option to each question, e.g. the option
to respond “I don’t know”. This requires the test taker to self-assess their specific
competence at the moment they are addressing each question.
Similarly, for overestimation, find the residuals of predicting self-estimated number
correct from actual number correct.15 For both overstatement and overestimation methods,
the RExI will, by definition, be uncorrelated with demonstrated ability. This allows the
conventional ability measure (e.g. number correct on a test) and the RExI to both be used
as independent assessments.
(One might logically consider using a complementary approach for assessing
competence, e.g. finding the residuals after predicting number correct from estimated
performance. The complication is that this residualized ability measure is no longer
uncorrelated with the RExI, meaning that one no longer has separate measures of
competence and exaggeration. Given that the conventional estimate of competence,
number correct, has shown widespread utility and acceptance for over a century, there is
little reason to change that now.)
To assess exaggeration with an overclaiming inventory, find the residuals of foils
claiming rate predicted from reals claiming rate. This addresses the several issues with foils
noted above, because common influences to claiming any item (i.e. bias) is removed.
Unlike overstatement, however, an overclaiming approach does not provide a verifiable
measure of competence unrelated to exaggeration.
Residuals have been used in other research to remove confounding influences. An
influential example in the study of self-enhancement is the work of John and Robins (1994),
in which their self-enhancement indeces are residuals of self-ranking after removing
variance from ratings by peers or expert observers (Table 6). That research, like much on
15 However, as noted above, overestimation is a far less psychologically direct method of capturingimagined ability, and so not recommended for capturing exaggeration.
29
self-enhancement, compares self-perceptions (S) against perceptions of others (P ), and/or
with others’ perceptions of the self (O).
Krueger and Wright (2011) thoroughly discuss the many challenges arising from
deriving self-enhancement from those three measures, and from various analytic approaches
to combining them, including use of difference scores and residuals. That work considers
two contexts for measuring self-enhancement: an intrapersonal comparison of self to
perceived others known as social comparison theory (Festinger, 1954; Suls & Wheeler,
2013), and an interpersonal comparison using an observer-based paradigm, the social realist
approach (Funder, 1995; Kenny, 2004). The former frames self-enhancement as thinking
myself better than how I perceive others (S − P ), while the latter considers how I see
myself compared to how others see me (S −O). In both cases, there is a discrepancy to
measure, but from different reference points. The authors note that the social comparison
theory considers self-enhancement as beneficial (the Taylor and Brown hypothesis), while
social realist theory sees it as detrimental.
Curiously, that work introduces reality measures (R) such as test scores without
considering if the psychology of S −R self-enhancement differs from the S − P or S −O
framings. The S −R discrepancy is what exaggeration captures, avoiding the biases and
errors inherent in P and O measures which are enmeshed in social comparison. The lack of
social context for exaggeration suggests it may not fully fit under the umbrella of
self-enhancement, at least as conventionally studied.
More relevant to the current research is the inflation approach used by Anderson
et al. (1984). That research administered examinations to job applicants that included
self-assessments on a variety of job skills, some of which were nonexistent bogus (foil)
items. This was essentially an overclaiming test of job skills, and in their final analysis they
used linear regression to predict an objectively-measured job skill (typing performance)
from the two types of skill claiming, showing incremental validity from the foils claiming,
greater than the predictive validity of reals claiming alone. That study (like many dealing
30
with bias in self-report) focused on correcting estimates of some criterion, rather than using
the index to measure a separate psychological process.
Claiming that the RExI approach is fundamentally new would clearly be overstating
the case. However, previous literature tends to not consider exaggeration as a distinct
phenomenon, presumes it represents some pre-determined theoretical construct, or fails to
measure it cleanly. A goal of this paper is to present evidence that exaggeration deserves to
be examined separately from existing constructs of self-enhancement or cognitive function.
Exaggeration may be a functional conglomeration of several constructs, but it is worth
remembering that constructs are just theories, and exaggeration manifests as a reliable
reality. Investment in theory may explain why both overestimation and overstatement
literatures have persisted for so long with little recognition of their inherent contradictions.
Ironically, an exaggerated sense of knowing may have kept researchers from exploring what
exaggeration is.
The RExI thus provides a methodological integration uniting overstatement,
overestimation and overclaiming approaches, providing a comparable measure of self-image
error distinct from competence and other common influences.16 Armed with this technique,
we can explore the impact exaggeration has on more global performance, and what factors
may relate to it.
Current Research
If the RExI addresses contradictions in previous approaches, those approaches cannot
be used to consistently validate the RExI. If previous results were influenced by
competence, then removing that influence may result in weaker or null effects. Ability
tests, from school exams to IQ assessments, have a well-established history of predictive
validity using the number correct as the signal. Because the RExI removes such signal, it is
entirely possible that what is left over is essentially noise. The central research questions,
16 An approach to overclaiming purporting similar distinction, called the Overclaiming Technique (OCT), isdescribed in the Appendix.
31
then, are whether there is any useful new signal in the RExI, and if so, whether it is easily
explained away, or is something new.
As an initial exploration, the current research relies on convenience samples of
undergraduate students. For this population, a relevant ability to study is knowledgeability,
the ability to answer simple questions of fact. To be ecologically valid, the main outcome or
dependent variable (DV) used here is academic performance, which captures not just
knowledgeability in general, but a broad range of skills and abilities relevant to success in
life. The breadth of this DV means that expected effects should be small, but if still
significant, would indicate a meaningful relationship with broad implications.
Validation Criteria
To show that a measurement captures something useful, we need to show that it a)
has expected similarities (convergent validity), b) has expected differences (divergent
validity) and, c) tells us something we didn’t already know (incremental validity).
Convergent Validity. Following the connotations of the terms exaggeration,
overstatement, overestimation, overclaiming or overconfidence, we should expect that the
RExI, representing error in self-image, should indicate impaired performance of the ability
being exaggerated. Additionally, because the discrepancy captured is in excess of
competence, this should relate to self-enhancement, and, such discrepancy may be
facilitated by cognitive biases.
Broader Performance. An unrealistic view of one’s ability should predict
impairment of that ability. This is the logic behind preventing drunk driving: Even though
an inebriated driver may not have caused harm (yet), their exaggerated sense of ability to
drive predicts potentially catastrophic performance failure. Similarly, for students,
knowledge exaggeration should predict lower knowledge (academic) performance. For
exaggeration to be meaningful, it should generalize: Someone who can’t walk a straight line
probably can’t drive a car. Likewise, exaggeration of knowledge in a narrow domain should
predict impairment of broader academic performance, ideally, even if the knowledge being
32
exaggerated does not. Thus, when given even a trivial knowledge test, exaggeration
demonstrated there should predict lower academic performance overall.
Self-Enhancement. Beyond performance impairment, to fit an intuitive notion of
exaggeration, the RExI should align with self-enhancement: an exaggerated, narcissistic
sense of self. While self-enhancement is a fairly broad construct typically assessed via
self-reports, the RExI, being a behavioral measure, may relate in only some narrow, specific
ways. If exaggeration predicts performance impairment, it should relate more to
maladaptive aspects of narcissism, such as entitlement, perhaps because one believes they
deserve success. Exaggeration as an unrealistically positive self-view should also relate to
overconfidence as overplacement, i.e. seeing oneself as better-than-average.
Cognitive Bias. A less motivational and more “innocent” explanation of
exaggeration may be bias in information processing. Of the several heuristics that veer
from rational expectations (e.g. Kahneman et al., 1982), recognition memory bias is a good
starting point to compare with exaggeration, given the memory error findings noted above.
Alternatively, performance on a memory test may exhibit exaggeration as would any other
ability, which should have similar relationships.
Divergent Validity. While relationships between a RExI and performance and
self-enhancement would confirm an intuitive understanding of the measure, and
relationships with cognitive bias help explain some of the mechanism, such convergent
validity only paints part of the picture. The boundaries of the picture, evidence of
divergent validity (i.e. what exaggeration is not), should also be considered. Because the
RExI is based on residuals after removing competence variance, it may be influenced by
other, unexpected factors.
Carelessness. Carelessness, the ever-present threat to any survey validity, may
appear as exaggeration. Simple lack of attention can lead to invalid responses, and such
behavior could contaminate any measure, especially the RExI, because it removes variance
attributable to competence. However, carelessness as a substantive variable, an enduring
33
individual difference (Bowling et al., 2016), may explain part of exaggeration, and should
replicate that paper’s finding, predicting lower academic performance. Thus, carelessness
may be a meaningful component of exaggeration, but should not be the dominant one.
Other Explanations. If exaggeration affects performance, then it should not be
easily explained by other obvious predictors of performance. For the relationship between
knowledge ability and academic performance, the RExI design rules out influence from the
knowledge being exaggerated. Beyond that, general cognitive ability should also be ruled
out to show that exaggeration is not simply a side-effect of lower intelligence. Following
that logic, metacognition (awareness and management of cognitive processes) should also
be ruled out, as that is also a reliable predictor of academic outcomes (Ohtani & Hisasaka,
2018).
Cultural effects in psychology are often overlooked, leading to poor inferences of
generalizability (Henrich et al., 2010). While it is far beyond the scope of this paper to
consider the many known differences between cultures, given that the convenience samples
used in the current research are all university undergraduates in Canada, perhaps the most
relevant distinction is between Western and non-Western cultural backgrounds. That
difference, and sex, are two control variables considered in all studies.
Incremental Validity. If the impact exaggeration has on performance can be largely
explained by variables considered above, then the behavior of exaggerating one’s ability
will be better understood. If not, then the RExI may represent something distinct worth
further exploration. Because overall cognitive ability and memory performance should
logically affect academic performance, and substantive carelessness has also been shown to
lower academic performance (Bowling et al., 2016), all these should be considered in
examining the relationship between knowledge exaggeration and knowledge performance. If
a distinct relationship holds, even after further control for sex and basic cultural variables,
that would suggest that the RExI captures an important, but overlooked, non-cognitive
variable explaining academic performance.
34
Study 1: Proof of Concept
The main goal of this study was to establish that the RExI captured information
relevant to performance. Because insufficient effort responding (IER) has related to lower
academic performance (Bowling et al., 2016), this study sought to minimize such influence
by design. By using only students intrinsically motivated to complete the study for no
other reason than feedback about their personality, this initial exploration selected only
participants who, ostensibly, cared about their results and thus their responses. Some basic
personality, cognitive and metacognitive measures, were included as controls.
Study 2: Validating the RExI
Study 2 was designed to replicate and extend Study 1, using better measures and a
larger, broader sample. Overall university Grade Point Average (GPA) was used to measure
academic performance more broadly, accurately, and reliably. Exaggeration was derived
from a large, popular inventory of overclaiming items, self-enhancement captured through
measures of narcissism, impression management, self-deceptive enhancement and
self-deceptive denial, and recognition memory was tested with a large battery of items. To
examine how exaggeration, or its relationship with performance, overlaps with general
cognitive ability, a commercial IQ test was included as a control measure.
Study 3: Developing Better Measures
Having validated the RExI approach by re-purposing an existing overclaiming
inventory (in Study 2), Study 3 tested instruments designed specifically to capture
exaggeration. To capture a relevant, broad ability that university students might want to
exaggerate, English vocabulary was chosen as a knowledge domain to assess. An
overstatement test was developed by adding a non-claiming option to a commonly used
multiple-choice test. Addressing the issues raised above about overclaiming item design,
techniques informed by computational psycholinguistics and cognitive psychology were
35
employed to develop overclaiming items optimized for measuring exaggeration. Both of
these new instruments were empirically examined to confirm their suitability for measuring
exaggeration.
Study 4: Robustness of the RExI
Study 4 was designed to replicate and extend Study 2 by examining multiple different
measures of knowledge exaggeration. Retaining a briefer version of the exaggeration
measure used in Study 2 for comparison, the novel instruments from Study 3 were added to
see if the RExI could capture exaggeration similarly across different abilities, content, and
format. Hypothetically, all exaggeration instruments should show similar relationships with
other relevant measures.
To better understand what exaggeration means, self-enhancement aspects were more
precisely targeted as entitlement, overplacement, and intellectual humility. To examine the
link with carelessness and cognition more closely, special software was developed to capture
individual item response times, and implement a novel technique to detect motivated
carelessness, as persistent, intentional rushing of responses.
Altogether, the following studies examine how the RExI approach of separating the
effects of self-image from competence, the exaggeration of ability from the ability being
exaggerated, may yield a more accurate picture of what is connoted by “overconfidence”
than have previous attempts using overstatement, overestimation or overclaiming.
36
Reporting Conventions
Throughout this paper the following conventions for statistical reporting are adopted
and explained here for convenience.
Correlations are Pearson product moment, which are equivalent to point biserial when
one variable is dichotomous. Statistical significance is always two-tailed and is marked as
follows in text: *p < .05, **p < .01, ***p < .001. Group mean differences are shown using a
conservative t-test (assuming unequal variance, estimated separately for both groups, using
the Welch modification for degrees of freedom), followed by effect size (Cohen’s d). 95%
Confidence Intervals are shown in square brackets. Regression models always show
standardized beta (β) coefficients in order to compare the relative impact of predictors.
For thoroughness, this paper presents some large correlation tables, with hundreds of
elements. To facilitate compact representation and visual distinction, results shown in these
tables follow a different convention. Statistical significance is indicated by font intensity:
p >= .05, p < .05, p < .01. Where appropriate, Cohen’s α is shown in italics on the
diagonal for unidimensional measures of more than two items. For RExI measures, a similar
measure of internal consistency was calculated by correlating RExIs derived from half the
items with the same from the other half. These halves were randomly selected 1000 times
and the correlations averaged via Fisher transformation to estimate overall reliability.
In correlation tables, to concisely describe distributions, the M (SD) column reports
the mean (standard deviation) of data that has been normalized to a range of 0 to 1. This
choice is similar to the percent of maximum possible (POMP) approach advocated by
Cohen et al. (1999). That 0 to 1 range represents the theoretical limits of bounded
measures (e.g. 0 to 100 for grades, 1 to 7 on a Likert scale) and empirical extremes
otherwise. By scaling all data to the same range (for table reports), this convention allows
easier comparison of distributions and better appreciation of skew and dispersion. Thus,
(for example) standardized distributions (e.g. RExI measures) which are centered on zero
will show a positive mean here, which then indicates how far the center of the distribution
37
is from the extremes (0 and 1), providing information about skew that is commonly
overlooked.
Throughout, gender / sex measures have been collapsed to dichotomous, with 0
representing identification mostly as female, and 1 representing identifying mostly as male.
Similarly, “Native English” is 1 if English was reported as a first language, 0 otherwise, and
culture variables are 1 if 10 or more years lived in English / Western countries, 0 otherwise.
For all studies using student populations, these basic demographic measures were used as
controls, but age was not recorded because such variance is often small with potentially
misleading outliers, and ethical considerations recommend against collecting unnecessary
personal information.
All data was gathered via the Qualtrics survey platform (www.qualtrics.com), with
analysis done using the R statistical programming language (R Core Team, 2020) in the
RStudio development environment (RStudio Team, 2019), using LATEX for document
Note: N = 316. Sex coded as binary, Male high. SS GPA: Secondary School Grade Point Average. CRT: Cognitive ReflectionTest. MSLQ: Subset of the Motivated Strategies for Learning Questionnaire. RExI: Residualized Exaggeration Index. Cohen’s α(bootstrap approximated for RExI measures) shown in italics on diagonal. M (SD) for range of 0 to 1. Probabilities shown as p>= .05, p < .05, p < .01 (see Reporting Conventions).
44
Results
Predictive Validity
Table 1 shows correlations between study measures. Demographic measures (sex,
being a native English speaker, and having an English cultural background) did not relate
significantly to course grades.
Exaggeration. On the vocabulary overclaiming inventory, claiming for reals and foils
was uncorrelated (r(314) = -.03, 95% CI [-.14, .08]), with Cohen’s α for reals being .91 and
for foils, .87 (as shown on the diagonal). The absence of correlation between reals and foils
claiming indicates no inherent common variance that might have been caused by
carelessness or other response bias, suggesting that the overall design did suppress
inattentive responding. The RExI derived from this inventory significantly predicted lower
course grades, to the same degree as foil claiming alone did. This is because reals claiming,
which showed no relationship with academic performance, was also unrelated to foils
claiming, so the RExI had no competence evidence to remove. This pattern also suggests
that exaggeration alone can be a meaningful predictor of performance, even when
competence is not. The correlations between reals claiming and demographic measures
suggest that some genuine English vocabulary knowledge was being captured, even if not
relevant to science grades.
Cognitive Ability. While self-report SS GPA showed an expected relationship with
course grades, the lack of relationship with exaggeration may be due to it being self-report,
and thus potentially influenced by exaggeration, i.e. exaggerated self-report compensates
for exaggeration-caused lower performance. The CRT scores showed similar relationships
with higher academic performance and lower exaggeration, suggesting a more general
relationship between exaggeration and performance impairment.
Overestimation. Overestimation of performance on the CRT test was related to the
RExI, but not significantly to course grades, suggesting it was not as effective in capturing
error in self-perception. Consistent with the hard-easy effect in overestimation literature,
45
there was a strong negative relationship between the overestimation index and the ability
being overestimated.
Metacognition. The 22 items selected from the MSLQ for their relationship with
academic performance behaved as expected here, relating to course grades and cognitive
ability measures, but not to exaggeration. The Growth and Fixed Mindset measures
related to the MSLQ metacognition measure in an expected way, in that metacognitive skills
should relate more to a growth orientation. The negative relationship between grades and
growth mindset may be due to restriction of range in the sample (i.e. these students are
already proven to be high achievers), and that science students may find growth items (e.g.
“I have the ability to change my basic intelligence”) contrary to what they’ve been taught.
The lack of relationship between these metacognition measures and exaggeration suggests
some divergent validity.
Personality. Like other measures that related to academic performance, TIPI
openness showed a similar opposite relationship with exaggeration. It is not clear why
exaggeration would relate to agreeableness, especially given the opposite relationship to
reals claiming, i.e. this is not just an agreeable, acquiescent response bias. The lack of
relationship between conscientiousness and exaggeration does provide some divergent
validity, given that other researchers (e.g. Bowling et al., 2016) have found carelessness
related to both lower agreeableness and lower conscientiousness. Exaggeration here does
not fit that pattern. While the courseness of the TIPI means that subtle personality
correlates may not appear here, these results at least suggest that exaggeration is not easily
explained by the five-factor model of personality.
Incremental Validity
Table 2 shows standardized β coefficients (with standard error) of a linear regression
predicting course grades. Note that, because both reals and foils rate are in the same
model, the β for foils rate is exactly the RExI (foils rate controlled for reals rate), showing
that the impact of knowledge exaggeration persists after controlling for measures of
46
cognitive ability (SS GPA, the CRT), metacognition (MSLQ), personality (TIPI), sex and
culture.
Table 2: Regression Model Predicting Course Gradesfrom RExI in Study 1
Predictor β SE p value
Foils Rate (RExI) -.14 .06 .01
Reals Rate .02 .06 .72
Self-Report SS GPA .21 .05 <.001
CRT Correct .15 .05 .009
MSLQ Metacognition .22 .05 <.001
Native English .06 .06 .32
English Culture -.05 .07 .48
Sex (M+) -.06 .05 .25
Extraversion -.09 .05 .08
Agreeableness .07 .05 .16
Conscientiousness .01 .05 .84
Emotional Stability .08 .05 .14
Openness .04 .05 .41
Note: Overall R2 = .21, p < .001. N = 316. Sex codedas binary, Male high. SS GPA: Secondary School GradePoint Average. CRT: Cognitive Reflection Test. MSLQ:Subset of the Motivated Strategies for LearningQuestionnaire. RExI: Residualized Exaggeration Index.
47
Discussion
As proof of concept, science undergraduates, motivated only by their own curiosity,
completed an “Academic Personality” survey in order to get personal feedback. By
gathering knowledge claims of vocabulary unrelated to science grades, exaggeration was
measured using the RExI approach of residualizing foils rate from reals rate.
Students who cared enough about their responses that they wanted to see results still
demonstrated detectable exaggeration, enough to predict lower course grades beyond
measures of cognitive ability, metacognition, personality and demographic controls. This
exaggeration appeared to generalize: The knowledge that was exaggerated (ordinary,
non-science vocabulary) was unrelated to (science) academic performance, yet the tendency
to exaggerate that knowledge was.
While exaggeration related to academic performance but the knowledge being
exaggerated did not, overestimation did not relate to academic performance while the
ability estimated did. This suggests the RExI approach is providing more useful information
than the overestimation index.
These results cannot be explained by carelessness, not just because of study design,
but also because reals and foils claiming showed no common influence, e.g. inattentive
carelessness or response bias. We also see that the cognitive and metacognitive measures
predicted course grades in expected ways, indicating the integrity and validity of the survey
overall.
While this exploratory study was coarse, it confirms that this operationalization of
exaggeration, the RExI, captured a phenomenon worth examining more thoroughly, an
avenue we shall pursue in Study 2.
48
Study 2: Validating the RExI
Will the results of Study 1 replicate with better measures? While a relationship
between exaggeration and lower academic performance was found, the context and
selection was unusual: Only students wanting personality feedback were involved. While
this may have selected for lower inattentive carelessness, it may have also selected students
preoccupied with their self-image. The remaining student studies use a conventional
context for psychological research: undergraduates incentivized to participate in return for
course credit. While certainly not representative of humanity overall (Henrich et al., 2010),
these samples were at least more indicative of North American undergraduates in general.
This context should also now allow for more variance in careless responding as found in
similar samples (e.g. Meade & Craig, 2012), so carelessness is now measured.
To consider self-enhancement as an explanation for exaggeration, Study 2 used a
popular measure, the Narcissistic Personality Inventory (NPI) introduced by Raskin and
Terry (1988). An important quality of the NPI is that it involves forced-choice questions
where one must decide between two alternatives (unlike, say, a Likert question assessing
degree of agreement). This forced-choice format means that the measure is resilient to
response bias or carelessness: Answering uniformly or randomly produces a noisy, middling
score, not misleading variance, because both high and low scores require selective attention
to content. Likert scales, used in many personality measures, when answered inattentively
(e.g. longstrings) or with socially-desirable or other response bias, are more vulnerable to
such signal biases.
To examine self-enhancement in more detail, Study 2 also incorporated the Balanced
Inventory of Desirable Responding (BIDR) developed by Paulhus (1988). This set of three
instruments is designed to capture both interpersonal (impression management) and
intrapersonal (self-deceptive enhancement, self-deceptive denial) aspects of
self-enhancement. The “balanced” in the title refers to half the items being reverse-scored18
18 In psychometric instruments using Likert items to assess some quality (e.g. “From 1 to 10, how agitated
49
in order to compensate for superficial response biases. This 50-50 balance of item scoring
directions also allows for convenient measure of carelessness via longstrings: It is extremely
unlikely that a sincere, attentive respondent would give the same response to more than
half the items.
From the history of foils being used to test false recognition of advertisements, we
noted earlier that exaggeration may be related to recognition memory bias. To test that
relationship, a 100-item battery of words were presented at the start of the survey and then
tested for recognition at the end (with 50 old, and 50 new words). This allowed for testing
both individual differences in memory ability and also memory exaggeration, by applying
the RExI to false (relative to correct) claims of recognition. Note that the cognitive trait of
recognition bias mentioned earlier (Kantner & Lindsay, 2014), is about individual
differences in memory claims, whether valid or not. Memory bias thus represents
confidence (warranted or not) in claiming recognition, which will relate to genuine
recognition ability. This is different from memory exaggeration, which is about false
recognition, uncorrelated with rate of plausible memory claiming.
To get a broader measure of exaggeration, a large (150-item), commonly-used
overclaiming inventory was employed, the Overclaiming Questionnaire (OCQ) developed by
Paulhus et al. (2003). The content of that inventory is based on 1980s American cultural
knowledge (Hirsch Jr et al., 1988), so if knowledge exaggeration does not strictly depend on
the domain of knowledge assessed, as Study 1 suggests, then it should not matter that this
content should be largely irrelevant to academic performance of a 21st century
undergraduate at a Canadian university.
To broaden and generalize the measure of academic performance, for the remaining
studies, the central dependent variable was University of British Columbia (UBC) GPA, a
metric of high ecological validity that reflects not just knowledge, but a broad range of
do you feel?”) a reverse-scored item would assess that quality from the opposite direction, e.g. “From 1 to10, how calm do you feel?”.
50
decisions: cognitive, metacognitive, strategic, social, emotional and more. Like intelligence
or personality, tendency to exaggerate one’s knowledge may represent an individual
difference that affects many life outcomes.
Study 1 showed that exaggeration predicted lower academic performance and also
lower performance on the CRT. This raises the possibility that exaggeration may be simply
an expression of lower cognitive ability in general. To test that, Study 2 employed a broad
measure of cognitive abilities used commercially for evaluating job applicants, the
Wonderlic Personnel Test (E. F. Wonderlic, 1992). Scores on this should predict GPA.
Method
The study was approved by the institutional ethics board and included explicit, active
consent to access student transcript information. As with Study 1, data was gathered via
online survey.
Participants
No longer limited to students from one discipline (like the science students in Study
1) Study 2 considered a wider range of students, in various disciplines and from varied
backgrounds, who happen to be enrolled in an undergraduate psychology course (a popular
elective across disciplines) at UBC, with participants from spring and summer terms.
Students volunteered to complete an online survey for partial course credit. A total of 533
students completed the study, with a median completion time of 44.1 minutes.
Overall, 31% of participants reported their gender as male (69% as female), 50%
reported English as a first language, and 64% as being from Western countries. (Note that
these proportions are also shown in the M (SD) column of Table 3, because these are
dichotomous variables.) 59% of the students were enrolled in an Arts program. While 43%
of the students sampled were in their first year, this distinction had no impact on the
outcomes shown below, nor did their overall number of academic terms.
51
Measures
Academic Performance. To capture academic performance with breadth, reliability
and ecological validity, the study asked participants to grant access to their university
transcripts. From this, GPA was calculated as overall average grade for all courses
completed at the university, including courses in progress, so there were data even for
students new to the university. These grades were represented on a 0 – 100 scale. Note that
this is an improvement over many studies where self-reported GPA is used; official
transcript information avoids measurement error or self-report bias.
Knowledge Exaggeration. The OCQ-150 (Paulhus et al., 2003) was employed here as
a reference for extracting exaggeration measures, given that it has been widely applied in
overclaiming studies. The broader knowledge domain these items query is 1980s American
culture, taken from Hirsch Jr et al. (1988), in ten categories (20th Century Culture Names,
Authors and Characters, Books and Poems, Fine Arts, Historical Names and Events,
Language, Life Sciences, Philosophy, Physical Sciences, Social Science and Law) with each
category having 12 reals and 3 foils. In this application, the potential irrelevance of the
content suits the purpose of establishing that exaggeration generalizes beyond the
domain(s) it is measured on. Claims for each item were solicited with the prompt of
“Please rate how familiar you are with each item” along a Likert scale of 1 : Not at all
familiar to 7 : Very familiar, a format taken from the instrument’s use in overclaiming
studies. The RExI was calculated as the amount (average rating) of foils claiming
residualized on the amount of reals claiming.
Self-Enhancement.
Narcissism. The 40-item dichotomous forced-choice NPI (Raskin & Terry, 1988)
was used to assess narcissism. It has a theoretical range of 0 – 40.
Balanced Inventory of Desirable Responding (BIDR). The BIDR consists of
three 20-item, balanced (equal numbers of forward- and reverse-scored items) instruments:
Impression Management (IM), Self-Deceptive Enhancement (SDE), and Self-Deceptive
52
Denial (SDD). For each item, extreme scores (e.g. 6 or 7 for forward-scored items on the
7-step Likert scale used) count as 1, others as 0, for a range of 0 – 20 for each instrument.
Cognitive Ability. The form A 2000 version of the Wonderlic Personnel Test
(E. F. Wonderlic, 1992), adapted here for online survey administration, was used as a
general measure of cognitive ability. It is a widely-used 12-minute timed test that requires
metacognitive, numerical, graphical and verbal problem-solving skills, and attention to
detail. Scored only as number correct, the maximum possible is 50.
Recognition Memory. Early in the survey, participants were exposed to items via a
Lexical Decision Task (LDT). For each of 100 words, they were asked to categorize each, as
quickly and accurately as possible, as either a genuine word from the English language or
not. Fifty genuine words and fifty pronounceable nonwords were shown. At the end of the
survey (with roughly 20 minutes of other questions in between), using a similar timed
binary classification task, respondents categorized words as to whether they had seen them
before in the LDT, with 50 old and 50 new items (both 50% genuine words) of similar
properties. New and old items were matched for word length.
This protocol gives us two summary measures: one for the rate of claiming old items
(hit rate, analogous to reals claiming for overclaiming), and one for the rate of claiming
new items (false alarm rate, analogous to foils claiming). These two simple measures can
be combined in various ways, e.g. as done to create the RExI. Signal detection theory (the
typical model used in memory research) provides a variety of other combinations, notably
accuracy (roughly the excess of old claims beyond new claims, known as d′) and bias,
representing overall claiming, whether old or new (which correspondingly increases with
recognition). Kantner and Lindsay (2014), in examining their cognitive trait, consider
several kinds of bias measures which will not be considered here. For simplicity, the
correlation table reports memory accuracy (d′) as an indicator of general memory ability,
and also memory exaggeration, as measured by the RExI. For regression models, simple
measures of memory hit rate and false alarm rate are included. These two measures
53
capture all the information (variance) from the memory test without requiring any
understanding of signal detection theory, or arguments about which composite measures to
use. The software used to summarize scores did not easily provide individual item data, so
Cohen’s α was not calculated.
Careless Responding. To identify participants who were not answering sincerely,
two techniques were combined. For the LDT, the fastest decision time with at least average
discrimination was found (454ms) and then used as a cutoff: Participants with median
correct response time faster than this (1.5%) were considered too fast and labelled as
careless. This distinction correlated with LDT discrimination (Hits - False Alarms)
significantly, r(530) = -.69***, 95% CI [-.73, -.64], showing that such rushed responses were
not accurate.
As well, responses that showed an unreasonable consistency on BIDR measures (i.e.
having identical responses to more than half the items, because half are reverse-scored) were
also labeled as careless. In total that came to 12% of the sample, which is consistent with
Meade and Craig (2012) who reported that “approximately 10% – 12% of undergraduates
completing a lengthy survey for course credit were identified as careless responders” (p. 1).
Carelessness was thus a dichotomous variable indicating if a respondent was unreasonable
fast on the LDT and / or gave identical responses to more than half of the items on any of
the three BIDR instruments. Note that this carelessness index can not be meaningfully
compared to any of the BIDR scores because they are both derived from the same items.
Demographics. While participants were asked which gender (or neither) they most
closely identify with, for the purposes of analysis, this was collapsed to a binary variable,
with the larger value indicating male. Participants also reported if English was a first
language for them, and if they had spent 10 or more years living in Western countries (e.g.
Note: N = 530. OCQ: 150-item Overclaiming Questionnaire. RExI: Residualized Exaggeration Index.Cohen’s α (bootstrap approximated for RExI measures) shown in italics on diagonal. M (SD) for range of 0to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see Reporting Conventions).
55
Results
Predictive Validity
This section describes convergent and divergent validity of the RExI based on the
bivariate, zero-order correlations between study variables shown in Table 3. Note again
that, as in all correlation tables in this paper, mean and standard deviation are reported on
ranges standardized to be from 0 to 1, as described in Reporting Conventions.
Exaggeration. The main results of Study 1 are confirmed here, that knowledge
ability exaggeration is related to impaired knowledge (academic) performance, despite the
knowledge being exaggerated having little relevance to performance, as showed by the weak
relationship between real claiming rate and GPA. Exaggeration also showed only moderate
relationships with carelessness, narcissism, lower memory accuracy, and lower cognitive
ability as measured by the Wonderlic test. Altogether, these suggest substantial
discriminant validity, given the higher reliability of these measures compared to Study 1:
Exaggeration does not appear to be merely a side-effect of these other possible
explanations. This distinction is also shown via the regression model presented later.
Despite being related to narcissism, exaggeration showed no significant relationship
with impression management or self-deceptive enhancement, and only a slight relationship
with lower self-deceptive denial. Surprisingly, having English as a first language or a
Western cultural background related to lower exaggeration, which may be an artifact of the
OCQ content being so culturally biased. Memory exaggeration showed similar patterns to
knowledge exaggeration, albeit weaker, without a very strong relationship between the two
measures.
Careless Responding. The ad hoc measure of careless responding showed sensible
relationships with Wonderlic, narcissism and memory measures, suggesting it captured
something relevant, although possibly just situational carelessness, only for this study,
given the lack of relationship with GPA. Note that, being partially derived from response
style on the BIDR, this careless responding measure can’t be meaningfully compared with
56
impression management, self-deceptive enhancement, or self-deceptive denial. The two
components of this carelessness measure, responding too quickly to the LDT and longstring
responding on the BIDR, correlated r(530) = .20***, 95% CI [.12, .28], suggesting only
slight convergence of the two techniques. We see that carelessness did relate to
exaggeration, although not in a dramatic way.
Recognition Memory. Recognition memory accuracy related similarly to both GPA
and Wonderlic, validating the measure used in this study. Notably, memory exaggeration
(RExI) related to both those cognitive ability measures at no less magnitude than did
memory accuracy. Because memory accuracy was calculated (as signal detection theory d′)
on the same data as used for memory exaggeration, here the exaggeration measure is not
completely orthogonal to the ability measure. This is because d′ collapses both ability and
exaggeration into one measure, essentially correcting for guessing. The similar magnitude
of predictive validities raises some questions: If d′ captures both memory ability and
(negative) exaggeration, yet shows effects similar to the RExI (which removes variance
related to ability), this suggests genuine ability may be less relevant than exaggeration in
this context. This is supported by the regression model (below) showing that hit rate
(suggesting correct recognition) does not contribute more than false-alarm rate (controlled
for hit rate) in predicting academic performance.
Self-Enhancement. While narcissism related to both forms of exaggeration,
impression management, self-deceptive enhancement and self-deceptive denial did not. This
may be because those constructs do not relate to exaggeration, or because those
instruments don’t capture them in ways that are relevant here. More detailed analysis
revealed that the Emmons 7-item Exploitiveness / Entitlement subscale of the NPI
(Emmons, 1987) showed similar relationships with GPA (r(528) = -.15***, 95% CI [-.23,
-.06]) and the RExI (r(531) = .23***, 95% CI [.15, .31]), suggesting that facet best
characterizes exaggeration. Also, the Ames et al. (2006) 16-item shortened version of the
NPI related to GPA (r(528) = -.17***, 95% CI [-.25, -.09]) and the RExI (r(531) = .26***,
57
95% CI [.18, .34]) about as well as the 40-item version, allowing for more economy in future
studies.
Incremental Validity
To disentangle the several relationships shown above, Table 4 presents a linear
regression model, with standardized β coefficients (and standard error) predicting
university GPA. Note, again, that the β for foils rate, now being controlled for reals rate, is
exactly the RExI. Note also that the partial correlations (βs) for foils and reals claiming are
larger than their zero-order correlations shown in Table 3. This indicates a mutual
suppressor effect: The “meaning” of one kind of claim depends on the amount of the other
kind of claim. This highlights the value of considering exaggeration in context and the RExI
analytic strategy to isolate it.
Table 4: Regression Model Predicting GPA from OCQ RExI inStudy 2
Predictor β SE p value
OCQ Foils (RExI) -.19 .04 .001
OCQ Reals .17 .05 .003
Wonderlic .22 .04 <.001
Memory False Alarms (RExI) -.10 .04 .06
Memory Hits .07 .05 .20
Narcissism -.05 .04 .25
Impression Management .10 .04 .08
Self-Deceptive Enhancement -.04 .05 .49
Self-Deceptive Denial -.06 .06 .32
Sex (M+) -.15 .04 <.001
Native English .03 .05 .57
Western Culture -.00 .05 .99
Careless Responding .03 .04 .46
Note: Overall R2 = .16, p < .001. N = 530. OCQ: 150-itemOverclaiming Questionnaire. RExI: Residualized ExaggerationIndex.
58
Exaggeration as Distinct Liability. This model helps clarify that, in the relationship
between exaggeration and performance, language and culture are no longer distinct
predictors, once controlling for Wonderlic general cognitive ability, memory performance,
and careless responding. Neither are the self-enhancement measures (NPI or BIDR)
significant predictors in this model, given those controls. This suggests that the behavior of
exaggerating one’s knowledge, even of trivial information, predicts impairment in overall
knowledge (academic) performance that is not simply a side effect of lower cognitive ability,
weaker memory, self-enhancement or carelessness.
Note that the two memory measures, hit rate and false alarm rate, also include some
degree of exaggeration, i.e. unwarranted memory ability claims. Including them in this
model thus reduces the variance attributable to the OCQ RExI; without the memory
measures, the β for the RExI increases slightly. This model, then, is conservative, showing
the effect of knowledge exaggeration after controlling for memory exaggeration. A similar
model using memory exaggeration (without the OCQ but with the other measures), also
shows that memory exaggeration remains a uniquely significant (but weaker) predictor of
GPA. Both kinds of exaggeration predict lower academic performance beyond these
controls.
Discussion
Using more extensive and comprehensive measures, the finding in Study 1 that
knowledge exaggeration (as measured by the RExI) uniquely predicted lower academic
performance was replicated. Memory exaggeration showed a similar, albeit smaller, effect.
Narcissism showed a slight relationship with both kinds of exaggeration.
Recall that my historical review of foil claiming showed that it has been used to
Note: N = 151. RExI: Residualized Exaggeration Index. Cohen’s α (bootstrapapproximated for RExI measures) shown in italics on diagonal. M (SD) for rangeof 0 to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see ReportingConventions).
The two novel exaggeration measures showed a moderate correlation, suggesting some
overlap but also some distinction despite both involving similar content. This is supported
by the overstatement exaggeration not correlating as strongly with narcissism, memory
measures or age. Note that reals claiming in the overclaiming inventory related to number
correct in the overstatement test, confirming content similarity and that reals claiming
captured relevant knowledge.
Age correlated negatively with narcissism and both measures of exaggeration, while
being positively related to vocabulary score and recognition memory accuracy (Hits - False
67
Alarms; r(149) = .18*, 95% CI [.02, .33]), suggesting some benefits of maturity.
Discussion
These preliminary results suggest that the overclaiming items were performing as
expected and should provide a suitable resource for more efficient measures in future
studies.
From these 270 candidate items, reals were selected if claiming them correlated
r >= .40 with correct vocabulary answers and r <= −.19 with answered but incorrect
vocabulary answers, thus capturing genuine knowledge and low exaggeration. Similarly,
foils were selected if claiming them correlated r >= .40 with answered but incorrect
vocabulary answers and r <= −.20 with responding “Don’t Know” on vocabulary answers,
thus capturing exaggerated knowledge and unwillingness to admit ignorance. As expected
by the fluency heuristic, empirically selected reals had, on average, higher OLD20 than
selected foils (.75*** [.43, 1.08], t(17.43) = 4.85; d = 1.91), confirming that higher fluency
for foils is optimal.
This process yielded 20 reals and 10 foils for a new overclaiming inventory designated
the Vocabulary Knowledge Exaggeration (VoKE) inventory. The proportion of reals to foils
in an overclaiming test is largely a subjective choice, balancing the need for an adequate
number of foils for capturing exaggeration with the need to have an adequate number of
reals to keep the test reasonable: Too many unrecognizable items could raise suspicion or
doubt. While some researchers have used item sets of only foils (e.g. Phillips & Clancy,
1972), a common assumption in the overclaiming literature is to have at least 50% reals for
credibility. Williams et al. (2002) compared similar tests with 20% and 50% foils and found
no differences in overall claiming nor the differential between reals and foils claiming.
Study 4 will compare this new VoKE item set and the VST overstatement test against
OCQ items (as used in Study 2) for capturing knowledge exaggeration and predicting
academic performance.
68
Study 4: Robustness of the RExI
Study 2 showed knowledge exaggeration to be a distinct phenomenon, independently
Note: N = 710, except AEQ and PES where N = 536. RExI: Residualised Exaggeration Index. VoKE: Vocabulary Knowledge Exaggeration.OCQ: 60-item Overclaiming Questionnaire. VST: Vocabulary Size Test. CRT: Cognitive Reflection Test. NPI: Narcissistic Personality Inventory.NFCS: Need For Cognition Scale. AEQ: Academic Entitlement Questionnaire. PES Psychological Entitlement Scale. CIHS: ComprehensiveIntellectual Humility Scale. RExI: Residualized Exaggeration Index. Cohen’s α (bootstrap approximated for RExI measures) shown in italics ondiagonal. M (SD) for range of 0 to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see Reporting Conventions).
76
Data Merging Loss Effects. As mentioned above, sex data were intended to be
added to this survey data by linking with screening data collected by the department for all
researchers using this subject pool. Data were linked by having students generate a
(mostly) unique identification code, with similar instructions given during both data
collections. Inevitably, some codes don’t match because students don’t follow the
instructions the same way each time, resulting in some records not being linkable. For this
sample, that accounted for 12% of records gathered that had completed the study.
Comparing merged and unmerged records showed several significant relationships:
Students whose records did not merge had, on average, lower GPA (by -2.19* [-4.04, -.33],
t(109.88) = -2.33; d = -0.25, in percentage points), and higher overall exaggeration, by
.08*** [.04, .12], t(100.61) = 4.06; d = 0.51 (a unitless measure, but the effect size is
important). Non-merged records also showed lower memory and vocabulary performance,
more overconfidence and rushed responding, and had personalities with more narcissism
and academic entitlement, and less intellectual humility.
Consequently, none of that screening data is included in the analyses reported here,
including the measure of sex as control. Given that sex showed no relationship with RExI
measures in previous studies, it is unlikely that any relationship would have been found
here. This merging loss, however, suggests that the process of asking participants to
generate their own unique identification code (e.g. in order to preserve anonymity when
linking data sets) carries a cost of losing data and potentially significantly compromising
data quality.
Predictive Validity
Table 6 presents zero-order correlations between study variables. Table 7, as a
redundant convenience, summarizes significant correlates with RExI measures, as well as
with all four RExIs combined with equal weighting to show aggregate effects.
77
Table 7: Study 4 RExI Correlates, Selected from Table 6
Note: N = 710, except Entitlement measures where N = 536. RExI:Residualised Exaggeration Index. VoKE: Vocabulary KnowledgeExaggeration. OCQ: 60-item Overclaiming Questionnaire. VST: VocabularySize Test. CRT: Cognitive Reflection Test. RExI: Residualized ExaggerationIndex. Cohen’s α (bootstrap approximated for RExI measures) shown initalics on diagonal. M (SD) for range of 0 to 1. Probabilities shown as p >=.05, p < .05, p < .01 (see Reporting Conventions).
Exaggeration. As summarized in Table 7, all four exaggeration measures behave
similarly, always in opposition with academic performance, yet also with some distinction
from each other. The one exception is Need For Cognition which does not correlate with
exaggeration on the VST. Note that memory exaggeration and VST overstatement both
correlate with foil delay measured on the overclaiming instruments, suggesting that the
cognitive allocation indicated by foil delay reflects exaggeration in general. While the
overstatement method shows weaker correlations with personality measures, it relates to
academic performance comparably with the other formats and almost not at all to the
language and culture controls (see Table 6), suggesting that an overstatement approach to
78
exaggeration may be “cleaner” in some ways. Recall that the VST RExI is (by design)
completely unrelated to the number correct on that test so that both measures from the
same test can be used as unique predictors. In this case, exaggeration is a stronger
predictor of academic performance than knowledge.
Personality. Study 2 showed that narcissism had a significantly negative impact on
academic performance, and we see that result replicated here, although diminished,
possibly because only 16 of the 40 items were used. Recall that Study 2 also found the
entitlement facet of the NPI to be most predictive, which is why entitlement measures were
included in this study, and we see that academic entitlement was more predictive of GPA
and exaggeration, while general entitlement was still significant.
All RExI measures also showed small but consistent relationships with personality
measures NPI, CIHS, AEQ, PES and overplacement, as well as with the rushed responding
behavioral measure. Note that all these measures also predict academic performance,
memory performance, and VST Correct (and mostly CRT Correct) in the opposite direction,
suggesting that exaggeration is a costly, maladaptive form of “self-enhancement”.
While the RExI measures vary somewhat in magnitude of association, it appears that
narcissism, entitlement, overconfidence, impatience, and lower intellectual humility confirm
a consistent personality profile of exaggeration.
Memory. Memory accuracy again relates positively to academic performance and
negatively to exaggeration. The stronger overlap for the overclaiming inventory items (OCQ
and VoKE) than for the VST overstatement test suggests more susceptibility to recognition
bias when using an overclaiming inventory.
Like the RExI derived from OCQ, VoKE and VST items, the RExI calculated on
recognition memory performance shows a similar relationship with GPA, VST Correct and
CRT Correct scores, suggesting the generalizability of the RExI: Memory exaggeration
shows similarities to knowledge exaggeration.
79
Careless Responding. The new measure of carelessness, rushed responding,
correlated significantly with all RExI measures, suggesting impatience is part of
exaggeration. Similar results were found when rushing was counted dichotomously as more
than 2 or 3 warnings; 55% of respondents never rushed a response, and 87% did so only
once or twice over hundreds of items.
Confirmation of Rushed Responding as Careless. Due to an error in
implementing the survey, the 18 items of the NFCS were duplicated in the spots where the
PES and AEQ items were to go for the first 174 respondents. This serendipitously created
the opportunity to use the correlation between answers of these identical item sets as a
check on careless responding, such that the higher the correlation, the more consistent, and
less careless, the response set. Item order was somewhat randomized, so the repetition was
likely not obvious. This measure of consistent responding correlated as expected with
memory accuracy (r(172) = .35***, 95% CI [.21, .47]), and with rushed responding, r(172)
= -.35***, 95% CI [-.48, -.21], similar to findings by Wood et al. (2017) that “found response
times and consistency to be routinely positively associated across inventories” (p. 458).
Despite the bias in the merged prescreen data (described above), an attention check
question in that data also validated the rushed responding measure as indicative of careless
responding, with rushing being related to failing the attention check, r(625) = .39***, 95%
CI [.33, .46].
An advantage of this time-based technique for capturing carelessness is that it does
not require adding extra “bogus” (foil) questions which might alienate or be misunderstood
by the participant, and it provides a continuous behavioral measure to use as a control for
all data rather than an arbitrary cutoff for discarding data. Here we see that rushed
responding shows significant relationships with academic performance, cognitive and
personality measures, indicating a generalized detrimental trait.
Another serendipitous error in survey preparation shone some light on the influence of
careless responding specific to the OCQ items (which show the strongest relationship to
80
rushed responding). Using Microsoft Excel to assemble items, the auto-increment feature
inadvertently changed the domain “20th Century Figures” into “21th”, “22th” through “35th
Century Figures”, so that these items were now effectively absurd. While this only affected
the first 35 responses, no difference in responding (reals rate, foils rate, their sum or
difference) was found.
Exaggeration Methodology. The 30 items of the VoKE did at least as well as the 60
items of the OCQ in predicting GPA, and showed slightly more internal consistency,
suggesting that VoKE items were more efficient and effective at capturing exaggeration,
with less culturally-biased content. As further validation of the item engineering approach,
VoKE reals claiming correlated well with VST correct score and reasonably with GPA (even
more than VST Correct), indicating genuine knowledge was being tapped by those items.
In contrast, reals claiming on the OCQ items did not relate to GPA, showing the academic
irrelevance of that content. We can also note that the mean of VoKE reals claiming was well
more than a standard deviation away from its boundaries and thus was not suffering from
ceiling or floor effects, i.e. item difficulty was neither too easy nor too hard.
This study also resurrected the overstatement approach by modifying a conventional
multiple-choice vocabulary test, the VST. Except for language and culture variables, where
it shows some independence, RExI from the VST shows similar patterns to RExI from the
overclaiming inventories, suggesting that, despite a very different methodology, the
exaggeration index is capturing a similar phenomenon. This contrasts with the
contradictory findings of previous literature, where overclaiming increases with ability while
overestimation (overconfidence) decreases. The RExI approach yields a consistent index.
Note, however, that the overlap between OCQ and VoKE RExI measures is twice what
it is for either with the VST RExI. This suggests some method variance worth further
exploration. As discussed above (and below), there are clearly different advantages and
disadvantages to either methodology, but it is encouraging to see that the RExI extracts
similar information from both, and all RExI measures show similar predictions of overall
81
GPA.
For the VST, it should be noted that, out of 60 questions, the mean (standard
deviation) of number correct was 39.86 (8.76) with a maximum of 59. It may be that
reasonable exaggeration measures are best had from overstatement tests that nobody
scores completely correct on; there needs to be some opportunity for every test taker to
either exaggerate or avoid claiming.
An important advantage of an overstatement approach is that it yields two distinct,
usable measures: ability and exaggeration. Because they are, by definition, orthogonal, the
two zero-order correlations become standardized β coefficients when both measures are
combined to predict academic performance. While the contribution of VST knowledge to
GPA is meager, it is still significant, but more importantly, the VST RExI adds twice the
predictive power, increasing (regression model predicting GPA) R2 from .01* to .04***. If
adding RExI to everyday multiple-choice tests used in education only doubles predictive
power, this simple adjustment to standard testing procedure could have a profound effect
on academic assessment.
Additionally, adding VoKE RExI to that VST model increases R2 even further to
.07***, indicating distinct kinds of validity from the different methods, despite them both
being ostensibly based on the same domain of English vocabulary.
Finally, it is worth noting that the correlations between competence and
incompetence evidence are positive for overclaiming, but negative for overstatement (r(708)
= -.39***, 95% CI [-.45, -.33]), supporting the reasoning behind the RExI formula.
Foil Delay
Having response times for individual items allowed consideration of some cognitive
dynamics of exaggeration. Do people process foils differently than reals, and what might
that imply?
While Study 2 showed that the RExI predicts academic performance beyond cognitive
82
ability, it remained debatable whether this represents evidence of a process unlike typical
conceptions of cognitive ability. Ordinary problem solving takes time to get right. In this
data set, the time taken to answer CRT questions and the quality of answers given correlate
as expected: positively for correct answers (r(708) = .14***, 95% CI [.06, .21]) and
negatively for “intuitive” but incorrect answers, r(708) = -.11**, 95% CI [-.19, -.04]; i.e.
more time thinking gave better answers. It would follow, then, that claiming impossible
knowledge might be the result of shallow, hasty, inadequate processing of information. We
might expect that people who spend more time deliberating about what they do or don’t
know would exaggerate less. Does response time for overclaiming items predict amount of
exaggeration?
Overall, median response times for foils on the overclaiming inventories (OCQ and
VoKE) were slightly less than response times (in seconds) for reals, -.08* [-.16, -.003],
t(1395.38) = -2.04; d = -0.11, but overall median response times for overclaiming did not
relate to RExI on those instruments (r(708) = .05, 95% CI [-.03, .12]) nor to GPA (r(708) =
.00, 95% CI [-.07, .08]). Time spent thinking about overclaiming items in general did not
relate to exaggeration or academic performance.
Nonetheless, the amount of time spent on foils relative to reals (foil delay) tended to
increase with exaggeration amount, as did time spent on foils alone (r(708) = .15***, 95%
CI [.08, .22]), suggesting that exaggeration required more (or at least different) cognitive
effort rather than less. Foil delay also showed a slight detrimental impact on academic and
memory performance. Table 8 shows prediction of overclaiming exaggeration from the
median response time for overclaiming items in general (as control) and foil delay, showing
that higher exaggeration was characterized by (relatively) faster reals claiming and slower
foils claiming.
This is consistent with a transcranial magnetic stimulation study (Amati et al., 2010)
that reported that inhibiting medial prefrontal cortex (MPFC) activity reduced both
response time and foil claiming. Page 269 of that study noted that “regions of the MPFC
83
Table 8: Study 4 Overclaiming Response Times Predicting RExI.
Predictor β SE p value
Overclaiming Item Response Time -.01 .04 .72
Foil Delay .35 .04 <.001
Note: Overall R2 = .12, p < .001. N = 710. RExI: ResidualisedExaggeration Index. Foil Delay = Median Foils Time - Median Reals Time.RExI: Residualized Exaggeration Index.
are found to be particularly important for comparing the self to others.” Exaggeration
appears to be less about amount of mental processing than about allocation, less about
degree and more about kind of thinking.
Incremental Validity
Study 2 showed how exaggeration uniquely predicted academic performance beyond
cognitive ability, carelessness and demographic variables. Do these other exaggeration
measures do the same, and are they distinct from each other?
The shorter OCQ RExI used in this study no longer significantly relates to academic
performance once controlling for memory performance, which may be due to the
inappropriateness of OCQ content or that OCQ items were used for the memory test.
However, the other three exaggeration measures, VoKE overclaiming, VST overstatement,
and recognition memory, all uniquely predict GPA after controlling for CRT cognitive
ability, rushing, and demographics. Keeping memory exaggeration (and performance in
general) as a control, Table 9 shows that the shorter overclaiming inventory of the VoKE
captures exaggeration as uniquely predicting academic performance, beyond other study
measures. Similarly, Table 10 shows that the overstatement test based on the
multiple-choice VST also predicts GPA uniquely. Despite the relatively low correlation
between these two measures of exaggeration, based on different methods, they are both
behaving similarly. As before, in every model, the β for incompetence evidence is exactly
the RExI, once controlling for evidence of competence.
84
Table 9: Regression Model Predicting GPA from VoKE RExI in Study 4
While the RExI can be applied to an overstatement test to assess both ability and
exaggeration as uncorrelated measures, there exists another approach that purports to do
the same thing. The Overclaiming Technique (OCT) is presented as “Measuring
Self-Enhancement Independent of Ability” (Paulhus et al., 2003, p. 809), framing
self-enhancement as synonymous with exaggeration: “The OCT was designed to measure
knowledge exaggeration and knowledge accuracy simultaneously and independently”
(Paulhus, 2012, p. 151). How is this different from the RExI?
The OCT begins with the same data collection used by Raubenheimer (1925), labeled
“overclaiming” by Phillips and Clancy (1972), i.e. soliciting claims of knowledge or
familiarity with a variety of items, some of which are reals, some foils. The unique
contribution of the OCT is in using Signal Detection Theory (SDT) for analysis (e.g.
Macmillan, 2002). The portion of reals claimed (reals rate, or hit rate in SDT terms) and
the portion of foils claimed (foils rate, or false-alarm rate) are combined to create two new
indices: accuracy (the excess of reals rate over foils rate, also called sensitivity in SDT) and
bias (the average of the two)19.
The paper that introduced the OCT (Paulhus et al., 2003, which used a collection of
general knowledge reals and foils called the OCQ) first references the definition of
overclaiming used by Phillips and Clancy (1972): “Over-claiming is the tendency to claim
knowledge about nonexistent items” (p. 891)20, in other words, foils rate. The next page,
however, equates the term overclaiming with the bias definition described above:
“over-claiming was operationalized with the OCQ bias index” (p. 892), i.e. averaged reals
rate and foils rate, giving the term a very different meaning: not just unwarranted claims,
19 This simple approach, using difference and average, is called the “common sense” approach (Paulhus,2012, p. 154), which closely approximates the traditional SDT measures of d′ and −c, where reals rate andfoils rate are z-transformed before combining. Both approaches produce very similar results.
20 The terms “over-claiming” and “overclaiming” are used interchangeably by that author and others, butthe non-hyphenated version is a distinct keyword and search term, and so used here.
119
but any claims. For reporting results, however, “predictions with the OCQ bias measure
are always assessed after controlling for the OCQ accuracy score. Thus, discriminant
validity with respect to accurate knowledge is built into the calculation of the over-claiming
index.” (p. 899). That index (bias controlled for accuracy, or residualized bias) is meant to
capture exaggeration (self-enhancement) independent of knowledge (ability). Residualized
bias will necessarily be uncorrelated with the accuracy index, but does it capture
exaggeration independent of ability?
Let us examine the mathematics involved. Let R be reals rate, F , foils rate, A,
accuracy, and B, bias: A = R−F , and B = R+F . Plotting F against R then B against A
will show that creating difference and sum composites simply rotates the variable space by
45 degrees. What does this mean conceptually? As discussed earlier, R represents plausible,
self-reported ability (possibly with some exaggeration) that approximates actual ability, as
shown by P. L. Ackerman and Ellingsen (2014). F represents implausible ability claims,
but is likely related to genuine ability, as shown by P. L. Ackerman and Ellingsen (2014)
and Atir et al. (2015). B represents the indiscriminate claiming of real or foil items, which
is simply response bias, making no distinction between plausible and implausible claims.
Interpreting this as exaggeration would be comparable to asking fishermen the size of their
catch, and assuming everything they say is exaggeration regardless of what was caught.
In contrast, A represents plausible claims compensated for implausible claims, which
could be interpreted as a corrected self-estimate of ability. This idea has some validation:
Paulhus and Dubois (2014) demonstrated that this measure on an overclaiming inventory
was comparable to multiple-choice and short-answer quiz formats in predicting
undergraduate course grades. This suggests that foil claiming negatively predicts ability,
but, like bias, accuracy obfuscates any distinction between ability and exaggeration, since it
combines both reals and (reversed) foils claiming with equal weight. In essence, this is a
simple form of correction for guessing, collapsing two dimensions into one.
What about bias controlled for accuracy (residualized bias) which the OCT
120
recommends as the index of exaggeration or self-enhancement? Let BRes be residualized
bias. The SDT model that is the basis for OCT assumes equal variance for R and F . This
would make difference and sum composites A and B uncorrelated21, i.e. Cor(A,B) = 0. If
SDT assumptions are met, Cor(A,B) = 0 so controlling B for A has no effect and
BRes = B, meaning the OCT exaggeration index is no different from bias, i.e. indiscriminate
claiming.
In typical overclaiming research, however, the SDT assumption of equal variance may
not hold. Nonetheless, no part of the OCT ensures accuracy and bias will be substantially
correlated, i.e. that BRes will meaningfully differ from B. Thus there is no assurance that
the OCT measures exaggeration (self-enhancement) distinct from knowledge (ability),
contrary to the declared design goals.
This lack of distinction may explain the empirical failures of the OCT, because a
number of researchers have reported that the OCT does not measure what it claims to:
Bensch et al. (2017) examined self-enhancement broadly as “positivity bias”, including
measures of narcissism, self-deceptive enhancement, impression management,
overconfidence, crystallized intelligence, a variety of measures of socially-desirable
responding, and the five-factor model of personality. A factor analysis of all these found
that OCT bias or residualized bias did not appear on any of the six factors found, with the
authors concluding that whatever the OCT measured was “fully independent of personality
and crystallized intelligence.” (p. 12).
Using the HEXACO measure of Openness (K. Lee & Ashton, 2004), Dunlop et al.
(2016) found that OCT accuracy, bias and residualized bias all significantly related to
Openness (around r = .30), concluding that “overclaiming can be understood as a result of
knowledge accumulated through a general proclivity for cognitive and aesthetic exploration
(i.e., Openness)” (p. 1).
Less flattering, Ludeke and Makransky (2016), using the OCQ as done in the paper
21 Cov(X + Y,X − Y ) = E((X − µX) + (Y − µY ))((X − µX)− (Y − µY )) = V ar(X)− V ar(Y ) = 0
121
introducing the OCT (and so both the same items and same analytic technique) noted
“Using a sample of 704 adult community members, we found minimal support for the OCQ
as an assessment of misrepresentation. . . . OCQ bias measures were instead consistently
and sometimes even highly related to measures of careless responding.” (p. 1).
The OCT “exaggeration index” will mostly represent indiscriminate claiming, or
general response bias, a mix of ability and exaggeration, which could explain the above
findings. The OCT does not capture exaggeration as defined in this paper.