Integrating Overconfidence and ... - open.library.ubc.ca

INTEGRATING OVERCONFIDENCE AND OVERCLAIMING:

EXAGGERATION HARMS PERFORMANCE

by

PATRICK J. DUBOIS

M.A., The University of British Columbia, 2015

A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Psychology)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

August 2021

© Patrick J. Dubois, 2021

ii

The following individuals certify that they have read, and recommend to the Faculty of

Graduate and Postdoctoral Studies for acceptance, the dissertation titled:

Integrating Overconfidence and Overclaiming:

Exaggeration Harms Performance

submitted by Patrick J. Dubois in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Psychology.

Examining Committee

Jeremy Biesanz, Associate Professor, Psychology, UBC

Supervisor

Peter Graf, Professor, Psychology, UBC

Supervisory Committee Member

Steven Heine, Professor, Psychology, UBC

Supervisory Committee Member

Kristin Laurin, Associate Professor, Psychology, UBC

University Examiner

Katherine White, Professor, Marketing and Behavioural Science, UBC

University Examiner

Joachim Krueger, Professor, Psychology, Brown University

External Examiner

iii

Abstract

Some people have an exaggerated self-image: They imagine their abilities to be

greater than they are. This discrepancy between self-perception and reality has been

studied for at least a century under the names of overstatement, overestimation,

overconfidence, and overclaiming, yet considering this research altogether reveals some

contradictions. By introducing a unified approach (the Residualized Exaggeration Index, or

RExI), the present research rectifies past oversights and shows that exaggeration reliably

predicts narcissism, entitlement, and impatience, as well as lower academic performance

regardless of cognitive ability. As a more precise operationalization of what is connoted by

“overconfidence”, the RExI approach can also easily be incorporated into common

educational practice to provide more accurate and wholistic learner assessment, and

perhaps provide a foundation for improving self-awareness and critical thinking skills.

iv

Lay Summary

How well do you know what you know? Our tendency to overstate, overestimate,

overclaim or otherwise exaggerate or be overconfident about our abilities has been studied

for at least a century, but curiously, has lacked a consistent analytic framework. By

introducing a unified approach, this research resolves previous inconsistencies and

demonstrates that knowledge exaggeration in university students undermines their

academic performance, regardless of their intelligence. This approach could also provide

more accurate and wholistic assessment in almost any educational context.

v

Preface

This dissertation is original, unpublished, independent work by the author, Patrick J.

Dubois. Data collection for Study 1 was covered by UBC Ethics Certificate number

H16-00753; for Studies 2 and 4, Certificate H18-02505.

vi

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Lay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Introduction: Defining Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Framing Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Overstatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Overestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Overclaiming: Foils Among Reals . . . . . . . . . . . . . . . . . . . . . . . . 9

Theoretical Causes of Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Exaggeration as Self-Enhancement . . . . . . . . . . . . . . . . . . . . . . . 13

Exaggeration as Cognitive Bias . . . . . . . . . . . . . . . . . . . . . . . . . 15

Exaggeration as Carelessness . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

All of the Above . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

None of the Above . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Measuring Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Item Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

vii

Analytic Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

A Unified Approach to Assessing Exaggeration . . . . . . . . . . . . . . . . . 26

Current Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Validation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Study 1: Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Study 2: Validating the Residualized Exaggeration Index (RExI) . . . . . . . 34

Study 3: Developing Better Measures . . . . . . . . . . . . . . . . . . . . . . 34

Study 4: Robustness of the RExI . . . . . . . . . . . . . . . . . . . . . . . . . 35

Reporting Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Study 1: Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Predictive Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Incremental Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Study 2: Validating the RExI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55



Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

viii

Study 3: Developing Better Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Overclaiming Instrument Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Overstatement Instrument Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Study 4: Robustness of the RExI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74


Foil Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81


Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Research Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

What is this thing called Exaggeration? . . . . . . . . . . . . . . . . . . . . . . . 94

Potential Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Standardized Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

ix

Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Appendix: The Overclaiming Technique (OCT) . . . . . . . . . . . . . . . . . . . . . 118

x

List of Tables

Table 1 Correlations between Study 1 measures . . . . . . . . . . . . . . . . . . . 43

Table 2 Regression Model Predicting Course Grades from RExI in Study 1 . . . . 46


Table 4 Regression Model Predicting GPA from OCQ RExI in Study 2 . . . . . . . 57



Table 7 Study 4 RExI Correlates, Selected from Table 6 . . . . . . . . . . . . . . 77

Table 8 Study 4 Overclaiming Response Times Predicting RExI. . . . . . . . . . . 83

Table 9 Regression Model Predicting GPA from VoKE RExI in Study 4 . . . . . . 84

Table 10 Regression Model Predicting GPA from VST RExI in Study 4 . . . . . . . 85

xi

List of Figures

Figure 1 Influence of Competence and Self-Image on Performance. . . . . . . . . . 2

xii

List of Abbreviations

AEQ Academic Entitlement Questionnaire: An 8-item measure of academic

entitlement developed by Kopp et al. (2011).

BIDR Balanced Inventory of Desirable Responding: A 3 ∗ 20 = 60-item set of measures

of Impression Management (IM), Self-Deceptive Enhancement (SDE), and

Self-Deceptive Denial (SDD) by Paulhus (1998); these scales are “balanced” in

that they contain equal numbers of forward- and reverse-scored items.

CIHS Comprehensive Intellectual Humility Scale: A four-factor measure of intellectual

humility developed by Krumrei-Mancuso and Rouse (2016).

CRT Cognitive Reflection Test: A set of mathematical reasoning word problems which

suggest intuitive but incorrect answers, so correct answers require more reflection.

Originated by Frederick (2005), then later expanded by Toplak et al. (2014) and

Thomson and Oppenheimer (2016).

ELP English Lexicon Project: A “multiuniversity effort to provide a standardized

behavioral and descriptive data set for 40,481 words and 40,481 nonwords . . .

available via the Internet at elexicon.wustl.edu” (Balota et al., 2007).

GPA Grade Point Average: In this context, the average grade achieved by participating

students across all their courses taken at the University of British Columbia.

IM Impression Management: Part of the BIDR.

LDT Lexical Decision Task: A test commonly used in psycholinguistics: participants

must decide as quickly and accurately as possible whether a word is real or not.

MSLQ Motivated Strategies for Learning Questionnaire: An inventory of Likert-style

items for measuring various attitudes relevant to academic performance (Pintrich,

1991).

MTurk Amazon Mechanical Turk: A service where humans volunteer to complete simple

https://elexicon.wustl.edu/

xiii

tasks online for financial compensation. See Buhrmester et al. (2011).

NFCS Need for Cognition Scale: An 18-item measure developed by Cacioppo et al.

(1984).

NPI Narcissistic Personality Inventory: A popular measure of non-clinical narcissism

(Raskin & Terry, 1988); both 40-item and 16-item versions are used in this paper.

OCQ Overclaiming Questionnaire: A set of reals and foils, based on Hirsch Jr et al.

(1988), introduced by Paulhus et al., 2003 to demonstrate the OCT.

OCT Overclaiming Technique: An application of SDT to overclaiming introduced by

Paulhus et al. (2003).

OLD20 Average Orthographic Levenshtein Distance of the 20 Closest Neighbors: A

technique for measuring (un)wordlikeness by averaging the edit distances of a

letter string from the 20 most similar in a reference corpus of words.

PES Psychological Sense of Entitlement: a 9-item measure by Campbell, Bonacci,

et al. (2004).

RExI Residualized Exaggeration Index: A general technique for isolating exaggeration

in self-image of competence; Residualized incompetence evidence after controlling

for competence evidence.

SDD Self-Deceptive Denial: Part of the BIDR.

SDE Self-Deceptive Enhancement: Part of the BIDR.

SDT Signal Detection Theory: A well-established theoretical framework with analytic

techniques for distinguishing accuracy from response bias when discriminating

ambiguous signals (Macmillan, 2002).

TIPI Ten-Item Personality Inventory: A popular brief measure of the five-factor model

of personality by Gosling et al. (2003).

xiv

UBC University of British Columbia: The location where all studies presented here

took place, using their enrolled undergraduates.

VoKE Vocabulary Knowledge Exaggeration: An English vocabulary overclaiming

inventory (set of reals and foils) developed for this paper, with items selected

using theory from psycholinguistics and cognitive psychology, and empirical

testing.

VST Vocabulary Size Test: A multiple-choice test for assessing size of one’s English

vocabulary (Beglar, 2010).

xv

Acknowledgements

I would never have known about overclaiming, its limitations or potential, unless

Delroy Paulhus had asked my familiarity with a list of jazz musicians. Unfortunately, I did

not overclaim, sparking the suspicion that drove this research.

It was the important work of Steven Heine and Ara Norenzayan that showed me how

incredibly inappropriate it was to rely on convenience samples of undergraduates, unless, of

course, one makes that the population of interest.

I could not have completed this PhD had not Jeremy Biesanz patiently helped me

reformulate my heretical, unsupervised research into a coherent thesis.

I will be eternally grateful for the many kindnesses offered by so many of the faculty,

staff and students of the UBC department of psychology as I impostered my way through

grad school. If you’re reading this, I made it out alive.

I could not have afforded my time at UBC without substantial funding from The

Social Sciences and Humanities Research Council (SSHRC) of Canada; your tax dollars at

work.

Thank you.

1

Introduction: Defining Exaggeration

“I know words. I have the best words.” — Trump (2015).

Donald Trump has famously displayed exaggerated self assessment, claiming greater

ability than he demonstrates. Anyone not born yesterday will have encountered other

people who imagine themselves overly positively, and experience teaches us to not always

trust self-presentation of ability.

Such a person may be labeled braggart, boaster, or blusterer, and we might describe

such behavior as overstating, overestimating, or overclaiming ability, being overconfident

about performance, or having a self-image exceeding genuine competence. Why do we have

so many synonymous descriptions? Probably because such behavior is socially noteworthy.

According to the lexical hypothesis, the basis for much of personality psychology, “Those

individual differences that are of most significance in the daily transactions of persons with

each other will eventually become encoded into their language. The more important is such

a difference, the more people will notice it and wish to talk of it” (Goldberg, 1981, pp.

141-142). We use such labels to warn others about exaggerated self-report.

Merriam-Webster (2020a) defines the verb exaggerate as “to enlarge beyond bounds

or the truth : overstate”. Central to this definition is disparity from reality. Trump’s

monosyllabic boast above might be plausible if he had demonstrated a superior vocabulary,

but he did not. A language analysis by the Boston Globe of candidates’ 2015 campaign

announcements rated Bernie Sanders at school grade 10, Hillary Clinton near grade 8, and

Donald Trump as the lowest of all at grade 4 (Schumacher & Eskenazi, 2016).

Exaggeration is about excess.

While a cartoon or caricature might have exaggerated movements or expressions,

exaggeration here refers only to a person’s excessive self-image of their ability. This paper

examines exaggeration as a psychological phenomenon, how it differs among individuals,

and what those differences might mean. To that end, I define exaggeration as individual

differences in discrepancy between imagined and actual competence, unrelated to

2

competence.

I first begin with a conceptual model of how exaggeration may arise, then review how

this individual difference has been measured in the past, identifying some oversights and

contradictions in existing literature. In response to that, a new approach to exaggeration is

proposed, then implemented in four empirical, quantitative studies. The end result

validates a methodology for more clearly understanding this well-known but misunderstood

phenomenon, and establishes a foundation for future research.

Framing Exaggeration

Figure 1Influence of Competence and Self-Image on Performance.

Exaggeration is an example of a latent construct, a theoretical conception of a hidden,

unobservable psychological phenomenon. When Forrest Gump famously noted that “Stupid

is as stupid does” (Zemeckis, 1994), he was wisely noting that we can only infer a latent

construct (e.g. stupidity) through its expression in observable behavior. Similarly, we can

explore what exaggeration is by looking at what it does to performance of the exaggerated

ability.

The model shown in Figure 1 is based on existing understanding of how abilities are

manifested: “Performance is conceived as the observable solution behavior of a person on a

set of domain-specific problems. Competence (ability, skills) is understood as a theoretical

3

construct accounting for the performance.” (Korossy, 1999, p. 103, original emphasis).

Competence is positively correlated with performance (the ‘+’); they increase (or decrease)

together. Humans have been fascinated with comparing competencies through

performance, as the long history of the Olympics or other competitions shows. Our abilities

are likewise tested throughout our education with examinations and other performance

tests. We objectively evaluate such performances because we know that self-report is not

always an accurate indicator of genuine competence: We don’t just ask who is fastest or

smartest, we test people.

Nonetheless, the raw potential of competence is not the only determinant of

performance. Our beliefs about our competence, and how we will meet a challenge, are also

relevant. Some of our beliefs will be based on feelings of confidence, our internal sense of

competence, but other beliefs may interfere with the accuracy of this sensing. If I believe

running a marathon makes me a good person, need to affirm that belief may overshadow

accurate competence assessment, leaving me collapsed half-way through the race.

Our self-image is our mental construction of who we are, and this will include beliefs

about what we can do, should be able to do, and what effort is required for success. Ideally,

self-image of competence should positively correlate with genuine competence, even if

imperfectly. The question mark in Figure 1 indicates that self-image may contribute

positively or negatively to performance, depending on the harmony between self-image and

competence. Performance is thus shaped by both what we are capable of (competence),

and how we imagine that capability (self-image). Aesop’s fable of the tortoise and the hare

(in which the hare, far more competent yet arrogant, loses a race to the tortoise) eloquently

demonstrates how self-image can interfere with expression of competence.

Exaggeration can thus be seen as a way in which distorted self-image impairs

performance.1 This is similar to, but distinct from, typical conceptions of confidence, where

1 While exaggeration considers excessive perception of competence, performance may also be impaired byinadequate perception of competence. Such “underconfidence” is not considered here because it isapparently rare and involves the methodological challenge of measuring competence that is not expressed.

4

overconfidence implies an extension of a linear, unidimensional construct beyond some

optimal point. Instead, exaggeration allows for various aspects of self-image to interfere

with competence expression in undermining performance. Note that the model does not

necessarily imply any mediation or moderation relationship. A central goal of the current

research is to distinguish the influence excessive self-image has on performance, separate

from the influence of competence.

Additionally, situational factors may alter our self-image, or its impact on

performance, such as an audience boosting or shriveling our confidence, but for the

purposes of the current research, those many, complex situational factors are set aside in

order to address issues with isolating effects from self-image.

To capture exaggeration, we will need behavioral indications of what a person’s

genuine competence is, and what they mistakenly imagine it is. This can be done by

soliciting optional expressions of ability that provide evidence of competence or falsely

imagined competence: active incompetence.2 By making the expressions optional, one need

only express competence where one imagines competence, e.g. one can admit “I can’t do

that”, rather than pretend they can. In such a situation, active incompetence (e.g. failing

an optional task) indicates error in imagined competence, suggesting exaggerated

self-image: The person thought they could do something they could not.

To minimize confounding influences, all evidence should be collected at the same time

under similar circumstances. To maximize reliability, several ability expressions should be

solicited and aggregated. In other words, to gather evidence of exaggeration, get people to

repeatedly volunteer evidence of competence or incompetence, in comparable proportions.

Finally, because competence is a strong predictor of successful outcomes, care should be

taken that measurement of exaggeration is demonstrably distinct from evidence of

competence. Altogether, this framework presents three requirements for measuring

exaggeration: a) active competence, b) active incompetence, and c) isolation of self-image

2 In constrast to the passive incompetence of not responding to a question.

5

error from competence as evidence of exaggeration.

Because exaggeration of one’s abilities to others may yield rewards (e.g. winning an

election) and involves several complicated contextual factors, for simplicity, all the research

considered here minimizes social or situational influences, or obvious opportunities for gain

from manipulation or deceit. The goal is to understand exaggeration as an intrapersonal

(within self), not interpersonal (between people) phenomenon.

Conceptually, exaggeration can be considered synonymous with the terms

overstatement, overestimation, overclaiming and overconfidence, yet all those terms have

been used for distinct methodological approaches to measuring differences in one’s imagined

and actual abilities. This may be an example of the jangle fallacy : “the use of two separate

words or expressions covering in fact the same basic situation” (Kelley, 1927, p. 64). The

present research aims to integrate those approaches into one unified methodology.

A Brief History

Broadly speaking, there have been two approaches to simultaneously gathering

evidence of competence and imagined competence. One approach (overstatement and

overestimation tests) combines objective tests with (prior or post) estimates of success.

The number of correct answers (objectively scored) serves as evidence of competence, while

falsely imagined correctness (subjective statement or estimation) suggests unacknowledged

incompetence. The number of answers not claimed correct is ignored but allows a degree of

freedom between the other two scores.

Another approach (overclaiming) uses only ability claims, but embeds the competence

distinction in the items themselves. All items involve claiming ability, but some of the

items are fictitious, so claiming them requires active incompetence. This also yields two

scores: the rate or amount of claiming genuine items (reals), and the rate or amount of

claiming fictitious items (foils).

Both approaches allow someone to volunteer evidence of competence or

6

incompetence, all within the same test. While both have been around for nearly a century,

these two approaches have not been explicitly examined together.

Overstatement

The rising popularity of IQ tests at the start of the 20th century inspired a plethora

of attempts to quantify human potential (Richardson, 2002). Based on the belief that a

discrepancy between claimed and demonstrated ability was diagnostic of one’s “character”,

a review of research around that time (Symonds, 1924) listed the overstatement test

(Voelker, 1921) as an emerging assessment method.

As an example, Woodrow and Bemmels (1927) compared results of an overstatement

test to a “goodness” of character rating by teachers of pre-school children, reporting a

rank-order correlation of rs = .56 for a group of 17 five-year-olds and rs = .433 for a group

of 14 four-year-olds. The overstatement test involved a researcher interviewing children

individually, telling them “I’m going to ask you some questions to find out how many

things you can do. I want to find out who in your class can do the most.” (p. 241), then

asking a variety of questions such as “Can you write your name?”, “Can you stand on your

head?” and “Can you count up to ten?”, finally followed by the child demonstrating each

claimed ability, which was then liberally assessed. The younger group claimed 51% (while

performing 30%) and the older group claimed 75% (then performing 50%) of the queried

abilities, with only one child under-estimating their ability.

As an individual difference measure, the test was scored as the number claimed

divided by the number performed, with “The smaller this ratio, the better the score” (p.

242), presumably meaning that reverse ranking this score accounted for the positive

correlations (above) between less overstatement and teachers’ ratings of character goodness.

Considering statistical and practical issues (inherent in using ratios for scores), the

authors conclude that the issue of scoring the test “is not an altogether simple one.”

3 These correlations reported as ρ in the original paper.

7

(Woodrow & Bemmels, 1927, p. 243). These issues may have proven insurmountable, as

the overstatement test had faded to obscurity by the 1960s4.

Overestimation

Part of modern research on overconfidence is an approach called overestimation, a

methodology very similar to overstatement: “If a student who took a 10-item quiz believes

that he answered five of the questions correctly when, in fact, he got only three correct,

then he has overestimated his score. Roughly 64% of empirical studies on overconfidence

examined overestimation.” (D. A. Moore & Healy, 2008, p. 502). Overestimation is

typically calculated as a difference score, the arithmetic excess of estimate over

performance: 5− 3 = 2 in that example.

Both overestimation and overstatement gather ostensibly identical information: the

number imagined correct and the number objectively correct. The methodology of

overestimation, however, may be less psychologically direct than that of overstatement.

An overestimation test requires answers for every item (possibly by guessing),

regardless of one’s sense of ability. After the test, the participant must reflect on their

aggregate score in order to make an estimate. Retrospectively evaluating one’s performance

on a completed task may elicit “choice-induced preference change” (D. Lee & Daunizeau,

2020, p. 1). For example, it may be easier to rationalize that one answered a question

correctly after committing to an answer, if only because considering alternatives after such

commitment can induce cognitive dissonance (Joule & Azdia, 2003). There is also the

possibility that reflection on past performance may include recency effects (Murre & Dros,

2015), where experiences from the last few questions may carry more weight in an

aggregate estimate.

A further complication arises from the transparency of the research question (e.g.

“How many do you estimate you got correct?”). Made aware that accuracy of estimation is

4 As a shown by a Google Ngram search

https://books.google.com/ngrams/graph?content=Overstatement+test&case_insensitive=on&year_start=1880&year_end=2019&corpus=15&smoothing=5

8

under scrutiny, motives for social desirability or impression management may inspire a false

modesty: Having already completed the task, I will appear more humble if deflating my

estimated score. By triggering self-consciousness, overestimation methodology may be

distorted by the same psychological processes it attempts to measure.

Thus, retrospective assessment of aggregated past performance done during

overestimation tests may differ psychologically from the prospective assessment of specific

abilities done in overstatement tests. The (prospective) overstatement approach may be

more psychologically valuable simply because we are often more interested in predicting

somebody’s future behavior (e.g. likelihood of making an error) than predicting their

after-the-fact estimate. Overestimation tells you how people perceive past performance,

while overstatement tells you how they imagine future success.

An established observation about overestimation is that it varies with difficulty, i.e. is

relatively greater in hard tests and lower in easy tests. A multi-cultural examination of

overestimation replicated this hard-easy effect and found it much stronger than effects of

culture, sex or age (D. A. Moore et al., 2018). This effect, where excess rating is inversely

related to difficulty or ability, may be a methodological artifact: Low performers have more

range to overestimate than high performers. Seen another way, if estimates tend toward the

center of the distribution, difference scores will tend to be positive for hard tests and lower

(or negative) for easy tests. Even if estimates correlate perfectly with performance, but are

shifted centrally, difference scores will show the hard-easy effect. Such a scoring will likely

be negatively correlated with ability, e.g. Duttle (2016) found r = −.69 between

overestimation and performance on Raven’s Progressive Matrices. Difference scores yield

an effect similar to that found in the “unskilled and unaware” Dunning–Kruger effect

(Kruger & Dunning, 1999) which has been shown to be largely a statistical illusion

(Krueger & Mueller, 2002). More precisely, Cor(Score, Estimate− Score) approaches

Cor(Score,−Score) to the extent that V ar(Estimate) < V ar(Score). As mean Score

rises, Estimate range decreases. Evidence of exaggeration is mathematically confounded

9

with evidence of competence, so it’s impossible to cleanly distinguish effects.

Given the statistical problems noted with ratios used in overstatement tests, and the

artifactual correlations of difference scores used in overestimation (and sometimes

overstatement), neither overstatement nor overestimation, as conventionally implemented,

provide a clean measure of exaggeration separate from the ability being exaggerated.

Overclaiming: Foils Among Reals

As charming as it may have been in the 1920s to have children stand on their heads

for science, that approach can be difficult to scale up. Any objective ability test may take a

long time and induce stress in participants. A far more convenient ability to test is

knowledgeability, and probably the most face-valid or obvious test of exaggerated

knowledge is to query familiarity with something that does not exist.

For example, imagine a simple vocabulary test that only required the respondent to

rate their knowledge of words, without demonstrating that knowledge. If such a list

included the ostensible word covfefe, claiming to know the meaning of that5 would

demonstrate active incompetence.

Questions about fabricated, non-existent items, often labeled as bogus or foil,6 seem

ideal for capturing exaggeration because knowledge of such items is impossible (if the item

is designed appropriately, which may not be the case, as we shall see). Such items are often

combined with similar genuine or real items to also collect some evidence of competence.

In the book New Perspectives on Faking in Personality Assessment, the chapter on

“Overclaiming on Personality Questionnaires” surveys the use of foils claiming in

psychological research, describing “several historical precedents for the notion that claiming

familiarity with foils is a face-valid indicator of knowledge exaggeration”, with exaggeration

5 As many have: www.snopes.com/fact-check/covfefe-arabic-antediluvian/

6 While both these terms apply to impossible items that honest, attentive, rational people should neverclaim, the term bogus is typically used for items researchers expect to have no desirability, while foil refersto items which someone might have reason to claim falsely. As will be discussed below, such a distinction isnot always clear cut.

https://www.snopes.com/fact-check/covfefe-arabic-antediluvian/

10

interpreted there as faking (Paulhus, 2012, p. 151). It reports the earliest use of foils in

psychological research as Raubenheimer (1925) where respondents indicated which books

they had read, with 10 of 25 titles presented being fictitious. For example, respondents

(boys being assessed for potential delinquency) could claim to have read the existing book

“Robinson Crusoe” (to indicate literary knowledge) or the nonexistent book “The

Prize-Fighters Story”, indicating an exaggerated self-report.

That chapter title refers to a study by Phillips and Clancy (1972) which used

overclaiming to describe foils claiming, a term which Merriam-Webster (2020b) reports first

appeared in 1824 and means “to claim too much of something”. That study queried

participants about “their use of several new products, books, television programs, and

movies — all of which were actually nonexistent” (p. 928; their emphasis). This

overclaiming behavior was found to be related to participants’ rating of the desirability of

being the kind of person who tries new products, etc. The association between foils

claiming and valuing being trendy suggests a motivated, self-enhancing exaggeration.

As noted above, foils claiming has also been interpreted as dishonesty or faking.

Anderson et al. (1984) assessed job applicants by having them rate their skill levels on a

variety of tasks, many of which were genuine (i.e. reals), while several of were fictional, i.e.

foils. For example, respondents were asked to rate their experience with the fictitious task

“Typing from audio-fortran reports” (Table 2, p. 577). The extent of foils claiming was

found to negatively predict a later objective test of job skills, especially when controlling

for self-assessment of genuine skills.

Misrepresentation of self is not the only interpretation given to foils claiming: Foils

have appeared in research with other goals. For improving validity in marketing tests of

advertising exposure, Lucas (1942) describes a technique originating in 1937: Participants

report their recognition of various advertisements, some of which are unpublished and

could not have been seen, i.e. foils. Following that methodology, Smith and Mason (1970)

reported that warning participants about the foil ads had no effect on claim rates. This

11

suggests a possible recognition memory bias; respondents genuinely believed they

recognized something they had not seen before. The aim of such research, however, was to

assess advertising efficacy, not psychological mechanisms driving false claims.

Among other applications, foils have also been used to check validity of traumatic

brain injury reports (e.g. Mackenzie & McMillan, 2005) and digital literacy surveys (e.g.

Hargittai, 2009), to assess pretrial prejudice in court cases (e.g. Moran & Cutler, 1997), or

to generally identify careless survey responses (e.g. Meade & Craig, 2012). For example, if

a North American respondent agrees to the statement “I have never brushed my teeth”

(Meade & Craig, 2012, Table 1, p. 5), they are probably not paying attention to that

question, nor possibly the rest of the survey. Such investigations typically treat foils

claiming as errors to be corrected for, with little consideration of what such aberrant

behavior might indicate.

To summarize, foils have been used in research to ostensibly assess self-enhancement

(ego-motivated misrepresentation, such as claiming to have used a fictitious product),

cognitive bias (falsely recognizing an ad they had not seen) or carelessness (lack of

attention in survey responding). To my knowledge, no previous research has adequately

considered these contrasting explanations simultaneously.

An issue worth noting has to do with the ethics of using foils, i.e. whether it is

deceptive to ask about impossible abilities, to confront people with un-winnable challenges,

or entrap them into failure. Typical ability tests do not ask trick questions; one may

assume that if asked, “Do you know X?”, X actually exists. Warning about foils or failure,

however, raises the practical issue of insuring that participants acknowledge and

understand such a warning. The more careless, for example, may not take heed.

Alternatively, the more cautious or risk-averse may alter their responding more drastically

than others. Inevitably, there will be new individual differences introduced by the warning

and how it is presented. In everyday life, we encounter impossible problems, areas where

nobody should claim answers, with no guardian warning us of potential failure. Thus, an

12

ecologically valid test should introduce no more protective measures than ordinary life.

The absence of warning may be more valid in another way. There may be many

individual, contextual, or cultural differences in the degree to which people believe they

could or should be able to confront any and all challenges successfully. Exaggeration may

reflect such a difference, i.e. the tendency to assume, or be entitled to, a certain level of

success, as if one expects or deserves it.

Regardless, warning about potential foils (vs not), while it discouraged claiming in

general, did not remove relationships between overclaiming and narcissism in a study done

by Paulhus et al. (2003). This is consistent with the findings noted earlier that warning

participants about bogus ads did not affect false recognition rates (Smith & Mason, 1970),

and that the relationship between false knowledge claims and self-perceived knowledge was

not altered by warning about foils (Atir et al., 2015). Consequently, for ecological validity,

simplicity and practicality, the current research does not warn about the presence of foils.

Theoretical Causes of Exaggeration

Why might someone exaggerate their competence? Broadly, we can consider two

kinds of possible causes: motivated and unmotivated.

Someone may be motivated to enhance their self-presentation, to overly state or claim

ability, simply because, socially, it can pay off. Threatened animals often present

themselves as larger or more ferocious to intimidate others. Mate selection often requires

putting one’s best foot, feathers, calls, dancing or behavior, forward to impress the other

sex. Human overconfidence yields status benefits (Kennedy et al., 2013) so exaggeration

may help to intimidate others, or to acquire mates, votes, jobs, or other resources. For

example, Trump may not have won the 2016 election if he acknowledged his several

business failures (Stuart, 2016). Given that relative neocortex size relates to use of tactical

deception (Byrne & Whiten, 1992), humans may be uniquely equipped for exaggeration.

While these examples refer to interpersonal exaggeration, boasting to others, it may be

13

that such strategies get reinforced and internalized, leading to habitual exaggeration even

in the absence of social context.

Alternatively, implausible evidence of ability may appear for unmotivated reasons.

The false recognition of an advertisement despite warning (noted above), suggests that

participants had no motivation to misrepresent. Similarly, careless inattention to survey

responses would not indicate motivation to misrepresent an ability. Finally, perhaps

related, such misrepresentation may be, like any error, due to lower cognitive ability, or

simply poor self-awareness.

Exaggeration as Self-Enhancement

Motivated misrepresentation of competence may simply indicate self-enhancement, or

“tendencies to dwell on and elaborate positive information about the self relative to

negative information” (Heine & Hamamura, 2007, p. 4). Self-enhancement should thus

predict a bias toward claiming ability while denying inability or ignorance, leading to

exaggeration. This was the assumption of early uses of overstatement and overclaiming

tests, that people will overstate or overclaim their competence in an ego-enhancing way.

Self-enhancement is considered to have both a social, interpersonal dimension of

impression management, “the goal-directed activity of controlling information in order to

influence the impressions formed by an audience” (Schlenker, 2012, p. 542), as well as an

intrapersonal dimension of self-deception (Paulhus, 1984): We may be bluffing to others

and ourselves.

The personality trait most associated with self-enhancement is narcissism, named for

the mythological Greek youth Narcissus tragically obsessed with his own beauty:

“Narcissism is arguably the personality construct (and pathological disorder) most

fundamentally defined by chronic pursuit of self-enhancement.” (Wallace, 2011, p. 309).

For example, John and Robins (1994) compared self-perception of performance with peer

ratings and evaluations by a staff of 11 trained psychologists. Self ratings related less to

14

staff evaluations than did peer ratings, and showed substantial individual differences:

“people whose self-evaluations are the most unrealistically positive tend to be narcissistic”

(p. 215). Narcissism is the part of the Dark Triad of personalities (overlapping with, yet

distinct from, scheming Machiavellianism and antisocial psychopathy) distinguished by

self-enhancement (Paulhus & Williams, 2002). This personality trait is typically measured

by the Narcissistic Personality Inventory (NPI) and is considered to have multiple facets,

e.g. “Leadership/Authority, Grandiose Exhibitionism, and Entitlement/Exploitativeness”,

with the first being considered generally adaptive, and the last being most maladaptive

(R. A. Ackerman et al., 2011). This suggests that self-enhancement may have both helpful

and harmful aspects.

As discussed above, exaggerating to others may pay off, but why exaggerate with

nobody to impress? Back et al. (2010) note that the entitlement facet of narcissism is most

attractive at first impression while being most maladaptive in the long term. Thus,

short-term social rewards (boasting so strangers like you) may reinforce a dysfunctional

habit: Exaggerating your self to others may lead you to start believing it. More

importantly, exaggeration in the absence of social reward may be maladaptive.

Related to narcissism is overconfidence, an association that “remained significant in a

regression that included self-esteem, self-efficacy, and self-control in the model: for

narcissism, b = 0.33, t(99) = 3.16, p < 0.01” (Campbell, Goodie, et al., 2004, p. 302). In

that study (and most, as noted above), overconfidence is operationalized as overestimation.

Another, more interpersonal, form of overconfidence is thinking you are better than others,

more precisely called overplacement (D. A. Moore & Healy, 2008), also known as the

better-than-average effect. We know this effect is based on an illusion of superiority because

of the observed reality (yet mathematical impossibility) that more than half of people think

they are better than half the population for several abilities; individuals tend to place

themselves higher, relative to others, than objectively warranted.

In examining why humans overplace their abilities, Burks et al. (2013) compared

15

information-processing biases with social goals and found evidence only for the latter,

concluding: “it is natural to consider the possibility that the roots of overconfidence lie in

the value of over-confidence as a social signal” (p. 979). This effect, however, depends on

social comparison; how people view themselves and how they view others (Guenther &

Alicke, 2010), which introduces several situational influences. Nonetheless, while

overplacement and overestimation are conceptually and methodologically distinct, they

have both been labeled overconfidence, and Macenczak et al. (2016, Tables 1 & 2, p.

115-116) reports a correlation of r = .50 between the two, as well as positive (albeit

weaker) correlations between both those measures and narcissism.

Altogether, in terms of ego-motivated behavior, exaggeration may relate to

self-enhancement as impression management, self-deception, or narcissism, and

overconfidence as overestimation or overplacement. As a kind of solitary self-enhancement,

the tendency to exaggerate in non-social situations should be broadly maladaptive.

Exaggeration as Cognitive Bias

While the label “exaggeration” inherently connotes self-enhancement (as do the terms

overstatement, overestimation, or overclaiming) it is conceivable that incompetence may be

demonstrated with no motivation or goal, having little to do with ego or identity. A survey

respondent claiming to recognize an advertisement they had never seen may be merely

demonstrating a memory malfunction. Given that warning about the presence of foil ads

had no effect on claim rates (Smith & Mason, 1970), and that foils claiming is essentially

unaffected by warning (as noted above), such apparent exaggeration may not be

ego-motivated.

An ability well-known to be influenced by cognitive biases is memory, and probably

the easiest to study is recognition memory. Recognition memory involves distinguishing

stimuli that have been previously experienced from novel stimuli, e.g. given a list of words,

some of which are old (seen before) among others that are new, the ability to distinguish

16

old from new. For example, after reading a list of words (e.g. “person, woman, man,

camera, TV”; Baker, 2020), can someone identify them (among other distractors) later?7

Recognition memory bias, the tendency to identify new items as old, has been shown

to be a stable individual trait (Kantner & Lindsay, 2012; Kantner & Lindsay, 2014),

suggesting that people vary in false recognition reports. That latter paper noted similar

individual patterns for susceptibility to some bias manipulations, e.g. falsely claiming

having seen the word ‘sleep’ when trying to remember related words like ‘bed’, ‘rest’, and

‘night’.8

Cognitive psychology has demonstrated several techniques for manipulating

recognition memory bias, such as the use of discrepant fluency (Whittlesea & Leboe, 2003).

When testing recognition of items, if a new item is unexpectedly easy to process

(discrepantly fluent), it becomes easier to mistakenly think it has been seen before.

Intuitively, some minimal level of fluency is required for any useful exaggeration item: It is

unlikely anyone would claim to have read a book with an unpronounceable title. Extending

that logic, it may be that more fluent items facilitate more exaggeration.

Beyond an overall main effect of fluency, given individual differences in memory bias

(Kantner & Lindsay, 2014), individuals may also differ in susceptibility to fluency cues. For

any given item set, some people may be more prone to false recognition, increasing their

chances for exaggeration. How such cognitive traits relate to personality traits such as

self-enhancement has yet to be adequately studied. However, apparent individual

differences in levels of exaggeration may reflect, at least in part, individual differences in

recognition bias.

7 The Montreal Cognitive Assessment given to Donald Trump actually tested recall rather thanrecognition, but you get the point. Those words he “recalled” probably represented only what he could seeat the moment, not recollections from the test.

8 The Deese/Roediger–McDermott paradigm.

17

Exaggeration as Carelessness

If a job application asked “How often have you used the Wentzel Technique to solve a

budgetary problem?” (a fictitious job skill used by Levashina et al. (2009) to measure

faking, p. 274), someone might think “I have no idea what that is, but it sounds like I

should know it, so I’ll pretend I do”; affirming that foil as an expression of self-enhancement.

Or, someone may think, “I remember using some technique for a budgetary problem, and I

think it started with ‘W’, so that must be it”; sincerely but mistakenly affirming because of

recognition memory bias. Alternatively, someone may not think about the question at all

and carelessly affirm it. All three possible thought processes lead to the same behavior, the

active incompetence of claiming a foil, but for different reasons. To better understand the

careless component, we need to examine what we mean by “carelessness”.

While the concept of carelessness may cover several kinds of undesirable or

unintentional behaviors, for this research (because it used surveys to collect data), its most

relevant manifestation is as a source of invalidity in survey responding. This has long been

of interest to researchers, but has been difficult to clearly define, given that there are so

many possible reasons survey responses may not be what we expect. For example, Bond

(1986) argue that what had been labeled carelessness as a cause of inconsistent responding

to the MMPI9 may really be indecision, thus dramatically changing the interpretation.

Nichols et al. (1989) made the distinction between content nonresponsivity (e.g. ignoring

instructions) and content-responsive faking. Huang et al. (2012) more precisely referred to

“insufficient effort responding” (p. 99).

Meade and Craig (2012) found three fairly distinct latent classes (factors), painting a

multi-dimensional picture of carelessness. Part of the distinction is methodological, because

researchers have explored several ways to measure carelessness in survey responses;

DeSimone et al. (2015) provide an overview of several popular techniques (see their Table

1). Many of these operationalizations of carelessness are based on content of responses:

9 Minnesota Multiphasic Personality Inventory: a venerable, widely-used questionnaire.

18

The use of semantic or psychometric synonyms or antonyms makes assumptions about

which responses should be similar or oppositional in meaning or response patterns. A

related approach uses Mahalanobis distance to detect response patterns distant from the

multivariate normal distribution of all responses. These content-based techniques assume

that all respondents interpret the items similarly, which may lead to a subtle researcher

confirmation bias: unusual response patterns may get pathologized as careless, interpreting

diversity as deviance.

If we remove effects of (apparent) carelessness, are we improving data quality, or

limiting representativeness? Bowling et al. (2016) examined insufficient effort responding

and made the distinction between treating such behavior as a methodological nuisance (e.g.

errors to discount) and seeing it as a substantive variable indicating a trait-like, enduring

individual difference that, in fact, predicted academic performance. Using similar measures,

McKay et al. (2018) examined a wider range of personality traits and found that

malevolent traits showed a stronger relationship with carelessness. Their carelessness

measure having the strongest personality correlates in almost every case was the number of

incorrect responses to instructed items, e.g. not responding “strongly agree” when the

question explicitly said to do so. This response style, disregard for item content, may also

influence foils claiming. Further, M. K. Ward et al. (2017) examined careless responding

and attrition in completing online surveys and found personality correlates for both

measures, suggesting that participants who complete a survey carefully are a biased

sample. Using different measures of carelessness and personality, Furnham et al. (2015) also

found associations between validity of self-reports and personality.

Given that apparent carelessness may signal important individual differences (e.g.

exaggeration), how might we distinguish careless responses from the careless person? One

clear indication that a respondent is not paying attention to a question is when the

response is unreasonably fast. After informing participants that they would be answering

the same questions twice, Wood et al. (2017) found that consistency dropped sharply when

19

response time fell below an average of 1 second per item. While that study (and others)

used aggregate response times (e.g. time to complete a page of questions, or the whole

survey), that can be a poor measure, because it indicates the mean (average) time.

Cognitive psychologists, who regularly use response time measures, know that a better

indicator of central tendency is the median, not the mean, because distributions can be

highly skewed (Rousselet & Wilcox, 2020), e.g. a few very long response times can easily

pull the average away from the peak of the distribution.

Because carelessness (however interpreted) may reflect individual differences relevant

to exaggeration, the current research will take the approach of analyzing all complete

response sets, i.e. make no exclusions due aberrant response style. Where carelessness is

measured, it will involve median response time over several items within individuals.

All of the Above

Each of the three speculated mechanisms above may contribute independently to the

behavior of exaggeration: self-enhancing motivation to misrepresent, cognitive bias in

internal representation, and / or careless disregard for accurate representation.

How might these work together to explain exaggeration behavior? First, there must

be some cognitive fluency to facilitate misrepresentation: An opportunity to volunteer

incompetence must be believable, e.g. it’s easier to imagine covfefe is a word than cffveeo

is. Similarly, a lure (incorrect) option in a multiple-choice test should seem correct to some

test takers. The more fluent or believable a claim is, the more self-enhancing motives can

manifest. A similar interdependence could work for carelessness: It takes more inattention

to claim something unpronounceable.

How might these influences be teased apart? One clue might be processing time:

Self-enhancing misrepresentation requires attentive processing to determine the most

positive presentation, whereas a careless claim can be done hastily. Another clue might be

found in differential item responses: Carelessness should affect all items whereas

20

exaggeration may be more apparent when volunteering incompetence.

None of the Above

Finally, the answer may also be none of the above; there may be other reasons people

exaggerate their abilities. One possibility is the “unskilled and unaware” Dunning–Kruger

effect, which posits that lower ability leads to greater error in self-estimates (Kruger &

Dunning, 1999). This effect has been criticized as an artifact of the better-than-average

effect and statistical regression (Krueger & Mueller, 2002), and evidence on foils claiming

shows the opposite effect. Atir et al. (2015) found positive relationships between knowledge

foils claiming and both genuine and self-perceived knowledge: Knowledge exaggeration

apparently increases when one thinks they know more, genuinely or not. P. L. Ackerman

and Ellingsen (2014) specifically tested this hypothesis, and found that unwarranted claims

of vocabulary knowledge increased with validated knowledge, noting that this was in

opposition to the Dunning–Kruger effect.

In a more general sense, beyond specific skills or knowledge, exaggeration could be a

side-effect of lower general cognitive ability, simply a sign of lower intelligence, so this

should be considered as a potential influence. Along the same lines, poor metacognition

(awareness of one’s thinking processes) may also play a role, given that metacognition

predicts academic performance beyond general intelligence (Ohtani & Hisasaka, 2018). Or,

exaggeration may be a side-effect of complex cultural biases in self-presentation.

Measuring Exaggeration

Probably the most common method for measuring individual differences is to ask

participants a series of question in a survey, so that technique has been employed for the

current research. The two crucial design components of such a survey are the items (e.g.

questions) and the analytic techniques used for scoring responses to those items. Both item

design and analytic choices can influence the integrity of a psychological instrument, so let

us consider how they may pertain to measuring exaggeration.

21

Item Design

For an overstatement test, the items are competency tests with a specific, correct

answer. Including a non-claiming option, such as “I can’t” or “I don’t know”, allows for

three outcomes, success, failure, and not claimed, yielding two degrees of freedom, and thus

two potentially independent variables. While the design of such ability tests can involve

many sophisticated decisions (e.g. 31 guidelines for writing multiple-choice tests, Haladyna

et al., 2002), the most critical issue is that there is one objectively correct answer, with

other possible answers being unequivocally incorrect. For example, if asked who won the

2020 U.S. presidential election, claiming “Trump” would be objectively wrong. That wrong

answer, however, would make a good lure on a multiple-choice test because it represents a

common error, making the question more discriminating.

There is considerable literature on how to best design such objective ability items

(e.g. Haladyna et al., 2002), so we will not consider that here. For overclaiming items,

however, there is a poverty of research on item design.

Consider this overclaiming question on geography: “Which of the following are

countries in Africa? Nigeria, Eswatini, oLpx3w, Nambia, Zanzibar.” Claiming the first

option is a reasonable indication of geography competence, Nigeria being a well-known

name and the most populous nation of the continent. The second option is also correct, but

Eswatini is much smaller and less famous (being formerly known as Swaziland), so claiming

it may indicate a higher level of knowledge, although it may be claimed for other reasons.

The third option (oLpx3w) is a foil, but unlikely to be claimed by anybody paying

attention because it is unpronounceable and disfluent. Claims for such an item might

indicate extreme carelessness, which could be useful for discarding the rare uncooperative

respondent, but unlikely to capture exaggeration. The fourth option, Nambia,10 is also a

foil, but discrepantly fluent or “truthy” (Newman et al., 2012) because it is similar to

10 Which Trump claimed was an African country:globalnews.ca/news/3760873/donald-trump-nambia-namibia/

https://globalnews.ca/news/3760873/donald-trump-nambia-namibia/

22

names of genuine countries (Namibia and Zambia) and easy to read. Such truthiness11

could facilitate recognition ambiguity and thus potentiate exaggeration. Claiming the fifth

option would also be technically incorrect, because Zanzibar is not a country but a

semi-autonomous region of Tanzania in Africa. Having partial knowledge may lead to

mistakenly claiming something a more ignorant person wouldn’t: It is quite possible that

claim rates for foil item Zanzibar would be higher than rates for real item Eswatini. Is it

reasonable to assume that the only meaningful distinction between those five items is that

two are reals and three are foils?

Such differences are not always accounted for in research. After assessing items for

flagging carelessness in survey responses, Meade and Craig (2012)12 decided that “I do not

understand a word of English” and “I sleep less than one hour per night” both identified

10% of respondents as careless. Yet, while the former would be an impossible claim (as

intended), the latter might reasonably be claimed by people exaggerating their insomnia.

While agreeing to “All my friends say I would make a great poodle” was considered a valid

indicator of carelessness (ignoring metaphorical interpretations), claiming “I have never

spoken to anyone who was listening” was discarded as an indicator because of unexpectedly

high levels of agreement. Clearly, even if designed for the same purpose, foils are not

always interpreted as expected.

Foil design for capturing exaggeration presents a paradox; the item should represent

something nonexistent, yet still be alluring to a potential exaggerator. In the above

example, claiming “Nambia” would more likely indicate exaggeration than would “oLpx3w”,

which may only indicate extreme inattention or disregard. However, is “Zanzibar” truly

capturing exaggeration, since it might be claimed by people with partial knowledge?

Researchers wanting to capture exaggeration more than carelessness face a treacherous

temptation to create items of ambiguous validity, but this approach can backfire.

11 As used by Stephen Colbert: en.wikipedia.org/wiki/Truthiness.

12 Table 1 on page 5 of that article lists these items.

https://en.wikipedia.org/wiki/Truthiness

23

As an example of inappropriate foil design, Fell and Konig (2018) attempted to

measure “Academic Faking in 41 Nations” by asking secondary school students around the

world to rate their knowledge of terms from mathematics. Among those were three

fabricated terms (foils), one of which was “proper number”, which is very similar to the

genuine math concept of “proper fraction”, especially if one considers fractions as numbers.

Their data13 show that this foil item empirically behaved more like a real math term, with

more claims of knowledge than ignorance, suggesting that many students appropriately

recognized the concept and graciously allowed for some ambiguity in expression.

Interpreting such partial knowledge as faking seems unjustified.

Another example is the use of “ultra-lipid” as a foil for capturing exaggeration of

science knowledge (Paulhus & Bruce, 1990). Unfortunately, the term can be found via

Google search to be a genuine term used to market cosmetics and in an article in

Comparative Clinical Pathology (Safat et al., 2018). Claiming it may not indicate the

knowledge the researchers had imagined, but it still may indicate knowledge more than

exaggeration.

As those examples illustrate, the real / foil distinction is less categorical than

continuous, an issue not adequately addressed in existing research. Foils with the highest

claim rates may be altered, creative, unofficial (e.g. slang), or rare indicators of genuine

competence, just not what the researchers expected. At the same time, foils must be

seductive enough to avoid floor effects in claiming. For example, over a range 0 to 4,

Bynum and Davison (2014) reported both mean and standard deviation of foils claiming

being 0.51, suggesting compressed variance which would limit power of the measurement.

When foils claiming reaches zero, how is exaggeration measured?

The choice of real items also presents challenges. Without an objective test (as with

overstatement), there is no assurance that a real item is claimed based on ability rather

13 Available as “Codebook for student questionnaire data file” atwww.oecd.org/pisa/pisaproducts/pisa2012database-downloadabledata.htm. See item ST62Q04.

http://www.oecd.org/pisa/pisaproducts/pisa2012database-downloadabledata.htm

24

than exaggeration. Making real items too easy could lead to ceiling effects, leaving the foil

items conspicuous by contrast. Alternatively, if reals are too difficult, some effectively act

as foils, and this distinction will vary by individual ability.

Ideally, meaningful claiming of real and foil items should show some distinction. For

convergent validity, reals claiming should correlate with valid demonstrations of ability,

while foils claiming should relate to errors of commission, e.g. choosing a wrong answer

instead of admitting ignorance. As divergent validity, while reals and foils claiming may

necessarily relate (given the evidence discussed above) the overlap should be small.

The historical use of foils and the above discussion highlights several potential reasons

for claiming foils. The difficulty is that real items may be claimed for the same reasons, in

addition to indicating competence. Even with optimal item design, care must be taken in

analysis to disentangle these shared influences.

Analytic Issues

Both the overstatement and overestimation approaches provides a clean,

well-accepted measure of actual ability: the number of successes, or percent correct.

However, as noted above, difference scores from either approach do not separate

exaggeration of ability from the ability itself.

With overclaiming, one can easily calculate claiming rates for both reals and foils,

knowing there is no methodological constraint linking these two measures. On the surface,

this might seem ideal: Reals rate indicates ability and foils rate indicates exaggeration.

However, without some mechanism to validate claims on reals (as done with overstatement

or overestimation), how do we know which reals claims are not exaggerations? Likewise,

how do we know that foils claiming is not related to competence?

P. L. Ackerman and Ellingsen (2014) addressed these questions, within a larger goal

of testing accuracy of self-estimates of vocabulary ability. Kirkpatrick (1907) had

developed a simple vocabulary test in which respondents marked a ‘+’ or ‘−’ beside a list

25

of 100 words to indicate which they knew or did not, respectively, then, without warning,

tested understanding of words marked as known. P. L. Ackerman and Ellingsen (2014)

built on this method, which is essentially an overstatement test, because foils were not

used. However, the term overclaiming was used for claiming knowledge of a word that

could not be adequately defined in the later test. Such active incompetence claims were

called false alarms while claims later validated on the test were called hits. The researchers

reported that overall knowledge claims correlated similarly with hit rates (r = .79) and

false-alarm rates (r = .79), and that hit and false-alarm rates also correlated significantly,

at r = .24 (all p < .01). This would suggest that self estimates of ability are fairly accurate

but also influenced by exaggeration, and that exaggeration increases slightly with ability.

The researchers note that this finding is in opposition to the well-known “unskilled and

unaware” Dunning–Kruger effect which posits that lower ability leads to greater error in

self estimates (Kruger & Dunning, 1999). Instead, self-image error may increase with

competence.

Consistent with this, Atir et al. (2015) found that higher self-perceived knowledge

predicted more foil claiming, i.e. greater confidence meant more overclaiming of knowledge.

This relationship existed even when controlling for level of knowledge, and when being

warned of the presence of foils. By manipulating self-perceived knowledge via an either

easy or hard pre-test, they also showed a causal relationship: The easy test increased

subject confidence and overclaiming.

The relationship between reals and foils claiming may be even more complicated: In

an examination of faking in a genuine job application (with warning that faking could be

detected and penalized), Levashina et al. (2009) introduced three foil items (e.g. asking

applicants how often they have used a fictitious technique) and found that impossible

ability claiming increased with genuine claiming, but was negatively related to mental

ability. Yet, as number of foils endorsed increased, so did the positive relationship between

genuine claiming and both job knowledge and verbal ability (from about r = .20 to

26

r = .40). The paper concluded that “job candidates with higher levels of mental ability

might fake in less detectable ways” (p. 279).

Clearly, the behavior of claiming foils is not always independent of the claiming of

real items. Claiming of either reals or foils may reflect any of the factors noted above

(self-enhancement, cognitive bias, carelessness, etc.), appearing as an indiscriminate

response bias.

For overstatement or overestimation approaches, there may be similar issues

confounding genuine and exaggerated claims. Ability claims may also be susceptible to

fluency effects, carelessness, partial knowledge, or poor item design. In a multiple-choice

test, number correct will be affected by chance. Difference scores used for overestimation

(and sometimes for overstatement, e.g. Brogden, 1940) have long been criticized (e.g. Peter

et al., 1993; Edwards, 1994).

A Unified Approach to Assessing Exaggeration

The above discussion summarizes how different methodologies — overstatement,

overestimation (a form of overconfidence), and overclaiming — have all simultaneously

gathered both evidence for competence and mistaken self-image, yet we can note an

interesting contradiction. The overstatement and overestimation techniques produce results

suggesting that the discrepancy between imagined and actual ability decreases with

competence, e.g. the r = −.69 found between overestimation and performance on Raven’s

Progressive Matrices (Duttle, 2016). However, overclaiming approaches (e.g.

P. L. Ackerman & Ellingsen, 2014; Atir et al., 2015) tend to find a positive relationship

between the active incompetence of foils claiming and genuine competence (assessed

independently). Does exaggeration, mistaken self-image of ability, decrease or increase with

genuine ability? The contradictory results found in the literature may be a result of not

properly isolating exaggeration from competence.

To address this, and to integrate those methodologies, this paper proposes a unified,

27

linear regression approach for measuring exaggeration:

1. In the same test, gather repeated evidence of competence (e.g. correct answers, reals

claiming), and active incompetence (e.g. incorrect answers, foils claiming).

2. Statistically remove common variance by finding the residuals of predicting

incompetence from competence.14

Let the resulting measure be called the Residualized Exaggeration Index (RExI). The

idea here is that there may be many common influences driving expressions of either

competence or incompetence, such as exaggeration, cognitive bias, partial knowledge,

carelessness or other response bias. The residuals capture what is not common to the two

measures, but what is unique to the behavior of active incompetence. The RExI is thus

guaranteed to be uncorrelated with evidence of competence. In this way, it represents

exaggeration of ability unrelated to the ability being exaggerated.

It is important to note that what the RExI captures, while conceptually related to the

connotations of exaggeration, overconfidence, overstatement, overclaiming, etc., is more

precisely the error variance of self-perceived competence, which may arise from various

causes. Thus, the RExI is more a technology to isolate useful information about

self-perception than a theory-driven operationalization of a hypothetical construct. This

bottom-up approach avoids some potential researcher biases: The goal is not to validate a

theory as much as to understand a behavior by isolating its effects. While the RExI serves

as a standalone measure of individual differences, for regression modeling, simply include

competence evidence as a control variable; the standardized β for active incompetence then

indicates the influence of exaggeration.

The RExI can be extracted from any overstatement test by finding the residuals of

predicting the number of failed attempts from the number of successful attempts.

14 More precisely, here is computer code for calculating the index, using the statistical programminglanguage R (R Core Team, 2020):RExI <- resid(lm(scale(Incompetence) ∼ scale(Competence), na.action = "na.exclude"))

28

Furthermore, any objective test (e.g. a math quiz or multiple-choice test) can be converted

to an overstatement test by adding a non-claiming option to each question, e.g. the option

to respond “I don’t know”. This requires the test taker to self-assess their specific

competence at the moment they are addressing each question.

Similarly, for overestimation, find the residuals of predicting self-estimated number

correct from actual number correct.15 For both overstatement and overestimation methods,

the RExI will, by definition, be uncorrelated with demonstrated ability. This allows the

conventional ability measure (e.g. number correct on a test) and the RExI to both be used

as independent assessments.

(One might logically consider using a complementary approach for assessing

competence, e.g. finding the residuals after predicting number correct from estimated

performance. The complication is that this residualized ability measure is no longer

uncorrelated with the RExI, meaning that one no longer has separate measures of

competence and exaggeration. Given that the conventional estimate of competence,

number correct, has shown widespread utility and acceptance for over a century, there is

little reason to change that now.)

To assess exaggeration with an overclaiming inventory, find the residuals of foils

claiming rate predicted from reals claiming rate. This addresses the several issues with foils

noted above, because common influences to claiming any item (i.e. bias) is removed.

Unlike overstatement, however, an overclaiming approach does not provide a verifiable

measure of competence unrelated to exaggeration.

Residuals have been used in other research to remove confounding influences. An

influential example in the study of self-enhancement is the work of John and Robins (1994),

in which their self-enhancement indeces are residuals of self-ranking after removing

variance from ratings by peers or expert observers (Table 6). That research, like much on

15 However, as noted above, overestimation is a far less psychologically direct method of capturingimagined ability, and so not recommended for capturing exaggeration.

29

self-enhancement, compares self-perceptions (S) against perceptions of others (P ), and/or

with others’ perceptions of the self (O).

Krueger and Wright (2011) thoroughly discuss the many challenges arising from

deriving self-enhancement from those three measures, and from various analytic approaches

to combining them, including use of difference scores and residuals. That work considers

two contexts for measuring self-enhancement: an intrapersonal comparison of self to

perceived others known as social comparison theory (Festinger, 1954; Suls & Wheeler,

2013), and an interpersonal comparison using an observer-based paradigm, the social realist

approach (Funder, 1995; Kenny, 2004). The former frames self-enhancement as thinking

myself better than how I perceive others (S − P ), while the latter considers how I see

myself compared to how others see me (S −O). In both cases, there is a discrepancy to

measure, but from different reference points. The authors note that the social comparison

theory considers self-enhancement as beneficial (the Taylor and Brown hypothesis), while

social realist theory sees it as detrimental.

Curiously, that work introduces reality measures (R) such as test scores without

considering if the psychology of S −R self-enhancement differs from the S − P or S −O

framings. The S −R discrepancy is what exaggeration captures, avoiding the biases and

errors inherent in P and O measures which are enmeshed in social comparison. The lack of

social context for exaggeration suggests it may not fully fit under the umbrella of

self-enhancement, at least as conventionally studied.

More relevant to the current research is the inflation approach used by Anderson

et al. (1984). That research administered examinations to job applicants that included

self-assessments on a variety of job skills, some of which were nonexistent bogus (foil)

items. This was essentially an overclaiming test of job skills, and in their final analysis they

used linear regression to predict an objectively-measured job skill (typing performance)

from the two types of skill claiming, showing incremental validity from the foils claiming,

greater than the predictive validity of reals claiming alone. That study (like many dealing

30

with bias in self-report) focused on correcting estimates of some criterion, rather than using

the index to measure a separate psychological process.

Claiming that the RExI approach is fundamentally new would clearly be overstating

the case. However, previous literature tends to not consider exaggeration as a distinct

phenomenon, presumes it represents some pre-determined theoretical construct, or fails to

measure it cleanly. A goal of this paper is to present evidence that exaggeration deserves to

be examined separately from existing constructs of self-enhancement or cognitive function.

Exaggeration may be a functional conglomeration of several constructs, but it is worth

remembering that constructs are just theories, and exaggeration manifests as a reliable

reality. Investment in theory may explain why both overestimation and overstatement

literatures have persisted for so long with little recognition of their inherent contradictions.

Ironically, an exaggerated sense of knowing may have kept researchers from exploring what

exaggeration is.

The RExI thus provides a methodological integration uniting overstatement,

overestimation and overclaiming approaches, providing a comparable measure of self-image

error distinct from competence and other common influences.16 Armed with this technique,

we can explore the impact exaggeration has on more global performance, and what factors

may relate to it.

Current Research

If the RExI addresses contradictions in previous approaches, those approaches cannot

be used to consistently validate the RExI. If previous results were influenced by

competence, then removing that influence may result in weaker or null effects. Ability

tests, from school exams to IQ assessments, have a well-established history of predictive

validity using the number correct as the signal. Because the RExI removes such signal, it is

entirely possible that what is left over is essentially noise. The central research questions,

16 An approach to overclaiming purporting similar distinction, called the Overclaiming Technique (OCT), isdescribed in the Appendix.

31

then, are whether there is any useful new signal in the RExI, and if so, whether it is easily

explained away, or is something new.

As an initial exploration, the current research relies on convenience samples of

undergraduate students. For this population, a relevant ability to study is knowledgeability,

the ability to answer simple questions of fact. To be ecologically valid, the main outcome or

dependent variable (DV) used here is academic performance, which captures not just

knowledgeability in general, but a broad range of skills and abilities relevant to success in

life. The breadth of this DV means that expected effects should be small, but if still

significant, would indicate a meaningful relationship with broad implications.

Validation Criteria

To show that a measurement captures something useful, we need to show that it a)

has expected similarities (convergent validity), b) has expected differences (divergent

validity) and, c) tells us something we didn’t already know (incremental validity).

Convergent Validity. Following the connotations of the terms exaggeration,

overstatement, overestimation, overclaiming or overconfidence, we should expect that the

RExI, representing error in self-image, should indicate impaired performance of the ability

being exaggerated. Additionally, because the discrepancy captured is in excess of

competence, this should relate to self-enhancement, and, such discrepancy may be

facilitated by cognitive biases.

Broader Performance. An unrealistic view of one’s ability should predict

impairment of that ability. This is the logic behind preventing drunk driving: Even though

an inebriated driver may not have caused harm (yet), their exaggerated sense of ability to

drive predicts potentially catastrophic performance failure. Similarly, for students,

knowledge exaggeration should predict lower knowledge (academic) performance. For

exaggeration to be meaningful, it should generalize: Someone who can’t walk a straight line

probably can’t drive a car. Likewise, exaggeration of knowledge in a narrow domain should

predict impairment of broader academic performance, ideally, even if the knowledge being

32

exaggerated does not. Thus, when given even a trivial knowledge test, exaggeration

demonstrated there should predict lower academic performance overall.

Self-Enhancement. Beyond performance impairment, to fit an intuitive notion of

exaggeration, the RExI should align with self-enhancement: an exaggerated, narcissistic

sense of self. While self-enhancement is a fairly broad construct typically assessed via

self-reports, the RExI, being a behavioral measure, may relate in only some narrow, specific

ways. If exaggeration predicts performance impairment, it should relate more to

maladaptive aspects of narcissism, such as entitlement, perhaps because one believes they

deserve success. Exaggeration as an unrealistically positive self-view should also relate to

overconfidence as overplacement, i.e. seeing oneself as better-than-average.

Cognitive Bias. A less motivational and more “innocent” explanation of

exaggeration may be bias in information processing. Of the several heuristics that veer

from rational expectations (e.g. Kahneman et al., 1982), recognition memory bias is a good

starting point to compare with exaggeration, given the memory error findings noted above.

Alternatively, performance on a memory test may exhibit exaggeration as would any other

ability, which should have similar relationships.

Divergent Validity. While relationships between a RExI and performance and

self-enhancement would confirm an intuitive understanding of the measure, and

relationships with cognitive bias help explain some of the mechanism, such convergent

validity only paints part of the picture. The boundaries of the picture, evidence of

divergent validity (i.e. what exaggeration is not), should also be considered. Because the

RExI is based on residuals after removing competence variance, it may be influenced by

other, unexpected factors.

Carelessness. Carelessness, the ever-present threat to any survey validity, may

appear as exaggeration. Simple lack of attention can lead to invalid responses, and such

behavior could contaminate any measure, especially the RExI, because it removes variance

attributable to competence. However, carelessness as a substantive variable, an enduring

33

individual difference (Bowling et al., 2016), may explain part of exaggeration, and should

replicate that paper’s finding, predicting lower academic performance. Thus, carelessness

may be a meaningful component of exaggeration, but should not be the dominant one.

Other Explanations. If exaggeration affects performance, then it should not be

easily explained by other obvious predictors of performance. For the relationship between

knowledge ability and academic performance, the RExI design rules out influence from the

knowledge being exaggerated. Beyond that, general cognitive ability should also be ruled

out to show that exaggeration is not simply a side-effect of lower intelligence. Following

that logic, metacognition (awareness and management of cognitive processes) should also

be ruled out, as that is also a reliable predictor of academic outcomes (Ohtani & Hisasaka,

2018).

Cultural effects in psychology are often overlooked, leading to poor inferences of

generalizability (Henrich et al., 2010). While it is far beyond the scope of this paper to

consider the many known differences between cultures, given that the convenience samples

used in the current research are all university undergraduates in Canada, perhaps the most

relevant distinction is between Western and non-Western cultural backgrounds. That

difference, and sex, are two control variables considered in all studies.

Incremental Validity. If the impact exaggeration has on performance can be largely

explained by variables considered above, then the behavior of exaggerating one’s ability

will be better understood. If not, then the RExI may represent something distinct worth

further exploration. Because overall cognitive ability and memory performance should

logically affect academic performance, and substantive carelessness has also been shown to

lower academic performance (Bowling et al., 2016), all these should be considered in

examining the relationship between knowledge exaggeration and knowledge performance. If

a distinct relationship holds, even after further control for sex and basic cultural variables,

that would suggest that the RExI captures an important, but overlooked, non-cognitive

variable explaining academic performance.

34

Study 1: Proof of Concept

The main goal of this study was to establish that the RExI captured information

relevant to performance. Because insufficient effort responding (IER) has related to lower

academic performance (Bowling et al., 2016), this study sought to minimize such influence

by design. By using only students intrinsically motivated to complete the study for no

other reason than feedback about their personality, this initial exploration selected only

participants who, ostensibly, cared about their results and thus their responses. Some basic

personality, cognitive and metacognitive measures, were included as controls.

Study 2: Validating the RExI

Study 2 was designed to replicate and extend Study 1, using better measures and a

larger, broader sample. Overall university Grade Point Average (GPA) was used to measure

academic performance more broadly, accurately, and reliably. Exaggeration was derived

from a large, popular inventory of overclaiming items, self-enhancement captured through

measures of narcissism, impression management, self-deceptive enhancement and

self-deceptive denial, and recognition memory was tested with a large battery of items. To

examine how exaggeration, or its relationship with performance, overlaps with general

cognitive ability, a commercial IQ test was included as a control measure.

Study 3: Developing Better Measures

Having validated the RExI approach by re-purposing an existing overclaiming

inventory (in Study 2), Study 3 tested instruments designed specifically to capture

exaggeration. To capture a relevant, broad ability that university students might want to

exaggerate, English vocabulary was chosen as a knowledge domain to assess. An

overstatement test was developed by adding a non-claiming option to a commonly used

multiple-choice test. Addressing the issues raised above about overclaiming item design,

techniques informed by computational psycholinguistics and cognitive psychology were

35

employed to develop overclaiming items optimized for measuring exaggeration. Both of

these new instruments were empirically examined to confirm their suitability for measuring

exaggeration.

Study 4: Robustness of the RExI

Study 4 was designed to replicate and extend Study 2 by examining multiple different

measures of knowledge exaggeration. Retaining a briefer version of the exaggeration

measure used in Study 2 for comparison, the novel instruments from Study 3 were added to

see if the RExI could capture exaggeration similarly across different abilities, content, and

format. Hypothetically, all exaggeration instruments should show similar relationships with

other relevant measures.

To better understand what exaggeration means, self-enhancement aspects were more

precisely targeted as entitlement, overplacement, and intellectual humility. To examine the

link with carelessness and cognition more closely, special software was developed to capture

individual item response times, and implement a novel technique to detect motivated

carelessness, as persistent, intentional rushing of responses.

Altogether, the following studies examine how the RExI approach of separating the

effects of self-image from competence, the exaggeration of ability from the ability being

exaggerated, may yield a more accurate picture of what is connoted by “overconfidence”

than have previous attempts using overstatement, overestimation or overclaiming.

36

Reporting Conventions

Throughout this paper the following conventions for statistical reporting are adopted

and explained here for convenience.

Correlations are Pearson product moment, which are equivalent to point biserial when

one variable is dichotomous. Statistical significance is always two-tailed and is marked as

follows in text: *p < .05, **p < .01, ***p < .001. Group mean differences are shown using a

conservative t-test (assuming unequal variance, estimated separately for both groups, using

the Welch modification for degrees of freedom), followed by effect size (Cohen’s d). 95%

Confidence Intervals are shown in square brackets. Regression models always show

standardized beta (β) coefficients in order to compare the relative impact of predictors.

For thoroughness, this paper presents some large correlation tables, with hundreds of

elements. To facilitate compact representation and visual distinction, results shown in these

tables follow a different convention. Statistical significance is indicated by font intensity:

p >= .05, p < .05, p < .01. Where appropriate, Cohen’s α is shown in italics on the

diagonal for unidimensional measures of more than two items. For RExI measures, a similar

measure of internal consistency was calculated by correlating RExIs derived from half the

items with the same from the other half. These halves were randomly selected 1000 times

and the correlations averaged via Fisher transformation to estimate overall reliability.

In correlation tables, to concisely describe distributions, the M (SD) column reports

the mean (standard deviation) of data that has been normalized to a range of 0 to 1. This

choice is similar to the percent of maximum possible (POMP) approach advocated by

Cohen et al. (1999). That 0 to 1 range represents the theoretical limits of bounded

measures (e.g. 0 to 100 for grades, 1 to 7 on a Likert scale) and empirical extremes

otherwise. By scaling all data to the same range (for table reports), this convention allows

easier comparison of distributions and better appreciation of skew and dispersion. Thus,

(for example) standardized distributions (e.g. RExI measures) which are centered on zero

will show a positive mean here, which then indicates how far the center of the distribution

37

is from the extremes (0 and 1), providing information about skew that is commonly

overlooked.

Throughout, gender / sex measures have been collapsed to dichotomous, with 0

representing identification mostly as female, and 1 representing identifying mostly as male.

Similarly, “Native English” is 1 if English was reported as a first language, 0 otherwise, and

culture variables are 1 if 10 or more years lived in English / Western countries, 0 otherwise.

For all studies using student populations, these basic demographic measures were used as

controls, but age was not recorded because such variance is often small with potentially

misleading outliers, and ethical considerations recommend against collecting unnecessary

personal information.

All data was gathered via the Qualtrics survey platform (www.qualtrics.com), with

analysis done using the R statistical programming language (R Core Team, 2020) in the

RStudio development environment (RStudio Team, 2019), using LATEX for document

preparation (Lamport, 1986).

https://www.qualtrics.com

38

Study 1: Proof of Concept

Unlike most research using overstatement, overestimation (overconfidence), or

overclaiming approaches, the RExI removes evidence of ability from the measure of

exaggeration of that ability. The primary goal of this first study was to determine if this

residualized index captured useful information about broader performance of ability.

Within practical limits, this study also included control measures of cognitive and

metacognitive abilities, general personality, and basic demographics.

While carelessness is an important validity concern in any survey-based study, it

remains a somewhat nebulous concept lacking convenient, reliable, valid measurement, as

discussed earlier. The approach taken for this initial study was not to atomistically

measure carelessness, but rather to wholistically, ecologically minimize it, using motivation

and accountability.

Typical psychological experiments solicit participants by offering some financial

compensation or, as is common for many convenience samples of undergraduate students,

course credit. A long line of research has suggested that “tangible extrinsic rewards tend to

undermine intrinsic motivation for rewarded activities” (Deci et al., 2001, p. 1) which

could, ironically, mean that common research procedures actually encourage careless

responding: Any intrinsic desire to participate in research might be compromised by

framing it as paid labor. Both McKay et al. (2018) and M. Ward and Meade (2018) argue

that intrinsic motivation is important for attentive, accurate survey responding.

Another influence is accountability: After reviewing several techniques for dealing

with careless responding, the first recommendation of Meade and Craig (2012) was to

systematically reduce situational carelessness by removing anonymity and using identified

surveys. It is fairly easy to increase a sense of accountability simply by asking for

identifying information. However, how might intrinsic motivation be increased, especially

among students already overwhelmed by surveys from unknown researchers?

Of the many innovations flowering in the garden of the World Wide Web, free

39

“personality” tests appear to be persistent and flourishing weeds. People happily answer

questions in order to learn, for example, “What Type of Dragon Are You?”17 (or myriad

other mythical beings) in the pursuit of dubious self-knowledge. Researchers have taken

advantage of this intrinsic motivation to do extensive legitimate personality research online

at low cost (Joinson et al., 2007). For the present research, if someone completes a survey

solely to find out what their answers mean, they probably care about the responses they

give.

Additionally, only participants who submitted their email address in order to receive

personalized feedback were included. Thus, by relying only on intrinsically motivated,

explicitly accountable respondents, this study was designed to minimize the carelessness

typically found in convenience samples.

This design, however, comes at a cost. The survey had to be reasonably brief, and

focus on measures from which meaningful feedback could be given. (There is also the

burden of delivering on the promise of personalized feedback.) Nonetheless, it served the

purpose of conservatively testing if the RExI captures useful information.

Method

All data was gathered over the course of one term via online survey. Median recorded

completion time was 21.2 minutes, with 90% completing in 45.8 minutes.

Invitations to participate in an online survey were given to three local undergraduate

science classes (roughly 594 from Biology, 1004 from Chemistry, and 151 from Earth and

Ocean Sciences), with clear instructions that participation was completely voluntary, with

no compensation (e.g. cash or course credit) offered. The only incentive was that the study

was designed to measure aspects of “Academic Personality” with the promise of

personalized feedback for any student who completed the survey and provided their email

address. The pitch was that the survey would measure individual differences relevant to

17 A Google search of that question yields hundreds of millions of results:www.google.ca/search?q=What+Type+of+Dragon+Are+You

https://www.google.ca/search?q=What+Type+of+Dragon+Are+You

40

scholastic success and then tell students where they stood compared to others. (This was

not an empty offer: Personalized feedback was delivered at the end of data collection and

response to that, though meager, was entirely positive.)

Participants

The offer was successful in getting 537 students to show interest in the survey, but a

considerable portion did not persist to reasonable completion or provide an email address

for personalized feedback, leaving 316 usable records. This sampling, of course, represents a

fair degree of self-selection bias, but in the intended direction: These students cared enough

(at least about themselves) to complete the survey solely for the purpose of seeing their

results. Arguably, such self-selection may not be representative, and respondents may even

be misrepresenting themselves (in hopes of a flattering self-portrait) but it is very unlikely

they were careless, at least by choice.

Overall, 49% of the students reported English as a first language, 29% identified as

male, and 74% reported having lived 10 or more years in English-speaking countries (each

of those variables being dichotomous). Of the three courses, 58% were in 1st year biology,

31% were in 2nd year chemistry, and 12% were in 3rd year earth and ocean sciences

(rounding created a sum > 100). Age was not solicited.

Measures

To deliver on the promise of assessing “Academic Personality”, some measures were

included in order to provide meaningful feedback but are not reported; analysis including

those variables did not change results presented here.

Demographics. Participants were asked which gender they most closely identified

with, the age they became fluent in English, and the number of years lived in

English-speaking countries. These were all collapsed to dichotomous variables as described

in Reporting Conventions.

41

Course Grades. Instructors of the three science courses provided final course grades

after the term had ended. In procedures approved by the institution’s ethics review board,

survey respondents willingly provided their unique student numbers so that records could

be merged. To compensate for differences in grading between courses, grades were

standardized within each course. This was the central dependent variable, a measure of

academic performance.

Self-Report Secondary School GPA (SS GPA). Students were asked “As a number

out of 100, what was your overall high-school average (your GPA)?”.

Cognitive Reflection Test (CRT). The original 3-item test by Frederick (2005) was

expanded to 7 items by Toplak et al. (2014), all of which were used here. The original “A

bat and a ball cost $1.10 in total” question, having become so well known (and

economically out-dated), was re-worded to “A graduation cap and gown cost $110 in total”

to avoid cuing suspicion. Altogether, these conceptual math problems provide two scores:

number correct (a brief measure of cognitive ability relevant to science ability) and number

of incorrect intuitive answers given (a measure of cognitive impulsivity or lack of reflection,

possibly relevant to exaggeration). Note that these two scores will necessarily correlate

highly negatively.

Overestimation. After the CRT questions, respondents were asked to indicate how

many of those questions they thought they got correct. Overestimation was calculated as

the excess of that self estimate over number correct.

Personality. Probably the best-known short measure of the Five-Factor model of

personality is the Ten-Item Personality Inventory (TIPI) (Gosling et al., 2003). Agreement

/ disagreement with items was gathered by using a 6-step Likert scale to avoid the

ambiguity of mid-point responses. Here it provided a basis for feedback to students and for

examination of personality correlates of knowledge exaggeration.

Metacognition. A subset of 22 items from the Motivated Strategies for Learning

Questionnaire (MSLQ) developed by Pintrich (1991) were selected based on their reported

42

correlations (|r| >= .22) with school grades and used as a general measure of academic

metacognitive skills. Individual aspects (e.g. Self-Efficacy for Learning, Metacognitive

Self-Regulation, Test Anxiety) also provided information for student feedback.

Eight items on Fixed versus Growth mindset (Dweck, 2006) were also included in

order to provide feedback to students, based on Dweck’s evidence that more Fixed

mindsets may inhibit learning. Hypothetically, one might expect a Growth mindset to be

more willing to admit ignorance and thus exaggerate knowledge less.

Knowledge Exaggeration. An overclaiming inventory of 50 words (25 genuine

English words of moderate difficulty and 25 foils) that had been useful in previous

(unpublished) studies on overclaiming was employed here to measure knowledge

exaggeration. Real word meanings were not particularly related to science or any academic

field, being chosen simply because they represented vocabulary most undergraduates

should be familiar with. Participants were prompted “Please tell us which English words

you know or don’t know” with a binary choice for each word. Reals and foils rates were

calculated as the number of items of each type chosen. As described earlier, the RExI was

calculated as the residuals of predicting foils rate by reals rate.

43

Table 1: Correlations between Study 1 measures

Measure M (SD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

1 Course Grade .74 (.15) —

2 RExI .18 (.19) -.16 .77

3 Foils Rate .18 (.20) -.16 1.00 .87

4 Reals Rate .60 (.27) .07 .00 -.03 .91

5 Sex (M+) .29 (.45) -.05 -.07 -.07 .22 —

6 Native English .49 (.50) .08 -.06 -.07 .39 .08 —

7 English Culture .74 (.44) -.00 -.19 -.20 .45 .11 .50 —

8 Self-Report SS GPA .91 (.06) .29 -.08 -.08 .08 -.10 .14 .11 —

9 CRT Correct .66 (.30) .23 -.17 -.17 .10 .16 -.09 -.14 .14 .75

10 CRT Intuitive .25 (.25) -.20 .16 .16 -.07 -.13 .10 .16 -.07 -.90 .67

11 CRT Overestimation .47 (.19) -.10 .20 .20 -.04 .00 .12 .08 -.12 -.79 .76 —

12 Overall MSLQ .64 (.12) .28 .04 .04 .09 .04 -.01 -.06 .17 .17 -.14 .03 .64

13 Fixed Mindset .27 (.22) .06 -.03 -.03 -.12 -.00 -.07 -.10 .09 .01 .04 -.02 -.32 .93

14 Growth Mindset .71 (.21) -.13 .05 .05 .10 .02 .03 .07 -.12 -.03 -.01 .04 .30 -.76 .93

15 Openness .54 (.14) .13 -.14 -.14 -.07 -.07 -.00 -.01 .11 .09 -.14 -.10 .14 -.02 .04 —

16 Conscientiousness .54 (.13) .04 -.08 -.08 .03 .06 -.00 .11 .09 .09 -.05 -.03 -.02 .03 .09 .02 —

17 Extraversion .59 (.11) -.08 -.04 -.04 -.13 -.01 -.09 -.00 .05 -.03 .05 .11 -.03 -.10 .11 -.02 .16 —

18 Agreeableness .60 (.16) -.01 .12 .12 -.12 -.01 -.06 -.10 -.07 -.11 .13 .15 -.07 -.00 .03 -.02 .11 .17 —

19 Emotional Stability .56 (.12) .10 -.06 -.06 .05 -.11 .08 .06 .07 -.09 .10 .02 -.02 -.10 .05 -.01 .10 -.00 .02

Note: N = 316. Sex coded as binary, Male high. SS GPA: Secondary School Grade Point Average. CRT: Cognitive ReflectionTest. MSLQ: Subset of the Motivated Strategies for Learning Questionnaire. RExI: Residualized Exaggeration Index. Cohen’s α(bootstrap approximated for RExI measures) shown in italics on diagonal. M (SD) for range of 0 to 1. Probabilities shown as p>= .05, p < .05, p < .01 (see Reporting Conventions).

44

Results

Predictive Validity

Table 1 shows correlations between study measures. Demographic measures (sex,

being a native English speaker, and having an English cultural background) did not relate

significantly to course grades.

Exaggeration. On the vocabulary overclaiming inventory, claiming for reals and foils

was uncorrelated (r(314) = -.03, 95% CI [-.14, .08]), with Cohen’s α for reals being .91 and

for foils, .87 (as shown on the diagonal). The absence of correlation between reals and foils

claiming indicates no inherent common variance that might have been caused by

carelessness or other response bias, suggesting that the overall design did suppress

inattentive responding. The RExI derived from this inventory significantly predicted lower

course grades, to the same degree as foil claiming alone did. This is because reals claiming,

which showed no relationship with academic performance, was also unrelated to foils

claiming, so the RExI had no competence evidence to remove. This pattern also suggests

that exaggeration alone can be a meaningful predictor of performance, even when

competence is not. The correlations between reals claiming and demographic measures

suggest that some genuine English vocabulary knowledge was being captured, even if not

relevant to science grades.

Cognitive Ability. While self-report SS GPA showed an expected relationship with

course grades, the lack of relationship with exaggeration may be due to it being self-report,

and thus potentially influenced by exaggeration, i.e. exaggerated self-report compensates

for exaggeration-caused lower performance. The CRT scores showed similar relationships

with higher academic performance and lower exaggeration, suggesting a more general

relationship between exaggeration and performance impairment.

Overestimation. Overestimation of performance on the CRT test was related to the

RExI, but not significantly to course grades, suggesting it was not as effective in capturing

error in self-perception. Consistent with the hard-easy effect in overestimation literature,

45

there was a strong negative relationship between the overestimation index and the ability

being overestimated.

Metacognition. The 22 items selected from the MSLQ for their relationship with

academic performance behaved as expected here, relating to course grades and cognitive

ability measures, but not to exaggeration. The Growth and Fixed Mindset measures

related to the MSLQ metacognition measure in an expected way, in that metacognitive skills

should relate more to a growth orientation. The negative relationship between grades and

growth mindset may be due to restriction of range in the sample (i.e. these students are

already proven to be high achievers), and that science students may find growth items (e.g.

“I have the ability to change my basic intelligence”) contrary to what they’ve been taught.

The lack of relationship between these metacognition measures and exaggeration suggests

some divergent validity.

Personality. Like other measures that related to academic performance, TIPI

openness showed a similar opposite relationship with exaggeration. It is not clear why

exaggeration would relate to agreeableness, especially given the opposite relationship to

reals claiming, i.e. this is not just an agreeable, acquiescent response bias. The lack of

relationship between conscientiousness and exaggeration does provide some divergent

validity, given that other researchers (e.g. Bowling et al., 2016) have found carelessness

related to both lower agreeableness and lower conscientiousness. Exaggeration here does

not fit that pattern. While the courseness of the TIPI means that subtle personality

correlates may not appear here, these results at least suggest that exaggeration is not easily

explained by the five-factor model of personality.

Incremental Validity

Table 2 shows standardized β coefficients (with standard error) of a linear regression

predicting course grades. Note that, because both reals and foils rate are in the same

model, the β for foils rate is exactly the RExI (foils rate controlled for reals rate), showing

that the impact of knowledge exaggeration persists after controlling for measures of

46

cognitive ability (SS GPA, the CRT), metacognition (MSLQ), personality (TIPI), sex and

culture.

Table 2: Regression Model Predicting Course Gradesfrom RExI in Study 1

Predictor β SE p value

Foils Rate (RExI) -.14 .06 .01

Reals Rate .02 .06 .72

Self-Report SS GPA .21 .05 <.001

CRT Correct .15 .05 .009

MSLQ Metacognition .22 .05 <.001

Native English .06 .06 .32

English Culture -.05 .07 .48

Sex (M+) -.06 .05 .25

Extraversion -.09 .05 .08

Agreeableness .07 .05 .16

Conscientiousness .01 .05 .84

Emotional Stability .08 .05 .14

Openness .04 .05 .41

Note: Overall R2 = .21, p < .001. N = 316. Sex codedas binary, Male high. SS GPA: Secondary School GradePoint Average. CRT: Cognitive Reflection Test. MSLQ:Subset of the Motivated Strategies for LearningQuestionnaire. RExI: Residualized Exaggeration Index.

47

Discussion

As proof of concept, science undergraduates, motivated only by their own curiosity,

completed an “Academic Personality” survey in order to get personal feedback. By

gathering knowledge claims of vocabulary unrelated to science grades, exaggeration was

measured using the RExI approach of residualizing foils rate from reals rate.

Students who cared enough about their responses that they wanted to see results still

demonstrated detectable exaggeration, enough to predict lower course grades beyond

measures of cognitive ability, metacognition, personality and demographic controls. This

exaggeration appeared to generalize: The knowledge that was exaggerated (ordinary,

non-science vocabulary) was unrelated to (science) academic performance, yet the tendency

to exaggerate that knowledge was.

While exaggeration related to academic performance but the knowledge being

exaggerated did not, overestimation did not relate to academic performance while the

ability estimated did. This suggests the RExI approach is providing more useful information

than the overestimation index.

These results cannot be explained by carelessness, not just because of study design,

but also because reals and foils claiming showed no common influence, e.g. inattentive

carelessness or response bias. We also see that the cognitive and metacognitive measures

predicted course grades in expected ways, indicating the integrity and validity of the survey

overall.

While this exploratory study was coarse, it confirms that this operationalization of

exaggeration, the RExI, captured a phenomenon worth examining more thoroughly, an

avenue we shall pursue in Study 2.

48

Study 2: Validating the RExI

Will the results of Study 1 replicate with better measures? While a relationship

between exaggeration and lower academic performance was found, the context and

selection was unusual: Only students wanting personality feedback were involved. While

this may have selected for lower inattentive carelessness, it may have also selected students

preoccupied with their self-image. The remaining student studies use a conventional

context for psychological research: undergraduates incentivized to participate in return for

course credit. While certainly not representative of humanity overall (Henrich et al., 2010),

these samples were at least more indicative of North American undergraduates in general.

This context should also now allow for more variance in careless responding as found in

similar samples (e.g. Meade & Craig, 2012), so carelessness is now measured.

To consider self-enhancement as an explanation for exaggeration, Study 2 used a

popular measure, the Narcissistic Personality Inventory (NPI) introduced by Raskin and

Terry (1988). An important quality of the NPI is that it involves forced-choice questions

where one must decide between two alternatives (unlike, say, a Likert question assessing

degree of agreement). This forced-choice format means that the measure is resilient to

response bias or carelessness: Answering uniformly or randomly produces a noisy, middling

score, not misleading variance, because both high and low scores require selective attention

to content. Likert scales, used in many personality measures, when answered inattentively

(e.g. longstrings) or with socially-desirable or other response bias, are more vulnerable to

such signal biases.

To examine self-enhancement in more detail, Study 2 also incorporated the Balanced

Inventory of Desirable Responding (BIDR) developed by Paulhus (1988). This set of three

instruments is designed to capture both interpersonal (impression management) and

intrapersonal (self-deceptive enhancement, self-deceptive denial) aspects of

self-enhancement. The “balanced” in the title refers to half the items being reverse-scored18

18 In psychometric instruments using Likert items to assess some quality (e.g. “From 1 to 10, how agitated

49

in order to compensate for superficial response biases. This 50-50 balance of item scoring

directions also allows for convenient measure of carelessness via longstrings: It is extremely

unlikely that a sincere, attentive respondent would give the same response to more than

half the items.

From the history of foils being used to test false recognition of advertisements, we

noted earlier that exaggeration may be related to recognition memory bias. To test that

relationship, a 100-item battery of words were presented at the start of the survey and then

tested for recognition at the end (with 50 old, and 50 new words). This allowed for testing

both individual differences in memory ability and also memory exaggeration, by applying

the RExI to false (relative to correct) claims of recognition. Note that the cognitive trait of

recognition bias mentioned earlier (Kantner & Lindsay, 2014), is about individual

differences in memory claims, whether valid or not. Memory bias thus represents

confidence (warranted or not) in claiming recognition, which will relate to genuine

recognition ability. This is different from memory exaggeration, which is about false

recognition, uncorrelated with rate of plausible memory claiming.

To get a broader measure of exaggeration, a large (150-item), commonly-used

overclaiming inventory was employed, the Overclaiming Questionnaire (OCQ) developed by

Paulhus et al. (2003). The content of that inventory is based on 1980s American cultural

knowledge (Hirsch Jr et al., 1988), so if knowledge exaggeration does not strictly depend on

the domain of knowledge assessed, as Study 1 suggests, then it should not matter that this

content should be largely irrelevant to academic performance of a 21st century

undergraduate at a Canadian university.

To broaden and generalize the measure of academic performance, for the remaining

studies, the central dependent variable was University of British Columbia (UBC) GPA, a

metric of high ecological validity that reflects not just knowledge, but a broad range of

do you feel?”) a reverse-scored item would assess that quality from the opposite direction, e.g. “From 1 to10, how calm do you feel?”.

50

decisions: cognitive, metacognitive, strategic, social, emotional and more. Like intelligence

or personality, tendency to exaggerate one’s knowledge may represent an individual

difference that affects many life outcomes.

Study 1 showed that exaggeration predicted lower academic performance and also

lower performance on the CRT. This raises the possibility that exaggeration may be simply

an expression of lower cognitive ability in general. To test that, Study 2 employed a broad

measure of cognitive abilities used commercially for evaluating job applicants, the

Wonderlic Personnel Test (E. F. Wonderlic, 1992). Scores on this should predict GPA.

Method

The study was approved by the institutional ethics board and included explicit, active

consent to access student transcript information. As with Study 1, data was gathered via

online survey.

Participants

No longer limited to students from one discipline (like the science students in Study

1) Study 2 considered a wider range of students, in various disciplines and from varied

backgrounds, who happen to be enrolled in an undergraduate psychology course (a popular

elective across disciplines) at UBC, with participants from spring and summer terms.

Students volunteered to complete an online survey for partial course credit. A total of 533

students completed the study, with a median completion time of 44.1 minutes.

Overall, 31% of participants reported their gender as male (69% as female), 50%

reported English as a first language, and 64% as being from Western countries. (Note that

these proportions are also shown in the M (SD) column of Table 3, because these are

dichotomous variables.) 59% of the students were enrolled in an Arts program. While 43%

of the students sampled were in their first year, this distinction had no impact on the

outcomes shown below, nor did their overall number of academic terms.

51

Measures

Academic Performance. To capture academic performance with breadth, reliability

and ecological validity, the study asked participants to grant access to their university

transcripts. From this, GPA was calculated as overall average grade for all courses

completed at the university, including courses in progress, so there were data even for

students new to the university. These grades were represented on a 0 – 100 scale. Note that

this is an improvement over many studies where self-reported GPA is used; official

transcript information avoids measurement error or self-report bias.

Knowledge Exaggeration. The OCQ-150 (Paulhus et al., 2003) was employed here as

a reference for extracting exaggeration measures, given that it has been widely applied in

overclaiming studies. The broader knowledge domain these items query is 1980s American

culture, taken from Hirsch Jr et al. (1988), in ten categories (20th Century Culture Names,

Authors and Characters, Books and Poems, Fine Arts, Historical Names and Events,

Language, Life Sciences, Philosophy, Physical Sciences, Social Science and Law) with each

category having 12 reals and 3 foils. In this application, the potential irrelevance of the

content suits the purpose of establishing that exaggeration generalizes beyond the

domain(s) it is measured on. Claims for each item were solicited with the prompt of

“Please rate how familiar you are with each item” along a Likert scale of 1 : Not at all

familiar to 7 : Very familiar, a format taken from the instrument’s use in overclaiming

studies. The RExI was calculated as the amount (average rating) of foils claiming

residualized on the amount of reals claiming.

Self-Enhancement.

Narcissism. The 40-item dichotomous forced-choice NPI (Raskin & Terry, 1988)

was used to assess narcissism. It has a theoretical range of 0 – 40.

Balanced Inventory of Desirable Responding (BIDR). The BIDR consists of

three 20-item, balanced (equal numbers of forward- and reverse-scored items) instruments:

Impression Management (IM), Self-Deceptive Enhancement (SDE), and Self-Deceptive

52

Denial (SDD). For each item, extreme scores (e.g. 6 or 7 for forward-scored items on the

7-step Likert scale used) count as 1, others as 0, for a range of 0 – 20 for each instrument.

Cognitive Ability. The form A 2000 version of the Wonderlic Personnel Test

(E. F. Wonderlic, 1992), adapted here for online survey administration, was used as a

general measure of cognitive ability. It is a widely-used 12-minute timed test that requires

metacognitive, numerical, graphical and verbal problem-solving skills, and attention to

detail. Scored only as number correct, the maximum possible is 50.

Recognition Memory. Early in the survey, participants were exposed to items via a

Lexical Decision Task (LDT). For each of 100 words, they were asked to categorize each, as

quickly and accurately as possible, as either a genuine word from the English language or

not. Fifty genuine words and fifty pronounceable nonwords were shown. At the end of the

survey (with roughly 20 minutes of other questions in between), using a similar timed

binary classification task, respondents categorized words as to whether they had seen them

before in the LDT, with 50 old and 50 new items (both 50% genuine words) of similar

properties. New and old items were matched for word length.

This protocol gives us two summary measures: one for the rate of claiming old items

(hit rate, analogous to reals claiming for overclaiming), and one for the rate of claiming

new items (false alarm rate, analogous to foils claiming). These two simple measures can

be combined in various ways, e.g. as done to create the RExI. Signal detection theory (the

typical model used in memory research) provides a variety of other combinations, notably

accuracy (roughly the excess of old claims beyond new claims, known as d′) and bias,

representing overall claiming, whether old or new (which correspondingly increases with

recognition). Kantner and Lindsay (2014), in examining their cognitive trait, consider

several kinds of bias measures which will not be considered here. For simplicity, the

correlation table reports memory accuracy (d′) as an indicator of general memory ability,

and also memory exaggeration, as measured by the RExI. For regression models, simple

measures of memory hit rate and false alarm rate are included. These two measures

53

capture all the information (variance) from the memory test without requiring any

understanding of signal detection theory, or arguments about which composite measures to

use. The software used to summarize scores did not easily provide individual item data, so

Cohen’s α was not calculated.

Careless Responding. To identify participants who were not answering sincerely,

two techniques were combined. For the LDT, the fastest decision time with at least average

discrimination was found (454ms) and then used as a cutoff: Participants with median

correct response time faster than this (1.5%) were considered too fast and labelled as

careless. This distinction correlated with LDT discrimination (Hits - False Alarms)

significantly, r(530) = -.69***, 95% CI [-.73, -.64], showing that such rushed responses were

not accurate.

As well, responses that showed an unreasonable consistency on BIDR measures (i.e.

having identical responses to more than half the items, because half are reverse-scored) were

also labeled as careless. In total that came to 12% of the sample, which is consistent with

Meade and Craig (2012) who reported that “approximately 10% – 12% of undergraduates

completing a lengthy survey for course credit were identified as careless responders” (p. 1).

Carelessness was thus a dichotomous variable indicating if a respondent was unreasonable

fast on the LDT and / or gave identical responses to more than half of the items on any of

the three BIDR instruments. Note that this carelessness index can not be meaningfully

compared to any of the BIDR scores because they are both derived from the same items.

Demographics. While participants were asked which gender (or neither) they most

closely identify with, for the purposes of analysis, this was collapsed to a binary variable,

with the larger value indicating male. Participants also reported if English was a first

language for them, and if they had spent 10 or more years living in Western countries (e.g.

English-speaking countries, Western Europe).

54


Measure M (SD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 University GPA .72 (.09) —

2 OCQ RExI .45 (.16) -.27 .78

3 OCQ Foils Rate .22 (.19) -.17 .83 .92

4 OCQ Reals Rate .48 (.16) .09 -.00 .56 .97

5 Sex (M+) .31 (.46) -.12 -.03 .00 .05 —

6 Native English .50 (.50) .11 -.29 -.07 .32 .09 —

7 Western Culture .64 (.48) .14 -.35 -.14 .26 .02 .52 —

8 Careless Responding .12 (.33) -.07 .21 .13 -.09 .00 -.14 -.14 —

9 Wonderlic Score .56 (.12) .29 -.34 -.20 .16 .13 .14 .19 -.21 .83

10 Narcissism .33 (.17) -.18 .25 .21 .01 .11 -.10 -.20 .10 -.18 .84

11 Impression Management .28 (.18) .11 -.08 -.02 .09 -.11 .01 .07 -.26 .14 -.10 .75

12 Self-Deceptive Enhancement .21 (.15) -.01 .01 .11 .19 .04 .02 -.01 -.21 .07 .26 .47 .70

13 Self-Deceptive Denial .33 (.18) .07 -.09 -.01 .10 -.25 .10 .21 -.24 .05 -.16 .66 .42 .76

14 Memory RExI .59 (.15) -.17 .25 .19 -.02 .01 -.09 -.12 .15 -.17 .19 -.04 -.05 -.07 —

15 Memory Accuracy (d′) .37 (.14) .15 -.21 -.14 .05 -.03 .10 .12 -.14 .15 -.16 .07 .05 .12 -.89

Note: N = 530. OCQ: 150-item Overclaiming Questionnaire. RExI: Residualized Exaggeration Index.Cohen’s α (bootstrap approximated for RExI measures) shown in italics on diagonal. M (SD) for range of 0to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see Reporting Conventions).

55

Results

Predictive Validity

This section describes convergent and divergent validity of the RExI based on the

bivariate, zero-order correlations between study variables shown in Table 3. Note again

that, as in all correlation tables in this paper, mean and standard deviation are reported on

ranges standardized to be from 0 to 1, as described in Reporting Conventions.

Exaggeration. The main results of Study 1 are confirmed here, that knowledge

ability exaggeration is related to impaired knowledge (academic) performance, despite the

knowledge being exaggerated having little relevance to performance, as showed by the weak

relationship between real claiming rate and GPA. Exaggeration also showed only moderate

relationships with carelessness, narcissism, lower memory accuracy, and lower cognitive

ability as measured by the Wonderlic test. Altogether, these suggest substantial

discriminant validity, given the higher reliability of these measures compared to Study 1:

Exaggeration does not appear to be merely a side-effect of these other possible

explanations. This distinction is also shown via the regression model presented later.

Despite being related to narcissism, exaggeration showed no significant relationship

with impression management or self-deceptive enhancement, and only a slight relationship

with lower self-deceptive denial. Surprisingly, having English as a first language or a

Western cultural background related to lower exaggeration, which may be an artifact of the

OCQ content being so culturally biased. Memory exaggeration showed similar patterns to

knowledge exaggeration, albeit weaker, without a very strong relationship between the two

measures.

Careless Responding. The ad hoc measure of careless responding showed sensible

relationships with Wonderlic, narcissism and memory measures, suggesting it captured

something relevant, although possibly just situational carelessness, only for this study,

given the lack of relationship with GPA. Note that, being partially derived from response

style on the BIDR, this careless responding measure can’t be meaningfully compared with

56

impression management, self-deceptive enhancement, or self-deceptive denial. The two

components of this carelessness measure, responding too quickly to the LDT and longstring

responding on the BIDR, correlated r(530) = .20***, 95% CI [.12, .28], suggesting only

slight convergence of the two techniques. We see that carelessness did relate to

exaggeration, although not in a dramatic way.

Recognition Memory. Recognition memory accuracy related similarly to both GPA

and Wonderlic, validating the measure used in this study. Notably, memory exaggeration

(RExI) related to both those cognitive ability measures at no less magnitude than did

memory accuracy. Because memory accuracy was calculated (as signal detection theory d′)

on the same data as used for memory exaggeration, here the exaggeration measure is not

completely orthogonal to the ability measure. This is because d′ collapses both ability and

exaggeration into one measure, essentially correcting for guessing. The similar magnitude

of predictive validities raises some questions: If d′ captures both memory ability and

(negative) exaggeration, yet shows effects similar to the RExI (which removes variance

related to ability), this suggests genuine ability may be less relevant than exaggeration in

this context. This is supported by the regression model (below) showing that hit rate

(suggesting correct recognition) does not contribute more than false-alarm rate (controlled

for hit rate) in predicting academic performance.

Self-Enhancement. While narcissism related to both forms of exaggeration,

impression management, self-deceptive enhancement and self-deceptive denial did not. This

may be because those constructs do not relate to exaggeration, or because those

instruments don’t capture them in ways that are relevant here. More detailed analysis

revealed that the Emmons 7-item Exploitiveness / Entitlement subscale of the NPI

(Emmons, 1987) showed similar relationships with GPA (r(528) = -.15***, 95% CI [-.23,

-.06]) and the RExI (r(531) = .23***, 95% CI [.15, .31]), suggesting that facet best

characterizes exaggeration. Also, the Ames et al. (2006) 16-item shortened version of the

NPI related to GPA (r(528) = -.17***, 95% CI [-.25, -.09]) and the RExI (r(531) = .26***,

57

95% CI [.18, .34]) about as well as the 40-item version, allowing for more economy in future

studies.


To disentangle the several relationships shown above, Table 4 presents a linear

regression model, with standardized β coefficients (and standard error) predicting

university GPA. Note, again, that the β for foils rate, now being controlled for reals rate, is

exactly the RExI. Note also that the partial correlations (βs) for foils and reals claiming are

larger than their zero-order correlations shown in Table 3. This indicates a mutual

suppressor effect: The “meaning” of one kind of claim depends on the amount of the other

kind of claim. This highlights the value of considering exaggeration in context and the RExI

analytic strategy to isolate it.

Table 4: Regression Model Predicting GPA from OCQ RExI inStudy 2


OCQ Foils (RExI) -.19 .04 .001

OCQ Reals .17 .05 .003

Wonderlic .22 .04 <.001

Memory False Alarms (RExI) -.10 .04 .06

Memory Hits .07 .05 .20

Narcissism -.05 .04 .25

Impression Management .10 .04 .08

Self-Deceptive Enhancement -.04 .05 .49

Self-Deceptive Denial -.06 .06 .32

Sex (M+) -.15 .04 <.001

Native English .03 .05 .57

Western Culture -.00 .05 .99

Careless Responding .03 .04 .46

Note: Overall R2 = .16, p < .001. N = 530. OCQ: 150-itemOverclaiming Questionnaire. RExI: Residualized ExaggerationIndex.

58

Exaggeration as Distinct Liability. This model helps clarify that, in the relationship

between exaggeration and performance, language and culture are no longer distinct

predictors, once controlling for Wonderlic general cognitive ability, memory performance,

and careless responding. Neither are the self-enhancement measures (NPI or BIDR)

significant predictors in this model, given those controls. This suggests that the behavior of

exaggerating one’s knowledge, even of trivial information, predicts impairment in overall

knowledge (academic) performance that is not simply a side effect of lower cognitive ability,

weaker memory, self-enhancement or carelessness.

Note that the two memory measures, hit rate and false alarm rate, also include some

degree of exaggeration, i.e. unwarranted memory ability claims. Including them in this

model thus reduces the variance attributable to the OCQ RExI; without the memory

measures, the β for the RExI increases slightly. This model, then, is conservative, showing

the effect of knowledge exaggeration after controlling for memory exaggeration. A similar

model using memory exaggeration (without the OCQ but with the other measures), also

shows that memory exaggeration remains a uniquely significant (but weaker) predictor of

GPA. Both kinds of exaggeration predict lower academic performance beyond these

controls.

Discussion

Using more extensive and comprehensive measures, the finding in Study 1 that

knowledge exaggeration (as measured by the RExI) uniquely predicted lower academic

performance was replicated. Memory exaggeration showed a similar, albeit smaller, effect.

Narcissism showed a slight relationship with both kinds of exaggeration.

Recall that my historical review of foil claiming showed that it has been used to

capture self-enhancement (faking, socially-desirable responding), false memory (of unseen

advertisements) and careless responding in surveys. After controlling for those possible

explanations, results here show that exaggeration based on foil claiming still uniquely

59

predicts academic performance. Even after also controlling for simple demographic

variables and general intelligence, exaggeration shows discriminant validity.

The weakening of relationship between knowledge exaggeration and GPA from

zero-order correlation to partial correlation (β) is almost entirely due to adding Wonderlic

score to the model, indicating an overlap in their relationships to academic performance.

The RExI, however, has no obvious relationship with either fluid (problem solving) or

crystallized (acquired, declarative) intelligence as conventionally conceived, so exaggeration

may capture an aspect of g (general intelligence) previously overlooked. Altogether, the

RExI appears to identify an important, distinct predictor of academic performance, worthy

of further exploration.

Curiously, despite the Wonderlic being a professionally-developed, proprietary

instrument widely employed for over 80 years (Wonderlic, 2019), the RExI used here shows

comparable, and independent, validity in predicting an important life outcome, university

GPA. While taking the Wonderlic involves 12 stressful minutes, the items used here for

capturing exaggeration, appearing like a casual trivia quiz, showed a median completion

time of 6 minutes.

A car’s racing performance depends on both the power of the engine and the skill of

the driver. Similarly, it appears that, while cognitive ability predicts academic performance

as expected, how skillfully one drives that engine, how one understands their own ability,

also has an influence. Just as taking a corner too fast can slow you down, excessive

self-image of competence appears to impair application of that competence, whether in

knowledge expression or recognition memory.

To get a better understanding of the nature of exaggeration, it is important to further

examine how it is measured and what it relates to.

60

Study 3: Developing Better Measures

Studies 1 and 2 used overclaiming item inventories for assessing exaggeration,

extracting the variance of foils claiming not explained by reals claiming. Study 2 results

showed an improvement (stronger correlation with relevant measures) from a larger

inventory of 150 items. Might it be possible to create a more efficient set of items? This

study explores the construction of an overclaiming type of instrument designed to capture

general knowledge exaggeration. Additionally, this study also served as a pilot test of an

overstatement instrument for measuring knowledge exaggeration.

Overclaiming Instrument Design

Item Format. Like many personality instruments, the OCQ used in Study 2 relied on

a Likert style “7-point scale ranging from 0 (never heard of it) to 6 (know it very well)”

(Paulhus et al., 2003, p. 891). For a new instrument, I kept the wisdom of enumerating the

lowest step at 0 (to connote complete ignorance) but expanded the upper limit to 10, both

to connote mastery, and to capture higher resolution for confident claims. I replaced the

labels at the extremes with No Knowledge and Complete Knowledge to avoid sensory bias

(you may not have heard of it but have seen it a few times) and ambiguity (how well is

“very” well?). This design is similar to the approach proposed by Hodge and Gillespie

(2003), which demonstrated higher reliability than typical Likert scales.

Item Content. As described earlier, the OCQ based its content on 1980s American

cultural knowledge. Even if exaggeration does not strictly depend on knowledge domain,

respondent reactance to test content should be avoided; the obvious cultural bias was noted

by some participants. Considering the typical convenience sample of North American

undergraduates, a more appropriate domain of general knowledge would simply be English

vocabulary, given that a certain level of English skill is required for university entrance.

This domain also serves as a relevant domain for exaggeration, as “vocabulary knowledge

correlates highly with performance on more general measures of intelligence and is

61

commonly viewed as a proxy for IQ” (Marchman & Fernald, 2008, p. F14).

Beyond being a useful general knowledge domain to query, vocabulary items have the

benefit of being well-studied in education and psycholinguistics. This provides a wealth of

theoretical support and metrics for choosing items. Having so many items to choose from

provides the luxury of high standards for selection. With such a rich resource, we can begin

with theoretical principles for selecting a few hundred candidates out of several thousands,

then use empirical testing for selecting tens out of those hundreds. The theory for item

selection used here can be extended to instrument development in other languages.

Item Selection Theory. What makes good items for genuine and false claims of

knowledge? The reals should be indicative of an appropriate range of relevant knowledge,

and the foils should not represent content associated with knowledge. However, foils must

be alluring, without being “damaged” reals, e.g. misspellings or slang. Reals must not be

too easy (making foils appear as incongruent outliers) nor too hard, or they act as

undetectable foils. Ideally, the psychological processes driving item claiming should be

different for reals and foils.

The most challenging of the above criteria is making foils alluring, worth claiming,

without tapping actual knowledge. Cognitive psychology provides inspiration in

dual-process theory (e.g. Frankish, 2010). The shallow, surface, easy processing of cognitive

fluency may trigger false familiarity to make foils more attractive despite their lack of

substance. At the same time, reals should be less fluent so that their primary cause for

claiming would be from a deeper process of recollection. This provides a simple heuristic

for selecting items: Foils should have high fluency but not be related to ability, while reals

should be related to ability yet have low fluency.

A practical issue arises: Is there an objective way to assess fluency of a word, even if

not real? Psycholinguistics research has shown that both phonotactic frequency and

neighborhood density influence wordlikeness (e.g. Frisch et al., 2000; Bailey & Hahn, 2001).

For our purposes, that means that pseudowords with more common sequences of letters

62

and/or sounds, and having structures similar to existing words, are perceived to be more

like actual words. A common measure of neighborhood density is OLD20; this averages the

orthographic Levenshtein edit distance between a letter string and the 20 closest neighbors

within a given corpus (if you need to know), and it can serve as a simple assessment of

un-wordlikeness (Yarkoni et al., 2008). Longer words are also less fluent, and shorter words

too easy, so word lengths were restricted to be from 6 to 8 characters.

As a foundational resource, the English Lexicon Project (ELP) provides free access to

behavioral data on over 40 thousand each of English words and pseudowords. The

behavioral data includes speed and accuracy information from LDT measures of each item

(word or pseudoword), in which participants had to quickly and accurately decide if a given

letter string was a genuine word or not. This provides both decision time and accuracy

measures for each word, whether real or not. This database can provide some confirmation

of our fluency heuristic: Controlling for word length, OLD20 (unusualness) predicts LDT

accuracy negatively for real words (β = −.53∗∗∗) but positively for fake words (β = .41∗∗∗).

This suggests that fluency makes real words easier and fake words harder to distinguish.

The reals still need to be of appropriate difficulty, so we can use frequency of occurrence

(i.e. how likely an English reader has encountered it) as a proxy for difficulty, as more

common words are more well-known.

Item Selection Process. The selection process for potential real items from the 40

thousand in the ELP database was as follows:

• Items were filtered to remove capitalizations (proper nouns), punctuation

(contractions), common obscenities and extremely uncommon words.

• OLD20 was calculated for each item.

• Items were filtered to be 6 to 8 characters in length, not among the 20% most

common words (too easy), in the top 10% for OLD20 (unusual construction), in the slowest

40% of LDT response time (lower fluency), and not compound words (e.g. “soapbox” or

“bedroom”).

63

The selection process for potential foil items from the 40 thousand in the ELP

database was as follows:

• Items were filtered to remove capitalizations and punctuation.

• OLD20 is calculated for each item.

• Items were filtered to be 6 to 8 characters in length, having more than 1 letter

difference from a real word (i.e. not typos), and being below median for LDT Accuracy (not

obviously fake).

Candidate foils were not filtered by OLD20 because that measure in ELP pseudowords

was contaminated by having common suffixes, e.g. “ing”, “er”, “ed”. It should be noted that

their pseudowords were machine-generated, mostly by slight alterations of real words, so

that set included some rare real words, some common typos of real words, and some that

are very awkward to pronounce, i.e. low fluency. In general, the suitability of pseudowords

found was so weak that it was hard to find even 60 candidates out of the initial 40

thousand.

Generating Foils. The pseudowords in the ELP database were created for a

different purpose, where being obviously fake was useful. To remedy this gap, a novel

algorithm for generating fluent pseudowords was created. This involved taking common,

fluent short words as input, then “stitching” them together by finding overlapping sequences

of 2 or 3 letters. For example, “bear”, “area”, and “read” can be stitched to create the fluent

pseudoword “bearead”. In contrast, “judgmenp” is an example of a typically less-fluent ELP

pseudoword, as demonstrated by the unusual letter sequences “dgm” and “enp”.

Final Candidate Selection. For each of lengths 6, 7, and 8, 20 ELP pseudowords and

10 “stitched” pseudowords were selected, giving a total of 90 candidate foils, each manually

checked by Internet search to be very unlikely to have any plausible meaning. To avoid

possible misinterpretation as a proper noun, word items in this instrument are always

presented in lower-case letters. These 90 candidate foils were combined with 180 candidate

reals, 60 of each length, for a total of 270 items.

64

Overstatement Instrument Design

To match the content of the overclaiming instrument, a simple overstatement test

based on vocabulary was also tested in this study. Any multiple-choice test can be

converted into an overstatement test simply by adding a non-claiming option to each

question, e.g. “Don’t Know”. This provides a degree of freedom between correct and

incorrect answers because the test-taker is not forced into the dichotomy of being right or

wrong. Items were selected from the multiple-choice Vocabulary Size Test (VST) to capture

college-level knowledge. The full VST involves 140 items to assess vocabulary sizes between

1000 and 14000 words, ten at each thousand level (Beglar, 2010). From that, 60 words were

selected representing vocabularies of 7000 to 12000 words. Those items were presented with

the given four answer options (1 correct, 3 incorrect, in randomized order) plus a “Don’t

Know” non-claiming option, to create an overstatement format. Care was taken to ensure

that none of the VST words were included in the candidate overclaiming real words.

Method

The purpose of this study was to assess the suitability of potential overclaiming items

and to pilot test the overstatement version of the VST. To reach a more diverse subject

pool, participants were recruited via Amazon Mechanical Turk (MTurk). That system

allows for “workers” to be paid by “requesters” once their work is approved, otherwise they

are rejected. Excessive rejections look bad for both workers and requesters, but pressure to

reject is high because many workers are careless and requesters have limited budgets. A

special design was employed to avoid this tension. The description of the task (prior to

accepting the task and on the first page, once accepted) made it clear that a college level of

English vocabulary was required to complete the task, and if not demonstrated, the task

would halt, without pay or rejection. Dynamic processing was added to the survey to

determine if vocabulary questions (at the outset) were answered correctly above chance

without cheating, and if not, the task terminated with explanation. This prevented workers

65

from wasting time completing a task without pay, prevented paying for careless work, and

allowed for immediate automatic acceptance and paying of tasks once completed.

Participants

The dynamic exclusion of participants not scoring above chance on vocabulary

questions proved very effective in restricting careless work and minimizing costs, as only

37% of attempts (151 participants) demonstrated vocabulary knowledge above chance.

Those respondents reported ages having mean (standard deviation) of 39.34 (12.40) years,

ranging from 21 to 74. 93% reported English as a first language, 95% said they were more

familiar with American rather than British English, and 82% reported post-secondary or

higher education. Gender and race information was collected indirectly, showing

participants to be roughly equally male and female, and about 80% white (Caucasian).

Measures

Demographics. The opening questions involved the demographic data reported

above.

Vocabulary Overstatement and Overclaiming. After the opening questions,

participants saw three blocks each of 20 multiple-choice (overstatement) VST items and 90

(60 real, 30 foil) overclaiming items. To avoid cheating (e.g. looking up a word), each

overclaiming item had a time limit of 10 seconds and each VST item had a time limit of 20

seconds. If a VST block did not have at least 9 out of 20 correct answers (i.e. less than 5%

chance if random), the task terminated, as described above.

Narcissism and Memory. Following those vocabulary tests was the 40-item NPI and

then a 120-item recognition memory test based on the 60 multiple-choice vocabulary items.

The narcissism and memory measures were included as possible future selection metrics,

and to confirm validity of the survey.

66

Results

Table 5 shows correlations of study measures. Lack of overall carelessness was

demonstrated in overclaiming rates: Foils rate was unrelated to reals rate (r(149) = -.03,

95% CI [-.19, .13]), showing that item claiming was not influenced by a common factor like

careless responding or response bias. This also means that the overclaiming RExI reported

here is effectively identical to the foils claiming rate.


Measure M (SD) 1 2 3 4 5 6 7 8 9

1 Overclaiming RExI .15 (.18) .96

2 Overclaiming Reals Rate .67 (.23) .00 .98

3 Overstatement RExI .57 (.21) .46 .12 .67

4 Overstatement Correct .64 (.20) -.26 .64 -.00 .77

5 Narcissism .29 (.23) .29 -.28 .09 -.32 .93

6 Memory RExI .22 (.19) .42 -.17 .25 -.26 .30 —

7 Memory Accuracy .62 (.21) -.43 .38 -.23 .48 -.40 -.81 —

8 Age .35 (.23) -.36 .30 -.17 .21 -.34 -.08 .18 —

9 Native English .93 (.26) -.12 .10 -.10 .08 -.03 .07 .05 .15 —

10 Post-Secondary Education .82 (.38) .06 .06 .11 .09 .17 -.14 .08 -.22 -.06

Note: N = 151. RExI: Residualized Exaggeration Index. Cohen’s α (bootstrapapproximated for RExI measures) shown in italics on diagonal. M (SD) for rangeof 0 to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see ReportingConventions).

The two novel exaggeration measures showed a moderate correlation, suggesting some

overlap but also some distinction despite both involving similar content. This is supported

by the overstatement exaggeration not correlating as strongly with narcissism, memory

measures or age. Note that reals claiming in the overclaiming inventory related to number

correct in the overstatement test, confirming content similarity and that reals claiming

captured relevant knowledge.

Age correlated negatively with narcissism and both measures of exaggeration, while

being positively related to vocabulary score and recognition memory accuracy (Hits - False

67

Alarms; r(149) = .18*, 95% CI [.02, .33]), suggesting some benefits of maturity.

Discussion

These preliminary results suggest that the overclaiming items were performing as

expected and should provide a suitable resource for more efficient measures in future

studies.

From these 270 candidate items, reals were selected if claiming them correlated

r >= .40 with correct vocabulary answers and r <= −.19 with answered but incorrect

vocabulary answers, thus capturing genuine knowledge and low exaggeration. Similarly,

foils were selected if claiming them correlated r >= .40 with answered but incorrect

vocabulary answers and r <= −.20 with responding “Don’t Know” on vocabulary answers,

thus capturing exaggerated knowledge and unwillingness to admit ignorance. As expected

by the fluency heuristic, empirically selected reals had, on average, higher OLD20 than

selected foils (.75*** [.43, 1.08], t(17.43) = 4.85; d = 1.91), confirming that higher fluency

for foils is optimal.

This process yielded 20 reals and 10 foils for a new overclaiming inventory designated

the Vocabulary Knowledge Exaggeration (VoKE) inventory. The proportion of reals to foils

in an overclaiming test is largely a subjective choice, balancing the need for an adequate

number of foils for capturing exaggeration with the need to have an adequate number of

reals to keep the test reasonable: Too many unrecognizable items could raise suspicion or

doubt. While some researchers have used item sets of only foils (e.g. Phillips & Clancy,

1972), a common assumption in the overclaiming literature is to have at least 50% reals for

credibility. Williams et al. (2002) compared similar tests with 20% and 50% foils and found

no differences in overall claiming nor the differential between reals and foils claiming.

Study 4 will compare this new VoKE item set and the VST overstatement test against

OCQ items (as used in Study 2) for capturing knowledge exaggeration and predicting


68

Study 4: Robustness of the RExI

Study 2 showed knowledge exaggeration to be a distinct phenomenon, independently

predicting academic performance beyond recognition memory performance, cognitive

ability, self-enhancement, carelessness, and basic demographic variables. Using the

instruments developed in Study 3, Study 4 tests their efficacy and also considers other

possible personality correlates.

Method

The same sampling approach as in Study 2 was used, surveying students enrolled in

any program who happen to be taking a psychology undergraduate course at UBC, but this

study used a cohort from the following year. The Wonderlic was not used to test IQ this

time, as there would have been some overlap in samples (e.g. students persisting in

psychology) and to allow time for including other questions. Again, consent to access

transcript information was explicitly granted at the start of the survey in a process

approved by the institutional ethics review board.

Participants

For this sample, 710 complete survey results were collected, with roughly 25% of

those respondents self-identifying (dichotomously) as male (some missing data, see Data

Merging Loss Effects), 51% claiming English as a first language, and 60% reporting having

lived 10 or more years in Western countries. 71% were in an Arts program, and 60% were

in their first or second year.

Measures

Academic Performance: GPA. As before, overall cumulative average of all courses

taken at UBC was obtained from institutional records after getting explicit consent from

students.

69

Demographics. As in previous studies, respondents were asked at what age they

became fluent in English (then coded as 1 if English was a first language, 0 otherwise), and

the number of years they have lived in Western countries (i.e. English-speaking or Western

Europe), then coded as 1 if 10 or more years, 0 otherwise.

To avoid stereotype threat for the ability tests in the survey, gender was not queried

in the survey itself, but merged with earlier data from a survey run for all participants in

the department’s subject pool. This process, unfortunately, lost some records due to errors

in merging unique identifiers. This loss is due largely to students not correctly recalling a

unique identifier they create to link records, a process which created data missing not

completely at random, meaning that such loss cannot be compensated for. See Data

Merging Loss Effects.

RExI From Vocabulary Size Test (VST) Overstatement. As done in Study 3, a

60-item multiple-choice word-meaning test was adapted from Beglar (2010)’s VST, using

levels 7 through 12, i.e. assuming participants had vocabulary sizes of between 7 and 12

thousand words. This became an overstatement test by adding a non-claiming option

labeled Don’t Know. Like Study 3, to discourage looking up answers, each vocabulary

question had a time limit of 20 seconds. Questions lost due to timeout were not counted as

active incompetence.

Because having a non-claiming option (to avoid answering a question) allows a degree

of freedom between answering correctly and answering incorrectly, a RExI is calculated on

an overstatement test by regressing the number answered incorrectly on the number

correct, and taking the residuals. This captures variance in imagined ability unrelated to

actual ability, providing two orthogonal measures, number correct and a RExI, thus creating

several opportunities:

• The number correct on this test is a simple, conventional measure of vocabulary

knowledge following a common, widely accepted technique.

• The knowledge tapped by this test should be at least somewhat relevant to overall

70


• Adapting it to also derive a RExI allows a direct comparison of the predictive value

of competence vs exaggeration.

• The knowledge domain of this test is similar to the VoKE instrument developed in

Study 3 which uses an overclaiming inventory, so overstatement and overclaiming

approaches to measuring exaggeration can be roughly compared.

RExI From Overclaiming Inventories. A 60-item version (using knowledge domains

of “Historical Names and Events”, “Books, Stories and Poems”, “20th Century Figures”, and

“Sciences”, selected by the originators, each with 10 reals and 5 foils) of the OCQ used in

Study 2 was supplemented with the domain “English Words”, the 30 items of the VoKE

developed in Study 3, which had the same ratio of 13 foils. Item presentation was

randomized, with the knowledge domain presented with each item. Participants were

prompted with “Please rate your knowledge of each item for the given topic” with responses

recorded on a 0 – 10 scale from No Knowledge to Complete Knowledge. Intermingling the

OCQ and VoKE items allowed for a direct comparison of the two item sets, eliminating

method or order effects. Each item here had a time limit of 10 seconds to discourage

looking up answers. Timed out responses were ignored.

Cognitive Reflection Test (CRT). The same set of 7 word problems testing reflective

thinking on math problems (Toplak et al., 2014) used in Study 1 was employed again here

as a cognitive ability control measure. No time limits were imposed for this or any

subsequent questions.

Self-Enhancement. In Study 2 we saw that narcissism related to exaggeration and

predicted GPA. This study focused on more specific aspects of a self-enhancing personality

that may relate to either exaggeration or academic performance. For consistency,

personality measures other than narcissism (both entitlement instruments, intellectual

humility, need for cognition) were scored via Likert rating of “how true the statement is for

you, in general” ranging from Not at all True (0) to Completely True (10).

71

Overplacement. Overplacement is an interpersonal form of overconfidence often

called the better-than-average effect. After completing all of the VST, OCT items, and CRT

problems, participants were asked “Compared to the average student doing these tests, how

well do you think you performed?” on a scale of 0 (Far Below Average) to 5 (Average) to

10 (Far Above Average). The scores from those three ability tests (with reals claiming

minus foils claiming serving as score for the overclaiming inventories) were each

standardized then averaged to serve as an overall general ability measure. That average

was then subtracted from the standardized self rating to this question to capture

overplacement, i.e. how much greater relative rating of ability was than relative

performance. The standardization of scores before taking the difference removes the main

effect of overplacement (i.e. that people in general see themselves as better than unknown

others) in order to examine individual differences.

Narcissism. Because the 16-item version of the NPI proposed by Ames et al.

(2006) proved as predictive as the 40-item version in Study 2, for economy, that version was

employed here, providing scores in a 0 – 16 range, because each item is binary forced-choice.

Entitlement. A closer examination of the 40-item narcissism measure in Study 2

found that predictive correlations of similar size were found with the Exploitiveness /

Entitlement facet as identified by Emmons (1987). As a general measure of entitlement,

Study 4 employed the 9-item Psychological Sense of Entitlement (PES) developed by

Campbell, Bonacci, et al. (2004) which has been found to capture a single factor and show

some discrimination from NPI Entitlement with no link to social desirability. This

instrument was also found to identify a slightly less pathological variant than the NPI

(Pryor et al., 2008). For a more domain-specific measure of entitlement, Study 4 also used

the 8-item Academic Entitlement Questionnaire (AEQ) developed by Kopp et al. (2011),

with the expectation it may be more relevant to this sample and context. This instrument

contained items more applicable to students, such as “Because I pay tuition, I deserve

passing grades.”

72

Intellectual Humility. Krumrei-Mancuso and Rouse (2016) developed the

22-item Comprehensive Intellectual Humility Scale (CIHS) used here, which “measures 4

distinct but intercorrelated aspects of intellectual humility, including independence of

intellect and ego, openness to revising one’s viewpoint, respect for others’ viewpoints, and

lack of intellectual overconfidence” (p.209). Although using a simpler measure, Deffler et al.

(2016) compared intellectual humility with both overclaiming and recognition memory,

finding “participants who were higher in intellectual humility more accurately discriminated

between real and bogus items on the OCQ” (p. 258) and “more accurately distinguished

old from new items” (p. 257), suggesting this personality trait may relate to both

exaggeration and memory.

Need For Cognition. Leary et al. (2017) reported a relationship between intellectual

humility and need for cognition, so the same measure was used here, the 18-item Need for

Cognition Scale (NFCS) developed by Cacioppo et al. (1984). Woo et al. (2007) also reports

a relationship between overclaiming and need for cognition.

Recognition Memory. Using the 60 items (both reals and foils) presented at the

start of the study in the OCQ as old items, 60 more new items (fitting the same knowledge

domains but chosen to be at least as recognizable) were added to create a 120-item

recognition memory test. Responses were a simple Yes/No decision of whether the item

had been seen earlier in the study. By residualizing false alarm rate predicted by hit rate, a

RExI was calculated to indicate memory exaggeration for comparison with other measures

of knowledge exaggeration. Using standard signal detection techniques, a d′ measure of

accuracy was also calculated. Note that, while d′ is a common measure of memory

competence, it will not be orthogonal to exaggeration here.

Careless Responding: Rushing. Wood et al. (2017) investigated the relationship

between item response time (measured in spi : seconds per item) and found “a sharp

drop-off in response consistency for most inventories at approximately 1 spi” (p. 462), after

aggregating response times and calculating an average spi. While those authors “strongly

73

recommend . . . the automatic removal of participants who complete survey instruments at

rates faster than 1 spi” this creates the dilemma of whether to keep participants that

achieve a 1.1 spi average by occasional very slow responses. That approach also discards

data based on an a priori judgement, biasing the sample and losing variance that may be

relevant, as it is here. Instead, a dynamic, interactive approach to measure and control

response times was developed for this study.

The platform used for collecting data was Qualtrics (www.qualtrics.com) which

presents questions online via web browser. The default presentation requires users to select

answers then move their mouse to the bottom right of the screen to click an arrow icon to

advance. For the hundreds of items presented in surveys like the current study, this can

involve hundreds of wrist and finger movements just to advance through the study, which

can become quite tedious. To ease such burden on the participant, custom JavaScript code

was implemented to create automatic progression to the next in a series of questions once a

selection was made. A side-effect of such facilitation is that it becomes easier to respond

carelessly. This may seem to be a liability but is also an opportunity.

To capitalize on this, the JavaScript code also recorded response time and blocked

responses below a certain threshold: 1 second for most items, but 0.5 seconds for the

single-word recognition tasks of overclaiming and memory. Response attempts below

threshold generated the warning “You appear to be responding faster than humanly

possible. Be careful!” which required clicking “OK” then continuing with the question. This

was effectively a real-time penalty for impatient responding that allowed respondents to

continue. Each such rushed response was counted, and the distribution found to be highly

skewed, i.e. modal zero with a steep drop-off and long tail. The long tail was probably

because the warnings can be turned off by a browser setting (after the first few), so

intentionally hasty users could continue without repeated warnings. In such cases,

responses below threshold were still blocked, slightly thwarting their progress. This

protocol effectively allowed participants to express a range of impatience, but only with

https://www.qualtrics.com

74

conscious effort. Because of this highly-skewed distribution, rushed responding was

calculated as the square root of number of rushed responses, to minimize effects of outliers.

This can be seen as a measure of motivated carelessness or contempt because respondents

repeatedly ignore warnings and delays to continue rushing.

Foil Delay. Having individual item response times allowed for examination of how

respondents considered foils relative to reals. The foil delay measure used here is an

individual’s median response time for reals (in an overclaiming inventory) subtracted from

their median response time for foils. This indicates individual differences in processing foil

vs real items, i.e. how much longer one takes responding to foils relative to reals.

Data Lost By Error. Some data were lost due to mistakes in implementation. The

18 NFCS items were inadvertently duplicated into the spaces where the AEQ and PES items

were to go for the first 174 respondents, thus limiting N for those two measures to 536.

Because this loss should be unrelated to which respondents were affected, it can be

considered completely at random, allowing for statistical compensation in the regression

models described later.

Results

Some unexpected outcomes created serendipitous opportunities. Problems in merging

data sets revealed unexpected patterns of carelessness. Errors in survey implementation left

out some data collection but also created opportunities to compare rushed responding with

consistency. Finally, having per-item response time data allowed examination of

overclaiming behavior in more detail, as discussed in Foil Delay.

75


Measure M (SD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

1 University GPA .72 (.09) —

2 VoKE RExI .29 (.15) -.22 .75

3 VoKE Foils .16 (.18) -.15 .93 .89

4 VoKE Reals .41 (.22) .14 -.00 .37 .91

5 OCQ RExI .39 (.16) -.20 .64 .52 -.20 .68

6 OCQ Foils .12 (.14) -.13 .67 .72 .25 .79 .91

7 OCQ Reals .32 (.15) .05 .27 .50 .67 -.00 .62 .91

8 VST RExI .35 (.13) -.19 .32 .33 .10 .32 .37 .20 .78

9 VST Correct .66 (.15) .09 -.17 .07 .60 -.36 -.06 .35 .00 .89

10 CRT Correct .44 (.30) .23 -.15 -.12 .05 -.08 -.04 .03 -.16 .03 .74

11 CRT Intuitive .36 (.25) -.14 .09 .09 .02 .01 .00 -.00 .12 .06 -.82 .57

12 Memory RExI .16 (.14) -.21 .37 .33 -.05 .47 .41 .06 .35 -.18 -.19 .06 .87

13 Memory Accuracy (d′) .49 (.14) .29 -.45 -.29 .35 -.55 -.30 .22 -.24 .38 .26 -.13 -.69 —

14 Native English .51 (.50) .00 -.17 -.05 .30 -.36 -.17 .18 -.02 .39 -.16 .22 -.21 .26 —

15 Western Culture .60 (.49) .07 -.23 -.10 .29 -.36 -.21 .12 -.03 .37 -.12 .17 -.26 .30 .52 —

16 NPI-16 .24 (.17) -.08 .16 .11 -.09 .20 .17 .03 .17 -.14 -.07 .02 .17 -.22 -.06 -.10 .67

17 NFCS .56 (.13) .10 -.09 -.00 .22 -.14 .05 .26 -.00 .13 .17 -.12 -.10 .22 .08 .05 .07 .85

18 AEQ .29 (.18) -.22 .23 .18 -.09 .29 .24 .01 .17 -.19 -.14 .02 .31 -.31 -.16 -.18 .28 -.36 .86

19 PES .33 (.19) -.10 .17 .09 -.18 .26 .19 -.02 .12 -.19 -.05 -.02 .20 -.26 -.19 -.26 .47 -.17 .65 .84

20 CIHS .67 (.11) .11 -.22 -.14 .19 -.28 -.18 .07 -.16 .21 .14 -.03 -.28 .31 .11 .17 -.28 .34 -.45 -.42 .85

21 Overplacement .40 (.14) -.12 .30 .18 -.24 .36 .24 -.08 .16 -.36 -.08 .06 .16 -.26 -.12 -.18 .22 -.02 .17 .21 -.13 —

22 Rushed Responding .07 (.11) -.18 .25 .17 -.18 .38 .25 -.09 .21 -.30 -.20 .03 .44 -.44 -.17 -.11 .24 -.11 .32 .23 -.35 .22 —

23 Foil Delay .42 (.14) -.10 .36 .37 .09 .27 .35 .23 .15 -.08 -.00 -.03 .14 -.16 -.10 -.12 .08 .06 .06 .07 -.04 .05 .02

Note: N = 710, except AEQ and PES where N = 536. RExI: Residualised Exaggeration Index. VoKE: Vocabulary Knowledge Exaggeration.OCQ: 60-item Overclaiming Questionnaire. VST: Vocabulary Size Test. CRT: Cognitive Reflection Test. NPI: Narcissistic Personality Inventory.NFCS: Need For Cognition Scale. AEQ: Academic Entitlement Questionnaire. PES Psychological Entitlement Scale. CIHS: ComprehensiveIntellectual Humility Scale. RExI: Residualized Exaggeration Index. Cohen’s α (bootstrap approximated for RExI measures) shown in italics ondiagonal. M (SD) for range of 0 to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see Reporting Conventions).

76

Data Merging Loss Effects. As mentioned above, sex data were intended to be

added to this survey data by linking with screening data collected by the department for all

researchers using this subject pool. Data were linked by having students generate a

(mostly) unique identification code, with similar instructions given during both data

collections. Inevitably, some codes don’t match because students don’t follow the

instructions the same way each time, resulting in some records not being linkable. For this

sample, that accounted for 12% of records gathered that had completed the study.

Comparing merged and unmerged records showed several significant relationships:

Students whose records did not merge had, on average, lower GPA (by -2.19* [-4.04, -.33],

t(109.88) = -2.33; d = -0.25, in percentage points), and higher overall exaggeration, by

.08*** [.04, .12], t(100.61) = 4.06; d = 0.51 (a unitless measure, but the effect size is

important). Non-merged records also showed lower memory and vocabulary performance,

more overconfidence and rushed responding, and had personalities with more narcissism

and academic entitlement, and less intellectual humility.

Consequently, none of that screening data is included in the analyses reported here,

including the measure of sex as control. Given that sex showed no relationship with RExI

measures in previous studies, it is unlikely that any relationship would have been found

here. This merging loss, however, suggests that the process of asking participants to

generate their own unique identification code (e.g. in order to preserve anonymity when

linking data sets) carries a cost of losing data and potentially significantly compromising

data quality.

Predictive Validity

Table 6 presents zero-order correlations between study variables. Table 7, as a

redundant convenience, summarizes significant correlates with RExI measures, as well as

with all four RExIs combined with equal weighting to show aggregate effects.

77

Table 7: Study 4 RExI Correlates, Selected from Table 6

Measure GPA VoKE OCQ VST Memory All

VoKE RExI -.22

OCQ RExI -.20 .64

VST RExI -.19 .32 .32

Memory RExI -.21 .37 .47 .35

All RExI -.27 .78 .81 .66 .73

CRT Correct .23 -.15 -.08 -.16 -.19 -.20

VST Correct .09 -.17 -.36 .00 -.18 -.24

Memory Accuracy (d′) .29 -.45 -.55 -.24 -.69 -.65

Narcissism -.08 .16 .20 .17 .17 .23

Need For Cognition .10 -.09 -.14 -.00 -.10 -.11

Academic Entitlement -.22 .23 .29 .17 .31 .33

Psychological Entitlement -.10 .17 .26 .12 .20 .25

Intellectual Humility .11 -.22 -.28 -.16 -.28 -.31

Overplacement -.12 .30 .36 .16 .16 .33

Rushed Responding -.18 .25 .38 .21 .44 .43

Foil Delay -.10 .36 .27 .15 .14 .31

Note: N = 710, except Entitlement measures where N = 536. RExI:Residualised Exaggeration Index. VoKE: Vocabulary KnowledgeExaggeration. OCQ: 60-item Overclaiming Questionnaire. VST: VocabularySize Test. CRT: Cognitive Reflection Test. RExI: Residualized ExaggerationIndex. Cohen’s α (bootstrap approximated for RExI measures) shown initalics on diagonal. M (SD) for range of 0 to 1. Probabilities shown as p >=.05, p < .05, p < .01 (see Reporting Conventions).

Exaggeration. As summarized in Table 7, all four exaggeration measures behave

similarly, always in opposition with academic performance, yet also with some distinction

from each other. The one exception is Need For Cognition which does not correlate with

exaggeration on the VST. Note that memory exaggeration and VST overstatement both

correlate with foil delay measured on the overclaiming instruments, suggesting that the

cognitive allocation indicated by foil delay reflects exaggeration in general. While the

overstatement method shows weaker correlations with personality measures, it relates to

academic performance comparably with the other formats and almost not at all to the

language and culture controls (see Table 6), suggesting that an overstatement approach to

78

exaggeration may be “cleaner” in some ways. Recall that the VST RExI is (by design)

completely unrelated to the number correct on that test so that both measures from the

same test can be used as unique predictors. In this case, exaggeration is a stronger

predictor of academic performance than knowledge.

Personality. Study 2 showed that narcissism had a significantly negative impact on

academic performance, and we see that result replicated here, although diminished,

possibly because only 16 of the 40 items were used. Recall that Study 2 also found the

entitlement facet of the NPI to be most predictive, which is why entitlement measures were

included in this study, and we see that academic entitlement was more predictive of GPA

and exaggeration, while general entitlement was still significant.

All RExI measures also showed small but consistent relationships with personality

measures NPI, CIHS, AEQ, PES and overplacement, as well as with the rushed responding

behavioral measure. Note that all these measures also predict academic performance,

memory performance, and VST Correct (and mostly CRT Correct) in the opposite direction,

suggesting that exaggeration is a costly, maladaptive form of “self-enhancement”.

While the RExI measures vary somewhat in magnitude of association, it appears that

narcissism, entitlement, overconfidence, impatience, and lower intellectual humility confirm

a consistent personality profile of exaggeration.

Memory. Memory accuracy again relates positively to academic performance and

negatively to exaggeration. The stronger overlap for the overclaiming inventory items (OCQ

and VoKE) than for the VST overstatement test suggests more susceptibility to recognition

bias when using an overclaiming inventory.

Like the RExI derived from OCQ, VoKE and VST items, the RExI calculated on

recognition memory performance shows a similar relationship with GPA, VST Correct and

CRT Correct scores, suggesting the generalizability of the RExI: Memory exaggeration

shows similarities to knowledge exaggeration.

79

Careless Responding. The new measure of carelessness, rushed responding,

correlated significantly with all RExI measures, suggesting impatience is part of

exaggeration. Similar results were found when rushing was counted dichotomously as more

than 2 or 3 warnings; 55% of respondents never rushed a response, and 87% did so only

once or twice over hundreds of items.

Confirmation of Rushed Responding as Careless. Due to an error in

implementing the survey, the 18 items of the NFCS were duplicated in the spots where the

PES and AEQ items were to go for the first 174 respondents. This serendipitously created

the opportunity to use the correlation between answers of these identical item sets as a

check on careless responding, such that the higher the correlation, the more consistent, and

less careless, the response set. Item order was somewhat randomized, so the repetition was

likely not obvious. This measure of consistent responding correlated as expected with

memory accuracy (r(172) = .35***, 95% CI [.21, .47]), and with rushed responding, r(172)

= -.35***, 95% CI [-.48, -.21], similar to findings by Wood et al. (2017) that “found response

times and consistency to be routinely positively associated across inventories” (p. 458).

Despite the bias in the merged prescreen data (described above), an attention check

question in that data also validated the rushed responding measure as indicative of careless

responding, with rushing being related to failing the attention check, r(625) = .39***, 95%

CI [.33, .46].

An advantage of this time-based technique for capturing carelessness is that it does

not require adding extra “bogus” (foil) questions which might alienate or be misunderstood

by the participant, and it provides a continuous behavioral measure to use as a control for

all data rather than an arbitrary cutoff for discarding data. Here we see that rushed

responding shows significant relationships with academic performance, cognitive and

personality measures, indicating a generalized detrimental trait.

Another serendipitous error in survey preparation shone some light on the influence of

careless responding specific to the OCQ items (which show the strongest relationship to

80

rushed responding). Using Microsoft Excel to assemble items, the auto-increment feature

inadvertently changed the domain “20th Century Figures” into “21th”, “22th” through “35th

Century Figures”, so that these items were now effectively absurd. While this only affected

the first 35 responses, no difference in responding (reals rate, foils rate, their sum or

difference) was found.

Exaggeration Methodology. The 30 items of the VoKE did at least as well as the 60

items of the OCQ in predicting GPA, and showed slightly more internal consistency,

suggesting that VoKE items were more efficient and effective at capturing exaggeration,

with less culturally-biased content. As further validation of the item engineering approach,

VoKE reals claiming correlated well with VST correct score and reasonably with GPA (even

more than VST Correct), indicating genuine knowledge was being tapped by those items.

In contrast, reals claiming on the OCQ items did not relate to GPA, showing the academic

irrelevance of that content. We can also note that the mean of VoKE reals claiming was well

more than a standard deviation away from its boundaries and thus was not suffering from

ceiling or floor effects, i.e. item difficulty was neither too easy nor too hard.

This study also resurrected the overstatement approach by modifying a conventional

multiple-choice vocabulary test, the VST. Except for language and culture variables, where

it shows some independence, RExI from the VST shows similar patterns to RExI from the

overclaiming inventories, suggesting that, despite a very different methodology, the

exaggeration index is capturing a similar phenomenon. This contrasts with the

contradictory findings of previous literature, where overclaiming increases with ability while

overestimation (overconfidence) decreases. The RExI approach yields a consistent index.

Note, however, that the overlap between OCQ and VoKE RExI measures is twice what

it is for either with the VST RExI. This suggests some method variance worth further

exploration. As discussed above (and below), there are clearly different advantages and

disadvantages to either methodology, but it is encouraging to see that the RExI extracts

similar information from both, and all RExI measures show similar predictions of overall

81

GPA.

For the VST, it should be noted that, out of 60 questions, the mean (standard

deviation) of number correct was 39.86 (8.76) with a maximum of 59. It may be that

reasonable exaggeration measures are best had from overstatement tests that nobody

scores completely correct on; there needs to be some opportunity for every test taker to

either exaggerate or avoid claiming.

An important advantage of an overstatement approach is that it yields two distinct,

usable measures: ability and exaggeration. Because they are, by definition, orthogonal, the

two zero-order correlations become standardized β coefficients when both measures are

combined to predict academic performance. While the contribution of VST knowledge to

GPA is meager, it is still significant, but more importantly, the VST RExI adds twice the

predictive power, increasing (regression model predicting GPA) R2 from .01* to .04***. If

adding RExI to everyday multiple-choice tests used in education only doubles predictive

power, this simple adjustment to standard testing procedure could have a profound effect

on academic assessment.

Additionally, adding VoKE RExI to that VST model increases R2 even further to

.07***, indicating distinct kinds of validity from the different methods, despite them both

being ostensibly based on the same domain of English vocabulary.

Finally, it is worth noting that the correlations between competence and

incompetence evidence are positive for overclaiming, but negative for overstatement (r(708)

= -.39***, 95% CI [-.45, -.33]), supporting the reasoning behind the RExI formula.

Foil Delay

Having response times for individual items allowed consideration of some cognitive

dynamics of exaggeration. Do people process foils differently than reals, and what might

that imply?

While Study 2 showed that the RExI predicts academic performance beyond cognitive

82

ability, it remained debatable whether this represents evidence of a process unlike typical

conceptions of cognitive ability. Ordinary problem solving takes time to get right. In this

data set, the time taken to answer CRT questions and the quality of answers given correlate

as expected: positively for correct answers (r(708) = .14***, 95% CI [.06, .21]) and

negatively for “intuitive” but incorrect answers, r(708) = -.11**, 95% CI [-.19, -.04]; i.e.

more time thinking gave better answers. It would follow, then, that claiming impossible

knowledge might be the result of shallow, hasty, inadequate processing of information. We

might expect that people who spend more time deliberating about what they do or don’t

know would exaggerate less. Does response time for overclaiming items predict amount of

exaggeration?

Overall, median response times for foils on the overclaiming inventories (OCQ and

VoKE) were slightly less than response times (in seconds) for reals, -.08* [-.16, -.003],

t(1395.38) = -2.04; d = -0.11, but overall median response times for overclaiming did not

relate to RExI on those instruments (r(708) = .05, 95% CI [-.03, .12]) nor to GPA (r(708) =

.00, 95% CI [-.07, .08]). Time spent thinking about overclaiming items in general did not

relate to exaggeration or academic performance.

Nonetheless, the amount of time spent on foils relative to reals (foil delay) tended to

increase with exaggeration amount, as did time spent on foils alone (r(708) = .15***, 95%

CI [.08, .22]), suggesting that exaggeration required more (or at least different) cognitive

effort rather than less. Foil delay also showed a slight detrimental impact on academic and

memory performance. Table 8 shows prediction of overclaiming exaggeration from the

median response time for overclaiming items in general (as control) and foil delay, showing

that higher exaggeration was characterized by (relatively) faster reals claiming and slower

foils claiming.

This is consistent with a transcranial magnetic stimulation study (Amati et al., 2010)

that reported that inhibiting medial prefrontal cortex (MPFC) activity reduced both

response time and foil claiming. Page 269 of that study noted that “regions of the MPFC

83

Table 8: Study 4 Overclaiming Response Times Predicting RExI.


Overclaiming Item Response Time -.01 .04 .72

Foil Delay .35 .04 <.001

Note: Overall R2 = .12, p < .001. N = 710. RExI: ResidualisedExaggeration Index. Foil Delay = Median Foils Time - Median Reals Time.RExI: Residualized Exaggeration Index.

are found to be particularly important for comparing the self to others.” Exaggeration

appears to be less about amount of mental processing than about allocation, less about

degree and more about kind of thinking.


Study 2 showed how exaggeration uniquely predicted academic performance beyond

cognitive ability, carelessness and demographic variables. Do these other exaggeration

measures do the same, and are they distinct from each other?

The shorter OCQ RExI used in this study no longer significantly relates to academic

performance once controlling for memory performance, which may be due to the

inappropriateness of OCQ content or that OCQ items were used for the memory test.

However, the other three exaggeration measures, VoKE overclaiming, VST overstatement,

and recognition memory, all uniquely predict GPA after controlling for CRT cognitive

ability, rushing, and demographics. Keeping memory exaggeration (and performance in

general) as a control, Table 9 shows that the shorter overclaiming inventory of the VoKE

captures exaggeration as uniquely predicting academic performance, beyond other study

measures. Similarly, Table 10 shows that the overstatement test based on the

multiple-choice VST also predicts GPA uniquely. Despite the relatively low correlation

between these two measures of exaggeration, based on different methods, they are both

behaving similarly. As before, in every model, the β for incompetence evidence is exactly

the RExI, once controlling for evidence of competence.

84

Table 9: Regression Model Predicting GPA from VoKE RExI in Study 4

Predictor β SE p value γ Neffective

VoKE RExI -.13 .04 .003 .008 704

VoKE Reals Rate .20 .05 <.001 .01 702

VST Correct -.08 .05 .08 .009 703

CRT Correct .14 .04 <.001 .008 704

Memory RExI -.10 .04 .02 .01 701

Memory Hits .09 .04 .04 .010 702

Native English -.15 .09 .09 .009 703

Western Culture .08 .09 .39 .02 699

Rushed Responding -.03 .04 .43 .010 703

Narcissism -.01 .04 .80 .06 669

Psychological Entitlement .12 .06 .05 .25 532

Need For Cognition -.03 .04 .46 .04 682

Intellectual Humility -.04 .04 .38 .03 685

Academic Entitlement -.22 .06 <.001 .24 538

Overplacement -.01 .04 .72 .01 700

Note: RExI: Residualised Exaggeration Index. VoKE: VocabularyKnowledge Exaggeration. VST: Vocabulary Size Test. CRT: CognitiveReflection Test. Impact of missingness indicated by γ. RExI:Residualized Exaggeration Index.

Missing Data Compensation. As noted above, the two entitlement measures had

significant missing data. The cause of this missingness was simply an oversight in survey

implementation, and so should have no relationship with which respondents were affected.

Thus, this data is missing completely at random, and missingness should be unrelated to

any survey measures. This allows for the use of statistical methods to compensate for

missingness.

The technique used here is full information maximum likelihood (FIML) which

estimates values based on existing relationships within the complete data set, as

recommended by Schafer and Graham (2002). The estimated impact of missingness on the

standard errors (SE) for each predictor is given by γ (which has a maximum value of 1) in

the tables. Note that this impact is minimal for the RExI measures. The software used to

85

Table 10: Regression Model Predicting GPA from VST RExI in Study 4

Predictor β SE p value γ Neffective

VST RExI -.12 .04 .004 .007 705

VST Correct -.06 .04 .19 .007 704

CRT Correct .14 .04 <.001 .007 704

Memory RExI -.09 .04 .04 .01 702

Memory Hits .14 .04 <.001 .01 702

Native English -.12 .09 .16 .007 704

Western Culture .13 .09 .15 .01 700

Rushed Responding -.03 .04 .53 .009 703

Narcissism -.00 .04 .99 .05 671

Psychological Entitlement .11 .06 .08 .24 538

Need For Cognition -.01 .04 .79 .04 684

Intellectual Humility -.03 .04 .50 .03 687

Academic Entitlement -.20 .06 .001 .24 542

Overplacement -.03 .04 .44 .01 701

Note: RExI: Residualised Exaggeration Index. VoKE: VocabularyKnowledge Exaggeration. VST: Vocabulary Size Test. CRT: CognitiveReflection Test. Impact of missingness indicated by γ. RExI:Residualized Exaggeration Index.

compute this is based on Biesanz (2020).

Discussion

Study 4 replicated results of Study 2, but more broadly, using different, innovative

measures of exaggeration. The engineered VoKE overclaiming inventory was more effective

and efficient than OCQ items, and the VST overstatement test demonstrated a different way

to capture exaggeration, one which also showed unique predictive validity with academic

performance.

All exaggeration instruments painted a similar personality portrait of entitlement,

overconfidence, impatience, lack of humility, and narcissism. These personality traits also

revealed themselves to be detrimental to academic success.

The discovery in Study 2 that exaggeration appeared to be distinct from cognitive

86

ability was supported and elaborated by the finding that relatively more (or at least

different) thought was required to exaggerate than less. The phenomenon of exaggeration

appears to fall somewhere between conventional conceptions of intelligence and personality,

as an individual difference that reflects both information processing and self-image.

A novel measure of motivated carelessness, rushed responding, was implemented and

validated. By facilitating quick responses to repeated item formats, participants had

flexibility to respond carefully or hastily, with conscious intention required to persist in

rushed responding. By allowing and measuring this flexibility, this content-free behavioral

indicator of inattention captured more than situational carelessness, as shown by correlates

with other trait measures.

Inadvertently, a weakness was found in the process of linking surveys by use of a

respondent-generated code. Those who succeeded in replicating their code where found to

be significantly different on several important study measures. Had this pattern not been

detected, results could have been significantly limited.

Validation for the theoretical principles behind operationalizing exaggeration as the

RExI was also found: Regardless of whether evidence for competence and incompetence

correlated positively (as it does in overclaiming inventories) or negatively (as it does with

overstatement), the extracted RExI shows similar relationships with other variables.

The RExI appears to reliably capture exaggeration regardless of content and format,

and to demonstrate consistent relationships with cognition and personality variables, while

also showing distinction.

87

General Discussion

The tendency to imagine one’s abilities as greater than they are, to have an

exaggerated sense of self, has seen scientific inquiry for at least a century. The

overstatement test, a name synonymous with the dictionary definition of exaggeration, was

an early approach with great face validity: repeated measurement of someone’s predicted

and actual success with tasks. Unfortunately, that approach faded into obscurity by the

1960s. The similar (but less psychologically direct) approach of overestimation has

considerable recent application as the most popular form of measuring overconfidence

(D. A. Moore & Healy, 2008) but uses difference scores which methodologically correlate

negatively with the ability being overestimated, thus obscuring distinction between actual

and self-perceived competence.

Overclaiming, comparing endorsement of real (genuine) and foil (non-existent) items,

also has a century-long legacy, but has had multiple, inconsistent applications and

interpretations. Foil claiming has been assumed to indicate self-enhancement (faking,

socially-desirable misrepresentation), memory bias, or carelessness, with little or no

research distinguishing these interpretations. For ability overclaiming, both reals and foils

claiming can correlate positively with objectively measured competence, thus also

obscuring distinction between reality and self-perception.

The lack of clarity in these two literatures may be due to a tendency for psychological

research to be theory-driven: Conjectures about hypothesized psychological processes are

empirically tested via operationalizations. This can involve several layers of assumptions

connecting the test outcomes with the theorized constructs. The approach taken in the

present research was somewhat the reverse: A specific concrete behavior (active

incompetence) inspired an analytic procedure, the RExI, which then provided evidence for

understanding the behavior. Thus, the present research does not so much validate a

psychological hypothesis as it does elucidate a common behavior.

This reversed approach (going from evidence to theory instead of theory to evidence)

88

was motivated by existing research on overconfidence and overclaiming, both of which may

have been influenced by confirmation bias created by theoretical assumptions. If

overconfidence is assumed to be detrimental, then the difference scores from overestimation

measures will necessarily confirm that assumption, because they correlate negatively with

the ability being measured. Likewise, foils claiming, because it follows reals claiming and

actual ability, will often correlate positively with self-image and so appear as

self-enhancement. Considering those two literatures together, however, leads to a

contradiction: Overestimation results suggest that an exaggerated sense of competence

decreases with genuine competence, while overclaiming results suggest the opposite. To

rectify this, the RExI was designed to remove influence of the ability being exaggerated.

In retrospect, the RExI use of regression may seem an obvious choice, so why had it

not become the preferred approach before? For much of the last century, linear regression

tools were not as easily accessible, either materially or cognitively. As noted above, simple

techniques (e.g. difference score, raw foils claiming) confirmed hypotheses, undermining

motivation to question further. Methodological habits also form: “Over time, cottage

industries have emerged that take one measure off the shelf, accepting its validity without

further inquiry.” (Krueger & Wright, 2011, p. 20).

Even if regression techniques were used, error in self-image of competence

(exaggeration) was neglected as a distinct psychological phenomenon. For example,

researchers had not “considered foil claiming as a meaningful variable in its own right.”

(Paulhus, 2012, p. 152). Even the correction-for-guessing literature held a theoretical bias

built into the name: “Correction” (of ability scores) assumes that “guessing” is not a

distinct dimension worthy of separate study, but rather an influence to be subsumed as

part of the competence being measured. Much of the value of the RExI may be that it

highlights something that had been hiding in plain sight.

89

Research Outcomes

The goal of the present research was to investigate if excessive self-image of

competence impacted performance, and if so, the nature of that impact. To explore that

question fairly, the RExI was developed to remove influence of competence on self-image.

Validation

The primary research question was whether, after removing the “baby” of measured

ability, would the “bathwater” of the RExI have any meaningful relationship with broader

performance of the ability. This statistical dumpster diving — looking for useful variance

in the residuals after the ostensibly valuable variance had been partialed out — paid off.

Performance Outcomes. In every application of the RExI in this research, whether

the content was 1980s American culture trivia, English vocabulary, or recognition memory,

or whether the format was overclaiming or overstatement, exaggeration of ability predicted

impaired use of that ability, i.e. knowledge (and memory) exaggeration predicted lower

academic performance. While the effect size was fairly small, it was consistent, and (in

Study 2) comparable with, yet distinct from, a commercial general cognitive ability test in

predicting a broad life outcome (GPA).

While the raw behavioral measures used in overclaiming (reals rate and foils rate) or

recognition memory (hit rate, false-alarm rate) tend to correlate positively, but the raw

measures used in overstatement (number correct, number incorrect) correlate negatively,

the RExI showed similar relationships with other measures. This shows that the RExI could

be an effective tool in unifying previous findings about overconfidence and overclaiming.

Because the RExI puts individual perception-performance discrepancy in the context of

group performance (by using regression to interpret discrepancy relative to others in the

sample), this approach could also (somewhat) conceptually unify overestimation and

overplacement, creating a more wholistic measure of overconfidence.

90

Self-Enhancement. Consistent with an intuitive notion of overconfidence or

exaggeration, that one maintains an unrealistically high self-image, RExI measures reliably

showed positive relationships with narcissism, and particularly with a maladaptive facet of

narcissism, entitlement. This was supported by similar relationships with overplacement,

known as the better-than-average effect, and with lower intellectual humility, both

expressions of overconfidence. However, Study 2 showed negligible relationships between

exaggeration and the BIDR measures of Impression Management, Self-Deceptive

Enhancement, and Self-Deceptive Denial. This may either indicate a distinction in the kind

of self-enhancement represented by exaggeration, or inadequacy of the BIDR measures in

this particular application.

Cognitive Bias. Besides academic outcomes, exaggeration measures also reliably

predicted lower performance on cognitive tests, including general intelligence (the

Wonderlic Personnel Test), reasoning ability (the Cognitive Reflection Test), vocabulary

knowledge (VST Correct), and recognition memory accuracy. Memory exaggeration was

related to knowledge exaggeration, but did not fully explain the relationship between

knowledge exaggeration and academic performance. Memory exaggeration was also related

to self-enhancement measures, suggesting that, while false recognition may reflect some

degree of “innocent”, unmotivated errors in information processing, it may also be

influenced by egoic motivational processes. Further research on this interaction between

cognitive function and identity maintenance is warranted.

Carelessness. Given the use of foils claiming for capturing inattentive survey

responses, it was important to ensure that foils claiming in an overclaiming inventory was

not simply an expression of carelessness. While carelessness (measuring by rushing and by

longstrings) showed some relationship with exaggeration, it clearly was not a predominant

explanation.

The rushed responding measure used in Study 4 required an intentional form of

“carelessness”, because hasty participants were repeatedly warned about (and thwarted in)

91

responding too quickly, yet some persisted regardless. This persistent impatience showed

significant relationships with cognitive and personality measures, suggesting it captured

something more trait-like than situational inattentiveness, which is consistent with

carelessness as an indicator of personality (Bowling et al., 2016). Rather than exaggeration

being an artifact of carelessly invalid responding, the relationship between this impatience

and RExI measures broadens the personality profile of exaggeration.

Divergent Validity. Exaggeration showed similar patterns of association with

academic and cognitive performance, narcissism, entitlement, overplacement, intellectual

humility, and carelessness. Yet, these associations were generally fairly weak, and not

enough to explain exaggeration. Several other measures considered here (Big Five

personality, metacognition, growth mindset, Impression Management, Self-Deceptive

Enhancement, Self-Deceptive Denial, Need for Cognition) showed little if any significant

relationship with exaggeration. Thus, while showing that exaggeration is multi-faceted, the

correlates explored in this research do not fully characterize this behavior.

While Study 2 showed that exaggeration predicts academic performance distinctly

from general intelligence, the consistent relationships between the RExI and cognitive

measures suggested that exaggeration may still be due to weak cognition. Response time

analyses (with Foil Delay in Study 4) confirmed prior evidence that exaggeration

represented not hasty or superficial thinking, but rather a different kind of thought,

perhaps one that handicaps potential. Together, this suggests that exaggeration may

represent a novel construct worth further investigation.

Incremental Validity. The above explanations (self-enhancement, cognitive bias,

carelessness) of exaggeration were considered because of previous research linking foils

claiming to these psychological processes. Despite consistent relationships with these

alternative explanations (and others, such as overplacement, intellectual humility,

entitlement, rushing), exaggeration was shown to maintain a significant, distinct

relationship with academic performance. This distinct relationship held after controlling for

92

participant variables of sex, having English as a native language, and having a Western

cultural background. Regression models in Study 4 showed that, even though reasoning

ability (CRT Correct), memory functioning, and academic entitlement remained significant

predictors of GPA (validating those measures), RExI from both overclaiming and

overstatement inventories still uniquely predicted academic performance.

Altogether, by extracting exaggeration of an ability unrelated to the ability itself, the

error of self-imagined competence, the RExI appears to provide a potent new technology for

understanding overconfidence, a trait which has been called “the mother of all psychological

biases” (D. A. Moore, 2018, p. 1).

Limitations

Despite the consistent, convergent patterns of associations, the different RExI

measures showed only small to moderate inter-correlations, and even some unique

predictive validity of GPA. Future research should elucidate convergent and divergent

validity of the RExI when used with various formats and content.

All of the studies presented here are correlational, lacking controlled experimental

interventions, so no inferences of causality can be made. Nonetheless, the decrease in

exaggeration with age noted in Study 3 suggests either some malleability or generational

effects. Future research should explore situational, cultural, and motivational precursors to

exaggeration.

These studies relied on convenience samples of undergraduate students for obvious

reasons, but also because it was possible to collect a powerful, wholistic outcome dependent

variable (DV), university GPA. There may be no other comparable situation where

(mostly) adult humans, from a fairly diverse background (UBC has a large international

student cohort), invest the majority of their efforts toward a (mostly) consistent goal over a

relatively long time, with performance repeatedly measured by coordinated professionals.

University GPA represents a composite of many competencies beyond cognitive ability, such

93

as emotional resilience, social skills, and time, health and mood management. While the

exaggeration measures used here showed relatively small effect sizes, results were robustly

consistent, considering the breadth of the outcome DV.

Exaggeration in the present research identified some costs of excessive self-construal

of ability. It is entirely possible that there are also costs of miserly self-image of

competence, of under-confidence, or not adequately recognizing one’s own capabilities.

This presents a methodological conundrum, given the difficulty of measuring an

competence when it is not expressed. It may also be the case that the psychological

processes leading to inadequate ability self-construal are not simply uni-dimensional, linear,

polar opposites of exaggeration. Regardless, it should be noted that the RExI taps a

unipolar, linear construct that may not be relevant to issues of low self-confidence.

Future Research

The findings presented here set the stage for considerable future research. Previous

findings using overestimation or overclaiming could be clarified by replicating with a RExI

analysis. Different techniques for gathering evidence of competence and volunteered

incompetence could be considered, such as weighting item difficulty for more refined

estimates.

While the current results with academic performance suggest some generalization

across cognitive abilities (e.g. memory, problem-solving), other, non-cognitive domains of

ability should be explored. For example, if the RExI were applied to claims of physical

ability, would this predict impairment in athletic performance, perhaps more fouls, errors,

injuries, or other misjudgments? Further research should explore both other domains of

exaggerated competence (such as in sports or health care), and other outcome variables,

such as job performance indicators, or history of professional errors. Also, the educational

potential noted below should be supported by qualitative research to inform appropriate

and ecologically valid applications.

94

Finally, the findings that recognition memory exaggeration shows similarities with

knowledge exaggeration (including several non-cognitive correlates) may open new avenues

of research in cognitive psychology. The ubiquitous use of signal detection theory in

memory research assumes an information-processing model to distinguish signal (accurate

memory) from noise (response bias). Without contradicting that model, a RExI approach

may provide additional insight into how cognition is situated within the social, egoic

mental processes that influence behavior. When responding to stimuli, even in a laboratory

setting, human response may be conditioned by numerous, broad, external factors. The

above discussion about foil delay suggests that, when deciding whether or not a word had

been seen 20 minutes earlier in a lab study, social comparison may influence their answer,

even though nobody else will ever know. That RExI measures show distinction from

conventional cognitive and personality measures invites more research beyond those

academic silos, integrating both mechanistic information-processing and humanistic social

psychology perspectives.

What is this thing called Exaggeration?

At a minimum, the current research established that more information can be

extracted from a test of knowledge to better predict subsequent knowledge performance, by

residualizing test errors using the RExI technique. But what does it all mean?

Conceptually, exaggeration can be seen as a modeling error. Our self-image is our

self-model, and exaggeration is error in internal representation of competence, just as one’s

body-image can involve error in internal representation of physical self. Someone could

conceivably use a RExI approach to measure body-image error: Gather self-estimates of

weight, strength, or other physical characteristics, then residualize from objective measures.

But what does error in self-image of competence mean? Where does that come from?

The current research explored some of the consequents (e.g. academic performance), but

what are the antecedents? On that point, more evidence is required. In response to

95

possible theory-motivated reasoning that may have led to oversights in overconfidence and

overclaiming literature, I deliberately took a more agnostic approach to exploring

exaggeration: Forget any inferences drawn from measures that may be confounded with

competence, and start from scratch with likely suspects.

Results presented here suggest aspects of self-enhancement, overconfidence and

entitlement, and cognitive liabilities, and I have other (unpublished) research adding

hindsight bias, antagonism, gullibility and cheating to the mix. All of those, however, are

fairly weak correlates, not explanations. Exaggeration may be described as whatever

process leads to such a pattern. Altogether, this would suggest a tendency to favor a

flattering self-image over reality, to avoid the potential humiliation of acknowledging

ignorance or incompetence, and prefer personal over objective truth. From that

perspective, further research should examine whether exaggeration relates to

epistemological tendency to assess truth by evidence, authority, or social pragmatism:

”Social beliefs can be regarded as true if they are true in their consequences, that is, if the

consequences are desirable” (Krueger & Wright, 2011). One may evaluate the “truth” of

one’s competence based on their own authority or what they imagine others believe.

Ultimately, paraphrasing Forrest Gump, exaggeration is as exaggeration does. Just as

a knowledge test can be considered a behavioral construct, an operationalization of the

latent construct of competence, so the RExI can be considered a behavioral construct, an

operationalization of exaggeration. Competence measures (tests) can be influenced by other

factors, such as motivation, fatigue, or anxiety; likewise, exaggeration measures can be

influenced by memory bias or carelessness. Thus, test score represents not just competence

but also motivation, etc., and the RExI represents exaggeration and other influences. While

we might think test score indicates competence, it really only counts correct answers,

however obtained. Similarly, a RExI score suggests exaggeration, but just indicates error

rate controlled for success rate. Thus, competence is defined by its imperfect manifestation

(performance), and exaggeration is defined by its imperfect manifestation as RExI scores.

96

Just as many improvements have been made in measuring competence (e.g. Item Response

theory), improvements on the RExI will likely emerge to better capture exaggeration.

Potential Applications

That the RExI robustly predicts a comprehensive life outcome like university GPA

suggests that it may be a useful measure in many situations where knowledge performance

matters.

The two formats explored in Study 4 show different advantages and disadvantages.

An overclaiming format provides a time-efficient, low-stress measure of exaggeration but no

objectively measure of ability. That format also seems to have stronger personality and

memory correlates, and so may be more appropriate when used in conjunction with other

competence tests. While an overclaiming test can be quick and easy, the use of foils may

create some unexpected reactance effects, such as suspicion or confusion, or ethical

complaints about potential deception. Care should be taken in developing overclaiming

inventories (as done for the VoKE) to establish that reals claiming relates to appropriate

ability and is at least somewhat distinct from foils claiming. Extra care should be given to

ensuring foils do not indicate any kind of ability.

The overstatement format carries the time and stress burdens of any objective ability

test, but has the advantage of supplying a conventional ability score (number correct) in

addition to the RExI, so both can be used synergistically. In situations where a traditional

ability score is required, adaptation to provide a RExI could a) diminish some measurement

error due to guessing, b) provide incremental validity (if including RExI in predictions), and

c) provide important additional feedback about individual items, i.e. not just rate correct or

incorrect, but rate not claimed. Even if the RExI is not used as a quantitative predictor, the

extra information gained by allowing non-claiming may support useful qualitative insights.

An additional benefit of the RExI is that it is a behavioral measure that can predict a

behavioral outcome (e.g. GPA), avoiding the complications and assumptions of self-report

97

measures. As a conceptual analogy, consider a traffic cop pulling over a suspected

inebriated driver. If the driver (implicitly claiming driving competence) fails simple

behavioral tests (e.g. walking a straight line), the officer can make a reasonable inference of

future performance (e.g. an accident), without needing to know the underlying causes or

processes. Similarly, the RExI provides the practical convenience of using a simple

behavioral test to predict impairment on future performance — understanding

psychological mechanisms is optional. A RExI may prove to be convenient and useful in a

variety of settings.

Standardized Testing

In every study, exaggeration of an ability showed incremental predictive validity

beyond that of the ability itself, sometimes even to a larger degree. In the United States, a

sizeable industry has grown around predicting academic performance:

“Standardized-testing regimens cost states some $1.7 billion a year overall, or a quarter of 1

percent of total K-12 spending” (Ujifusa, 2012). Despite a long history of criticism (Hutt &

Schneider, 2018), the greatest value of such testing may be in predicting future outcomes,

such as whether an education system produces economically productive workers.

Integrating a RExI into such testing could improve such predictions both quantitatively and

qualitatively, given the range of non-cognitive associations presented here, providing more

accurate and more wholistic evaluation.

Human Resources

Job applicants have shown negative attitudes to pre-employment personality tests yet

not to ability tests (Rosse et al., 1994). Given that identifying narcissistic, entitled,

impatient, overconfident exaggerators with unreliable memories may be valuable in

employee selection, yet be problematic to assess directly without reactance, a simple

vocabulary test (for example) that indirectly captures these traits may be very useful.

Further research may show context-dependence effects, e.g. that exaggeration is helpful for

98

marketing but not for engineering.

Education

Beyond simply increasing predictive value, incorporating ignorance awareness in

education could be broadly beneficial. In most objective tests (e.g. multiple-choice),

students are encouraged to guess if they don’t know the answer. A common rationale for

guessing is that it can improve one’s test score, albeit illegitimately. This framing implicitly

encourages a performance motivation, e.g. maximizing marks regardless of ability, instead

of mastery motivation which prioritizes understanding and yields better academic outcomes

(Kaplan et al., 2002). Rarely, if ever, are students encouraged to critically assess their

ignorance or incompetence, depriving them of the chance to develop self-awareness and

emotion management skills. Automatically translating ignorance into failure (forcing an

incorrect guess) also prevents instructors from getting important feedback, both about the

student and about the questions being asked. The act of guessing itself may be harmful, as

students “can also learn false facts from multiple-choice tests; testing leads to persistence of

some multiple-choice lures on later general knowledge tests” (Marsh et al., 2007, p. 194). If

someone does not know the answer, why do we teach them to pretend they do?

By encouraging guessing, we not only lose important qualitative information about

the student and the test, we also deteriorate our quantitative measures. There has been a

decades-long debate about how to deal with guessing on multiple-choice tests, with strong

evidence that it should not be ignored (e.g. Espinosa & Gardeazabal, 2010) even though it

typically is. An oversight of this literature is that it assumes a unidimensional measure of

ability, e.g. adjusting number correct with some correction-for-guessing algorithm. No such

algorithm has been universally adopted, probably due to complexity (compared to simply

counting number correct) or that no single, simple algorithm gives consistent, comparable

results. Allowing guessing confounds the ostensible knowledge measure with personality

factors like risk-taking (Alnabhan, 2002). The research presented here shows that allowing

99

admission of ignorance not only allows for more predictive validity (i.e. with the RExI), but

also identifies other personality confounds, like entitlement, overconfidence or narcissism.

Even without calculating a RExI, allowing non-claiming of ability can give educators an

immediately sense of student (over)confidence or misconception and respond accordingly.

Across disciplines, use of the RExI may inform pedagogical research, by distinctly

quantifying failure in self-image. The known benefit of repeated testing (e.g. Larsen et al.,

2009) may owe some of its effectiveness to failure feedback, thus reducing tendency to

exaggerate. A related pedagogical approach is a technique called productive failure which

allows students to confront their ignorance of a subject (failure) which subsequently

increases productivity of learning (e.g. Chowrira et al., 2019). Perhaps confronting

ignorance or incompetence is a “desirable difficulty”, a challenge that improves learning

(e.g. Bjork & Kroll, 2015).

With the RExI providing an empirical foundation, the widely lauded but weakly

defined goal of teaching “critical thinking” (T. Moore, 2013) could become more tangible:

Skill in evaluating new knowledge may be grounded in ability to evaluate one’s existing

knowledge. The emerging scientific study of wisdom (Grossmann et al., 2020) suggests that

an important component may be intellectual humility, the “ability to accurately

acknowledge one’s limitations and abilities” (Van Tongeren et al., 2019, p. 463). The

significant relationship between intellectual humility and exaggeration noted in the present

research hints that the RExI may be a convenient tool in supporting future wisdom research.

Being unrelated to the ability being measured, the RExI allows for research that need

not interfere with ordinary educational practices, supporting scientific advancement

without ethical compromises. While more ecologically-valid contextual research (e.g. in

classrooms, examining student and teacher perspectives) is required before RExI scores

could be used in summative evaluations (e.g. determining grades for academic progression),

the additional, quantitative data provided by exaggeration scores would, at least, be

valuable formative feedback that helps students better understand both what they are

100

learning and themselves.

Conclusion

The present research aims to show that exaggeration (as captured by the RExI) offers

a more accurate measurement of what is intuitively connoted by the terms overconfidence,

overestimation, overclaiming, or overstatement. This contribution to psychology is

important because it a) identifies and rectifies oversights in previous research, b) provides a

unifying conceptual and analytic framework and, c) establishes that the underlying

phenomenon is ubiquitous and maladaptive.

In Errol Morris’s 2013 documentary about former U.S. Secretary of Defense Donald

Rumsfeld, The Unknown Known, besides famously distinguishing between “known knowns”

and “unknown unknowns”, Morris has Rumsfeld describe “unknown knowns”: “things that

you think you know, that it turns out you did not” (Morris, 2013). That documentary

serves as a case study in rationalized bad decisions.

In addition to the well-known arrogance of politicians, ordinary people also appear to

be susceptible to thinking they know things they do not. This has become low-hanging

fruit for comedians, asking people to give opinions on non-existent subjects, such as

imaginary musical groups, fashion designers or movie scenes, as demonstrated by Jimmy

Kimmel’s “Lie Witness News” (Kimmel, 2020).

Such delusions may be widespread because of an innate human heuristic conflating

confidence with competence (Birch et al., 2010). From an evolutionary perspective,

expressed confidence probably adequately predicted genuine competence enough to be an

adaptive cue of whom to trust and follow. In our modern world, where many of us interact

more with technology than other humans, this heuristic might cause problems, such as

electing the confident more than the competent (Ronay et al., 2019). This is supported by

evidence that foils claiming predicts anti-establishment voting (van Prooijen & Krouwel,

2019). In short, modernity may create a positive-feedback loop for this abuse of an evolved

101

heuristic (exaggeration) which may have increasing societal impact.

The pitfalls of unwarranted ability claiming have been appreciated for millennia, as in

the teaching of Lao Tzu 2600 years ago: “He who tries to shine dims his own light.”

(Mitchell & Tzu, 1992, chapter 24). With the hazards of humanity’s hubris becoming

increasingly apparent, raising awareness of our habitual miscalibration could be restorative.

Both individually and collectively, ignorance and incompetence are inevitable, and should

not elicit shame, denial or deceit, but rather curiosity. In our information-driven society,

the unmeasured often goes unnoticed, so if the RExI became a common metric of

mis-calibration, the increased salience may guide more judicious management of our

limitations, and avoid some of the dangers of “unknown knowns”.

102

References

Ackerman, P. L., & Ellingsen, V. J. (2014). Vocabulary overclaiming — A complete

approach: Ability, personality, self-concept correlates, and gender differences.

Intelligence, 100 (46), 216–227. https://doi.org/10.1016/j.intell.2014.07.003

Ackerman, R. A., Witt, E. A., Donnellan, M. B., Trzesniewski, K. H., Robins, R. W., &

Kashy, D. A. (2011). What Does the Narcissistic Personality Inventory Really

Measure? Assessment, 18 (1), 67–87. https://doi.org/10.1177/1073191110382845

Alnabhan, M. (2002). An empirical investigation of the effects of three methods of handling

guessing and risk taking on the psychometric indices of a test. Social Behavior and

Personality: An International Journal, 30 (7), 645–652.

https://doi.org/10.2224/sbp.2002.30.7.645

Amati, F., Oh, H., Kwan, V. S. Y., Jordan, K., & Keenan, J. P. (2010). Overclaiming and

the medial prefrontal cortex: A transcranial magnetic stimulation study. Cognitive

Neuroscience, 1 (4), 268–276. https://doi.org/10.1080/17588928.2010.493971

Ames, D. R., Rose, P., & Anderson, C. P. (2006). The NPI-16 as a short measure of

narcissism. Journal of Research in Personality, 40 (4), 440–450.

https://doi.org/10.1016/j.jrp.2005.03.002

Anderson, C. D., Warner, J. L., & Spencer, C. C. (1984). Inflation bias in self-assessment

examinations: Implications for valid employee selection. Journal of Applied

Psychology, 69 (4), 574–580. https://doi.org/10.1037/0021-9010.69.4.574

Atir, S., Rosenzweig, E., & Dunning, D. (2015). When Knowledge Knows No Bounds:

Self-Perceived Expertise Predicts Claims of Impossible Knowledge. Psychological

Science, 26 (8), 1295–1303. https://doi.org/10.1177/0956797615588195

Back, M. D., Schmukle, S. C., & Egloff, B. (2010). Why are narcissists so charming at first

sight? Decoding the narcissism–popularity link at zero acquaintance. Journal of

Personality and Social Psychology, 98 (1), 132–145.

https://doi.org/10.1037/a0016338

https://doi.org/10.1016/j.intell.2014.07.003

https://doi.org/10.1177/1073191110382845

https://doi.org/10.2224/sbp.2002.30.7.645

https://doi.org/10.1080/17588928.2010.493971

https://doi.org/10.1016/j.jrp.2005.03.002

https://doi.org/10.1037/0021-9010.69.4.574

https://doi.org/10.1177/0956797615588195

https://doi.org/10.1037/a0016338

103

Bailey, T. M., & Hahn, U. (2001). Determinants of Wordlikeness: Phonotactics or Lexical

Neighborhoods? Journal of Memory and Language, 44 (4), 568–591.

https://doi.org/10.1006/jmla.2000.2756

Baker, P. (2020). ‘Person. Woman. Man. Camera. TV.’ Didn’t Mean What Trump Hoped

It Did. The New York Times.

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B.,

Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English

Lexicon Project. Behavior Research Methods, 39 (3), 445–459.

https://doi.org/10.3758/BF03193014

Beglar, D. (2010). A Rasch-based validation of the Vocabulary Size Test. Language Testing,

27 (1), 101–118. https://doi.org/10.1177/0265532209340194

Bensch, D., Paulhus, D. L., Stankov, L., & Ziegler, M. (2017). Teasing Apart Overclaiming,

Overconfidence, and Socially Desirable Responding. Assessment, 26 (3), 351–363.

https://doi.org/10.1177/1073191117700268

Biesanz, J. (2020). Functions for Applied Behavioural Sciences.

Birch, S. A. J., Akmal, N., & Frampton, K. L. (2010). Two-year-olds are vigilant of others’

non-verbal cues to credibility. Developmental Science, 13 (2), 363–369.

https://doi.org/10.1111/j.1467-7687.2009.00906.x

Bjork, R. A., & Kroll, J. F. (2015). Desirable Difficulties in Vocabulary Learning. The

American journal of psychology, 128 (2), 241–252.

Bond, J. A. (1986). Inconsistent Responding to Repeated MMPI Items: Is Its Major Cause

Really Carelessness? Journal of Personality Assessment, 50 (1), 50–64.

https://doi.org/10.1207/s15327752jpa5001 7

Bowling, N. A., Huang, J. L., Bragg, C. B., Khazon, S., Liu, M., & Blackmore, C. E.

(2016). Who cares and who is careless? Insufficient effort responding as a reflection

of respondent personality. Journal of Personality and Social Psychology, 111 (2),

218–229. https://doi.org/10.1037/pspp0000085


https://doi.org/10.3758/BF03193014

https://doi.org/10.1177/0265532209340194

https://doi.org/10.1177/1073191117700268

https://doi.org/10.1111/j.1467-7687.2009.00906.x

https://doi.org/10.1207/s15327752jpa5001_7

https://doi.org/10.1037/pspp0000085

104

Brogden, H. E. (1940). A factor analysis of forty character tests. Psychological Monographs,

52 (3), 39–55. https://doi.org/10.1037/h0093562

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A New

Source of Inexpensive, Yet High-Quality, Data? Perspectives on Psychological

Science, 6 (1), 3–5. https://doi.org/10.1177/1745691610393980

Burks, S. V., Carpenter, J. P., Goette, L., & Rustichini, A. (2013). Overconfidence and

Social Signalling. The Review of Economic Studies, 80 (3), 949–983.

https://doi.org/10.1093/restud/rds046

Bynum, L. A., & Davison, H. K. (2014). A Comparison of Faking on Equity Sensitivity

Measures Using the Overclaiming Instrument. Journal of Managerial Issues, 26 (4),

345–364.

Byrne, R. W., & Whiten, A. (1992). Cognitive evolution in primates: Evidence from

tactical deception. Man, 609–627.

Cacioppo, J. T., Petty, R. E., & Kao, C. F. (1984). The Efficient Assessment of Need for

Cognition. Journal of Personality Assessment, 48 (3), 306–307.


Campbell, W. K., Bonacci, A. M., Shelton, J., Exline, J. J., & Bushman, B. J. (2004).

Psychological Entitlement: Interpersonal Consequences and Validation of a

Self-Report Measure. Journal of Personality Assessment, 83 (1), 29–45.


Campbell, W. K., Goodie, A. S., & Foster, J. D. (2004). Narcissism, confidence, and risk

attitude. Journal of Behavioral Decision Making, 17 (4), 297–311.

https://doi.org/10.1002/bdm.475

Chowrira, S., Smith, K., Dubois, P. J., & Roll, I. (2019). DIY Productive Failure: Boosting

Performance in a Large Undergraduate Biology Course. npj Science of Learning,

4 (1). https://doi.org/10.1038/s41539-019-0040-6

https://doi.org/10.1037/h0093562

https://doi.org/10.1177/1745691610393980

https://doi.org/10.1093/restud/rds046



https://doi.org/10.1002/bdm.475

https://doi.org/10.1038/s41539-019-0040-6

105

Cohen, P., Cohen, J., Aiken, L. S., & West, S. G. (1999). The Problem of Units and the

Circumstance for POMP. Multivariate Behavioral Research, 34 (3), 315–346.

https://doi.org/10.1207/S15327906MBR3403 2

Deci, E. L., Koestner, R., & Ryan, R. M. (2001). Extrinsic Rewards and Intrinsic

Motivation in Education: Reconsidered Once Again. Review of Educational

Research, 71 (1), 1–27. https://doi.org/10.3102/00346543071001001

Deffler, S. A., Leary, M. R., & Hoyle, R. H. (2016). Knowing what you know: Intellectual

humility and judgments of recognition memory. Personality and Individual

Differences, 96, 255–259. https://doi.org/10.1016/j.paid.2016.03.016

DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2015). Best practice recommendations

for data screening. Journal of Organizational Behavior, 36 (2), 171–181.

https://doi.org/10.1002/job.1962

Dunlop, P. D., Bourdage, J. S., de Vries, R. E., Hilbig, B. E., Zettler, I., & Ludeke, S. G.

(2016). Openness to (Reporting) Experiences That One Never Had: Overclaiming as

an Outcome of the Knowledge Accumulated Through a Proclivity for Cognitive and

Aesthetic Exploration. Journal of Personality and Social Psychology, 113 (5),

810–834. https://doi.org/10.1037/pspp0000110

Duttle, K. (2016). Cognitive Skills and Confidence: Interrelations with Overestimation,

Overplacement and Overprecision. Bulletin of Economic Research, 68 (S1), 42–55.

https://doi.org/10.1111/boer.12069

Dweck, C. S. (2006). Mindset: The New Psychology of Success. Random House.

Edwards, J. R. (1994). Regression Analysis as an Alternative to Difference Scores.

JOURNAL OF MANAGEMENT, 20 (3), 7.

Emmons, R. A. (1987). Narcissism: Theory and measurement. Journal of personality and

social psychology, 52 (1), 11–17.

https://doi.org/10.1207/S15327906MBR3403_2

https://doi.org/10.3102/00346543071001001

https://doi.org/10.1016/j.paid.2016.03.016

https://doi.org/10.1002/job.1962

https://doi.org/10.1037/pspp0000110

https://doi.org/10.1111/boer.12069

106

Espinosa, M. P., & Gardeazabal, J. (2010). Optimal correction for guessing in

multiple-choice tests. Journal of Mathematical Psychology, 54 (5), 415–425.

https://doi.org/10.1016/j.jmp.2010.06.001

Fell, C. B., & Konig, C. J. (2018). Examining Cross-Cultural Differences in Academic

Faking in 41 Nations. Applied Psychology, 69 (2), 444–478.

https://doi.org/10.1111/apps.12178

Festinger, L. (1954). A theory of social comparison processes. Human relations, 7 (2),

117–140.

Frankish, K. (2010). Dual-Process and Dual-System Theories of Reasoning. Philosophy

Compass, 5 (10), 914–926. https://doi.org/10.1111/j.1747-9991.2010.00330.x

Frederick, S. (2005). Cognitive Reflection and Decision Making. The Journal of Economic

Perspectives, 19 (4), 25–42.

Frisch, S. A., Large, N. R., & Pisoni, D. B. (2000). Perception of Wordlikeness: Effects of

Segment Probability and Length on the Processing of Nonwords. Journal of

Memory and Language, 42 (4), 481–496. https://doi.org/10.1006/jmla.1999.2692

Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach.

Psychological review, 102 (4), 652.

Furnham, A., Hyde, G., & Trickey, G. (2015). Personality and value correlates of careless

and erratic questionnaire responses. Personality and Individual Differences, 80,

64–67. https://doi.org/10.1016/j.paid.2015.02.005

Goldberg, L. R. (1981). Language and individual differences: The search for universals in

personality lexicons. Review of personality and social psychology, 2 (1), 141–165.

Gosling, S. D., Rentfrow, P. J., & Swann Jr, W. B. (2003). A very brief measure of the

Big-Five personality domains. Journal of Research in personality, 37 (6), 504–528.

Grossmann, I., Weststrate, N. M., Ardelt, M., Brienza, J. P., Dong, M., Ferrari, M.,

Fournier, M. A., Hu, C. S., Nusbaum, H. C., & Vervaeke, J. (2020). The Science of

https://doi.org/10.1016/j.jmp.2010.06.001


https://doi.org/10.1111/j.1747-9991.2010.00330.x



107

Wisdom in a Polarized World: Knowns and Unknowns. Psychological Inquiry, 31 (2),

103–133. https://doi.org/10.1080/1047840X.2020.1750917

Guenther, C. L., & Alicke, M. D. (2010). Deconstructing the better-than-average effect.

Journal of Personality and Social Psychology, 99 (5), 755–770.

https://doi.org/10.1037/a0020959

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice

Item-Writing Guidelines for Classroom Assessment. Applied Measurement in

Education, 15 (3), 309–333. https://doi.org/10.1207/S15324818AME1503 5

Hargittai, E. (2009). An Update on Survey Measures of Web-Oriented Digital Literacy.

Social Science Computer Review, 27 (1), 130–137.

https://doi.org/10.1177/0894439308318213

Heine, S. J., & Hamamura, T. (2007). In Search of East Asian Self-Enhancement.

Personality and Social Psychology Review, 11 (1), 4–27.

https://doi.org/10.1177/1088868306294587

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?

Behavioral and brain sciences, 33 (2-3), 61–83.

https://doi.org/10.1017/S0140525X0999152X

Hirsch Jr, E. D., Kett, J. F., & Trefil, J. S. (1988). Cultural literacy: What every American

needs to know. Vintage.

Hodge, D. R., & Gillespie, D. (2003). Phrase completions: An alternative to Likert scales.

Social Work Research, 27 (1), 45–55.

Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting

and Deterring Insufficient Effort Responding to Surveys. Journal of Business and

Psychology, 27 (1), 99–114. https://doi.org/10.1007/s10869-011-9231-8

Hutt, E., & Schneider, J. (2018). A History of Achievement Testing in the United States

Or: Explaining the Persistence of Inadequacy. Teachers College Record, 34.

https://doi.org/10.1080/1047840X.2020.1750917

https://doi.org/10.1037/a0020959

https://doi.org/10.1207/S15324818AME1503_5

https://doi.org/10.1177/0894439308318213

https://doi.org/10.1177/1088868306294587

https://doi.org/10.1017/S0140525X0999152X

https://doi.org/10.1007/s10869-011-9231-8

108

John, O. P., & Robins, R. W. (1994). Accuracy and bias in self-perception: Individual

differences in self-enhancement and the role of narcissism. Journal of personality and

social psychology, 66 (1), 206.

Joinson, A., McKenna, K., Postmes, T., & Reips, U.-D. (2007). Oxford Handbook of

Internet Psychology. Oxford University Press.

Joule, R.-V., & Azdia, T. (2003). Cognitive dissonance, double forced compliance, and

commitment. European Journal of Social Psychology, 33 (4), 565–571.

https://doi.org/10.1002/ejsp.165

Kahneman, D., Slovic, S. P., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty:

Heuristics and biases. Cambridge university press.

Kantner, J., & Lindsay, D. S. (2012). Response bias in recognition memory as a cognitive

trait. Memory & Cognition, 40 (8), 1163–1177.

https://doi.org/10.3758/s13421-012-0226-0

Kantner, J., & Lindsay, D. S. (2014). Cross-situational consistency in recognition memory

response bias. Psychonomic Bulletin & Review, 21 (5), 1272–1280.

https://doi.org/10.3758/s13423-014-0608-3

Kaplan, A., Middleton, M. J., Urdan, T., & Midgley, C. (2002). Achievement goals and

goal structures. Goals, goal structures, and patterns of adaptive learning, 21–53.

Kelley, T. L. (1927). Interpretation of educational measurements. World Book Co.

Kennedy, J. A., Anderson, C., & Moore, D. A. (2013). When overconfidence is revealed to

others: Testing the status-enhancement theory of overconfidence. Organizational

Behavior and Human Decision Processes, 122 (2), 266–279.

https://doi.org/10.1016/j.obhdp.2013.08.005

Kenny, D. A. (2004). PERSON: A general model of interpersonal perception. Personality

and social psychology review, 8 (3), 265–280.

Kimmel, J. (2020). Lie Witness News – Oscars 2020 Edition.

Kirkpatrick, E. A. (1907). A vocabulary test. Popular Science Monthly, 70, 157–164.

https://doi.org/10.1002/ejsp.165

https://doi.org/10.3758/s13421-012-0226-0

https://doi.org/10.3758/s13423-014-0608-3

https://doi.org/10.1016/j.obhdp.2013.08.005

109

Kopp, J. P., Zinn, T. E., Finney, S. J., & Jurich, D. P. (2011). The Development and

Evaluation of the Academic Entitlement Questionnaire. Measurement and

Evaluation in Counseling and Development, 44 (2), 105–129.

https://doi.org/10.1177/0748175611400292

Korossy, K. (1999). Modeling knowledge as competence and performance. Knowledge

spaces: Theories, empirical research, and applications, 103–132.

Krueger, J. I., & Mueller, R. A. (2002). Unskilled, unaware, or both? The

better-than-average heuristic and statistical regression predict errors in estimates of

own performance. Journal of personality and social psychology, 82 (2), 180–188.

Krueger, J. I., & Wright, J. C. (2011). Measurement of self-enhancement (and

self-protection). Handbook of self-enhancement and self-protection, 472–494.

Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in

recognizing one’s own incompetence lead to inflated self-assessments. Journal of

personality and social psychology, 77 (6), 1121–1134.

Krumrei-Mancuso, E. J., & Rouse, S. V. (2016). The development and validation of the

comprehensive intellectual humility scale. Journal of personality assessment, 98 (2),

209–221.

Lamport, L. (1986). LATEX: A document preparation system, adison.

Larsen, D. P., Butler, A. C., & Roediger III, H. L. (2009). Repeated testing improves

long-term retention relative to repeated study: A randomised controlled trial.

Medical Education, 43 (12), 1174–1181.

https://doi.org/10.1111/j.1365-2923.2009.03518.x

Leary, M. R., Diebels, K. J., Davisson, E. K., Jongman-Sereno, K. P., Isherwood, J. C.,

Raimi, K. T., Deffler, S. A., & Hoyle, R. H. (2017). Cognitive and Interpersonal

Features of Intellectual Humility. Personality and Social Psychology Bulletin, 43 (6),

793–813. https://doi.org/10.1177/0146167217697695

https://doi.org/10.1177/0748175611400292

https://doi.org/10.1111/j.1365-2923.2009.03518.x

https://doi.org/10.1177/0146167217697695

110

Lee, D., & Daunizeau, J. (2020). Choosing what we like vs liking what we choose: How

choice-induced preference change might actually be instrumental to decision-making.

PLOS ONE, 15 (5), e0231081. https://doi.org/10.1371/journal.pone.0231081

Lee, K., & Ashton, M. C. (2004). Psychometric Properties of the HEXACO Personality

Inventory. Multivariate Behavioral Research, 39 (2), 329–358.

https://doi.org/10.1207/s15327906mbr3902 8

Levashina, J., Morgeson, F. P., & Campion, M. A. (2009). They Don’t Do It Often, But

They Do It Well: Exploring the relationship between applicant mental abilities and

faking. International Journal of Selection and Assessment, 17 (3), 271–281.

https://doi.org/10.1111/j.1468-2389.2009.00469.x

Lucas, D. B. (1942). A Controlled Recognition Technique for Measuring Magazine

Advertising Audiences. Journal of Marketing, 6 (4 part 2), 133–136.

https://doi.org/10.1177/002224294200600431.1

Ludeke, S. G., & Makransky, G. (2016). Does the Over-Claiming Questionnaire measure

overclaiming? Absent convergent validity in a large community sample.

Psychological Assessment, 28 (6), 765–774.

Macenczak, L. A., Campbell, S., Henley, A. B., & Campbell, W. K. (2016). Direct and

interactive effects of narcissism and power on overconfidence. Personality and

Individual Differences, 91, 113–122. https://doi.org/10.1016/j.paid.2015.11.053

Mackenzie, J. A., & McMillan, T. M. (2005). Knowledge of post-concussional syndrome in

naıve lay-people, general practitioners and people with minor traumatic brain

injury. British Journal of Clinical Psychology, 44 (3), 417–424.

https://doi.org/10.1348/014466505X35696

Macmillan, N. A. (2002). Signal detection theory. Stevens’ handbook of experimental

psychology.

Marchman, V. A., & Fernald, A. (2008). Speed of word recognition and vocabulary

knowledge in infancy predict cognitive and language outcomes in later childhood.

https://doi.org/10.1371/journal.pone.0231081

https://doi.org/10.1207/s15327906mbr3902_8

https://doi.org/10.1111/j.1468-2389.2009.00469.x

https://doi.org/10.1177/002224294200600431.1


https://doi.org/10.1348/014466505X35696

111

Developmental Science, 11 (3), F9–F16.

https://doi.org/10.1111/j.1467-7687.2008.00671.x

Marsh, E. J., Roediger, H. L., Bjork, R. A., & Bjork, E. L. (2007). The memorial

consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14 (2),

194–199. https://doi.org/10.3758/BF03194051

McKay, A. S., Garcia, D. M., Clapper, J. P., & Shultz, K. S. (2018). The attentive and the

careless: Examining the relationship between benevolent and malevolent personality

traits with careless responding in online surveys. Computers in Human Behavior,

84, 295–303. https://doi.org/10.1016/j.chb.2018.03.007

Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data.

Psychological methods, 17 (3), 437–455.

Merriam-Webster. (2020a). Exaggerate.

Merriam-Webster. (2020b). Overclaim.

Mitchell, S., & Tzu, L. (1992). Tao Te Ching written by Lao-tzu. HarperPerennial.

Moore, D. A. (2018). Overconfidence | Psychology Today.

Moore, D. A., Dev, A. S., & Goncharova, E. Y. (2018). Overconfidence Across Cultures.

Collabra: Psychology, 4 (1), 36. https://doi.org/10.1525/collabra.153

Moore, D. A., & Healy, P. J. (2008). The trouble with overconfidence. Psychological

Review, 115 (2), 502–517. https://doi.org/10.1037/0033-295X.115.2.502

Moore, T. (2013). Critical thinking: Seven definitions in search of a concept. Studies in

Higher Education, 38 (4), 506–522. https://doi.org/10.1080/03075079.2011.586995

Moran, G., & Cutler, B. L. (1997). Bogus Publicity Items and the Contingency Between

Awareness and Media-Induced Pretrial Prejudice. Law and Human Behavior, 21 (3),

339–344. https://doi.org/10.1023/A:1024846917038

Morris, E. (2013). The Unknown Known.

Murre, J. M. J., & Dros, J. (2015). Replication and Analysis of Ebbinghaus’ Forgetting

Curve. PLOS ONE, 10 (7), e0120644. https://doi.org/10.1371/journal.pone.0120644

https://doi.org/10.1111/j.1467-7687.2008.00671.x

https://doi.org/10.3758/BF03194051

https://doi.org/10.1016/j.chb.2018.03.007

https://doi.org/10.1525/collabra.153

https://doi.org/10.1037/0033-295X.115.2.502

https://doi.org/10.1080/03075079.2011.586995

https://doi.org/10.1023/A:1024846917038

https://doi.org/10.1371/journal.pone.0120644

112

Newman, E. J., Garry, M., Bernstein, D. M., Kantner, J., & Lindsay, D. S. (2012).

Nonprobative photographs (or words) inflate truthiness. Psychonomic Bulletin &

Review, 19 (5), 969–974. https://doi.org/10.3758/s13423-012-0292-0

Nichols, D. S., Greene, R. L., & Schmolck, P. (1989). Criteria for assessing inconsistent

patterns of item endorsement on the MMPI: Rationale, development, and empirical

trials. Journal of Clinical Psychology, 45 (2), 239–250. https://doi.org/10.1002/1097-

4679(198903)45:2<239::AID-JCLP2270450210>3.0.CO;2-1

Ohtani, K., & Hisasaka, T. (2018). Beyond intelligence: A meta-analytic review of the

relationship among metacognition, intelligence, and academic performance.

Metacognition and Learning, 13 (2), 179–212.

https://doi.org/10.1007/s11409-018-9183-8

Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of

personality and social psychology, 46 (3), 598.

Paulhus, D. L. (1988). Balanced inventory of desirable responding (BIDR). Acceptance and

Commitment Therapy. Measures Package, 41.

Paulhus, D. L. (1998). The Paulhus deception scales: BIDR version 7. Toronto/Buffalo:

Multi-Health Systems.

Paulhus, D. L. (2012). History of the Technique. In M. Ziegler, C. MacCann, &

R. D. Roberts (Eds.), New Perspectives on Faking in Personality Assessment

(pp. 309–329). Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780195387476.003.0087

Paulhus, D. L., & Bruce, M. N. (1990). Validation of the OCQ: A preliminary study.

Annual Convention of the Canadian Psychological Association, Ottawa, Ontario,

Canada.

Paulhus, D. L., & Dubois, P. J. (2014). Application of the Overclaiming Technique to

Scholastic Assessment. Educational and Psychological Measurement, 74 (6), 975–990.

https://doi.org/10.1177/0013164414536184

https://doi.org/10.3758/s13423-012-0292-0

https://doi.org/10.1002/1097-4679(198903)45:2<239::AID-JCLP2270450210>3.0.CO;2-1

https://doi.org/10.1002/1097-4679(198903)45:2<239::AID-JCLP2270450210>3.0.CO;2-1

https://doi.org/10.1007/s11409-018-9183-8

https://doi.org/10.1093/acprof:oso/9780195387476.003.0087

https://doi.org/10.1177/0013164414536184

113

Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The over-claiming

technique: Measuring self-enhancement independent of ability. Journal of

Personality and Social Psychology, 84 (4), 890–904.

Paulhus, D. L., & Williams, K. M. (2002). The Dark Triad of personality: Narcissism,

Machiavellianism, and psychopathy. Journal of Research in Personality, 36 (6),

556–563. https://doi.org/10.1016/S0092-6566(02)00505-6

Peter, J. P., Churchill, G. A., Jr., & Brown, T. J. (1993). Caution in the Use of Difference

Scores in Consumer Research. Journal of Consumer Research, 19 (4), 655–662.

https://doi.org/10.1086/209329

Phillips, D. L., & Clancy, K. J. (1972). Some Effects of ”Social Desirability” in Survey

Studies. American Journal of Sociology, 77 (5), 921–940.

Pintrich, P. R. (1991). A manual for the use of the Motivated Strategies for Learning

Questionnaire (MSLQ).

Pryor, L. R., Miller, J. D., & Gaughan, E. T. (2008). A Comparison of the Psychological

Entitlement Scale and the Narcissistic Personality Inventory’s Entitlement Scale:

Relations With General Personality Traits and Personality Disorders. Journal of

Personality Assessment, 90 (5), 517–520.

https://doi.org/10.1080/00223890802248893

R Core Team. (2020). R: A language and environment for statistical computing.

Raskin, R., & Terry, H. (1988). A principal-components analysis of the Narcissistic

Personality Inventory and further evidence of its construct validity. Journal of

personality and social psychology, 54 (5), 890–902.

Raubenheimer, A. S. (1925). An experimental study of some behavior traits of the

potentially delinquent boy. Psychology Monograph, 34 (6), 107.

Richardson, K. (2002). What IQ Tests Test. Theory & Psychology, 12 (3), 283–314.

https://doi.org/10.1177/0959354302012003012

https://doi.org/10.1016/S0092-6566(02)00505-6

https://doi.org/10.1086/209329

https://doi.org/10.1080/00223890802248893

https://doi.org/10.1177/0959354302012003012

114

Ronay, R., Oostrom, J. K., Lehmann-Willenbrock, N., Mayoral, S., & Rusch, H. (2019).

Playing the trump card: Why we select overconfident leaders and why it matters.

The Leadership Quarterly, 30 (6), 1–19.

https://doi.org/10.1016/j.leaqua.2019.101316

Rosse, J. G., Miller, J. L., & Stecher, M. D. (1994). A field study of job applicants’

reactions to personality and cognitive ability testing. Journal of Applied Psychology,

79 (6), 987–992. https://doi.org/10.1037/0021-9010.79.6.987

Rousselet, G. A., & Wilcox, R. R. (2020). Reaction Times and other Skewed Distributions:

Problems with the Mean and the Median. Meta-Psychology, 4.

https://doi.org/10.15626/MP.2019.1630

RStudio Team. (2019). RStudio: Integrated development environment for r. Manual.

RStudio, Inc. Boston, MA.

Safat, A. A., Sheibani, H., Mohammadi, P., Hasanabadi, N., & Sakhaee, E. (2018).

Evaluation of lipid-lowering effect of Cynara scolymus extract-loaded mesoporous

silica nanoparticles on ultra-lipid-fed mice. Comparative Clinical Pathology, 27 (2),

513–518. https://doi.org/10.1007/s00580-017-2621-1

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art.

Psychological methods, 7 (2), 147.

Schlenker, B. R. (2012). Self-presentation. Handbook of self and identity, 2nd ed

(pp. 542–570). The Guilford Press.

Schumacher, E., & Eskenazi, M. (2016). A Readability Analysis of Campaign Speeches

from the 2016 US Presidential Campaign. arXiv:1603.05739 [cs].

Smith, E. M., & Mason, J. B. (1970). The Influence of Instructions on Respondent Error.

Journal of Marketing Research, 7 (2), 254–255.

https://doi.org/10.1177/002224377000700216

Stuart, T. (2016). Donald Trump’s 13 Biggest Business Failures.

https://doi.org/10.1016/j.leaqua.2019.101316

https://doi.org/10.1037/0021-9010.79.6.987

https://doi.org/10.15626/MP.2019.1630

https://doi.org/10.1007/s00580-017-2621-1

https://doi.org/10.1177/002224377000700216

115

Suls, J., & Wheeler, L. (2013). Handbook of social comparison: Theory and research.

Springer Science & Business Media.

Symonds, P. M. (1924). The present status of character measurement. Journal of

Educational Psychology, 15 (8), 484–498. https://doi.org/10.1037/h0069154

Thomson, K. S., & Oppenheimer, D. M. (2016). Investigating an alternate form of the

cognitive reflection test. Judgment and Decision making, 11 (1), 99–113.

Toplak, M. E., West, R. F., & Stanovich, K. E. (2014). Assessing miserly information

processing: An expansion of the Cognitive Reflection Test. Thinking & Reasoning,

20 (2), 147–168. https://doi.org/10.1080/13546783.2013.844729

Trump, D. J. (2015). Campaign Speech, Hilton Head Island, South Carolina, United States.

Ujifusa, A. (2012). Standardized Testing Costs States $1.7 Billion a Year, Study Says.

Education Week.

van Prooijen, J.-W., & Krouwel, A. P. M. (2019). Overclaiming Knowledge Predicts

Anti-establishment Voting. Social Psychological and Personality Science, 11 (3),

356–363. https://doi.org/10.1177/1948550619862260

Van Tongeren, D. R., Davis, D. E., Hook, J. N., & vanOyen Witvliet, C. (2019). Humility.

Current Directions in Psychological Science, 28 (5), 463–468.

https://doi.org/10.1177/0963721419850153

Voelker, P. F. (1921). An Account of Certain Methods of Testing for Moral Reactions in

Conduct. Religious Education, 16 (2), 81–83.

https://doi.org/10.1080/0034408210160204

Wallace, H. M. (2011). Narcissistic self-enhancement. The handbook of narcissism and

narcissistic personality disorder: Theoretical approaches, empirical findings, and

treatments, 309–318.

Ward, M. K., Meade, A. W., Allred, C. M., Pappalardo, G., & Stoughton, J. W. (2017).

Careless response and attrition as sources of bias in online survey assessments of

https://doi.org/10.1037/h0069154

https://doi.org/10.1080/13546783.2013.844729

https://doi.org/10.1177/1948550619862260

https://doi.org/10.1177/0963721419850153

https://doi.org/10.1080/0034408210160204

116

personality traits and performance. Computers in Human Behavior, 76, 417–430.


Ward, M., & Meade, A. W. (2018). Applying Social Psychology to Prevent Careless

Responding during Online Surveys. Applied Psychology, 67 (2), 231–263.


Whittlesea, B. W. A., & Leboe, J. P. (2003). Two fluency heuristics (and how to tell them

apart). Journal of Memory and Language, 49 (1), 62–79.

https://doi.org/10.1016/S0749-596X(03)00009-3

Williams, K. M., Paulhus, D. L., & Nathanson, C. (2002). The nature of over-claiming:

Personality and cognitive factors. A Poster Presented to the Annual Meeting of the

American Psychological Association, Chicago, IL.

Wonderlic. (2019). The story of Wonderlic, being awesome since 1937.

Wonderlic, E. F. (1992). Wonderlic Personnel Test and scholastic level exam user’s manual.

Wonderlic and Associates: Northfield, IL, USA.

Woo, S. E., Harms, P. D., & Kuncel, N. R. (2007). Integrating personality and intelligence:

Typical intellectual engagement and need for cognition. Personality and Individual

Differences, 43 (6), 1635–1639. https://doi.org/10.1016/j.paid.2007.04.022

Wood, D., Harms, P. D., Lowman, G. H., & DeSimone, J. A. (2017). Response Speed and

Response Consistency as Mutually Validating Indicators of Data Quality in Online

Samples. Social Psychological and Personality Science, 8 (4), 454–464.

https://doi.org/10.1177/1948550617703168

Woodrow, H., & Bemmels, V. (1927). Overstatement as a test of general character in

pre-school children. Journal of Educational Psychology, 18 (4), 239–246.

https://doi.org/10.1037/h0071514

Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure

of orthographic similarity. Psychonomic Bulletin & Review, 15 (5), 971–979.

https://doi.org/10.3758/PBR.15.5.971



https://doi.org/10.1016/S0749-596X(03)00009-3


https://doi.org/10.1177/1948550617703168

https://doi.org/10.1037/h0071514

https://doi.org/10.3758/PBR.15.5.971

117

Zemeckis, R. (1994). Forrest Gump.

118

Appendix: The Overclaiming Technique (OCT)

While the RExI can be applied to an overstatement test to assess both ability and

exaggeration as uncorrelated measures, there exists another approach that purports to do

the same thing. The Overclaiming Technique (OCT) is presented as “Measuring

Self-Enhancement Independent of Ability” (Paulhus et al., 2003, p. 809), framing

self-enhancement as synonymous with exaggeration: “The OCT was designed to measure

knowledge exaggeration and knowledge accuracy simultaneously and independently”

(Paulhus, 2012, p. 151). How is this different from the RExI?

The OCT begins with the same data collection used by Raubenheimer (1925), labeled

“overclaiming” by Phillips and Clancy (1972), i.e. soliciting claims of knowledge or

familiarity with a variety of items, some of which are reals, some foils. The unique

contribution of the OCT is in using Signal Detection Theory (SDT) for analysis (e.g.

Macmillan, 2002). The portion of reals claimed (reals rate, or hit rate in SDT terms) and

the portion of foils claimed (foils rate, or false-alarm rate) are combined to create two new

indices: accuracy (the excess of reals rate over foils rate, also called sensitivity in SDT) and

bias (the average of the two)19.

The paper that introduced the OCT (Paulhus et al., 2003, which used a collection of

general knowledge reals and foils called the OCQ) first references the definition of

overclaiming used by Phillips and Clancy (1972): “Over-claiming is the tendency to claim

knowledge about nonexistent items” (p. 891)20, in other words, foils rate. The next page,

however, equates the term overclaiming with the bias definition described above:

“over-claiming was operationalized with the OCQ bias index” (p. 892), i.e. averaged reals

rate and foils rate, giving the term a very different meaning: not just unwarranted claims,

19 This simple approach, using difference and average, is called the “common sense” approach (Paulhus,2012, p. 154), which closely approximates the traditional SDT measures of d′ and −c, where reals rate andfoils rate are z-transformed before combining. Both approaches produce very similar results.

20 The terms “over-claiming” and “overclaiming” are used interchangeably by that author and others, butthe non-hyphenated version is a distinct keyword and search term, and so used here.

119

but any claims. For reporting results, however, “predictions with the OCQ bias measure

are always assessed after controlling for the OCQ accuracy score. Thus, discriminant

validity with respect to accurate knowledge is built into the calculation of the over-claiming

index.” (p. 899). That index (bias controlled for accuracy, or residualized bias) is meant to

capture exaggeration (self-enhancement) independent of knowledge (ability). Residualized

bias will necessarily be uncorrelated with the accuracy index, but does it capture

exaggeration independent of ability?

Let us examine the mathematics involved. Let R be reals rate, F , foils rate, A,

accuracy, and B, bias: A = R−F , and B = R+F . Plotting F against R then B against A

will show that creating difference and sum composites simply rotates the variable space by

45 degrees. What does this mean conceptually? As discussed earlier, R represents plausible,

self-reported ability (possibly with some exaggeration) that approximates actual ability, as

shown by P. L. Ackerman and Ellingsen (2014). F represents implausible ability claims,

but is likely related to genuine ability, as shown by P. L. Ackerman and Ellingsen (2014)

and Atir et al. (2015). B represents the indiscriminate claiming of real or foil items, which

is simply response bias, making no distinction between plausible and implausible claims.

Interpreting this as exaggeration would be comparable to asking fishermen the size of their

catch, and assuming everything they say is exaggeration regardless of what was caught.

In contrast, A represents plausible claims compensated for implausible claims, which

could be interpreted as a corrected self-estimate of ability. This idea has some validation:

Paulhus and Dubois (2014) demonstrated that this measure on an overclaiming inventory

was comparable to multiple-choice and short-answer quiz formats in predicting

undergraduate course grades. This suggests that foil claiming negatively predicts ability,

but, like bias, accuracy obfuscates any distinction between ability and exaggeration, since it

combines both reals and (reversed) foils claiming with equal weight. In essence, this is a

simple form of correction for guessing, collapsing two dimensions into one.

What about bias controlled for accuracy (residualized bias) which the OCT

120

recommends as the index of exaggeration or self-enhancement? Let BRes be residualized

bias. The SDT model that is the basis for OCT assumes equal variance for R and F . This

would make difference and sum composites A and B uncorrelated21, i.e. Cor(A,B) = 0. If

SDT assumptions are met, Cor(A,B) = 0 so controlling B for A has no effect and

BRes = B, meaning the OCT exaggeration index is no different from bias, i.e. indiscriminate

claiming.

In typical overclaiming research, however, the SDT assumption of equal variance may

not hold. Nonetheless, no part of the OCT ensures accuracy and bias will be substantially

correlated, i.e. that BRes will meaningfully differ from B. Thus there is no assurance that

the OCT measures exaggeration (self-enhancement) distinct from knowledge (ability),

contrary to the declared design goals.

This lack of distinction may explain the empirical failures of the OCT, because a

number of researchers have reported that the OCT does not measure what it claims to:

Bensch et al. (2017) examined self-enhancement broadly as “positivity bias”, including

measures of narcissism, self-deceptive enhancement, impression management,

overconfidence, crystallized intelligence, a variety of measures of socially-desirable

responding, and the five-factor model of personality. A factor analysis of all these found

that OCT bias or residualized bias did not appear on any of the six factors found, with the

authors concluding that whatever the OCT measured was “fully independent of personality

and crystallized intelligence.” (p. 12).

Using the HEXACO measure of Openness (K. Lee & Ashton, 2004), Dunlop et al.

(2016) found that OCT accuracy, bias and residualized bias all significantly related to

Openness (around r = .30), concluding that “overclaiming can be understood as a result of

knowledge accumulated through a general proclivity for cognitive and aesthetic exploration

(i.e., Openness)” (p. 1).

Less flattering, Ludeke and Makransky (2016), using the OCQ as done in the paper

21 Cov(X + Y,X − Y ) = E((X − µX) + (Y − µY ))((X − µX)− (Y − µY )) = V ar(X)− V ar(Y ) = 0

121

introducing the OCT (and so both the same items and same analytic technique) noted

“Using a sample of 704 adult community members, we found minimal support for the OCQ

as an assessment of misrepresentation. . . . OCQ bias measures were instead consistently

and sometimes even highly related to measures of careless responding.” (p. 1).

The OCT “exaggeration index” will mostly represent indiscriminate claiming, or

general response bias, a mix of ability and exaggeration, which could explain the above

findings. The OCT does not capture exaggeration as defined in this paper.

Integrating Overconfidence and ... - open.library.ubc.ca

Documents