-
THEORETICAL REVIEW
Underpowered samples, false negatives, and unconscious
learning
Miguel A. Vadillo1 & Emmanouil Konstantinidis2 & David
R. Shanks3
Published online: 30 June 2015# The Author(s) 2015. This article
is published with open access at Springerlink.com
Abstract The scientific community has witnessed growingconcern
about the high rate of false positives and unreliableresults within
the psychological literature, but the harmful im-pact of false
negatives has been largely ignored. False negativesare particularly
concerning in research areas where demonstrat-ing the absence of an
effect is crucial, such as studies of uncon-scious or implicit
processing. Research on implicit processesseeks evidence of
above-chance performance on some implicitbehavioral measure at the
same time as chance-level perfor-mance (that is, a null result) on
an explicit measure of aware-ness. A systematic review of 73
studies of contextual cuing, apopular implicit learning paradigm,
involving 181 statisticalanalyses of awareness tests, reveals how
underpowered studiescan lead to failure to reject a false null
hypothesis. Among thestudies that reported sufficient information,
the meta-analyticeffect size across awareness tests was dz = 0.31
(95 % CI 0.24–0.37), showing that participants’ learning in these
experimentswas conscious. The unusually large number of positive
resultsin this literature cannot be explained by selective
publication.Instead, our analyses demonstrate that these tests are
typicallyinsensitive and underpowered to detect medium to small,
but
true, effects in awareness tests. These findings challenge
awidespread and theoretically important claim about the extentof
unconscious human cognition.
Keywords Contextual cuing . False negatives . Implicitlearning .
Null hypothesis Significance testing· Statisticalpower
Research practices in the behavioral sciences are under
scru-tiny to an extent that would have been inconceivable 10
yearsago. Much of the debate has concerned habits (such as
“p-hacking” and the filedrawer effect) which can boost the
prev-alence of false positives in the published literature
(Ioannidis,Munafò, Fusar-Poli, Nosek, & David, 2014; Simmons,
Nel-son, & Simonsohn, 2011). Much less attention has been
paidto the harmful consequences of false negatives, namely
reportswhich purport to present evidence supporting false null
hy-potheses (Fiedler, Kurtzner, & Krueger, 2012). Via
meta-analysis of a particular sub-literature within the field of
im-plicit learning, we demonstrate how the use of
underpoweredexperiments and Null Hypothesis Significance
Testing(NHST) can combine to encourage the reporting of false
neg-atives and consequent theoretical distortion.
When a researcher obtains a result that is significant at p
<.05 and consequently reports that the null hypothesis
isrejected, then of course we have learned something: That
thelikelihood of obtaining data at least as extreme as those
thatwere observed, if the null hypothesis is true, is less than 5
%.Many would argue that we have not learned very much – forexample,
we have not learned that the null hypothesis is falseor unlikely
(Dienes, 2011; Fidler & Loftus, 2009). In contrast,when the
researcher finds a result that is not significant (p >.05) and
consequently concludes that the null hypothesis can-not be
rejected, from the point of view of NHST we have
* Miguel A. [email protected]
Emmanouil [email protected]
David R. [email protected]
1 Primary Care and Public Health Sciences, King’s College
London,Capital House, 42 Weston St., London SE1 3QD, UK
2 Department of Social and Decision Sciences, Carnegie
MellonUniversity, 5000 Forbes Avenue, Pittsburgh, PA BP 208,
USA
3 Division of Psychology and Language Sciences, University
CollegeLondon, 26 Bedford Way, London WC1H 0AH, UK
Psychon Bull Rev (2016) 23:87–102DOI
10.3758/s13423-015-0892-6
http://crossmark.crossref.org/dialog/?doi=10.3758/s13423-015-0892-6&domain=pdf
-
learned literally nothing. We have not learned that the
exper-imental hypothesis is false (the experiment may be
underpow-ered) nor have we learned that the null hypothesis is
true. Thusthere is a sense in which any conclusions drawn from
failuresto reject the null hypothesis are intrinsically more
problematicthan those drawn from rejections of the null.
Underpowered studies are a major contributing factor tothe
reporting of both false positives and false negatives(Button et
al., 2013). The power of typical studies in psychol-ogy, combined
with typical effect sizes, indicates that the liter-ature contains
far more significant results than it should, sug-gesting that it is
therefore biased in favor of significant findings(false positives)
rejecting true null hypotheses (Francis, 2012).But low power might
also contribute to the reporting of falsenegatives, when authors
wish to demonstrate the absence ofsome effect. For instance, the
absence of judgmental biasesoutside the laboratory (e.g., List,
2002), the absence of genderdifferences in math performance (e.g.,
Hyde, Lindberg, Linn,Ellis, & Williams, 2008), the absence of
differences betweenstudies run in the laboratory versus online
(McGraw, Tew, &Williams, 2000), the absence of awareness in
studies of implic-it processing, and many other such influential
claims dependon null effects which could potentially be false
negatives ifbased on low-powered studies. NHST provides further
impe-tus, in that its dichotomous nature
(significant/nonsignificant atthe arbitrary p = .05 cliff-edge) and
focus on rejection of thenull hypothesis encourage both researchers
and students tointerpret failure to reject the null hypothesis as
implying thatthe null hypothesis is true (Hoekstra, Finch, Kiers,
& Johnson,2006). As Fidler and Loftus (2009) note, “this kind
of almostirresistible logical slippage can, and often does, lead to
allmanner of interpretational mischief later on” (p. 29).
Confidence intervals (CIs) have an important role to play inthe
interpretation of null results (but see Hoekstra, Morey,Rouder,
&Wagenmakers, 2014). If such intervals include zerobut are
narrow, then it can safely be concluded that the effectin question
is either small or negligible in magnitude (thoughof course it
cannot be concluded that the effect is non-exis-tent). But if the
intervals are wide, then little confidence can beplaced on the null
result and a motivation is provided forrunning larger sample sizes.
Equally important is the role thatmeta-analysis can play in
reaching valid conclusions acrossbodies of research featuring null
results. Even though individ-ual underpowered studies may fail to
reject the null hypothe-sis, meta-analysis across a set of such
studies may permitmodest but real effects to be detected.
In the present research we illustrate these issues via a
system-atic review of a large body of studies within the field of
implicitlearning. These studies depend crucially on null results in
aware-ness checks, because implicit learning by definition
involvesmental processing in the absence of awareness. As we
show,the majority of these studies are underpowered to detect
smallbut real awareness effects. We illustrate how the computation
of
CIs (and their graphic depiction) and meta-analysis can lead
toradically different conclusions from those reached in the
indi-vidual studies themselves. Our results challenge a
theoreticallycrucial conclusion drawn from this body of
research.
Null results as a crucial feature of researchon implicit
processing
Research on implicit processing provides an excellent exam-ple
to illustrate the consequences of overreliance on NHST togather
support for the null hypothesis. In a typical experimenton implicit
processing, participants’ performance on some taskis above a
baseline level, but this behavioral outcome is seem-ingly not
accompanied by any awareness of the environmentalcues or
regularities that gave rise to the behavior. For instance,in
research on subliminal perception, some form of behavior isprimed
by a briefly-flashed stimulus of which participants areunaware
(e.g., Dehaene et al., 1998); research in neuropsy-chology suggests
that perception, memory, and choices canbe influenced by cues
unconsciously in various patient popu-lations (Bechara et al.,
1995; Cohen & Squire, 1980; Goodale,Milner, Jakobson, &
Carey, 1991); in research on behaviorpriming, some behavioral
response such as voting intentions(Hassin, Ferguson, Shidlovski,
& Gross, 2007), walking speed(Bargh, Chen, & Burrows,
1996), or answering generalknowledge questions (Dijksterhuis &
van Knippenberg,1998) is influenced by a subtle cue without
participants beingaware of this influence; research on implicit
moral judgments,emotions, and attitudes proposes that behaviors in
each ofthese domains can again be influenced by environmental
cuesunconsciously (Bargh, 2006;Williams&Bargh, 2008), and soon.
Usually the absence of awareness is inferred from a nullresult in
an awareness test (Dienes, 2015). For example, par-ticipants might
fail to detect stimuli in a forced-choice test orthey might perform
at chance when asked to exert some con-trol over the cue’s
influence on their behavior.
However, as mentioned above, null results in NHST areinherently
ambiguous. They can mean either that the null hy-pothesis is true
or that there is insufficient evidence to reject it.In the context
of implicit processing experiments, this meansthat when an
awareness test yields a non-significant result, thiscan indicate
either that participants were really unconscious ofthe cue or that
the awareness test is inadequate to permit a firmconclusion about
whether participants were aware or not. Un-fortunately, the
statistical analyses reported in many implicitprocessing
experiments are insufficient to test which of thesetwo
interpretations is more plausible. A Bayesian approach
tostatistical analysis might allow researchers to quantify to
whatextent null results reflect a real absence of effects or a lack
ofstatistical sensitivity (Dienes, 2015; Rouder, Speckman,
Sun,Morey, & Iverson, 2009). However, these Bayesian
analysesare seldom conducted (or reported) on data from
awareness
88 Psychon Bull Rev (2016) 23:87–102
-
tests. Furthermore, researchers sometimes report so little
in-formation in their statistical analyses that it is also
difficult forother researchers to compute these Bayesian analyses
on re-ported data.
This problem is clearly illustrated by current research ina
popular implicit learning paradigm known as contextualcuing (Chun
& Jiang, 1998; Chun & Turk-Browne, 2008),which is the focus
of the systematic review conducted here.In a typical contextual
cuing experiment, participants areshown search displays containing
a T-shaped target amonga number of L-shaped distractors (see Fig.
1). The target isalways rotated, so that the stem of the T points
either to theleft or to the right. Participants are instructed to
find the Tas fast as possible and report its orientation using two
dif-ferent keys. The search displays presented in half of thetrials
are repeated several times across training, while theremaining
search displays are randomly generated in eachtrial, although
participants are not informed about this ma-nipulation. Across
training blocks, participants’ reactiontimes (RTs) decrease
systematically as they become famil-iar with the task. But, most
importantly, this decrease islarger for repeated than for random
search displays, indi-cating that across trials participants
eventually learn some-thing specific about the repeating patterns.
That is to say,some mental representation is acquired of repeating
dis-plays which allows attention to be more and more
rapidlydeployed to the location where the target will be found(Chun
& Jiang, 1998). This learning effect on RTs is highlyrobust and
indeed is obtained in the vast majority of con-textual cuing
experiments.
Usually, the implicitness of this learning is assessed bymeans
of a recognition test conducted at the end of theexperiment.
Participants are shown all the repeating pat-terns intermixed with
new random patterns and are askedto report whether they have
already seen each of those pat-terns. The learning effect found
during the training phase isconsidered implicit if the number of
patterns correctly rec-ognized as old in the recognition test
(hits) is no larger thanthe number of random patterns wrongly
classified as old(false alarms), or if participants’ performance is
at chance(50 % correct) overall. Another popular test used to
assesswhether learning was implicit is to ask participants to
guesswhere the target would be in a search display where thetarget
has been replaced by an additional distractor. If theyperform at
chance in this task, their learning about the re-peating search
configurations is again considered implicit.In both procedures,
learning is assumed to be unconsciousif a statistical comparison
yields a null result.
However, as explained above, the statistical analyses typi-cally
conducted in these studies do not allow one to concludethat the
null effects observed in the awareness tests reflecttruly random
performance. Meta-analysis across the wholebody of experiments
published in this domain permits us to
check whether these null results reflect a real absence
ofawareness. Based on the relative proportions of
significantresults or on the overall trends of mean performance in
aware-ness tests it is possible to measure to what extent the
preva-lence of null results reveals a genuine absence of awareness
ormerely insensitivity of statistical data in individual
studies.
Proportion and distribution of significant results
To assess to what extent the null results observed in
theseanalyses reflect a real absence of awareness or a mere lackof
statistical sensitivity, we conducted a systematic reviewof the
literature. As explained in Appendix 1, we includedin our analyses
all the experiments that found spatial con-textual cueing and that
included either of the two
Fig. 1 Panel A shows a sequence of search displays as used in
standardcontextual cuing experiments. Participants are instructed
to search for a T-shaped target among a series of L-shaped
distractors. Some searchdisplays are regularly repeated during
training, whilst others are new,unrepeated (random) displays. Panel
B shows the typical pattern ofresults: Participants become faster
at finding the target among thedistractors in repeated displays
Psychon Bull Rev (2016) 23:87–102 89
-
awareness tests explained above (i.e., a recognition test ora
target guessing test).
By definition, research on implicit processing assumes
thatparticipants lack awareness of the relevant regularity, and
ac-cordingly 78.5% of the awareness tests yielded nonsignificant(p
> .05) differences. However, 21.5 % of the awareness testsdid
yield a significant difference, well above (binomial p <.001)
the theoretical 5 % of false positives that should beobserved if
the one-tailed null hypothesis is true with a stan-dard α = .05.
This proportion of significant results becomesparticularly striking
if we take into account that most of thesestatistical contrasts
actually relied on two-tailed t-tests, forwhich the theoretical
proportion of false positives would bejust 2.5 %. The proportion of
significant (p < .05) or margin-ally significant (.05 < p
< .10) results was 27.6 %, again abovethe theoretical 10 % that
would be predicted on the null hy-pothesis given a one-tailed test,
binomial p < .001.
Regardless of the results of the inferential analyses, we
alsocoded for each study whether participants performed
numeri-cally above chance (+1), exactly at chance (0), or
belowchance (-1) (see Appendix 1 for further details). The
meanvalue of this direction score across experiments was 0.53(95 %
CI 0.41–0.66), far above the theoretical 0 that shouldbe observed
under the null hypothesis, t(165) = 8.468, p <.001, dz = 0.66.
The proportion of experiments scoring 1was 66.9 %, significantly
above 50 % in a binomial test, p <.001. Interestingly, within
our database, the vast majority ofexperiments that reported a
significant result had directionscores of 1. A logistic regression
confirmed that there was arelationship between the direction scores
and the probabilityof a significant result in the awareness tests,
B = 1.37, SEB =0.483, Wald = 8.114, Odds ratio = 3.95, Model χ2(1)
= 16.11,p < .001. In other words, significant results were far
morelikely to be associated with numerically above- than
below-chance performance in the awareness test.
Overall, these results are not consistent with the idea thatthe
null hypothesis reflects the true distribution of results inthe
awareness tests. On a true null hypothesis (hits = falsealarms in
the awareness test, or performance equal to chance),only around 5 %
of studies should yield a significant result,and the number of
effects in the “explicit” direction shouldequal those in the wrong
direction. There should be no ten-dency for significant awareness
results to be more prevalent inone direction than the other.
Is there publication bias in the results of awarenesstests?
However, it is still possible that the null hypothesis is true
andthat the unusually large number of significant results reflects
abias favoring the publication of significant results versus
non-significant results. Even if participants perform at chance
in
the awareness test, occasionally the statistical analyses
willyield a significant result by mere chance. If researchers
orjournals are biased towards publishing significant results,
thenthe proportion of these in the published literature will
exceedthe theoretical proportion of false alarms that would be
expect-ed under the null hypothesis. Although this hypothesis
mightappear counterintuitive given that truly implicit learning
re-quires null awareness, it is important to evaluate this
possibil-ity within the studies included in the meta-analysis.
Deviations from chance are more likely to occur in lowquality
experiments where the measurement error is larger(e.g., smaller
samples or unreliable methods). That is to say,under the null
hypothesis, large and significant effect sizes aremore likely to be
obtained in low- than in high-powered ex-periments. In
meta-analyses, this trend is usually representedby means of a
funnel plot representing the relationship be-tween effect size and
the measurement error. Unfortunately,it is difficult to draw a
funnel plot with the information avail-able in our dataset because
many experiments did not reportsufficient statistical information
to compute effect sizes. Forinstance, standard errors and exact
t-values were reported onlyin roughly half of the analyses.
However, if publication biaswere responsible for the unusually
large number of significantresults, then one would expect to find
more significant resultsin low quality studies.
An important determinant of the quality of an experiment isthe
number of trials on which its measurement is based. Theimpact of
random variance on the results can beminimized if adependent
variable is based on a larger number of observa-tions. In the case
of contextual cuing experiments, a largenumber of trials in the
awareness test should yield less vari-able results and, therefore,
a more precise measurement ofawareness. Figure 2A shows the
relationship between thenumber of trials and statistical
significance. Dark bars repre-sent significant (black) or
marginally significant (dark red)results. The height of each bar
represents the number of trialsin the awareness test. As can be
seen, if anything, the patternof results is the opposite of what
would be predicted on thebasis of a publication bias: Null results
are more prevalentamong experiments including a small number of
awarenesstrials. A logistic regression confirmed that the
probability offinding a significant result increases as the number
of trialsincreases, B = 0.024, SEB = 0.009,Wald = 7.238, Odds ratio
=1.024, Model χ2(1) = 8.068, p = .005. Smyth and Shanks(2008)
observed the same pattern in a single experiment: Anawareness
measure which was not significantly different fromchance when based
on 24 trials became significant whenbased on 96. The present
results show that this pattern holdsin aggregate across published
studies.
Sample size, defined as the number of participants, is an-other
important determinant of the methodological quality ofan
experiment. Studies conducted on larger samples are morelikely to
yield results that converge to the true effect size.
90 Psychon Bull Rev (2016) 23:87–102
-
Figure 2B shows the relationship between sample size
andstatistical significance in contextual cuing experiments.
Theheight of each bar represents the sample size of the study. Asin
the case of the previous analysis, a logistic regression sug-gests
that the probability of finding a significant result growswith
sample size, B = 0.024, SEB = 0.013,Wald = 3.247, Oddsratio =
1.024, Model χ2(1) = 3.128, p = .077. Although onlymarginally
significant, this trend goes in the opposite directionfrom the one
predicted if the high number of positive resultswere due to a
publication bias favoring significant results overnon-significant
ones.
A defender of the implicit nature of contextual cuing couldargue
that awareness truly is absent in these studies, and
thatpublication bias explains the prevalence of significant
resultsin the meta-analysis. The results above show that this
hypoth-esis is implausible and that the prevalence is not
attributable topublication bias. However, they also show something
else ofimportance, namely that many of the reported null results
arelikely to be false negatives arising from underpowered
studies.As the quality of the measurement improves in terms of
sam-ple size and number of observations, it becomes appreciablymore
likely that the study will yield evidence of awareness.
Effect sizes and statistical power
Overall, these analyses suggest that there is a true
positiveeffect in the awareness tests employed in the studies
includedin the meta-analysis, and that failures to reach
statistical sig-nificance are largely due to the small number of
observationsregistered in most experiments, both in terms of sample
sizeand in the number of trials included in the awareness test.
Additional evidence for this interpretation can be obtainedby
exploring the typical size of the effect found in the aware-ness
tests.
In many of the studies included in the present analyses,
theauthors failed to report sufficient information to compute
theeffect size of the results of the awareness test. Very
frequently,the only piece of information available was that
p-values werelarger than .05, without additional details about t-
or F-values.However, we were able to compute effect sizes for 96 of
thestatistical contrasts included in our data set. Based on
samplesizes, reported t-values or, alternatively,
one-degree-of-freedom F-statistics we were able to compute Cohen’s
dz ef-fect size scores. We coded dz scores as positive if the
outcomewent in the “explicit” direction (e.g., hit rate >
false-alarm rate,regardless of significance) and as negative if the
pattern ofresults was the opposite. Given the significant
heterogeneityof effect sizes, Q(95) = 160.78, p < .001, we
conducted ameta-analysis on dz scores using a random effects
model.The meta-analytic mean dz was 0.31 (95 % CI 0.24–0.37).
Interestingly, although small, the meta-analytic effect
sizeremains significantly greater than zero even if one
activelyremoves from the meta-analysis all the statistical
contrasts thatturned out to be individually significant, dz = 0.16
(95 % CI0.10–0.22). Thus aggregate awareness is evident evenamongst
those studies that obtained no significant awarenessand were on
that basis interpreted as showing implicit learn-ing. This speaks
against the possibility that the studies in themeta-analysis
represent two quite distinct sub-groups, one inwhich learning is
truly conscious and one in which it is trulyunconscious. Even when
the true conscious studies are re-moved, the remainder yield
above-chance awareness.
It is important to acknowledge the real size might be small-er
than our meta-analytic estimate of dz = 0.31. The t- and F-values
were less likely to be reported when awareness testsfailed to reach
statistical significance, because in many ofthose cases the authors
simply noted that p-values were largerthan .05. Even so, assuming
that 0.31 is approximately thetrue dz of the typical awareness
test, it is possible to computewhat would be the required sample
size to achieve a specificlevel of statistical power. Using G*Power
3 (Faul, Erdfelder,Lang, & Buchner, 2007) we found that,
assuming a dz of .31, asample size of at least 66 participants
would be needed toachieve statistical power of .80 in a one-tailed
paired-samplest-test. For the more frequent two-tailed t-test, the
figure goesup to 84. But recall that, as just mentioned, 0.31 might
over-estimate or underestimate the real effect size.
Most interestingly, the medianN of all the contrasts includ-ed
in the meta-analysis (also including the ones for which dzcould not
be calculated) was 16. The statistical power of asample of 16
participants to obtain a significant two-tailedeffect given a dz of
0.31 is around .21. Note that this rangeof statistical powers is
virtually identical to the proportion ofsignificant results (21.5
%) observed in our dataset. Given the
Fig. 2 Contextual cuing experiments sorted by the number of
trials of theawareness test (top panel) or by sample size (lower
panel). Black barsdenote statistical contrasts with significant
results. Red bars denotestatistical contrasts with marginally
significant results
Psychon Bull Rev (2016) 23:87–102 91
-
small size of the effect found in the typical awareness test,
theaverage sample sizes used in these studies are seriously
un-derpowered. At the same time, the distribution of significantand
nonsignificant results is close to what would be expectedif the
awareness results in individual studies are sampled froma
distribution with a mean effect size of around .30.
Effect size in implicit versus explicit measures
It might be countered that this effect size in the awareness
testis far too small to account for the usually large
contextualcueing effect found in these experiments, as the typical
con-textual cueing experiment yields effect sizes well above dz =
1on the implicit RT measure. If participants had conscious ac-cess
to the representations learned in contextual cuing, whyshould this
knowledge yield larger effects when assessed bymeans of visual
search than when measured by means of anawareness test? This
concern neglects the fact that contextualcuing and awareness are
measured with radically differentprocedures. Even if they were
measuring exactly the samememory trace, the differences between the
procedures are sonumerous that it would be naïve to expect the same
effect sizein both of them. Just to mention a clear difference,
contextualcuing is traditionally assessed by gathering reaction
timesfrom hundreds of trials (usually more than 500 across
theexperiment). In contrast, awareness is assessed by means ofjust
a few discrete responses. As can be seen in Fig. 2A, thenumber of
trials rarely goes beyond 24 or 40. One cannotexpect to find the
same precision in a dependent variablebased on a few observations
of a discrete response as in onebased on hundreds of observations
of a continuous measure,even if those two dependent variables are
measuring exactlythe same latent variable.
In fact, when other constraints are taken into account, asmall
effect size is exactly what one would expect to find inany measure
of contextual cuing that is not based on a verylarge number of
observations. The available evidenceshows that the faster reaction
times found in repeated pat-terns are usually attributable to a
small number of searchdisplays (Schlagbauer, Müller, Zehetleitner,
& Geyer,2012; Smyth & Shanks, 2008). In other words,
participantsseem to learn very little or nothing about most of the
searchdisplays. Furthermore, it is also known that even for
thesearch displays that elicit some learning, participants donot
seem to acquire detailed information about all the ele-ments in the
search display. Instead, they seem to learnsomething only about the
two or three distractors that hap-pen to be closest to the target
(Brady & Chun, 2007; Olson& Chun, 2002). Trying to detect
these fragmentary mem-ory traces in a brief recognition test, where
each pattern isonly presented once, is like finding a needle in a
haystack.It is hardly surprising that the resulting effects are
small.
To further explore how small these effects can be, we con-ducted
a simulation of the results which one could expectgiven these
constraints. In a typical contextual cuing experi-ment,
participants are exposed to 12 repeated patterns and 12random
patterns. In our simulation we assumed that partici-pants would
only be able to recognize one, two, or three of the12 repeated
patterns (for which theywould therefore have a hitrate of 1.0) and
that they would guess randomly when present-ed with any other
pattern (either the 9–11 remaining repeatedpatterns or the 12
random patterns). Figure 3 shows the resultsof a simulation based
on 1,000 simulated participants. As canbe seen, the difference
between the aggregate hit and falsealarm rates is quite small in
all cases. The tiny error barsshown in Fig. 3 refer to the standard
error of the means acrossthe 1,000 simulated participants. Using
this small amount ofsampling error as a yardstick, the Cohen’s d
for the differencebetween hit rate and false alarm rate is only
0.44 for the case inwhich participants learn only two patterns.
Even under theassumption that participants learn about three
patterns it doesnot reach the conventional level for a large
effect. It is notdifficult to see that with just a small amount of
additionalmeasurement error, the effect size of these differences
willbe reduced to levels very similar to those found in our
meta-analysis. That is to say, the small meta-analytic effect size
isexactly what one would expect in a recognition test assumingthat
participants can only recognize correctly a couple of re-peated
patterns and that they guess whenever they are asked toidentify a
pattern that they do not recognize. The assumptionthat learning is
based on only a small number of patterns isentirely consistent with
what is known about the implicitlearning effect in contextual cuing
(Schlagbauer et al., 2012;Smyth & Shanks, 2008).
This simulation illustrates that the fact that the effect size
ofawareness is small does not mean that it is insufficient
toexplain or cannot be related to the (usually large) size of
thecontextual cuing effect found in reaction times. Instead,
thesmall effect size found in awareness tests is exactly what
onewould expect to find when a subtle effect is assessed with
anunreliable test. This problem does not apply to the usual
mea-sure of contextual cuing, which typically relies on hundreds
oftrials and consequently yields very precise estimations
(andtherefore large effect sizes) for even very subtle effects.
Theasymmetry between the small effects found in the awarenesstest
and the large effects found in visual search facilitation canbe
attributed to differences in the sensitivity of the two mea-sures
(we return to this issue later).
It is interesting to note that the superior sensitivity of
con-textual cueing measures relative to awareness tests is
alsoevident in experimental protocols where a brief awareness
testis sufficiently powered to detect above-chance performance.For
instance, it is widely acknowledged that contextual cueingis
explicit when natural scenes are used as contexts. In
theseexperiments (not included in our meta-analysis), a short test
is
92 Psychon Bull Rev (2016) 23:87–102
-
usually enough to detect explicit awareness. But even so,
thiseffect is disproportionally smaller than the corresponding
con-textual cuing effect found in reaction times. As an
example,Brockmole and Henderson (2006, Experiment 1) found
thatparticipants performed above chance in a location-guessingtest,
and this effect was so large (dz = 1.14) that it reachedstatistical
significance with a small sample of only eight par-ticipants. But
even this seemingly large effect is tiny comparedto the huge size
of the contextual cueing effect (dz = 6.54).Thus, the reduced
sensitivity of awareness tests is obviouseven in experiments where
learning is unambiguously consid-ered explicit and tests are
adequately powered to detect above-chance awareness.
Confidence intervals as a partial solutionto the false-negative
problem
It is easy to understand how null results can be false
negativesby visually examining the CIs of the dependent
variables.Figure 4 shows CIs for studies that employed a
recognitiontest and that reported the mean hit and false alarm
rates, and at- or F-value. This figure does not aim to summarize
the fullresults of the previous meta-analysis. It is offered only
as away of illustrating the misleading impression produced bynull
results. For the sake of simplicity we only show the CIsof studies
with the typically small samples used in contextualcuing
experiments (Ns between 14 and 18) and experimentswith relatively
large samples (Ns of 36 and above). All theexperiments that meet
these criteria are shown in Fig. 4.
Recall that a positive difference indicates that the propor-tion
of hits was larger than the proportion of false alarms, inother
words that participants were able to discriminate repeat-ed from
random search displays. As can be seen, for many ofthe studies with
small sample sizes (19/21), the CI includeszero. Those results are
usually taken as a proof that partici-pants were unaware of
learning. However, in general, theseCIs are very wide. They include
not just a small region around
zero, but also a wide range of positive values. Therefore,
thesestudies do not allow one to conclude that participants
wereunaware. They simply demonstrate that these experimentsdo not
permit the level of awareness to be estimated withany
precision.
In contrast, among the six experiments with the largestsample
sizes the CIs are narrower and only one of them in-cludes zero.
Interestingly, the meta-analytic 95 % CI of all theexperiments
included in the figure overlaps with the CI ofevery single study.
In other words, although the larger exper-iments yield significant
results and the smaller experimentstend to yield non-significant
results, there is actually no con-tradiction between them. Null
results create the illusion thatthere is no difference between hits
and false alarms and thatparticipants were, therefore, unaware of
learning. But the CIsdo not allow this inference to be made with
any degree ofcertainty. The use of CIs and graphic depiction is a
powerfulmethod for conveying the degree of precision in the
estimateand of avoiding the temptation to interpret a failure to
rejectthe null as evidence in favor of the null (Cumming,
2014;Fidler & Loftus, 2009).
Bayes Factors as an alternative solution
CIs and meta-analysis provide a particularly clear and
simplemeans to illustrate the uncertainty associated with
underpow-ered studies, especially when the goal of the researchers
is todraw conclusions on the basis of null results. However,
animportant shortcoming of CIs is that they fail to quantify
theextent to which the results of an experiment favor the null
orthe alternative hypothesis. If an experiment yields a
precise(i.e., narrow) CI around zero, it is legitimate to conclude
thatthe null hypothesis is probably supported by the data, or
atleast that the effect is of little practical significance. But in
theabsence of a means to quantify support for the null
hypothesisprecisely this judgment remains somewhat arbitrary
andsubjective.
In contrast, Bayes Factors provide such a means to quantifythe
extent to which evidence favors the null or the
alternativehypothesis and could accordingly play an important role
infuture research on contextual cuing and other implicit
learningeffects (Dienes, 2015). Specifically, a Bayes Factor
(BF10)represents the ratio between the likelihood of the data
giventhe alternative hypothesis (1) and the likelihood of the
datagiven the null hypothesis (0). A BF10 larger than 3 is
usuallyconsidered to reflect substantial support in favor of the
alter-native hypothesis and values larger than 10 strong
support.Conversely, values lower than 1/3 are considered
substantialevidence and values lower than 1/10 strong support for
thenull hypothesis (Wetzels, Matzke, Lee, Rouder, Iverson,
&Wagenmakers, 2011).
Fig. 3 Results of a simulation exploring the size difference
between hitrate and false alarm rate depending on the number of
patterns learned bythe participant. See the main text for more
details. Error bars denotestandard errors of the means across
simulations
Psychon Bull Rev (2016) 23:87–102 93
-
Do the results of the awareness tests reviewed in our
meta-analysis provide more support for the null hypothesis than
forthe alternative hypothesis? To answer this question, we
con-verted all the 96 effect sizes entered in the meta-analysis
backto t-values that we submitted to a Bayes Factor analysis usinga
Cauchy distribution with a (default) scaling factor r = 0.707as the
alternative hypothesis. To improve the comparability ofvalues
supporting the null hypothesis (originally boundedfrom 0 to 1) with
values supporting the alternative hypothesis(originally bounded
from 1 to∞), we took the logarithm of allBF10’s, which yields a
symmetric distribution where all neg-ative values support the null
hypothesis and all positive valuessupport the alternative
hypothesis. On this logarithmic scale,values roughly larger than
1.1 provide substantial support forthe alternative hypothesis (BF10
> 3) and values roughly largerthan 2.3 provide strong support
(BF10 > 10). Conversely,values lower than −1.1 or than −2.3
constitute substantialand strong support for the null
hypothesis.
The resulting distribution of the log(BF10)’s is depicted inFig.
5. Interestingly, this distribution offers some encourage-ment for
the view that contextual cueing can be implicit. Themajority of
results provide some support for the null hypoth-esis over the
alternative hypothesis, suggesting that learningwas indeed
unconscious in many of these studies. However, acloser inspection
of Fig. 5 also reveals an important asymme-try between the positive
and negative values. While positivevalues span a wide range of
values (providing not just sub-stantial but even strong evidence
for the alternative hypothe-sis), negative values rarely go beyond
−1 or −1.50 and they
never reach the −2.30 boundary. In other words, many
studiesyield BF10’s more consistent with the null hypothesis
(noawareness), but the weight of this evidence is never strong.For
the sake of clarity, Fig. 5 also includes a scatter plotdepicting
the relationship between BF10’s and effect sizes,
Fig. 4 Ninety-five percentconfidence intervals (CIs) of asubset
of experiments contrastinghit rate versus false alarm rate
inrecognition tests. Given theheterogeneity of the studiesincluded
in the figure, Q(26) =43.73, p = .016, the meta-analyticmean and CI
shown in the lastrow were computed using arandom effects model
Fig. 5 Histogram of the logarithmic Bayes Factors (BF10’s)
included inthe meta-analysis. Positive values indicate support for
the alternativehypothesis (awareness) and negative values indicate
support for the nullhypothesis (unawareness). The inset depicts a
scatterplot of effect sizes(Cohen’s d) and logarithmicBF10’s with
the best fitting quadratic function
94 Psychon Bull Rev (2016) 23:87–102
-
together with the best fitting quadratic function.
Consistentwith the assessment above, the vertex of this quadratic
func-tion, which seems to capture well the typical lower values
oflog(BF10), is equal to −1.28, corresponding to an unconvertedBF10
= 0.27.
Therefore, this Bayesian analysis offers a somewhat tanta-lizing
view of the implicitness of contextual cueing that hasimportant
implications for future research: On the one hand,there are a large
number of studies with results numericallymore consistent with the
null hypothesis (no awareness) thanwith the alternative hypothesis
(awareness). On the otherhand, there are more experiments strongly
supporting the al-ternative hypothesis than strongly supporting the
null hypoth-esis. Fortunately, Bayesian statistics also offer a way
of resolv-ing this apparent contradiction regarding the
inconclusivenessof existing evidence. Although in NHST researchers
are notfree to continue testing participants after reaching the
samplesize they specified a priori, Bayesian statistics do allow
re-searchers to continue gathering data (e.g., in an awarenesstest)
until a specific level of precision is reached (Dienes,2011, 2014),
for instance, until the Bayes Factor becomeslarger than 10 or
smaller than 1/10. This feature of Bayesianstatistics make Bayes
Factors a powerful means by whichfuture research could establish
the implicitness of contextualcuing and other seemingly unconscious
learning effects(Rouder, Morey, Speckman, & Pratte, 2007).
Correlations and post hoc data selection
We should acknowledge that many of the studies included inthe
meta-analysis based their conclusion – that the contextualcuing
they obtained was implicit – not only on a null result inan
awareness test but also on one of two additional pieces ofevidence
(or both): Correlations and post hoc data selection.However both of
these are statistically problematic.
The first of these refers to the finding that across
partici-pants, the magnitude of contextual cuing tends not to be
sig-nificantly correlated with the measure of awareness. For
in-stance, going back to the examples depicted in Fig. 4,
Zellin,Conci, vonMühlenen, and Müller (2013, Experiment 3) founda
marginally significant effect in the awareness test.
However,instead of concluding that learning was explicit, they went
onto estimate the correlation between the results of the
awarenesstest and the size of contextual cueing and found a
correlation ofr = .42, p > .10. This lack of significant
correlation seems onthe face of it to provide further and stronger
support for theclaim that learning is implicit, but a moment’s
thought revealsthat once again absence of evidence is not the same
as evidenceof absence. Without knowing the CI on the correlation
coeffi-cient, we cannot evaluate howmuchweight to place on the
nullresult, yet authors never report such CIs. We computed the95 %
CI on the correlation coefficient obtained by Zellin,
Conci et al. (2013, Experiment 3) and found that it had lowerand
upper limits of −.14 and .77. Thus the data in this study
arecompatible with a true correlation as large as .77 or as low
as−.14. Similarly, Conci and von Mühlenen (2011, Experiment2) and
Preston and Gabrieli (2008) reported non-significantcorrelations
with 95 % CIs of [−.42 to .62] and [−.33 to .49],respectively.
Obviously, these estimations are too imprecise topermit any strong
conclusions to be drawn.
Furthermore, it is common practice to report the
correlationbetween explicit and implicit measures of learning only
whenthe explicit awareness measures yield significant results
(e.g.,Conci & von Mühlenen, 2011, Experiment 2; Geyer, Shi,
&Müller, 2010; Peterson & Kramer, 2001; Preston &
Gabrielli,2008). This is particularly problematic. In just the same
waythat multiple testing increases the risk of type 1 errors, it
alsoincreases the risk of type 2 errors. Put differently, if
re-searchers explore different awareness measures until they
findone that yields a null result, the chances that the null result
willreflect a false negative increase as the number of
statisticaltests grows. To prevent type 1 errors when multiple
compar-isons are conducted it is usual to make adjustments of α,
likethe Bonferroni correction. Similarly, in order to prevent type
2errors, it would be necessary to adjust β for multiple
compar-isons, which is virtually identical to increasing
statistical pow-er, defined as 1- β.
We have argued here that studies which measure
awarenessalongside some “implicit” behavioral measure can yield
erro-neous evidence if NHST leads researchers to mistake
weakawareness for null awareness. We have also noted that
thisproblem applies not only to the interpretation of the
awarenessmeasure itself and whether it exceeds chance, but also
extendsto interpretation of correlations between implicit and
explicitmeasures where absence of evidence can again
bemisinterpreted as evidence of absence. One final methodmay at
first sight appear to avoid these problems by unequiv-ocally
ensuring null awareness: Selecting participants posthoc who score
at or below chance on the awareness measure.If such a sample of
participants (or a sample of configurations)shows significant
contextual cuing (which they do: e.g.,Colagiuri, Livesey, &
Harris, 2011; Geyer, Shi, & Müller,2010; Geyer, Zehetleitner,
& Müller, 2010; Smyth & Shanks,2008), then surely this is
clear evidence of true implicit learn-ing? The answer to this
question is an emphatic “no.” Themethod is statistically unsound
(Shanks & Berry, 2012).
To see this, we demonstrate that the pattern can arise evenwhen
the awareness and behavioral measures are based on thevery same
underlying representation or latent variable. Weassume that a
contextual cuing experiment gives rise to aparticipant acquiring
knowledge of the repeating (comparedto novel) configurations that
we can capture by the memorystrength variable s, which is normally
distributed with meanand standard deviation (SD) equal to 1, and
with s = 0representing the baseline of no configuration knowledge.
This
Psychon Bull Rev (2016) 23:87–102 95
-
common underlying knowledge forms the basis of both
the“implicit” behavioral RT measure and the recognition aware-ness
score, measured as effect size d computed from recogni-tion hits
minus false alarms. Specifically:
RT ¼ 100sþ 30e ð1Þd ¼ 0:30sþ e ð2Þ
A given participant is assumed to have knowledge of therepeating
patterns, s, which is first scaled by a factor of 100 inEq. 1 and
combined with some normally distributed randomerror e which has a
mean of zero and SD of 1 to yield thatparticipant’s implicit
contextual cuing RT effect. This verysame value of s is scaled by a
factor of 0.3 in Eq. 2 andcombined with independent error (it is
important to emphasizethat while the same value of s features in
the two equations,the noise e added in each case is independent) to
yield thatparticipant’s explicit awareness score. Figure 6 shows
datagenerated by this simple model for 1,000 simulated
partici-pants. Because of the chosen scaling factors, participants
gen-erate a mean contextual cuing RT score of 100 msec, which
isroughly the level seen in contextual cuing experiments, and amean
awareness score of 0.30, consistent with the meta-analytic effect.
The two measures are weakly correlated, r ≈0.3, again consistent
with the data.
We now select only those simulated participants who
indi-vidually score at or below chance (d = 0) on the
awarenessmeasure (illustrated by the open circles in Fig. 6) and we
askwhat contextual cuing score we see in these “unaware”
partic-ipants. The score in these participants is ~70 msec. Despite
thefact that contextual cuing and awareness are based on thesame
underlying knowledge representation in this model(and on nothing
else apart from noise), and that these partic-ipants are selected
on the basis of chance (or below chance)awareness, they nonetheless
show a highly reliable contextualcuing effect. There is no mystery
to this: It is simply a mani-festation of regression to the mean.
In noisy bivariate data, asample created by applying a cut-off on
one dimension willhave a mean on the other dimension that is closer
to the overallmean. Note that although this demonstration concerns
partic-ipants selected post hoc, the same logic applies to
configura-tions selected in the same way (e.g., Geyer, Shi, &
Müller,2010; for a similar approach, see Conci & von
Muhlenenn,2011, Experiment 2). It implies that the logic of
interpretingsignificant contextual cuing in participants (or
configurations)retrospectively chosen because their awareness is at
or belowchance as evidence of implicit learning can be
misleading.
Lastly, note that across all of the data generated by themodel,
the effect size for contextual cuing is Cohen’s d ≈ 1while that for
awareness is d ≈ 0.3 (these can be calculateddirectly from Eqs. 1
and 2). Thus, confirming what weclaimed earlier, the fact that real
studies might yield largereffect sizes for contextual cuing than
for awareness does not
license the conclusion that the former is based on some
specialform of unconscious knowledge. It arises simply because
themodel assumes a greater relative contribution of random errorto
awareness measures than to contextual cuing.
Conclusions drawn by authors and impacton publication
quality
The analyses conducted so far give us reasons to suspect
thatmany, if not most, of the null results obtained in this kind
ofawareness test can be considered false negatives. This
conclu-sion stands in stark contrast with the certainty with which
au-thors interpret these null results as strong evidence in
supportfor the null hypothesis. As an example, the experiment with
thewidest CI in Fig. 4 is Experiment 4 from Conci and vonMühlenen
(2011). In spite of the uncertainty revealed by theCI of the
awareness test, the conclusion drawn by the authorswas that “no
explicit awareness of the display repetitions couldbe formed”
(Conci & von Mühlenen, 2011, p. 219). Note alsothe results of
the two conditions analysed in Experiment 4 ofZellin, Conci et al.
(2013). Although they include zero, the CIsdo not exclude a wide
range of positive values. Obviously, noconclusion can be drawnwith
any assurance from the results ofthose awareness tests. However,
the interpretation of the au-thors was that “observers did not
explicitly recognize the oldcontext-displays” (Zellin, Conci et
al., 2013, p. 10).
Researchers can hardly be blamed for their tendency
toover-interpret these null results as reflecting a genuine
absenceof awareness. The “implicit” status of contextual cuing
is
Fig. 6 Contextual cuing (msec) plotted against awareness
(recognitionhits minus false alarms, expressed in terms of effect
size d) in 1,000simulated participants. Mean contextual cuing
across the entire sampleis 100 msec (rightmost vertical line),
while that in the subset of simulatedparticipants scoring at or
below d = 0 (open circles) is approximately70 msec (dotted vertical
line)
96 Psychon Bull Rev (2016) 23:87–102
-
probably one of the features that make it most attractive
andsalient to the scientific community. As can be seen in the list
ofstudies included in the meta-analysis, the titles of most
articlesinclude some allusion to the implicit or automatic nature
ofcontextual cuing. In fact, 42 of the 73 articles included in
thisanalysis mention the concept of implicitness in their
titles.There are obvious reasons for the emphasis on the
implicitcharacter of contextual cuing. Figure 7 depicts the
impactfactors of the journals in which the articles analysed here
werepublished, depending on whether they mentioned
implicitness(“implicit,” “explicit,” “awareness,” “unconscious,” or
“rec-ognition”) or not in the title. Three of the 73 articles could
notbe included in Fig. 7 because they were published in journalsof
books not included in Journal Citation Reports. Given thatthe
distribution of impact factors was highly skewed, we
log-arithmically transformed them. As suggested by Fig. 7,
thedifference in mean impact factor between articles mentioningand
not mentioning implicitness was statistically significant,t(68) =
1.98, one-tailed p = .026, d = 0.48, suggesting thatpapers
mentioning implicitness in their titles made their wayinto higher
impact-factor journals. Although this result is nomore than
correlational at best, it does provide some hintabout the
incentives that exist for interpreting contextual cuingas
unconscious.
Conclusions
In recent years the scientific community has witnessedgrowing
concern about the high rate of false positives andunreliable
results within published studies (Francis, 2012;Ioannidis et al.,
2014; Simonsohn, Nelson, & Simmons,2014). In contrast, the
potential impact of false negativeshas remained largely ignored
(Fiedler et al., 2012). Thisasymmetry is natural, given that most
experiments seek toobserve positive results. However, there are
many areas ofpsychological research where the evidential value
given tonull results is critical. In fact, there are several
reasons tosuspect that the over-interpretation of null results is
evenmore dangerous than the prevalence of false positives insome
areas of research. First, null results are inherentlyambiguous.
They indicate that there is not enough supportfor the alternative
hypothesis, but they are silent about theamount of support for the
null hypothesis. Second, unlikepositive results, null results are
surprisingly easy to obtainby mere statistical artefacts. Simply
using a small sample ora noisy measure can suffice to produce a
false negative.
The results of the present systematic review suggest thatthese
problems might be obscuring our view of implicit learn-ing
andmemory in particular and, perhaps, implicit processingin
general. It is popularly claimed that contextual cuing andother
implicit learning effects take place without participantsbecoming
aware of the representations they learn (Chun &
Jiang, 2003). Contrary to this prevalent view, we found thatthe
seemingly chance-level performance of participants inawareness
tests is more likely to reflect a type 2 error. Theoverall
proportion of positive results is too large for the nullhypothesis
to be true. This proportion cannot easily be ex-plained in terms of
publication bias favoring positive results,but is perfectly
consistent with the frequency of positive re-sults that one would
expect to find, given a true but modest-sized awareness effect, in
underpowered experiments usingunreliable dependent measures. This
result is also consistentwith experimental evidence suggesting that
the quality of theawareness test is a key determinant of whether
contextualcuing experiments yield “explicit” or “implicit”
results(Smyth & Shanks, 2008).
We have offered some suggestions about how future stud-ies could
provide firmer evidence for implicit learning in con-textual cuing,
including increasing sample sizes to boost pow-er, reporting CIs,
and continuing to collect awareness (e.g.,recognition) data until
the Bayes Factor crosses a boundaryof evidential support. We have
also suggested that two dataanalytic techniques should
unequivocally be avoided in futurestudies: The calculation of
implicit-explicit correlations afterfinding that the implicit score
is significantly greater thanchance, and post hoc data
selection.
Before ending, we would also like to emphasize that we donot
believe that researchers working in this area are followingthese
practices (e.g., using small numbers of testing trials orrelying on
NHST to claim support for the null hypothesis) in adeliberate
attempt to deceive their readers. Most likely, re-searchers are
simply following routinely a research protocolthat, with its pros
and cons, has become standard. It must be
Fig. 7 Impact factor of journals that published papers
mentioning or notmentioning “implicitness” in the title
Psychon Bull Rev (2016) 23:87–102 97
-
acknowledged that many of the experiments included in thepresent
meta-analysis (and especially those that made nomen-tion of
awareness in their titles) were designed primarily toexplore issues
largely unrelated to the question of whethercontextual cuing is
implicit or not, such as the role of workingmemory in contextual
cueing, how spatial associations areformed, the neural
underpinnings of contextual learning, andso on. The fact that
awareness was only a secondary concernmight explain why the vast
majority of them did not include asensitive (and lengthy) awareness
test and why they relied onsimple NHST to analyse their results.
But this only serves toillustrate how easily a particular
conception can gain momen-tum in a substantial body of literature
and become part of thezeitgeist, despite weak evidence.
Although we restricted our analyses to experiments con-ducted
within a specific implicit learning paradigm, the sameproblem
extends to other phenomena where participants’awareness is
discounted on the basis of NHST, such as sub-liminal perception and
other forms of unconscious learningand implicit processing that we
have not considered here(e.g., Dehaene et al., 1998; Pessiglione et
al., 2008). Falsenegatives also pose important problems for current
attemptsto replicate controversial findings.
These and other examples show that null results in under-powered
studies may give the false impression that an effect isgenuinely
absent when actually it is not. They can also createthe impression
that there is a deep inconsistency between stud-ies showing
significant results and those yielding null results,even when the
latter just reflect a lack of statistical sensitivity.Fortunately,
researchers can resort to alternative statistical anal-yses when
they need to assess the amount of support for thenull hypothesis,
including CIs, Bayes factors, and counternullvalues (Cumming, 2014;
Dienes, 2015; Rosenthal & Rubin,1994; Rouder et al., 2009). The
price we pay for our reluctanceto use these alternatives to NHST is
that important aspects ofwhat we believe about cognition may be
mistaken.
Acknowledgments The authors were supported by Grant ES/J007196/1
from the Economic and Social Research Council. We are indebted
toMarvin Chun, Markus Conci, Thomas Geyer, HermannMüller, and
Eric-Jan Wagenmakers for their valuable comments on earlier
versions of thisarticle.
Appendix 1
Literature search strategy
For the present systematic review, we accessed all the
pub-lished reports that used the standard procedure for
contextualcuing experiments. On 26 November 2013 we searched in
theWeb of Science for all the papers citing the original report
fromChun and Jiang (1998). Based on the contents of the
abstracts
available on the Web of Science, we removed from this
listtheoretical reviews with no empirical work and also
empiricalpapers focused on topics different from contextual cuing.
Wealso removed contextual cuing papers that used natural scenesas
contexts because the cognitive processes involved in
theseexperiments are widely recognized to be explicit
(Brockmole& Henderson, 2006; Brockmole & Vo, 2010).
Within the remaining list of studies, we were
particularlyinterested in experiments whose general procedure did
not de-viate radically from the standard method described in the
Intro-duction. Specifically, we selected all the experiments in
whichthe location of the distractors in repeated displays predicted
thelocation of the target within the same static display. This
crite-rion excluded a small set of experiments on identity
cuing,temporal cuing, contextual cuing with moving patterns,
andalso experiments in which distractors predicted the location ofa
target presented on a subsequent search display. Finally,
bystudying the reference lists of the accessed reports, we
identi-fied a small group of relevant papers that had not appeared
inour original search in the Web of Science (Geringswald,Herbik,
Hoffman, & Pollmann, 2013; Manns & Squire, 2001;Pollmann
& Manginelly, 2010; Zellin, von Mühlenen, Müller,& Conci,
2013) and we included them among the final list ofstudies.
Following this procedure we identified 73 articles thatcontained at
least one experiment that qualified for the presentmeta-analysis.
All the papers included in our review aremarkedwith asterisks in
the References section.
Selection of experiments, conditions, and statistical tests
Only experiments including an awareness test (either a
recog-nition test or a target guessing test) were considered in
thepresent analysis. We ignored data from experiments in
whichcontextual cuing was not observed. Similarly, if an
experimentcomprised several conditions and awareness test results
werereported separately for each condition, we only analysed
aware-ness tests from those conditions that yielded contextual
cuing.
If the authors conducted several awareness tests (e.g.,
arecognition test and a target-location guessing task) we includ-ed
all of them in our analyses. Because we were interested inthe role
of the number of trials included in the awareness test(see main
text), we included several analyses of the samecondition only when
these were based on blocks of trials ofdifferent sizes. For
instance, if one experiment included anawareness test with two
blocks of 24 trials each and the anal-yses were conducted on block
1, block 2, and blocks 1 and 2collapsed, we included all three
contrasts in our analyses, cod-ing the number of trials of each of
them as 24, 24, and 48respectively. In contrast, if the authors
reported multiple anal-yses of the results of a single awareness
test, we only includedone of them, for instance, if the same data
set was first used tocompare hit rate versus false alarm rate and
then to compareoverall performance against chance. In cases like
these,
98 Psychon Bull Rev (2016) 23:87–102
-
comparisons of hits versus false alarms were favored
overalternative analyses of the same data (such as d’ scores
orcomparisons of performance against chance). The logic be-hind
this selection strategy is to extract several analyses froma single
experiment when they conveyed independent infor-mation (different
number of trials or different awareness tests)but not when they
were mere re-analyses of exactly the samedata set. Following these
criteria, we obtained data from 181statistical contrasts.
Coding of study characteristics and results
Wewere particularly interested in knowing whether the resultsof
the awareness tests were statistically significant or
not.Therefore, we coded non-significant results (p > .05) as
0and significant results (p ≤ .05) as 1. If the reported
dataallowed the reader to infer that a result was marginally
signif-icant (.05 < p < .10) we coded that result as 0.50.
Regardless ofthe significance of the statistical contrast, we also
codedwhether the descriptive pattern of results went in the
“explicit”direction (e.g., hit rate > false alarm rate) or in
the oppositedirection. We coded studies in the “explicit” direction
as 1,studies in the opposite direction as −1, and studies in
whichhits and false alarms were equal (or in which
participantsperformed exactly at chance level) as 0.
For each contrast we were interested in knowing the num-ber of
participants on which the contrast was based and thenumber of
trials included in the awareness test, as these deter-mine the
power of the experiment to detect non-zero aware-ness. These two
variables were also coded in our datasets. Thenumber of
participants of one experiment (Colagiuri et al.,2011) was an
outlier (z > 12) and was recoded as the numberof participants of
the next largest experiment included in thedata set. Similarly, the
number of trials of the awareness testconducted in another
experiment (Geyer, Shi, &Müller, 2010,Experiment 3) was an
outlier (z = 11.97) and was recoded asthe number of trials of the
experiment with the next largestnumber of trials in the data set.
This recoding strategy is com-mon practise in meta-analytic reviews
(e.g., Hofmann,Gawronski, Gschwendner, Le, & Schmitt, 2005).
The quali-tative conclusions of our analyses are not altered by
using theactual number of participants.
Finally, although not all experiments included
sufficientinformation to compute an effect size estimate, when
thesedata were available, we did collect them. In the case of
96/181 (53 %) contrasts, we were able to compute Cohen’s dzscores
from t-values or F-values with one degree of freedom.We computed dz
scores by dividing t-values by the square rootof the relevant
sample size (note that the contrast between hitsand false alarms or
between performance and chance iswithin-participants for all
studies in the meta-analysis). Whenonly F-values with one degree of
freedom were reported, weconverted them to t-values. In a few
cases, we detected a
contradiction between the sample size reported in the paperand
the degrees of freedom of the t and F contrasts. When thisoccurred,
we computed the effect size taking the degrees offreedom reported
in the statistical test as the correct estimate ofthe sample size.
We ignored t-values from two experiments(Geyer, Shi, & Müller,
2010, Experiment 3; Geyer,Zehetleitner, et al., 2010, Experiment 1)
because their degreesof freedom were reported as “t(1, 11),” making
it unclearwhether they used a t distribution with 11 degrees of
freedomor an F distribution with 1 and 11 degrees of freedom.We
alsoignored four t-values where the reported data did not allow
usto conclude whether the effect size was positive or negative.The
random effects meta-analysis was conducted with the“metaphor” R
package (Viechtbauer, 2010). Bayes Factorswere computed with the
“BayesFactor” R package (Rouderet al., 2009).
Coding of paper and journal characteristics
As a proxy to measure the relevance of the implicitness
ofcontextual cuing for each study, we coded whether the titleof the
paper made allusion to the implicit character of thiseffect. Papers
were sorted depending on whether or not theymentioned the words
“implicit,” “explicit,” “awareness,” “un-conscious,” or
“recognition.” We also coded the 2012 impactfactor of the journals
that had published the studies included inthe present
meta-analysis. Because of a change in the name ofthe journal,
impact factors for Attention, Perception, &Psychophysics were
also used for papers published in thejournal with the previous
name, Perception & Psychophysics.All impact factors were
obtained from the 2012 edition of theJournal Citation Reports.
Open Access This article is distributed under the terms of the
CreativeCommons At t r ibut ion 4 .0 In te rna t ional License (h t
tp : / /creativecommons.org/licenses/by/4.0/), which permits
unrestricted use,distribution, and reproduction in any medium,
provided you give appro-priate credit to the original author(s) and
the source, provide a link to theCreative Commons license, and
indicate if changes were made.
References
References marked with an asterisk indicate studies includedin
the meta-anays is.
Bargh, J. A. (2006). What have we been priming all these years?
On thedevelopment, mechanisms, and ecology of nonconscious social
be-havior. European Journal of Social Psychology, 36, 147–168.
References marked with an asterisk indicate studies includedin
the meta-analysis.
Psychon Bull Rev (2016) 23:87–102 99
-
Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity
of socialbehavior: Direct effects of trait construct and stereotype
activationon action. Journal of Personality and Social Psychology,
71, 230–244.
*Barnes, K. A., Howard, J. H., Jr., Howard, D. V., Gilotty, L.,
Kenworthy,L., Gaillard,W. D., &Vaidya, C. J. (2008). Intact
implicit learning ofspatial context and temporal sequences in
childhood autism spec-trum disorder. Neuropsychology, 22,
563–570.
*Barnes, K. A., Howard, J. H., Jr., Howard, D. V., Kenealy, L.,
& Vaidya,C. J. (2010). Two forms of implicit learning in
childhood ADHD.Developmental Neuropsychology, 35, 494–505.
Behara, A., Tranel, D., Damasio, H., Adolphs, R., Rockland, C.,
&Damasio, A. R. (1995). Double dissociation of conditioning
anddeclarative knowledge relative to the amygdala and hippocampusin
humans. Science, 269, 1115–1118.
*Bennett, I. J., Barnes, K. A., Howard, J. H., Jr., &
Howard, D. V. (2009).An abbreviated implicit spatial context
learning task that yieldsgreater learning. Behavior Research
Methods, 41, 391–395.
*Brady, T. F., & Chun, M. M. (2007). Spatial constraints on
learning invisual search: Modeling contextual cuing. Journal of
ExperimentalPsychology: Human Perception and Performance, 33,
798–815.
Brockmole, J. R., & Henderson, J. M. (2006). Using
real-world scenes ascontextual cues for search. Visual Cognition,
13, 99–108.
Brockmole, J. R., & Vo, M. L.-H. (2010). Semantic memory for
contex-tual regularities within and across scene categories:
Evidence fromeye movements. Attention, Perception, &
Psychophysics, 72, 1803–1813.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A.,
Flint, J.,Robinson, E. S. J., & Munafò, M. R. (2013). Power
failure: Whysmall sample size undermines the reliability of
neuroscience. NatureReviews, 14, 365–376.
*Chaumon, M., Drouet, V., & Tallon-Baudry, C. (2008).
Unconsciousassociative memory affects visual processing before 100
ms.Journal of Vision, 8, 10.
*Chaumon, M., Schwartz, D., & Tallon-Baundry, C.
(2008).Unconscious learning versus visual perception: Dissociable
rolesfor gamma oscillations revealed in MEG. Journal of
CognitiveNeuroscience, 21, 2287–2299.
*Chua, K. P., & Chun,M.M. (2003). Implicit scene learning is
viewpointdependent. Perception & Psychophysics, 65, 72–80.
*Chun, M. M., & Jiang, Y. (1998). Contextual cueing:
Implicit learningand memory of visual context guides spatial
attention. CognitivePsychology, 36, 28–71.
*Chun, M. M., & Jiang, Y. (2003). Implicit, long-term
spatial contextualmemory. Journal of Experimental Psychology:
Learning, Memory,and Cognition, 29, 224–234.
*Chun, M. M., & Phelps, E. A. (1999). Memory deficits for
implicitcontextual information in amnesic subjects with hippocampal
dam-age. Nature Neuroscience, 2, 844–847.
Chun, M. M., & Turk-Browne, N. B. (2008). Associative
learning mech-anisms in vision. In S. J. Luck & A. Hollingworth
(Eds.), Visualmemory (pp. 209–245). New York: Oxford University
Press.
Cohen, N. J., & Squire, L. R. (1980). Preserved learning and
retention ofpattern-analyzing skill in amnesia: Dissociation of
knowing howand knowing that. Science, 210, 207–210.
*Colagiuri, B., Livesey, E. J., & Harris, J. A. (2011). Can
expectanciesproduce placebo effects for implicit learning?
Psychonomic Bulletin& Review, 18, 399–405.
*Conci, M., &Müller, H. J. (2012). Contextual learning of
multiple targetlocations in visual search. Visual Cognition, 20,
746–770.
*Conci, M., Sun, L., & Müller, H. J. (2011). Contextual
remapping invisual search after predictable target-location
changes.Psychological Research, 75, 279–289.
*Conci, M., & von Muhlenen, A. (2009). Region segmentation
and con-textual cuing in visual search. Attention, Perception,
andPsychophysics, 71, 1514–1524.
*Conci, M., & von Mühlenen, A. (2011). Limitations of
perceptual seg-mentation on contextual cueing in visual search.
Visual Cognition,19, 203–233.
Cumming, G. (2014). The new statistics: Why and how.
PsychologicalScience, 25, 7–29.
Dehaene, S., Naccache, L., Le Clec'H, G., Koechlin, E., Mueller,
M.,Dehaene-Lambertz, G., ... Le Bihan, D. (1998). Imaging
uncon-scious semantic priming. Nature, 395, 597–600.
Dienes, Z. (2011). Bayesian versus Orthodox statistics: Which
side areyou on? Perspectives on Psychological Sciences, 6,
274–290.
Dienes, Z. (2014). Using Bayes to get the most out of
non-significantresults. Frontiers in Psychology, 5, 781.
Dienes, Z. (2015). How Bayesian statistics are needed to
determinewhether mental states are unconscious. In M. Overgaard
(Ed.),Behavioural methods in consciousness research (pp.
199–220).Oxford: Oxford University Press.
Dijksterhuis, A., & van Knippenberg, A. (1998). The relation
betweenperception and behavior, or how to win a game of trivial
pursuit.Journal of Personality and Social Psychology, 74,
865–877.
*Dixon, M. L., Zelazo, P. D., & De Rosa, E. (2010). Evidence
for intactmemory-guided attention in school-aged children.
DevelopmentalScience, 13, 161–169.
*Endo, N., & Takeda, Y. (2005). Use of spatial context is
restricted byrelative position in implicit learning. Psychonomic
Bulletin &Review, 12, 880–885.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007).
G*Power 3: Aflexible statistical power analysis program for the
social, behavioral,and biomedical sciences. Behavior
ResearchMethods, 39, 175–191.
Fidler, F., & Loftus, G. R. (2009). Why figures with error
bars shouldreplace p values: Some conceptual arguments and
empirical demon-strations. Zeitschrift für Psychologie / Journal of
Psychology, 217,27–37.
Fiedler, K., Kurtzner, F., & Krueger, J. I. (2012). The long
way from α-error control to validity proper: Problems with a
short-sighted false-positive debate. Perspectives on Psychological
Science, 7, 661–669.
Francis, G. (2012). Too good to be true: Publication bias in two
prominentstudies from experimental psychology. Psychonomic Bulletin
&Review, 19, 151–156.
*Geringswald, F., Baumgartner, F., & Pollmann, S. (2012).
Simulatedloss of foveal vision eliminates visual search advantage
in repeateddisplays. Frontiers in Human Neuroscience, 6, 134.
*Geringswald, F., Herbik, A., Hoffmann, M. B., & Pollmann,
S. (2013).Contextual cueing impairment in patients with age-related
maculardegeneration. Journal of Vision, 13, 28.
*Geyer, T., Baumgartner, F., Müller, H. J., & Pollmann, S.
(2012).Medialtemporal lobe-dependent repetition suppression and
enhancementdue to implicit vs. explicit processing of individual
repeated searchdisplays. Frontiers in Human Neuroscience, 6,
272.
*Geyer, T., Shi, Z., & Müller, H. J. (2010). Contextual
cueing inmulticonjunction visual search is dependent on color-
andconfiguration-based intertrial contingencies. Journal
ofExperimental Psychology: Human Perception and Performance,36,
515–532.
*Geyer, T., Zehetleitner, M., &Müller, H. J. (2010).
Contextual cueing ofpop-out visual search: When context guides the
deployment of at-tention. Journal of Vision, 10, 20.
*Giesbrecht, B., Sy, J. L., & Guerin, S. A. (2013). Both
memory andattention systems contribute to visual search for targets
cued byimplicitly learned context. Vision Research, 85, 80–89.
Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D.
P. (1991). Aneurological dissociation between perceiving objects
and graspingthem. Nature, 349, 154–156.
*Greene, A. J., Gross, W. L., Elsinger, C. L., & Rao, S. M.
(2007).Hippocampal differentiation without recognition: An fMRI
analysisof the contextual cueing task. Learning & Memory, 14,
548–553.
100 Psychon Bull Rev (2016) 23:87–102
-
Hassin, R. R., Ferguson, M. J., Shidlovski, D., & Gross, T.
(2007).Subliminal exposure to national flags affects political
thought andbehavior. Proceedings of the National Academy of
Sciences, 104,19757–19761.
Hoekstra, R., Finch, S., Kiers, H. A. L., & Johnson, A.
(2006). Probabilityas certainty: Dichotomous thinking and the
misuse of p-values.Psychonomic Bulletin & Review, 13,
1033–1037.
Hoekstra, R., Morey, R. D., Rouder, J. N., &Wagenmakers,
E.-J. (2014).Robust misinterpretation of confidence intervals.
PsychonomicBulletin & Review, 21, 1157–1164.
Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., &
Schmitt, M.(2005). A meta-analysis on the correlation between the
implicit as-sociation test and explicit self-report measures.
Personality andSocial Psychology Bulletin, 31, 1369–1385.
*Howard, J. H., Jr., Howard, D. V., Dennis, N. A., Yankovich,
H., &Vaidya, C. J. (2004). Implicit spatial contextual learning
in healthyaging. Neuropsychology, 18, 124–134.
Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., &
Williams, C. C.(2008). Gender similarities characterize math
performance. Science,321, 494–495.
Ioannidis, J. P. A., Munafò, M. R., Fusar-Poli, P., Nosek, B.
A., & David,S. P. (2014). Publication and other reporting
biases in cognitivesciences: Detection, prevalence, and prevention.
Trends inCognitive Sciences, 18, 235–241.
*Jiménez, L., & Vázquez, G. A. (2011). Implicit sequence
learning andcontextual cueing do not compete for central cognitive
resources.Journal of Experimental Psychology: Human Perception
andPerformance, 37, 222–235.
*Jiménez-Fernández, G., Vaquera, J. M. M., Jiménez, L., &
Defior, S.(2011). Dyslexic children show deficits in implicit
sequence learn-ing, but not in explicit sequence learning or
contextual cueing.Annals of Dyslexia, 61, 85–110.
*Johnson, J. S., Woodman, G. F., Braun, E., & Luck, S. J.
(2007). Implicitmemory influences the allocation of attention in
visual cortex.Psychonomic Bulletin & Review, 14, 834–839.
*Kawahara, J. (2003). Contextual cueing in 3D layouts defined by
bin-ocular disparity. Visual Cognition, 10, 837–852.
*Kourkoulou, A., Kuhn, G., Findlay, J. M., & Leekam, S. R.
(2013). Eyemovement difficulties in autism spectrum disorder:
Implications forimplicit contextual cueing. Autism Research, 6,
177–189.
*Kourkoulou, A., Leekam, S. R., & Findlay, J. M. (2012).
Implicit learn-ing of local context in autism spectrum disorder.
Journal of Autismand Developmental Disorders, 42, 244–256.
*Le Dantec, C. C., Melton, E. E., & Seitz, A. R. (2012). A
triple disso-ciation between learning of target, distractors, and
spatial contexts.Journal of Vision, 12, 5.
List, J. A. (2002). Preference reversals of a different kind:
The “more isless” phenomenon. The American Economic Review, 92,
1636–1643.
*Luethi, M., Meier, B., & Sandi, C. (2009). Stress effect on
workingmemory, explicit memory, and implicit memory for neutral
andemotional stimuli in healthy men. Frontiers in
BehavioralNeuroscience, 2, 5.
*Makovski, T., & Jiang, Y. V. (2011). Investigating the role
of response inspatial context learning. Quarterly Journal of
ExperimentalPsychology, 64, 1563–1579.
*Manginelli, A. A., Baumgartner, F., & Pollmann, S. (2013).
Dorsal andventral working memory-related brain areas support
distinct pro-cesses in contextual cueing. NeuroImage, 67,
363–374.
*Manginelli, A. A., Geringswald, F., & Pollmann, S. (2012).
Visualsearch facilitation in repeated displays depends on
visuospatialworking memory. Experimental Psychology, 59, 47–54.
*Manginelli, A. A., Langer, N., Klose, D., & Pollmann, S.
(2013).Contextual cueing under working memory load: Selective
interfer-ence of viuospatial load with expression of learning.
Attention,Perception, & Psychophysics, 75, 1103–1117.
*Manginelli, A. A., & Pollmann, S. (2009). Misleading
contextual cues:How do they affect visual search?Psychological
Research, 73, 212–221.
*Manns, J. R., & Squire, L. R. (2001). Perceptual learning,
awareness,and the hippocampus. Hippocampus, 11, 776–782.
McGraw, K. O., Tew, M. D., & Williams, J. E. (2000). The
integrity ofweb-delivered experiments: Can you trust the data?
PsychologicalScience, 11, 502–506.
*Mednick, S. C., Makovski, T., Cai, D. J., & Jiang, Y. V.
(2009). Sleepand rest facilitate implicit memory in a visual search
task. VisionResearch, 49, 2557–2565.
*Nabeta, T., Ono, F., & Kawahara, J.-I. (2003). Transfer of
spatial contextfrom visual to haptic search. Perception, 32,
1351–1358.
*Negash, S., Petersen, L. E., Geda, Y. E., Knopman, D. S.,
Boeve, B. F.,et al. (2007). Effects of ApoE genotype and mild
cognitive impair-ment on implicit learning. Neurobiology of Aging,
28, 885–893.
*Ogawa, H., & Watanabe, K. (2010). Time to learn: Evidence
for twotypes of attentional guidance in contextual cueing.
Perception, 39,72–80.
*Ogawa, H., & Watanabe, K. (2011). Implicit learning
increases prefer-ence for predictive visual display. Attention,
Perception, &Psychophysics, 73, 1815–1822.
*Olson, I. R., & Chun, M. M. (2002). Perceptual constraints
on implicitlearning of spatial context. Visual Cognition, 9,
273–302.
*Oudman, E., Van der Stigchel, S., Wester, A. J., Kessels, R. P.
C., &Postma, A. (2011). Intact memory for implicit contextual
informa-tion in Korsakoff’s amnesia. Neuropsychologia, 49,
2848–2855.
*Park, H., Quinlan, J., Thornton, E., & Reder, L. M. (2004).
The effectsof midazolam on visual search: Implications for
understandingamnesia. Proceedings of the National Academy of
Sciences, 101,17879–17883.
Pessiglione, M., Petrovic, P., Daunizeau, J., Palminteri, S.,
Dolan, R. J., &Frith, C. D. (2008). Subliminal instrumental
conditioning demon-strated in the human brain. Neuron, 59,
561–567.
*Peterson, M. S., & Kramer, A. F. (2001). Attentional
guidance of theeyes by contextual information and abrupt onsets.
Perception &Psychophysics, 63, 1239–1249.
*Pollmann, S., & Manginelli, A. A. (2009). Anterior
prefrontal involve-ment in implicit contextual change detection.
Frontiers in HumanNeuroscience, 3, 28.
*Pollmann, S., & Manginelli, A. A. (2010). Repeated
contextual searchcues lead to reduced BOLD-onset times in early
visual and leftinferior frontal cortex. The Open Neuroimaging
Journal, 4, 9–15.
*Preston, A. R., &Gabrieli, J. D. E. (2008). Dissociation
between explicitmemory and configural memory in the humanmedial
temporal lobe.Cerebral Cortex, 18, 2192–2207.
*Rausei, V., Makovski, T., & Jiang, Y. (2007). Attention
dependency inimplicit learning of repeated search context.
Quarterly Journal ofExperimental Psychology, 60, 1321–1328.
Rosenthal, R., & Rubin, D. B. (1994). The counternull value
of an effectsize: A new statistic. Psychological Science, 5,
329–334.
Rouder, J. N., Morey, R. D., Speckman, P. L., & Pratte, M.
S. (2007).Detecting chance: A solution to the null sensitivity
problem in sub-liminal priming. Psychonomic Bulletin & Review,
14, 597–605.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., &
Iverson, G.(2009). Bayesian t tests for accepting and rejecting the
null hypoth-esis. Psychonomic Bulletin & Review, 16,
225–237.
*Schankin, A., & Schubö, A. (2009a). Cognitive processes
facilitated bycontextual cueing: Evidence from event-related brain
potentials.Psychophysiology, 46, 668–679.
*Schankin, A., & Schubö, A. (2009b). The time course of
attentionalguidance in contextual cueing. In L. Paletta & J. K.
Tsotsos (Eds.),Attention in cognitive systems: Lecture notes in
computer sciences(pp. 69–84). Berlin: Springer.
*Schankin, A., Stursberg, O. & Schubö, A. (2008). The role
ofimplicit context information in guiding visual-spatial
attention.
Psychon Bull Rev (2016) 23:87–102 101
-
In B. Caputo & M. Vincze (Eds.), Cognitive vision (pp.
93–106)Berlin: Springer.
*Schlagbauer, B., Müller, H. J., Zehetleitner, M., & Geyer,
T. (2012).Awareness in contextual cueing of visual search as
measured withconcurrent access- and phenomenal-consciousness tasks.
Journal ofVision, 12, 25.
Shanks, D. R., &Berry, C. J. (2012). Are theremultiplememory
systems?Tests of models of implicit and explicit memory. Quarterly
Journalof Experimental Psychology, 65, 1449–1474.
*Shi, Z., Zang, X., Jia, L., Geyer, T., & Müller, H. J.
(2013). Transfer ofcontextual cueing in full-icon display
remapping. Journal of Vision,13, 2.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011).
False-positivepsychology: Undisclosed flexibility in data
collection and analysisallows presenting anything as significant.
Psychological Science,22, 1359–1366.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014).
P-curve: A keyto the file drawer. Journal of Experimental
Psychology: General,143, 534–547.
*Smyth, A. C., & Shanks, D. R. (2008). Awareness in
contextual cuingwith extended and concurrent explicit tests. Memory
& Cognition,36, 403–415.
*Smyth, A. C., & Shanks, D. R. (2011). Aging and implicit
learning:Explorations in contextual cueing.Psychology andAging, 26,
127–132.
*Song, J.-H., & Jiang, Y. (2005). Connecting the past with
the present:How do humansmatch an incoming visual displaywith
visual mem-ory? Journal of Vision, 5, 322–330.
*Travers, B. G., Powell, P. S., Mussey, J. L., Klinger, L. G.,
Crisler, M. E.,& Klinger, M. R. (2013). Spatial and identity
cues differentiallyaffect implicit contextual cueing in adolescents
and adults with au-tism spectrum disorder. Journal of Autism and
DevelopmentalDisorders, 43, 2393–2404.
*Travis, S. L.,Mattingley, J. B., &Dux, P. E. (2013). On the
role ofworkingmemory in spatial contextual cueing. Journal of
ExperimentalPsychology: Learning, Memory, and Cognition, 39,
208–219.
*Tseng, P., Hsu, T.-Z, Tzeng, O. J. L., Hung, D. L., & Juan,
C.-H. (2011).Probabilities in implicit learning. Perception, 40,
822–829.
*Tseng, Y.-C., & Li, C.-S. R. (2004). Oculomotor correlates
of context-guided learning in visual search. Perception &
Psychophysics, 66,1363–1378.
*Tseng, Y.-C., & Lleras, A. (2013). Rewarding context
accelerates im-plicit guidance in visual search. Attention,
Perception, &Psychophysics, 75, 287–298.
*Vaidya, C. J., Huger, M., Howard, D. V., & Howard, J. H.
(2007).Developmental differences in implicit learning of spatial
context.Neuropsychology, 21, 497–506.
*van Asselen, M., Almeida, I., Andre, R., Januário, C.,
Gonçalves, A. F.,& Castelo-Branco, M. (2009). The role of the
basal ganglia in im-plicit contextual learning: A study of
Parkinson's disease.Neuropsychologia, 47, 1269–1273.
*van Asselen, M., Almeida, I., Julio, F., Januario, C., Campos,
E. B.,Simoes, M., & Castelo-Branco, M. (2012). Implicit
contextuallearning in prodromal and early stage Huntington’s
disease patients.Journal of the International Neuropsychological
Society, 18, 689–696.
*van Asselen, M., & Castelo-Branco, M. (2009). The role of
peripheralvision in implicit contextual cuing. Attention,
Perception, &Psychophysics, 71, 76–81.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the
metaforpackage. Journal of Statistics Software, 36, 1–48.
Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G.
J., &Wagenmakers, E.-J. (2011). Statistical evidence in
experimentalpsychology: An empirical comparison using 855 t
tests.Perspectives on Psychological Science, 6, 291–298.
Williams, L. E., & Bargh, J. A. (2008). Experiencing
physical warmthpromotes interpersonal warmth. Science, 322,
606–607.
*Zellin, M., Conci, M., von Mühlenen, A., & Müller, H. J.
(2011). Two(or three) is one too many: Testing the flexibility of
contextual cue-ing with multiple target locations. Attention,
Perception, &Psychophysics, 73, 2065–2076.
*Zellin, M., Conci, M., von Mühlenen, A., & Müller, H. J.
(2013). Heretoday, gone tomorrow: Adaptation to change in
memory-guidedvisual search. PLoS ONE, 8, e59466.
*Zellin, M., von Mühlenen, A., Müller, H. J., & Conci, M.
(2013).Statistical learning in the past modulates contextual cueing
in thefuture. Journal of Vision, 13, 19.
*Zhao, G., Liu, Q., Jiao, J., Zhou, P., Li, H., & Sun, H.-J.
(2012). Dual-state modulation of the contextual cueing effect:
Evidence from eyemovement recordings. Journal of Vision, 12,
11.
102 Psychon Bull Rev (2016) 23:87–102
Underpowered samples, false negatives, and unconscious
learningAbstractNull results as a crucial feature of research on
implicit processingProportion and distribution of significant
resultsIs there publication bias in the results of awareness
tests?Effect sizes and statistical powerEffect size in implicit
versus explicit measuresConfidence intervals as a partial solution
to the false-negative problemBayes Factors as an alternative
solutionCor