Page 1
Seediscussions,stats,andauthorprofilesforthispublicationat:http://www.researchgate.net/publication/7693328
ThereliabilityoffMRIactivationsinthemedialtemporallobesinaverbalepisodicmemorytask
ARTICLEinNEUROIMAGE·NOVEMBER2005
ImpactFactor:6.36·DOI:10.1016/j.neuroimage.2005.06.005·Source:PubMed
CITATIONS
43
DOWNLOADS
128
VIEWS
146
8AUTHORS,INCLUDING:
KathrinWagner
UniversitätsklinikumFreiburg
48PUBLICATIONS629CITATIONS
SEEPROFILE
LarsFrings
UniversitätsklinikumFreiburg
44PUBLICATIONS469CITATIONS
SEEPROFILE
RalfSchwarzwald
UniversitätsklinikumFreiburg
30PUBLICATIONS412CITATIONS
SEEPROFILE
UlrikeHalsband
UniversityofFreiburg
71PUBLICATIONS2,856CITATIONS
SEEPROFILE
Availablefrom:AndreasSchulze-Bonhage
Retrievedon:03August2015
Page 2
www.elsevier.com/locate/ynimg
NeuroImage 28 (2005) 122 – 131
The reliability of fMRI activations in the medial temporal lobes in a
verbal episodic memory task
Kathrin Wagner,a,* Lars Frings,a Ansgar Quiske,a Josef Unterrainer,b Ralf Schwarzwald,c
Joachim Spreer,c Ulrike Halsband,b and Andreas Schulze-Bonhagea
aEpilepsy Center, University Hospital of Freiburg, Breisacher Str. 64, 79106 Freiburg, GermanybNeuropsychology, Department of Psychology, University of Freiburg, GermanycDepartment of Neuroradiology, University Hospital of Freiburg, Germany
Received 9 December 2004; revised 24 May 2005; accepted 1 June 2005
Available online 26 July 2005
The test–retest reliability of activation patterns elicited by encoding
and recognition of word-pair associates within the whole brain and a
predefined medial temporal region of interest (ROI) was investigated.
Twenty healthy right-handed subjects were studied within two sessions,
either on the same day or 210–308 days later. Three quantitative
measures of reliability were calculated for the contrasts encoding and
recognition versus a control condition within the ROI and also for the
whole brain: A group correlational analysis between the lateralization
indices of the first and second session, correlations of the individual
SPM(t) maps of the first and the second run, and overlap ratios
between both sessions. For the ROI, correlational analysis of
lateralization indices during both encoding trials was significant.
Eighty percent of the individual positive correlation coefficients of
SPM(t) maps during encoding, and 75% during recognition reached
significance. The mean percentage of overlapping voxels was 18%
during encoding and 19% during recognition. The reproducibility
measures evaluated for the whole brain demonstrated significantly
higher values compared to the ROI. For the group that stayed inside
the scanner, better whole brain test–retest reliability was observed,
and no influence of the memory process (encoding or recognition) on
reproducibility was found.
D 2005 Elsevier Inc. All rights reserved.
Keywords: Reliability; fMRI; Medial temporal lobe; Hippocampus; Verbal
episodic memory; Word-pair associates; Encoding; Recognition
Introduction
It is well known that the medial temporal lobe (MTL) is a
crucial area involved in episodic memory processes (Alvarez and
Squire, 1994). Lesions in this area have shown to produce
impairments of declarative memory processes (Scoville and
1053-8119/$ - see front matter D 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2005.06.005
* Corresponding author. Fax: +49 7612705003.
E-mail address: [email protected] (K. Wagner).
Available online on ScienceDirect (www.sciencedirect.com).
Milner, 1957). Furthermore, numerous functional imaging studies
have demonstrated that the hippocampus and its adjacent cortices
are involved in encoding and retrieving new information (Schacter
and Wagner, 1999). Since functional magnetic resonance imaging
(fMRI) has become a useful tool to gain insights into mnemonic
processes, it has been used to answer questions of basic research,
but also of clinical interest. Most of these studies performed group
analyses, but did not evaluate the reproducibility of a given
individual BOLD signal. In order to interpret individual activation
patterns, e.g., for diagnostic purposes, it is necessary to examine
the validity and reliability of individual results. Some studies have
evaluated the reproducibility of individual activations associated
with motor (Maitra et al., 2002; Tegeler et al., 1999; Yetkin et al.,
1996), visual (Miki et al., 2000, 2001a,b; Rombouts et al., 1997;
Rombouts et al., 1998; Swallow et al., 2003), language (Brannen et
al., 2001; Fernandez et al., 2003; Maldjian et al., 2002; Rutten et
al., 2002), and different higher cognitive tasks (McGonigle et al.,
2000; Neumann et al., 2003). Mostly, different approaches were
used to determine reproducibility which complicates the compa-
rability. Several studies qualitatively assessed the consistency of
suprathreshold activations in predefined brain areas and showed
mostly analogue results over repeated measurements. For quanti-
tative analyses, many different measures were evaluated in order to
determine the reliability: e.g., number of activated voxels, overlap
ratio, correlations of activation values or lateralizations, intraclass
correlation coefficient (ICC), intersect maps, and conjunction
analysis. The results depend on the choice of parameter, the way
of data preprocessing, and the selected threshold. It could be shown
that functional MR activations in the visual (e.g., Miki et al., 2000)
and motor cortex (e.g., Yetkin et al., 1996), as well as in frontal
language areas (e.g., Brannen et al., 2001) are satisfactorily
reproducible.
So far, reliability of memory-related BOLD-signal changes has
predominantly been examined for verbal (Manoach et al., 2001;
Noll et al., 1997; Wei et al., 2004) and spatial (Casey et al., 1998)
working memory tasks, predominantly within the frontal lobes. To
Page 3
K. Wagner et al. / NeuroImage 28 (2005) 122–131 123
our knowledge, only two studies investigated the reproducibility of
fMRI activations associated with episodic memory processes
(Machielsen et al., 2000; Miller et al., 2002) of which only the
former involved medial temporal lobe activations: Miller et al.
(2002) investigated individual differences in activation during
verbal episodic recognition by correlating the volumes of raw
signal intensity values of six healthy subjects and found significant
variations between subjects, but reliable individual activations over
time in the frontal and parietal cortices. Machielsen et al. (2000)
analyzed within- and between-subject reproducibility of activations
during encoding of complex pictures for the whole brain and for
three different regions of interest [ROIs: anterior, posterior, and
middle brain areas, including the (para)hippocampal region]. They
showed that reliability, as measured by number and location of
overlapping suprathreshold voxels, was higher for the ROIs than
for the whole brain, and within the ROIs, the activations in
posterior regions were most reliable. None of those studies
examined particularly the reproducibility of medial temporal lobe
activations elicited by a verbal episodic memory paradigm.
The goal of the present study is to investigate the test–retest
reliability of BOLD-signal changes predominantly in the medial
temporal lobes using a verbal episodic memory paradigm. It has
been shown that the hippocampus forms associations to build
memories (Buckner et al., 2000; Henke et al., 1999; Otten et al.,
2001) and that word-pair associates elicit activations predom-
inantly in left medial temporal areas (Dolan and Fletcher, 1997;
Halsband et al., 2002; Kelley et al., 1998). In order to examine the
reproducibility of medial temporal lobe activations, we inves-
tigated encoding and recognition of word-pair associates with two
parallel test versions. They were applied either within one or across
two measurements, and the activations in the whole brain and the
region of interest including the hippocampus, the parahippocampal
and the fusiform gyrus bilaterally were investigated.
Materials and methods
Subjects
Twenty right-handed (Mhandedness quotient = 0.84; Oldfield, 1971)
subjects (16 female, 4 male, Mage = 26 years, SD = 6.1 years) with
no history of neurologic, psychiatric, or vascular disease and with
normal or corrected-to-normal vision participated. Informed
written consent was obtained from each subject after the procedure
had been fully explained. The study was approved by the Ethics
committee of the University Freiburg according to the guidelines of
the Declaration of Helsinki.
The subjects performed twice an explicit verbal memory task
using parallel versions. In eleven subjects, the two versions were
administered within one measurement and the subjects remained
inside the scanner (consecutive or CON group). Nine subjects
performed the two versions within two separate measurements
(separate or SEP group) 210 to 308 days apart (Mintersession interval =
228 days, SD = 32.1 days). Order effects were cancelled out by
pseudo-randomly assigning the subjects to test versions. This
resulted in 13 subjects performing the first test version during the
first session and 7 subjects working on the parallel version during
the first run.
After each session, a 4-point rating scale was applied outside
the scanner in order to evaluate the subjects’ mnemonic strategy. In
this questionnaire, the participants were asked to state whether they
had used a purely verbal (1), a rather verbal (2), a rather pictorial
(3), or a purely pictorial (4) strategy to memorize the word pairs.
Stimuli and task design
Subjects were explicitly instructed to encode and later
recognize 24 concrete and highly imageable word pairs per
session. The two nouns were neither semantically nor phonemati-
cally related and consisted of a maximum of three syllables.
Concrete nouns were selected from the German version 2.5 of the
CELEX Lexical Database Release 2 (http://www.kun.nl/celex)
which gives information about the written and spoken frequency of
about 6 million words. The overall frequency of the used 144
words ranged from 1 to 4395, and there was no significant
difference between the frequency of the words used in version 1
and those in version 2.
The sequence of blocks is displayed in Fig. 1. During encoding,
subjects viewed each word pair [e.g., ‘‘Pelz + Kreis’’ (‘‘fur +
circle’’)] for 7 s (plus 1 s black screen) and were told to memorize
it. Four word pairs constituted an encoding block (32 s) which
alternated with a block of the control condition (24 s). As a control
condition, the subjects were presented names of two weekdays
[e.g., ‘‘Dienstag + Sonntag’’ (‘‘Tuesday + Sunday’’)] for 5 s (plus
1 s black screen) and they had to indicate by button press whether
they were identical or not. In the recognition condition (32 s), the
subjects were given three words for 7 s (plus 1 s black screen), and
they had to indicate by button press which two words constituted a
pair beforehand (24 decisions). Fifty percent of the distractors were
phonematically, and 50% semantically related to the target (see
Fig. 1). The control condition was identical for encoding and
recognition. The number of left and right button presses was
balanced throughout the whole experiment, which lasted 672 s.
The stimuli were visually projected on a translucent screen at
the end of the scanner table using a data projector outside the
magnet. Subjects saw the word pairs via a mirror that was
positioned above the head coil. A laptop outside the scanner room
using the software FPresentation 0.5_ (www.neurobehavioralsys-
tems.com) was connected to the data projector. Responses were
recorded by use of a button box.
Data acquisition
Magnetic Resonance Imaging (MRI) was performed with a
Magnetom Siemens Vision 1.5-T scanner (Siemens AG, Erlangen,
Germany). For high anatomical resolution a sagittal T1-weighted
3D-MPRAGE sequence was obtained (TR/TE = 9.7/4 ms, flip
angle = 12-, field of view = 256 mm, matrix = 256 * 256, 160
slices, voxel size = 1 * 1 * 1 mm3).
Functional MR images were acquired using Gradient-Echo
Echo-Planar imaging sequences (GE-EPI) sensitive to BOLD
contrast (TR/TE = 4000/64 ms, flip angle = 90-, field of view =
256 mm, matrix = 64 * 64, 30 interleaved slices, voxel size = 4 *
4 * 3.3 mm3, gap = 0.3 mm). The block design included 173
acquisitions, of which the first 5 images were discarded in order to
eliminate magnetization instability.
Image processing and data analysis
Data were analyzed in MATLAB 6.1 (http://www.mathworks.
com) using the statistical parametric mapping software SPM2
(http://www.fil.ion.ucl.ac.uk/spm/). Additional calculations were
Page 4
Fig. 1. One of two experimental cycles showing examples for encoding, recognition, and the control condition. Examples: ‘‘Pelz + Kreis’’ (‘‘fur + circle’’),
‘‘Dienstag’’ and ‘‘Sonntag’’ (‘‘Tuesday’’ and ‘‘Sunday’’), ‘‘Kugel’’ (‘‘ball’’).
K. Wagner et al. / NeuroImage 28 (2005) 122–131124
accomplished with SPSS 11.0 (http://www.spss.com). Functional
images were converted into Analyze format and unwarped. They
were realigned and normalized onto the Montreal Neurologic
Institute Atlas (MNI; Mazziotta et al., 1995) using the EPI template
(sinc interpolation), and smoothed with a 9-mm isotropic Gaussian
kernel. The anatomical images were normalized as well onto the
MNI atlas using the T1-weighted template. The time series were
filtered with the hemodynamic response function (hrf) as a low
pass and at 112 s as a high pass filter. At first, single-subject
analyses were carried out (items modeled as blocks and convolved
with hemodynamic response function) in order to evaluate the
individual contrasts for encoding vs. control condition (encoding or
ENC) and recognition vs. control condition (recognition or REC).
An individual t threshold was calculated for every subject and
contrast in order to adjust for the intersubject variability in general
activation levels: for this purpose, the mean of the upper 5% of t
values of the whole brain was computed and the threshold was set
at 50% of this value (Fernandez et al., 2003).
In order to investigate activations within the medial temporal
lobe, regions of interest (ROI) were defined bilaterally. A mask was
drawn manually on a normalized T1-weighted image that included
the hippocampus proper, the parahippocampal gyrus, the entorhinal
cortex, the subiculum, and the fusiform gyrus. The medial temporal
mask was mirrored onto the other hemisphere. For the whole brain
reliability analyses, only the supratentorial brain was used.
Laterality of activation elicited by the verbal memory paradigm
was assessed by calculating lateralization indices of the SPM(t)
maps in the ROI for every subject. The number of suprathreshold
voxels in the right (R) and the left (L) medial temporal ROI was
computed and weighted with their t values in order to take the
intensity of activation into account. Hemispheric dominance has
been quantified using a laterality index (LI) defined by the formula
(Fernandez et al., 2003):
LI ¼
P
V
XR �P
V
XL
P
V
XR þP
V
XL
where V = set of activated voxels within the medial temporal
ROI, XL = t value of left hemispheric voxels, and XR = t value
of right hemispheric voxels. A negative LI indicates left
lateralized activations and a positive LI shows right lateralized
activations.
Test–retest reliability measures
Test– retest reliability was evaluated by calculating three
quantitative variables for the whole brain and the ROI:
I In order to assess the reproducibility of laterality, lateraliza-
tion indices were determined for the contrasts encoding
(ENC) and recognition (REC) using all activated voxels
above their respective individual thresholds. Linear correla-
tions were calculated between the lateralization indices of
the first and the second session.
II A spatially more precise measure of within-subject reli-
ability is the voxel-wise correlational analysis between t
values of the first and the second investigation of ENC and
REC (Strother et al., 1997). Pearson’s correlation coeffi-
cients were calculated only for voxels that either exceeded
the positive individual threshold or those that fell below the
negative individual threshold to reduce the noise of non-
significant voxels with t values around zero.
III In order to determine the relative amount of overlapping
volume between two activation maps, the overlap ratio
Rijoverlap introduced by Rombouts et al. (1997) was calculated:
Rijoverlap ¼
24Voverlap
Vi þ Vj
where Vi = number of suprathreshold voxels within SPM(t)
maps in session i, Vj = number of suprathreshold voxels
within SPM(t) maps in session j, and Voverlap = the number
of suprathreshold voxels in both maps. Unlike the correla-
tional analysis of t values (see II) the overlap ratio is based
on the location of significantly activated voxels and does not
include the actual t values of these voxels in the calculation.
The overlap ratio can range from 0 to 100% of overlapping
volume.
Page 5
K. Wagner et al. / NeuroImage 28 (2005) 122–131 125
In order to visualize the area of overlapping volume for the
group, a Random Effects (RFX) Analysis was carried out by
calculating a one way ANOVAwith 4 groups (encoding session 1,
recognition session 1, encoding session 2, recognition session 2),
with non-sphericity correction, replications over subjects, and with
correlated repeated measures. After the contrasts for encoding and
recognition for each session were defined, an inclusive masking of
the first and the second measurement of encoding as well as of
recognition was accomplished. Results are displayed on a
normalized T1-weighted image of one of the subjects as well as
on a glass brain.
Additionally, a multivariate analysis of variance for repeated
measures (SPSS 11.0) was accomplished for the individual
reliability measures II and III to test for main effects of the TASK
(encoding vs. recognition), brain REGION (MTL vs. whole brain),
and GROUP (CON vs. SEP group). In order to get more
information about significant interactions between factors, addi-
tional post hoc t tests (for Paired or Independent Samples) were
performed. As to identify differences in laterality, a multivariate
analysis of variance for repeated measures was calculated with all
lateralization indices for within-subject factors TASK (encoding
vs. recognition), REGION (MTL vs. whole brain), and TIME
(session 1 vs. session 2).
Fig. 2. (a and b) Individual lateralization indices of encoding (a, above) and
of recognition (b, below) during both sessions. Plain columns show cases of
the CON group, striated columns characterize separate measurements (SEP
group).
Results
Behavioral data
The mean percentage of correctly recognized word pairs was
97.50% (SD = 5.13) in the first session and 98.12% (SD = 3.44) in
the second session. There was no significant difference between
performances in both runs (Wilcoxon Test). Altogether, 32.5% of
the subjects tried to memorize the word pairs using a purely verbal
strategy (1), 22.5% stated that their strategy was rather verbal than
pictorial (2), 25% rated their mnemonic strategy as rather pictorial
than verbal (3), and the remaining 20% determined they had used a
purely pictorial strategy (4). Two subjects indicated that they had
changed their strategy between both sessions: One changed from a
rather pictorial than verbal (3) to a rather verbal than pictorial (2)
strategy. The other subject shifted from a rather pictorial than
verbal (3) to a purely verbal strategy (1). There was no significant
difference between the chosen strategies in session 1 and session 2
(Wilcoxon Test). No significant linear correlations were found
between lateralization indices and strategies (Pearson).
Lateralization of activations
For the MTL, the mean lateralization index of suprathreshold
voxels associated with encoding during the first session was �0.06
(SD = 0.48) and �0.28 (SD = 0.49) during the second session,
respectively. Recognition in the first session showed a mean
lateralization index of �0.11 (SD = 0.36) and �0.22 (SD = 0.28)
during the second session. Laterality of activations remained
unchanged between both sessions in 13 subjects (65%) for
encoding and in 14 subjects (70%) for recognition. The individual
lateralization indices for encoding and recognition within the ROI
for both measurements are displayed in Figs. 2a and b (average
individual threshold: t = 1.4, SD = 0.3).
The mean lateralization for the whole brain was �0.33 (SD =
0.33) and �0.37 (SD = 0.30) for encoding. For recognition, the
mean lateralization index was �0.06 (SD = 0.18) in the first
session and �0.23 (SD = 0.15) in the second session, respectively.
Analysis of variance showed a main effect of the factor TIME
(session 1 vs. session 2; P < 0.05). Additionally, an interaction
between REGION and TASK (P < 0.05) was found demonstrating
that during encoding activation patterns within the whole brain
were more left lateralized than during recognition as well as in
comparison to the ROI analysis of encoding processes.
Test–retest reliability
I The linear correlations (Pearson) of lateralization indices of
MTL activations reached significance for the test– retest
comparison of encoding (r = 0.41, P = 0.05, see Fig. 3a), but
not for recognition (r = �0.24, n.s., see Fig. 3b). Separate
evaluations of subjects who remained inside the scanner
throughout both sessions (consecutive or CON group) and
those who were measured in two separate sessions (separate
or SEP group) did show a tendency of a significant
relationship between the lateralization indices for encoding
within the MTL, and no significant correlation for recog-
nition (correlation coefficients and corresponding signifi-
Page 6
Fig. 3. (a and b) Scatterplots of individual lateralization indices from both
sessions of encoding (a, above, r = 0.41, P < 0.05) and recognition (b,
below, r = �0.24, n.s.).
K. Wagner et al. / NeuroImage 28 (2005) 122–131126
cance levels are displayed in Table 1). The correlational
whole brain analyses of the lateralization indices showed
significant positive relationships between both sessions of
encoding as well as recognition for all subjects (ENC: r =
0.82, P < 0.01; REC: r = 0.59, P < 0.01) as well as
calculated for the CON and SEP group separately (see
Table 1).
Table 1
Reliability measures I, II, and III for the medial temporal lobe and the whole brain d
subgroups
Medial temporal lobe Whol
ENC REC ENC
ALL CON SEP ALL CON SEP ALL
I 0.41 0.26 0.52 �0.24 �0.33 �0.04 0.82
(P < 0.05) (n.s.) (P = 0.07) (n.s.) (n.s.) (n.s.) (P <
II 0.22 0.27 0.17 0.20 0.21 0.18 0.50
(0.19) (0.20) (0.15) (0.27) (0.31) (0.24) (0.19)
III 17.60 17.55 17.67 19.15 18.0 20.56 36.15
(15.64) (16.78) (15.13) (15.13) (16.08) (14.72) (15.74
I = correlation coefficients of lateralization indices (significance level).
II = mean correlation coefficients of SPM(t) maps (standard deviation).
III = mean overlap ratios (standard deviation).
II Correlational analyses within the ROI between t values
of the first and the second investigation in single
subjects showed that 16 (80%; ENC) and 15 (75%;
REC) positive correlations reached significance (P <
0.01). The mean correlation coefficient of encoding was
0.22 (SD = 0.19) ranging from �0.09 to 0.65. For
recognition, the mean correlation coefficient was 0.20
(SD = 0.27) ranging from �0.44 to 0.54. Within the
CON group, 10 subjects (91%) showed significant
positive correlations for the SPM(t) maps of encoding,
and 8 (73%) of recognition in the MTL. In the SEP
group, 6 (67%; ENC) and 7 (78%; REC) subjects
exhibited significant positive correlations. Mean correla-
tion coefficients and standard deviations for the CON
and SEP group evaluated for the MTL and the whole
brain activation patterns are displayed in Table 1. In the
whole brain analyses of all subjects, the mean correlation
coefficient was 0.50 (SD = 0.19) for encoding and 0.51
(SD = 0.19) for recognition. Each calculated correlation
was positive and reached significance (P < 0.01).
III The mean percentage of medial temporal overlapping
volume considering the individual threshold was 17.60%
(SD = 15.64) for encoding and 19.15% (SD = 15.13) for
recognition of the word pairs. Mean overlap ratios (and
standard deviations) for all subjects as well as for the
CON and the SEP subgroup are presented in Table 1.
The mean percentage of overlapping whole brain volume
was 36.15% (SD = 15.74; encoding) and 41.95% (SD =
13.26; recognition).
Evaluation of medial temporal intersect maps in the
group RFX analysis (ANOVA: P < 0.05, uncorrected)
exhibited that encoding of word pairs elicited activations in
the left hippocampus and the left fusiform gyrus during both
measurements (see Fig. 4a).
During both recognition phases, the parahippocampal
gyrus and the fusiform gyrus bilaterally showed a larger
BOLD response than during the control condition (see Fig.
4b). Activated clusters within the medial temporal lobe
during both sessions are listed in Table 2 for encoding and
Table 3 for recognition including cluster size, peak voxel
coordinates, and the corresponding t values. After applying
a correction for multiple comparisons (P < 0.05), no
suprathreshold voxels remained. Intersecting the whole
brain encoding-related activation patterns primarily resulted
uring encoding and recognition for all subjects as well as for CON and SEP
e brain
REC
CON SEP ALL CON SEP
0.80 0.85 0.59 0.72 0.74
0.01) (P < 0.01) (P < 0.01) (P < 0.01) (P < 0.01) (P < 0.05)
0.59 0.39 0.51 0.58 0.44
(0.18) (0.16) (0.19) (0.18) (0.18)
43.91 26.67 41.95 47.82 34.78
) (16.18) (8.79) (13.26) (12.35) (11.02)
Page 7
Fig. 4. (a and b) Suprathreshold voxels within the MTL during encoding displayed on a coregistered T1-weighted image and on a glass brain for encoding
(a, above) and recognition (b, below). (Random Effects Analysis, one-way ANOVA, inclusive masking of session 1 and 2, uncorrected threshold, P < 0.05,
left = left.)
K. Wagner et al. / NeuroImage 28 (2005) 122–131 127
in activations of the left inferior frontal gyrus (including
Broca’s area, especially BA45). Bilateral frontal (left >
right), parietal, and occipital areas were activated during
both sessions of recognition (FWE-corrected, P < 0.05).
Accomplishing a multivariate analysis of variance for repeated
measures with the between-subject factor GROUP (CON vs.
SEP) and the within-subject factors TASK (ENC vs. REC) and
REGION (MTL vs. whole brain) for the individual reliability
Page 8
Table 2
Activated clusters (>5 voxels) within the MTL during session 1 and 2 of
encoding with cluster size, peak voxel coordinates, and corresponding t
values (Random Effects Analysis, ANOVA, uncorrected threshold, P <
0.05)
Region Cluster t value x y z
Session 1 Left fusiform gyrus 149 3.16 �39 �9 �36
Right fusiform gyrus 24 2.07 36 �12 �30
Right hippocampus 9 2.14 30 �33 �12
Session 2 Left hippocampus 20 2.38 �30 �15 �15
Left fusiform gyrus 7 2.66 �36 �30 �27
K. Wagner et al. / NeuroImage 28 (2005) 122–131128
measures II and III revealed a main effect of REGION (whole
brain > MTL ROI, P < 0.001), a main effect of GROUP
(univariate test for measure II: CON > SEP, P < 0.05), and a
significant interaction between REGION and GROUP (univariate
test for measure III: P < 0.05). In order to get more information
about the significant interaction between the factor REGION and
GROUP, additional post hoc t tests were performed. These
showed that the CON group demonstrated significantly more
overlapping volume within the whole brain analysis than the SEP
group (Independent-Samples t test, P < 0.05). Significantly
higher overlap ratios for the whole brain compared to the MTL
were found (Paired-Samples t test, P < 0.05).
Table 3
Activated clusters (>5 voxels) within the MTL during session 1 and 2 of
recognition with cluster size, peak voxel coordinates and corresponding t
values (Random Effects Analysis, ANOVA, uncorrected threshold, P <
0.05)
Region Cluster t value x y z
Session 1 Left parahippocampal
gyrus/hippocampus
287 3.62 �15 �33 �3
Right parahippocampal
gyrus/hippocampus
134 3.62 9 �21 �12
Left fusiform gyrus 12 3.67 �39 �45 �24
Session 2 Left hippocampus 276 4.81 �21 �36 0
Right fusiform gyrus 110 4.28 39 �39 �24
Discussion
In our study, we evaluated the test–retest reliability of medial
temporal lobe and whole brain activations using verbal episodic
encoding and recognition of subjects who remained inside the
scanner between sessions (CON group) and subjects who were
measured again weeks later (SEP group). The study revealed three
major findings: (1) It could be shown that the whole brain analysis
produced higher reliability measures compared to separate analysis
of the MTL. (2) Within the whole brain analysis, higher reliability
measures were observed for the CON group compared to the SEP
group. (3) The task (encoding or recognition) had no effect on
reproducibility. Qualitative group analysis of the ROI could
demonstrate intersecting activations in the left hippocampus and
the left fusiform gyrus during encoding and in the parahippocam-
pal gyrus and the fusiform gyrus bilaterally during recognition.
These activation sites are in line with previous functional imaging
studies of episodic memory processes (Daselaar et al., 2001;
Davachi and Wagner, 2002; Fernandez et al., 1998; Golby et al.,
2001; Halsband et al., 2002; Jackson and Schacter, 2004; Nyberg
et al., 1996). Additionally, encoding as well as recognition
processes reliably activated left frontal language areas, which is
due to the verbal nature of the task.
(1) Within quantitative analyses, the evaluation of reproduci-
bility revealed higher values in every single reliability
measure for whole brain (predominantly left frontal) than
medial temporal activation patterns. These robust left frontal
activations might be attributed to language processing, like
rehearsing or inner speech (e.g., Shergill et al., 2001). These
activation sites are commonly seen in verbal memory tasks
(for a review, see Cabeza and Nyberg, 2000; Wagner et al.,
1998). Especially individual quantitative reliability meas-
ures for the MTL that relied on exact corresponding voxels
[correlations of SPM(t) maps, overlap ratios] demonstrated a
lower conformity between the two sessions compared to
studies of other anatomical areas (Fernandez et al., 2003;
Machielsen et al., 2000). The study by Fernandez et al.
(2003) addressed the test– retest reliability of language-
related activations by administering a semantic decision task
using similar reproducibility measures which revealed
comparable results. They showed mean overlap ratios of
46.03% at the chosen threshold (uncorrected, P < 0.05) for
the whole brain. For our data, analogue evaluation of
reliability measures for activations of the whole brain
showed similar overlap ratios of 36.2% for encoding and
42.0% for recognition considering the individual t threshold.
Linear correlations between SPM(t) maps of both measure-
ments yielded a mean correlation coefficient of 0.7 in the
study by Fernandez et al. (2003) and 0.5 (for ENC as well as
REC) for the whole brain analysis of our data. Higher
reliability measures of frontal lobe areas indicate that
reproducibility of activations depends on the brain area.
Furthermore, the study by Machielsen et al. (2000) supports
these findings using a mnemonic task. They examined the
reproducibility of episodic memory-related fMRI activa-
tions, investigating encoding of complex visual stimuli
(photographs of outdoor scenes) across 3 sessions (session 1
and 2 during one measurement and session 3 after 3–24
days). As described earlier, they divided the whole brain
into 3 different ROIs: an anterior region, a posterior region,
and a middle region covering the rest of the brain in between
these 2 areas including the (para)hippocampal region.
Overlap ratios were also calculated as reproducibility
measures. Mean overlap ratios between session 1 and 2
constituted 42.8% (SD = 30.8) for the anterior region,
62.0% (SD = 11.5) for the posterior area, and 50.4% (SD =
27.0) for the middle ROI. Mean overlap ratios between
session 1 and 3 showed 21.1% (SD = 22.8) overlapping
voxels for the anterior region, 51.4% (SD = 18.1) for the
posterior area, and 39.6% (SD = 25.0) for the middle ROI.
This also implicates an influence of the brain region on
reproducibility of activation patterns associated with picture
encoding in favor of posterior areas. According to the large
ROI that included the medial temporal area, it remains
unclear, how much the (para)hippocampal area contributed
to this overlapping volume.
One reason for lower reproducibility within the MTL is the
anatomy. The anatomy of the medial temporal lobe is known
to bear difficulties in functional MRI: because of its
proximity to bones and air sinuses, it is susceptible to
Page 9
K. Wagner et al. / NeuroImage 28 (2005) 122–131 129
magnetic artifacts, which often result in image distortions or
deletions. The evaluation of reliability on a voxel-by-voxel
basis (correlation of t maps and overlap ratio) might be
affected. Additionally, the voxel size of 4 * 4 * 3.3 mm3
might have led to partial volume effects, especially in the
hippocampus. Decreasing the voxel size might increase the
voxel-by-voxel reproducibility. It could be demonstrated
that voxel sizes of 2 * 2 * 1 mm3 reduce susceptibility
artifacts within the hippocampal formation (Fransson et al.,
2001). Furthermore, a shorter echo time (TE) has shown to
produce an increased signal-to-noise ratio and less signal
loss, even though a shorter TE is not sufficient to recover the
BOLD signal from regions affected by susceptibility
artifacts (Gorno-Tempini et al., 2002). Additionally, enhanc-
ing the power of the study by performing more scans per
condition should increase the signal-to-noise ratio, and
therefore probably improve reproducibility within the MTL.
(2) It is known that various additional factors that are
independent of the subject can influence location and size
of activations, and therefore increase variability within and
between sessions: variations in the shim procedure, spatial
filter size, or the stability of the scanner over time
(Howseman et al., 1998; Rombouts et al., 1998). It is also
possible that spatial preprocessing has an effect on
intersession variance. Additionally, repositioning errors
and miscoregistration of EPI images may account for
variations in activation sites.
Several psychophysiological effects within the subject like
changes in arousal, attention, fatigue, task acquaintance, and
habituation might also have contributed to an increased
variability and therefore a lower reproducibility. Good
performances throughout all sessions indicate that all
subjects attended to the task. The contribution of other
factors cannot be determined exactly, but it seems probable
that they vary more across as compared to in-between
measurements. As expected, the comparison of reliability
variables of subjects who remained inside the scanner for
both test versions (CON group) and of those who repeated
the procedure after an average of 228 days (SEP group)
displayed higher values for the CON group. This underlines
the influence of numerous factors, like physiological aspects
within the subjects and also technical artifacts, on the
consistency of activations throughout the brain.
(3) It was shown that the task had no significant influence on
test–retest reliability. The only difference between ENC and
REC was seen for lateralization indices of the whole brain:
Encoding processes produced more left lateralized activation
patterns than recognition. This might indicate that during
recognition phases, subjects retrieved verbal as well as
visuospatial associations, which is expressed by less left-
lateralized activation patterns.
It should be pointed out that the use of correlational analyses for
evaluating reliability has some restrictions. In order to identify
significant correlations, it is necessary to have larger intersubject
than intrasubject variance. The restriction of Pearson’s Product
Moment Correlation is that it is only able to detect linear
relationships: If there were neither inter- nor intrasubject variance,
meaning perfect within-subject reliability, the correlational analysis
would fail to detect it. We assumed that this is a rare case in
empirical data, and we would rather expect greater variance
between than within subjects. Nevertheless, the scatterplot should
always be inspected to explore inter- and intrasubject variance. In
our study, analysis of lateralization indices revealed no significant
correlation during recognition, although laterality of activations
showed less inter- and intrasubject variance than for encoding (see
Figs. 3a and b). Therefore, the reproducibility of lateralization for
recognition is underestimated by linear correlational analysis.
Additionally, descriptive data (standard deviation of lateralization
indices, percentage of laterality changes between sessions) should
be considered to corroborate the interpretation of correlational
analyses.
In this study, the reproducibility of medial temporal lobe
activations was studied with quantitative measurements for the first
time. The reliability measures indicated larger intrasubject varia-
bility of activations within the MTL than within the whole brain.
Measures of subjects who left the scanner between sessions were
less consistent than of those who remained inside the MRT. More
information is needed about factors contributing to intrasubject
variability of medial temporal lobe activations. This can be
achieved by varying and optimizing scanning parameters (voxel
size, TE, TR, and number of scans), controlling for technical
variability (scanner instability, shim and preprocessing proce-
dures), and psychophysiological effects (respiration, skin conduc-
tance, EEG). Furthermore, the relationship between cognitive
strategies as well as material and laterality of activations within the
MTL should be further investigated.
Conclusion
The evaluation of test–retest reliability of verbal episodic
memory activation patterns within the medial temporal lobes
compared to the whole brain revealed three major findings: (1) It
could be shown that whole brain analysis resulted in higher
reliability measures compared to separate analysis of the MTL. (2)
Within the whole brain analysis, higher reliability measures were
observed for the subjects who remained inside the scanner
compared to those who performed the two versions within two
separate measurements. (3) The task (encoding or recognition) had
no effect on reproducibility. The voxel-based quantitative measure-
ments showed an increased intrasubject variability within the
medial temporal lobes as compared to the whole brain, which
might be due to susceptibility artifacts. Single-subject voxel-based
evaluations of MTL activations should therefore be interpreted
carefully. Further insight is needed in contributing factors to
variability in medial temporal lobe activations.
References
Alvarez, P., Squire, L.R., 1994. Memory consolidation and the medial
temporal lobe: a simple network model. Proc. Natl. Acad. Sci. U.S.A
91, 7041–7045.
Brannen, J.H., Badie, B., Moritz, C.H., Quigley, M., Meyerand, M.E.,
Haughton, V.M., 2001. Reliability of functional MR imaging with
word-generation tasks for mapping Broca’s area. AJNR Am. J.
Neuroradiol. 22, 1711–1718.
Buckner, R.L., Logan, J., Donaldson, D.I., Wheeler, M.E., 2000. Cognitive
neuroscience of episodic memory encoding. Acta Psychol. (Amst) 105,
127–139.
Cabeza, R., Nyberg, L., 2000. Imaging cognition II: an empirical review of
275 PET and fMRI studies. J. Cogn. Neurosci. 12, 1–47.
Page 10
K. Wagner et al. / NeuroImage 28 (2005) 122–131130
Casey, B.J., Cohen, J.D., O’Craven, K., Davidson, R.J., Irwin, W., Nelson,
C.A., Noll, D.C., Hu, X., Lowe, M.J., Rosen, B.R., Truwitt, C.L.,
Turski, P.A., 1998. Reproducibility of fMRI results across four
institutions using a spatial working memory task. NeuroImage 8,
249–261.
Daselaar, S.M., Rombouts, S.A., Veltman, D.J., Raaijmakers, J.G., Lazeron,
R.H., Jonker, C., 2001. Parahippocampal activation during successful
recognition of words: a self-paced event-related fMRI study. Neuro-
Image 13, 1113–1120.
Davachi, L., Wagner, A.D., 2002. Hippocampal contributions to episodic
encoding: insights from relational and item-based learning. J. Neuro-
physiol. 88, 982–990.
Dolan, R.J., Fletcher, P.C., 1997. Dissociating prefrontal and hippocampal
function in episodic memory encoding. Nature 388, 582–585.
Fernandez, G., Weyerts, H., Schrader-Bolsche, M., Tendolkar, I., Smid,
H.G., Tempelmann, C., Hinrichs, H., Scheich, H., Elger, C.E.,
Mangun, G.R., Heinze, H.J., 1998. Successful verbal encoding into
episodic memory engages the posterior hippocampus: a parametrically
analyzed functional magnetic resonance imaging study. J. Neurosci.
18, 1841–1847.
Fernandez, G., Specht, K., Weis, S., Tendolkar, I., Reuber, M., Fell, J.,
Klaver, P., Ruhlmann, J., Reul, J., Elger, C.E., 2003. Intrasubject
reproducibility of presurgical language lateralization and mapping using
fMRI. Neurology 60, 969–975.
Fransson, P., Merboldt, K.D., Ingvar, M., Petersson, K.M., Frahm, J.,
2001. Functional MRI with reduced susceptibility artifact: high-
resolution mapping of episodic memory encoding. NeuroReport 12,
1415–1420.
Golby, A.J., Poldrack, R.A., Brewer, J.B., Spencer, D., Desmond, J.E.,
Aron, A.P., Gabrieli, J.D., 2001. Material-specific lateralization in the
medial temporal lobe and prefrontal cortex during memory encoding.
Brain 124, 1841–1854.
Gorno-Tempini, M.L., Hutton, C., Josephs, O., Deichmann, R., Price, C.,
Turner, R., 2002. Echo time dependence of BOLD contrast and
susceptibility artifacts. NeuroImage 15, 136–142.
Halsband, U., Krause, B.J., Sipila, H., Teras, M., Laihinen, A., 2002. PET
studies on the memory processing of word pairs in bilingual Finnish–
English subjects. Behav. Brain Res. 132, 47–57.
Henke, K., Weber, B., Kneifel, S., Wieser, H.G., Buck, A., 1999. Human
hippocampus associates information in memory. Proc. Natl. Acad. Sci.
U. S. A. 96, 5884–5889.
Howseman, A.M., McGonigle, D.J., Grootonk, S., Ramdeen, J., Athwal,
B.S., Turner, R., 1998. Assessment of the variability in fMRI data sets
due to subject positioning and calibration of the MRI scanner.
NeuroImage 7, 599.
Jackson III, O., Schacter, D.L., 2004. Encoding activity in anterior medial
temporal lobe supports subsequent associative recognition. NeuroImage
21, 456–462.
Kelley, W.M., Miezin, F.M., McDermott, K.B., Buckner, R.L., Raichle,
M.E., Cohen, N.J., Ollinger, J.M., Akbudak, E., Conturo, T.E., Snyder,
A.Z., Petersen, S.E., 1998. Hemispheric specialization in human dorsal
frontal cortex and medial temporal lobe for verbal and nonverbal
memory encoding. Neuron 20, 927–936.
Machielsen, W.C., Rombouts, S.A., Barkhof, F., Scheltens, P., Witter, M.P.,
2000. FMRI of visual encoding: reproducibility of activation. Hum.
Brain Mapp. 9, 156–164.
Maitra, R., Roys, S.R., Gullapalli, R.P., 2002. Test– retest reliability
estimation of functional MRI data. Magn Reson. Med. 48, 62–70.
Maldjian, J.A., Laurienti, P.J., Driskill, L., Burdette, J.H., 2002. Multiple
reproducibility indices for evaluation of cognitive functional MR
imaging paradigms. AJNR Am. J. Neuroradiol. 23, 1030–1037.
Manoach, D.S., Halpern, E.F., Kramer, T.S., Chang, Y., Goff, D.C., Rauch,
S.L., Kennedy, D.N., Gollub, R.L., 2001. Test– retest reliability of a
functional MRI working memory paradigm in normal and schizo-
phrenic subjects. Am. J. Psychiatry 158, 955–958.
Mazziotta, J.C., Toga, A.W., Evans, A., Fox, P., Lancaster, J., 1995. A
probabilistic atlas of the human brain: theory and rationale for its
development. The International Consortium for Brain Mapping
(ICBM). NeuroImage 2, 89–101.
McGonigle, D.J., Howseman, A.M., Athwal, B.S., Friston, K.J., Frack-
owiak, R.S., Holmes, A.P., 2000. Variability in fMRI: an examination of
intersession differences. NeuroImage 11, 708–734.
Miki, A., Raz, J., van Erp, T.G., Liu, C.S., Haselgrove, J.C., Liu,
G.T., 2000. Reproducibility of visual activation in functional MR
imaging and effects of postprocessing. AJNR Am. J. Neuroradiol.
21, 910–915.
Miki, A., Liu, G.T., Englander, S.A., Raz, J., van Erp, T.G.,
Modestino, E.J., Liu, C.J., Haselgrove, J.C., 2001a. Reproducibility
of visual activation during checkerboard stimulation in functional
magnetic resonance imaging at 4 Tesla. Jpn. J. Ophthalmol. 45,
151–155.
Miki, A., Raz, J., Englander, S.A., Butler, N.S., van Erp, T.G., Haselgrove,
J.C., Liu, G.T., 2001b. Reproducibility of visual activation in functional
magnetic resonance imaging at very high field strength (4 Tesla). Jpn. J.
Ophthalmol. 45, 1–4.
Miller, M.B., Van Horn, J.D., Wolford, G.L., Handy, T.C., Valsangkar-
Smyth, M., Inati, S., Grafton, S., Gazzaniga, M.S., 2002.
Extensive individual differences in brain activations associated
with episodic retrieval are reliable over time. J. Cogn. Neurosci.
14, 1200–1214.
Neumann, J., Lohmann, G., Zysset, S., von Cramon, D.Y., 2003. Within-
subject variability of BOLD response dynamics. NeuroImage 19,
784–796.
Noll, D.C., Genovese, C.R., Nystrom, L.E., Vazquez, A.L., Forman, S.D.,
Eddy, W.F., Cohen, J.D., 1997. Estimating test– retest reliability in
functional MR imaging. II: application to motor and cognitive
activation studies. Magn. Reson. Med. 38, 508–517.
Nyberg, L., Cabeza, R., Tulving, E., 1996. PET studies of encoding and
retrieval: the HERA model. Psychonom. Bull. Rev. 3, 135–148.
Oldfield, R.C., 1971. The assessment and analysis of handedness: the
Edinburgh inventory. Neuropsychologia 9, 97–113.
Otten, L.J., Henson, R.N., Rugg, M.D., 2001. Depth of processing effects
on neural correlates of memory encoding: relationship between findings
from across-and within-task comparisons. Brain 124, 399–412.
Rombouts, S.A., Barkhof, F., Hoogenraad, F.G., Sprenger, M., Valk, J.,
Scheltens, P., 1997. Test– retest analysis with functional MR of the
activated area in the human visual cortex. AJNR Am. J. Neuroradiol.
18, 1317–1322.
Rombouts, S.A., Barkhof, F., Hoogenraad, F.G., Sprenger, M., Scheltens,
P., 1998. Within-subject reproducibility of visual activation patterns
with functional magnetic resonance imaging using multislice echo
planar imaging. Magn. Reson. Imaging 16, 105–113.
Rutten, G.J., Ramsey, N.F., van Rijen, P.C., van Veelen, C.W., 2002.
Reproducibility of fMRI-determined language lateralization in individ-
ual subjects. Brain Lang. 80, 421–437.
Schacter, D.L., Wagner, A.D., 1999. Medial temporal lobe activations in
fMRI and PET studies of episodic encoding and retrieval. Hippocampus
9, 7–24.
Scoville, W.B., Milner, B., 1957. Loss of recent memory after
bilateral hippocampal lesions. J. Neurol., Neurosurg. Psychiatry
20, 11–21.
Shergill, S.S., Bullmore, E.T., Brammer, M.J., Williams, S.C., Murray,
R.M., McGuire, P.K., 2001. A functional study of auditory verbal
imagery. Psychol. Med. 31, 241–253.
Strother, S.C., Lange, N., Anderson, J.R., Schaper, K.A., Rehm, K.,
Hansen, L.K., Rottenberg, D.A., 1997. Activation pattern reproduci-
bility: measuring the effects of group size and data analysis models.
Hum. Brain Mapp. 5, 312–316.
Swallow, K.M., Braver, T.S., Snyder, A.Z., Speer, N.K., Zacks, J.M., 2003.
Reliability of functional localization using fMRI. NeuroImage 20,
1561–1577.
Tegeler, C., Strother, S.C., Anderson, J.R., Kim, S.G., 1999. Reproduci-
bility of BOLD-based functional MRI obtained at 4 T. Hum. Brain
Mapp. 7, 267–283.
Page 11
K. Wagner et al. / NeuroImage 28 (2005) 122–131 131
Wagner, A.D., Poldrack, R.A., Eldridge, L.L., Desmond, J.E., Glover, G.H.,
Gabrieli, J.D., 1998. Material-specific lateralization of prefrontal
activation during episodic encoding and retrieval. NeuroReport 9,
3711–3717.
Wei, X., Yoo, S.S., Dickey, C.C., Zou, K.H., Guttmann, C.R., Panych, L.P.,
2004. Functional MRI of auditory verbal working memory: long-term
reproducibility analysis. NeuroImage 21, 1000–1008.
Yetkin, F.Z., McAuliffe, T.L., Cox, R., Haughton, V.M., 1996. Test– retest
precision of functional MR in sensory and motor task activation. AJNR
Am. J. Neuroradiol. 17, 95–98.