Top Banner
See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/7693328 The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task ARTICLE in NEUROIMAGE · NOVEMBER 2005 Impact Factor: 6.36 · DOI: 10.1016/j.neuroimage.2005.06.005 · Source: PubMed CITATIONS 43 DOWNLOADS 128 VIEWS 146 8 AUTHORS, INCLUDING: Kathrin Wagner Universitätsklinikum Freiburg 48 PUBLICATIONS 629 CITATIONS SEE PROFILE Lars Frings Universitätsklinikum Freiburg 44 PUBLICATIONS 469 CITATIONS SEE PROFILE Ralf Schwarzwald Universitätsklinikum Freiburg 30 PUBLICATIONS 412 CITATIONS SEE PROFILE Ulrike Halsband University of Freiburg 71 PUBLICATIONS 2,856 CITATIONS SEE PROFILE Available from: Andreas Schulze-Bonhage Retrieved on: 03 August 2015
11

The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

May 11, 2023

Download

Documents

Nannan Li
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

Seediscussions,stats,andauthorprofilesforthispublicationat:http://www.researchgate.net/publication/7693328

ThereliabilityoffMRIactivationsinthemedialtemporallobesinaverbalepisodicmemorytask

ARTICLEinNEUROIMAGE·NOVEMBER2005

ImpactFactor:6.36·DOI:10.1016/j.neuroimage.2005.06.005·Source:PubMed

CITATIONS

43

DOWNLOADS

128

VIEWS

146

8AUTHORS,INCLUDING:

KathrinWagner

UniversitätsklinikumFreiburg

48PUBLICATIONS629CITATIONS

SEEPROFILE

LarsFrings

UniversitätsklinikumFreiburg

44PUBLICATIONS469CITATIONS

SEEPROFILE

RalfSchwarzwald

UniversitätsklinikumFreiburg

30PUBLICATIONS412CITATIONS

SEEPROFILE

UlrikeHalsband

UniversityofFreiburg

71PUBLICATIONS2,856CITATIONS

SEEPROFILE

Availablefrom:AndreasSchulze-Bonhage

Retrievedon:03August2015

Page 2: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

www.elsevier.com/locate/ynimg

NeuroImage 28 (2005) 122 – 131

The reliability of fMRI activations in the medial temporal lobes in a

verbal episodic memory task

Kathrin Wagner,a,* Lars Frings,a Ansgar Quiske,a Josef Unterrainer,b Ralf Schwarzwald,c

Joachim Spreer,c Ulrike Halsband,b and Andreas Schulze-Bonhagea

aEpilepsy Center, University Hospital of Freiburg, Breisacher Str. 64, 79106 Freiburg, GermanybNeuropsychology, Department of Psychology, University of Freiburg, GermanycDepartment of Neuroradiology, University Hospital of Freiburg, Germany

Received 9 December 2004; revised 24 May 2005; accepted 1 June 2005

Available online 26 July 2005

The test–retest reliability of activation patterns elicited by encoding

and recognition of word-pair associates within the whole brain and a

predefined medial temporal region of interest (ROI) was investigated.

Twenty healthy right-handed subjects were studied within two sessions,

either on the same day or 210–308 days later. Three quantitative

measures of reliability were calculated for the contrasts encoding and

recognition versus a control condition within the ROI and also for the

whole brain: A group correlational analysis between the lateralization

indices of the first and second session, correlations of the individual

SPM(t) maps of the first and the second run, and overlap ratios

between both sessions. For the ROI, correlational analysis of

lateralization indices during both encoding trials was significant.

Eighty percent of the individual positive correlation coefficients of

SPM(t) maps during encoding, and 75% during recognition reached

significance. The mean percentage of overlapping voxels was 18%

during encoding and 19% during recognition. The reproducibility

measures evaluated for the whole brain demonstrated significantly

higher values compared to the ROI. For the group that stayed inside

the scanner, better whole brain test–retest reliability was observed,

and no influence of the memory process (encoding or recognition) on

reproducibility was found.

D 2005 Elsevier Inc. All rights reserved.

Keywords: Reliability; fMRI; Medial temporal lobe; Hippocampus; Verbal

episodic memory; Word-pair associates; Encoding; Recognition

Introduction

It is well known that the medial temporal lobe (MTL) is a

crucial area involved in episodic memory processes (Alvarez and

Squire, 1994). Lesions in this area have shown to produce

impairments of declarative memory processes (Scoville and

1053-8119/$ - see front matter D 2005 Elsevier Inc. All rights reserved.

doi:10.1016/j.neuroimage.2005.06.005

* Corresponding author. Fax: +49 7612705003.

E-mail address: [email protected] (K. Wagner).

Available online on ScienceDirect (www.sciencedirect.com).

Milner, 1957). Furthermore, numerous functional imaging studies

have demonstrated that the hippocampus and its adjacent cortices

are involved in encoding and retrieving new information (Schacter

and Wagner, 1999). Since functional magnetic resonance imaging

(fMRI) has become a useful tool to gain insights into mnemonic

processes, it has been used to answer questions of basic research,

but also of clinical interest. Most of these studies performed group

analyses, but did not evaluate the reproducibility of a given

individual BOLD signal. In order to interpret individual activation

patterns, e.g., for diagnostic purposes, it is necessary to examine

the validity and reliability of individual results. Some studies have

evaluated the reproducibility of individual activations associated

with motor (Maitra et al., 2002; Tegeler et al., 1999; Yetkin et al.,

1996), visual (Miki et al., 2000, 2001a,b; Rombouts et al., 1997;

Rombouts et al., 1998; Swallow et al., 2003), language (Brannen et

al., 2001; Fernandez et al., 2003; Maldjian et al., 2002; Rutten et

al., 2002), and different higher cognitive tasks (McGonigle et al.,

2000; Neumann et al., 2003). Mostly, different approaches were

used to determine reproducibility which complicates the compa-

rability. Several studies qualitatively assessed the consistency of

suprathreshold activations in predefined brain areas and showed

mostly analogue results over repeated measurements. For quanti-

tative analyses, many different measures were evaluated in order to

determine the reliability: e.g., number of activated voxels, overlap

ratio, correlations of activation values or lateralizations, intraclass

correlation coefficient (ICC), intersect maps, and conjunction

analysis. The results depend on the choice of parameter, the way

of data preprocessing, and the selected threshold. It could be shown

that functional MR activations in the visual (e.g., Miki et al., 2000)

and motor cortex (e.g., Yetkin et al., 1996), as well as in frontal

language areas (e.g., Brannen et al., 2001) are satisfactorily

reproducible.

So far, reliability of memory-related BOLD-signal changes has

predominantly been examined for verbal (Manoach et al., 2001;

Noll et al., 1997; Wei et al., 2004) and spatial (Casey et al., 1998)

working memory tasks, predominantly within the frontal lobes. To

Page 3: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

K. Wagner et al. / NeuroImage 28 (2005) 122–131 123

our knowledge, only two studies investigated the reproducibility of

fMRI activations associated with episodic memory processes

(Machielsen et al., 2000; Miller et al., 2002) of which only the

former involved medial temporal lobe activations: Miller et al.

(2002) investigated individual differences in activation during

verbal episodic recognition by correlating the volumes of raw

signal intensity values of six healthy subjects and found significant

variations between subjects, but reliable individual activations over

time in the frontal and parietal cortices. Machielsen et al. (2000)

analyzed within- and between-subject reproducibility of activations

during encoding of complex pictures for the whole brain and for

three different regions of interest [ROIs: anterior, posterior, and

middle brain areas, including the (para)hippocampal region]. They

showed that reliability, as measured by number and location of

overlapping suprathreshold voxels, was higher for the ROIs than

for the whole brain, and within the ROIs, the activations in

posterior regions were most reliable. None of those studies

examined particularly the reproducibility of medial temporal lobe

activations elicited by a verbal episodic memory paradigm.

The goal of the present study is to investigate the test–retest

reliability of BOLD-signal changes predominantly in the medial

temporal lobes using a verbal episodic memory paradigm. It has

been shown that the hippocampus forms associations to build

memories (Buckner et al., 2000; Henke et al., 1999; Otten et al.,

2001) and that word-pair associates elicit activations predom-

inantly in left medial temporal areas (Dolan and Fletcher, 1997;

Halsband et al., 2002; Kelley et al., 1998). In order to examine the

reproducibility of medial temporal lobe activations, we inves-

tigated encoding and recognition of word-pair associates with two

parallel test versions. They were applied either within one or across

two measurements, and the activations in the whole brain and the

region of interest including the hippocampus, the parahippocampal

and the fusiform gyrus bilaterally were investigated.

Materials and methods

Subjects

Twenty right-handed (Mhandedness quotient = 0.84; Oldfield, 1971)

subjects (16 female, 4 male, Mage = 26 years, SD = 6.1 years) with

no history of neurologic, psychiatric, or vascular disease and with

normal or corrected-to-normal vision participated. Informed

written consent was obtained from each subject after the procedure

had been fully explained. The study was approved by the Ethics

committee of the University Freiburg according to the guidelines of

the Declaration of Helsinki.

The subjects performed twice an explicit verbal memory task

using parallel versions. In eleven subjects, the two versions were

administered within one measurement and the subjects remained

inside the scanner (consecutive or CON group). Nine subjects

performed the two versions within two separate measurements

(separate or SEP group) 210 to 308 days apart (Mintersession interval =

228 days, SD = 32.1 days). Order effects were cancelled out by

pseudo-randomly assigning the subjects to test versions. This

resulted in 13 subjects performing the first test version during the

first session and 7 subjects working on the parallel version during

the first run.

After each session, a 4-point rating scale was applied outside

the scanner in order to evaluate the subjects’ mnemonic strategy. In

this questionnaire, the participants were asked to state whether they

had used a purely verbal (1), a rather verbal (2), a rather pictorial

(3), or a purely pictorial (4) strategy to memorize the word pairs.

Stimuli and task design

Subjects were explicitly instructed to encode and later

recognize 24 concrete and highly imageable word pairs per

session. The two nouns were neither semantically nor phonemati-

cally related and consisted of a maximum of three syllables.

Concrete nouns were selected from the German version 2.5 of the

CELEX Lexical Database Release 2 (http://www.kun.nl/celex)

which gives information about the written and spoken frequency of

about 6 million words. The overall frequency of the used 144

words ranged from 1 to 4395, and there was no significant

difference between the frequency of the words used in version 1

and those in version 2.

The sequence of blocks is displayed in Fig. 1. During encoding,

subjects viewed each word pair [e.g., ‘‘Pelz + Kreis’’ (‘‘fur +

circle’’)] for 7 s (plus 1 s black screen) and were told to memorize

it. Four word pairs constituted an encoding block (32 s) which

alternated with a block of the control condition (24 s). As a control

condition, the subjects were presented names of two weekdays

[e.g., ‘‘Dienstag + Sonntag’’ (‘‘Tuesday + Sunday’’)] for 5 s (plus

1 s black screen) and they had to indicate by button press whether

they were identical or not. In the recognition condition (32 s), the

subjects were given three words for 7 s (plus 1 s black screen), and

they had to indicate by button press which two words constituted a

pair beforehand (24 decisions). Fifty percent of the distractors were

phonematically, and 50% semantically related to the target (see

Fig. 1). The control condition was identical for encoding and

recognition. The number of left and right button presses was

balanced throughout the whole experiment, which lasted 672 s.

The stimuli were visually projected on a translucent screen at

the end of the scanner table using a data projector outside the

magnet. Subjects saw the word pairs via a mirror that was

positioned above the head coil. A laptop outside the scanner room

using the software FPresentation 0.5_ (www.neurobehavioralsys-

tems.com) was connected to the data projector. Responses were

recorded by use of a button box.

Data acquisition

Magnetic Resonance Imaging (MRI) was performed with a

Magnetom Siemens Vision 1.5-T scanner (Siemens AG, Erlangen,

Germany). For high anatomical resolution a sagittal T1-weighted

3D-MPRAGE sequence was obtained (TR/TE = 9.7/4 ms, flip

angle = 12-, field of view = 256 mm, matrix = 256 * 256, 160

slices, voxel size = 1 * 1 * 1 mm3).

Functional MR images were acquired using Gradient-Echo

Echo-Planar imaging sequences (GE-EPI) sensitive to BOLD

contrast (TR/TE = 4000/64 ms, flip angle = 90-, field of view =

256 mm, matrix = 64 * 64, 30 interleaved slices, voxel size = 4 *

4 * 3.3 mm3, gap = 0.3 mm). The block design included 173

acquisitions, of which the first 5 images were discarded in order to

eliminate magnetization instability.

Image processing and data analysis

Data were analyzed in MATLAB 6.1 (http://www.mathworks.

com) using the statistical parametric mapping software SPM2

(http://www.fil.ion.ucl.ac.uk/spm/). Additional calculations were

Page 4: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

Fig. 1. One of two experimental cycles showing examples for encoding, recognition, and the control condition. Examples: ‘‘Pelz + Kreis’’ (‘‘fur + circle’’),

‘‘Dienstag’’ and ‘‘Sonntag’’ (‘‘Tuesday’’ and ‘‘Sunday’’), ‘‘Kugel’’ (‘‘ball’’).

K. Wagner et al. / NeuroImage 28 (2005) 122–131124

accomplished with SPSS 11.0 (http://www.spss.com). Functional

images were converted into Analyze format and unwarped. They

were realigned and normalized onto the Montreal Neurologic

Institute Atlas (MNI; Mazziotta et al., 1995) using the EPI template

(sinc interpolation), and smoothed with a 9-mm isotropic Gaussian

kernel. The anatomical images were normalized as well onto the

MNI atlas using the T1-weighted template. The time series were

filtered with the hemodynamic response function (hrf) as a low

pass and at 112 s as a high pass filter. At first, single-subject

analyses were carried out (items modeled as blocks and convolved

with hemodynamic response function) in order to evaluate the

individual contrasts for encoding vs. control condition (encoding or

ENC) and recognition vs. control condition (recognition or REC).

An individual t threshold was calculated for every subject and

contrast in order to adjust for the intersubject variability in general

activation levels: for this purpose, the mean of the upper 5% of t

values of the whole brain was computed and the threshold was set

at 50% of this value (Fernandez et al., 2003).

In order to investigate activations within the medial temporal

lobe, regions of interest (ROI) were defined bilaterally. A mask was

drawn manually on a normalized T1-weighted image that included

the hippocampus proper, the parahippocampal gyrus, the entorhinal

cortex, the subiculum, and the fusiform gyrus. The medial temporal

mask was mirrored onto the other hemisphere. For the whole brain

reliability analyses, only the supratentorial brain was used.

Laterality of activation elicited by the verbal memory paradigm

was assessed by calculating lateralization indices of the SPM(t)

maps in the ROI for every subject. The number of suprathreshold

voxels in the right (R) and the left (L) medial temporal ROI was

computed and weighted with their t values in order to take the

intensity of activation into account. Hemispheric dominance has

been quantified using a laterality index (LI) defined by the formula

(Fernandez et al., 2003):

LI ¼

P

V

XR �P

V

XL

P

V

XR þP

V

XL

where V = set of activated voxels within the medial temporal

ROI, XL = t value of left hemispheric voxels, and XR = t value

of right hemispheric voxels. A negative LI indicates left

lateralized activations and a positive LI shows right lateralized

activations.

Test–retest reliability measures

Test– retest reliability was evaluated by calculating three

quantitative variables for the whole brain and the ROI:

I In order to assess the reproducibility of laterality, lateraliza-

tion indices were determined for the contrasts encoding

(ENC) and recognition (REC) using all activated voxels

above their respective individual thresholds. Linear correla-

tions were calculated between the lateralization indices of

the first and the second session.

II A spatially more precise measure of within-subject reli-

ability is the voxel-wise correlational analysis between t

values of the first and the second investigation of ENC and

REC (Strother et al., 1997). Pearson’s correlation coeffi-

cients were calculated only for voxels that either exceeded

the positive individual threshold or those that fell below the

negative individual threshold to reduce the noise of non-

significant voxels with t values around zero.

III In order to determine the relative amount of overlapping

volume between two activation maps, the overlap ratio

Rijoverlap introduced by Rombouts et al. (1997) was calculated:

Rijoverlap ¼

24Voverlap

Vi þ Vj

where Vi = number of suprathreshold voxels within SPM(t)

maps in session i, Vj = number of suprathreshold voxels

within SPM(t) maps in session j, and Voverlap = the number

of suprathreshold voxels in both maps. Unlike the correla-

tional analysis of t values (see II) the overlap ratio is based

on the location of significantly activated voxels and does not

include the actual t values of these voxels in the calculation.

The overlap ratio can range from 0 to 100% of overlapping

volume.

Page 5: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

K. Wagner et al. / NeuroImage 28 (2005) 122–131 125

In order to visualize the area of overlapping volume for the

group, a Random Effects (RFX) Analysis was carried out by

calculating a one way ANOVAwith 4 groups (encoding session 1,

recognition session 1, encoding session 2, recognition session 2),

with non-sphericity correction, replications over subjects, and with

correlated repeated measures. After the contrasts for encoding and

recognition for each session were defined, an inclusive masking of

the first and the second measurement of encoding as well as of

recognition was accomplished. Results are displayed on a

normalized T1-weighted image of one of the subjects as well as

on a glass brain.

Additionally, a multivariate analysis of variance for repeated

measures (SPSS 11.0) was accomplished for the individual

reliability measures II and III to test for main effects of the TASK

(encoding vs. recognition), brain REGION (MTL vs. whole brain),

and GROUP (CON vs. SEP group). In order to get more

information about significant interactions between factors, addi-

tional post hoc t tests (for Paired or Independent Samples) were

performed. As to identify differences in laterality, a multivariate

analysis of variance for repeated measures was calculated with all

lateralization indices for within-subject factors TASK (encoding

vs. recognition), REGION (MTL vs. whole brain), and TIME

(session 1 vs. session 2).

Fig. 2. (a and b) Individual lateralization indices of encoding (a, above) and

of recognition (b, below) during both sessions. Plain columns show cases of

the CON group, striated columns characterize separate measurements (SEP

group).

Results

Behavioral data

The mean percentage of correctly recognized word pairs was

97.50% (SD = 5.13) in the first session and 98.12% (SD = 3.44) in

the second session. There was no significant difference between

performances in both runs (Wilcoxon Test). Altogether, 32.5% of

the subjects tried to memorize the word pairs using a purely verbal

strategy (1), 22.5% stated that their strategy was rather verbal than

pictorial (2), 25% rated their mnemonic strategy as rather pictorial

than verbal (3), and the remaining 20% determined they had used a

purely pictorial strategy (4). Two subjects indicated that they had

changed their strategy between both sessions: One changed from a

rather pictorial than verbal (3) to a rather verbal than pictorial (2)

strategy. The other subject shifted from a rather pictorial than

verbal (3) to a purely verbal strategy (1). There was no significant

difference between the chosen strategies in session 1 and session 2

(Wilcoxon Test). No significant linear correlations were found

between lateralization indices and strategies (Pearson).

Lateralization of activations

For the MTL, the mean lateralization index of suprathreshold

voxels associated with encoding during the first session was �0.06

(SD = 0.48) and �0.28 (SD = 0.49) during the second session,

respectively. Recognition in the first session showed a mean

lateralization index of �0.11 (SD = 0.36) and �0.22 (SD = 0.28)

during the second session. Laterality of activations remained

unchanged between both sessions in 13 subjects (65%) for

encoding and in 14 subjects (70%) for recognition. The individual

lateralization indices for encoding and recognition within the ROI

for both measurements are displayed in Figs. 2a and b (average

individual threshold: t = 1.4, SD = 0.3).

The mean lateralization for the whole brain was �0.33 (SD =

0.33) and �0.37 (SD = 0.30) for encoding. For recognition, the

mean lateralization index was �0.06 (SD = 0.18) in the first

session and �0.23 (SD = 0.15) in the second session, respectively.

Analysis of variance showed a main effect of the factor TIME

(session 1 vs. session 2; P < 0.05). Additionally, an interaction

between REGION and TASK (P < 0.05) was found demonstrating

that during encoding activation patterns within the whole brain

were more left lateralized than during recognition as well as in

comparison to the ROI analysis of encoding processes.

Test–retest reliability

I The linear correlations (Pearson) of lateralization indices of

MTL activations reached significance for the test– retest

comparison of encoding (r = 0.41, P = 0.05, see Fig. 3a), but

not for recognition (r = �0.24, n.s., see Fig. 3b). Separate

evaluations of subjects who remained inside the scanner

throughout both sessions (consecutive or CON group) and

those who were measured in two separate sessions (separate

or SEP group) did show a tendency of a significant

relationship between the lateralization indices for encoding

within the MTL, and no significant correlation for recog-

nition (correlation coefficients and corresponding signifi-

Page 6: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

Fig. 3. (a and b) Scatterplots of individual lateralization indices from both

sessions of encoding (a, above, r = 0.41, P < 0.05) and recognition (b,

below, r = �0.24, n.s.).

K. Wagner et al. / NeuroImage 28 (2005) 122–131126

cance levels are displayed in Table 1). The correlational

whole brain analyses of the lateralization indices showed

significant positive relationships between both sessions of

encoding as well as recognition for all subjects (ENC: r =

0.82, P < 0.01; REC: r = 0.59, P < 0.01) as well as

calculated for the CON and SEP group separately (see

Table 1).

Table 1

Reliability measures I, II, and III for the medial temporal lobe and the whole brain d

subgroups

Medial temporal lobe Whol

ENC REC ENC

ALL CON SEP ALL CON SEP ALL

I 0.41 0.26 0.52 �0.24 �0.33 �0.04 0.82

(P < 0.05) (n.s.) (P = 0.07) (n.s.) (n.s.) (n.s.) (P <

II 0.22 0.27 0.17 0.20 0.21 0.18 0.50

(0.19) (0.20) (0.15) (0.27) (0.31) (0.24) (0.19)

III 17.60 17.55 17.67 19.15 18.0 20.56 36.15

(15.64) (16.78) (15.13) (15.13) (16.08) (14.72) (15.74

I = correlation coefficients of lateralization indices (significance level).

II = mean correlation coefficients of SPM(t) maps (standard deviation).

III = mean overlap ratios (standard deviation).

II Correlational analyses within the ROI between t values

of the first and the second investigation in single

subjects showed that 16 (80%; ENC) and 15 (75%;

REC) positive correlations reached significance (P <

0.01). The mean correlation coefficient of encoding was

0.22 (SD = 0.19) ranging from �0.09 to 0.65. For

recognition, the mean correlation coefficient was 0.20

(SD = 0.27) ranging from �0.44 to 0.54. Within the

CON group, 10 subjects (91%) showed significant

positive correlations for the SPM(t) maps of encoding,

and 8 (73%) of recognition in the MTL. In the SEP

group, 6 (67%; ENC) and 7 (78%; REC) subjects

exhibited significant positive correlations. Mean correla-

tion coefficients and standard deviations for the CON

and SEP group evaluated for the MTL and the whole

brain activation patterns are displayed in Table 1. In the

whole brain analyses of all subjects, the mean correlation

coefficient was 0.50 (SD = 0.19) for encoding and 0.51

(SD = 0.19) for recognition. Each calculated correlation

was positive and reached significance (P < 0.01).

III The mean percentage of medial temporal overlapping

volume considering the individual threshold was 17.60%

(SD = 15.64) for encoding and 19.15% (SD = 15.13) for

recognition of the word pairs. Mean overlap ratios (and

standard deviations) for all subjects as well as for the

CON and the SEP subgroup are presented in Table 1.

The mean percentage of overlapping whole brain volume

was 36.15% (SD = 15.74; encoding) and 41.95% (SD =

13.26; recognition).

Evaluation of medial temporal intersect maps in the

group RFX analysis (ANOVA: P < 0.05, uncorrected)

exhibited that encoding of word pairs elicited activations in

the left hippocampus and the left fusiform gyrus during both

measurements (see Fig. 4a).

During both recognition phases, the parahippocampal

gyrus and the fusiform gyrus bilaterally showed a larger

BOLD response than during the control condition (see Fig.

4b). Activated clusters within the medial temporal lobe

during both sessions are listed in Table 2 for encoding and

Table 3 for recognition including cluster size, peak voxel

coordinates, and the corresponding t values. After applying

a correction for multiple comparisons (P < 0.05), no

suprathreshold voxels remained. Intersecting the whole

brain encoding-related activation patterns primarily resulted

uring encoding and recognition for all subjects as well as for CON and SEP

e brain

REC

CON SEP ALL CON SEP

0.80 0.85 0.59 0.72 0.74

0.01) (P < 0.01) (P < 0.01) (P < 0.01) (P < 0.01) (P < 0.05)

0.59 0.39 0.51 0.58 0.44

(0.18) (0.16) (0.19) (0.18) (0.18)

43.91 26.67 41.95 47.82 34.78

) (16.18) (8.79) (13.26) (12.35) (11.02)

Page 7: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

Fig. 4. (a and b) Suprathreshold voxels within the MTL during encoding displayed on a coregistered T1-weighted image and on a glass brain for encoding

(a, above) and recognition (b, below). (Random Effects Analysis, one-way ANOVA, inclusive masking of session 1 and 2, uncorrected threshold, P < 0.05,

left = left.)

K. Wagner et al. / NeuroImage 28 (2005) 122–131 127

in activations of the left inferior frontal gyrus (including

Broca’s area, especially BA45). Bilateral frontal (left >

right), parietal, and occipital areas were activated during

both sessions of recognition (FWE-corrected, P < 0.05).

Accomplishing a multivariate analysis of variance for repeated

measures with the between-subject factor GROUP (CON vs.

SEP) and the within-subject factors TASK (ENC vs. REC) and

REGION (MTL vs. whole brain) for the individual reliability

Page 8: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

Table 2

Activated clusters (>5 voxels) within the MTL during session 1 and 2 of

encoding with cluster size, peak voxel coordinates, and corresponding t

values (Random Effects Analysis, ANOVA, uncorrected threshold, P <

0.05)

Region Cluster t value x y z

Session 1 Left fusiform gyrus 149 3.16 �39 �9 �36

Right fusiform gyrus 24 2.07 36 �12 �30

Right hippocampus 9 2.14 30 �33 �12

Session 2 Left hippocampus 20 2.38 �30 �15 �15

Left fusiform gyrus 7 2.66 �36 �30 �27

K. Wagner et al. / NeuroImage 28 (2005) 122–131128

measures II and III revealed a main effect of REGION (whole

brain > MTL ROI, P < 0.001), a main effect of GROUP

(univariate test for measure II: CON > SEP, P < 0.05), and a

significant interaction between REGION and GROUP (univariate

test for measure III: P < 0.05). In order to get more information

about the significant interaction between the factor REGION and

GROUP, additional post hoc t tests were performed. These

showed that the CON group demonstrated significantly more

overlapping volume within the whole brain analysis than the SEP

group (Independent-Samples t test, P < 0.05). Significantly

higher overlap ratios for the whole brain compared to the MTL

were found (Paired-Samples t test, P < 0.05).

Table 3

Activated clusters (>5 voxels) within the MTL during session 1 and 2 of

recognition with cluster size, peak voxel coordinates and corresponding t

values (Random Effects Analysis, ANOVA, uncorrected threshold, P <

0.05)

Region Cluster t value x y z

Session 1 Left parahippocampal

gyrus/hippocampus

287 3.62 �15 �33 �3

Right parahippocampal

gyrus/hippocampus

134 3.62 9 �21 �12

Left fusiform gyrus 12 3.67 �39 �45 �24

Session 2 Left hippocampus 276 4.81 �21 �36 0

Right fusiform gyrus 110 4.28 39 �39 �24

Discussion

In our study, we evaluated the test–retest reliability of medial

temporal lobe and whole brain activations using verbal episodic

encoding and recognition of subjects who remained inside the

scanner between sessions (CON group) and subjects who were

measured again weeks later (SEP group). The study revealed three

major findings: (1) It could be shown that the whole brain analysis

produced higher reliability measures compared to separate analysis

of the MTL. (2) Within the whole brain analysis, higher reliability

measures were observed for the CON group compared to the SEP

group. (3) The task (encoding or recognition) had no effect on

reproducibility. Qualitative group analysis of the ROI could

demonstrate intersecting activations in the left hippocampus and

the left fusiform gyrus during encoding and in the parahippocam-

pal gyrus and the fusiform gyrus bilaterally during recognition.

These activation sites are in line with previous functional imaging

studies of episodic memory processes (Daselaar et al., 2001;

Davachi and Wagner, 2002; Fernandez et al., 1998; Golby et al.,

2001; Halsband et al., 2002; Jackson and Schacter, 2004; Nyberg

et al., 1996). Additionally, encoding as well as recognition

processes reliably activated left frontal language areas, which is

due to the verbal nature of the task.

(1) Within quantitative analyses, the evaluation of reproduci-

bility revealed higher values in every single reliability

measure for whole brain (predominantly left frontal) than

medial temporal activation patterns. These robust left frontal

activations might be attributed to language processing, like

rehearsing or inner speech (e.g., Shergill et al., 2001). These

activation sites are commonly seen in verbal memory tasks

(for a review, see Cabeza and Nyberg, 2000; Wagner et al.,

1998). Especially individual quantitative reliability meas-

ures for the MTL that relied on exact corresponding voxels

[correlations of SPM(t) maps, overlap ratios] demonstrated a

lower conformity between the two sessions compared to

studies of other anatomical areas (Fernandez et al., 2003;

Machielsen et al., 2000). The study by Fernandez et al.

(2003) addressed the test– retest reliability of language-

related activations by administering a semantic decision task

using similar reproducibility measures which revealed

comparable results. They showed mean overlap ratios of

46.03% at the chosen threshold (uncorrected, P < 0.05) for

the whole brain. For our data, analogue evaluation of

reliability measures for activations of the whole brain

showed similar overlap ratios of 36.2% for encoding and

42.0% for recognition considering the individual t threshold.

Linear correlations between SPM(t) maps of both measure-

ments yielded a mean correlation coefficient of 0.7 in the

study by Fernandez et al. (2003) and 0.5 (for ENC as well as

REC) for the whole brain analysis of our data. Higher

reliability measures of frontal lobe areas indicate that

reproducibility of activations depends on the brain area.

Furthermore, the study by Machielsen et al. (2000) supports

these findings using a mnemonic task. They examined the

reproducibility of episodic memory-related fMRI activa-

tions, investigating encoding of complex visual stimuli

(photographs of outdoor scenes) across 3 sessions (session 1

and 2 during one measurement and session 3 after 3–24

days). As described earlier, they divided the whole brain

into 3 different ROIs: an anterior region, a posterior region,

and a middle region covering the rest of the brain in between

these 2 areas including the (para)hippocampal region.

Overlap ratios were also calculated as reproducibility

measures. Mean overlap ratios between session 1 and 2

constituted 42.8% (SD = 30.8) for the anterior region,

62.0% (SD = 11.5) for the posterior area, and 50.4% (SD =

27.0) for the middle ROI. Mean overlap ratios between

session 1 and 3 showed 21.1% (SD = 22.8) overlapping

voxels for the anterior region, 51.4% (SD = 18.1) for the

posterior area, and 39.6% (SD = 25.0) for the middle ROI.

This also implicates an influence of the brain region on

reproducibility of activation patterns associated with picture

encoding in favor of posterior areas. According to the large

ROI that included the medial temporal area, it remains

unclear, how much the (para)hippocampal area contributed

to this overlapping volume.

One reason for lower reproducibility within the MTL is the

anatomy. The anatomy of the medial temporal lobe is known

to bear difficulties in functional MRI: because of its

proximity to bones and air sinuses, it is susceptible to

Page 9: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

K. Wagner et al. / NeuroImage 28 (2005) 122–131 129

magnetic artifacts, which often result in image distortions or

deletions. The evaluation of reliability on a voxel-by-voxel

basis (correlation of t maps and overlap ratio) might be

affected. Additionally, the voxel size of 4 * 4 * 3.3 mm3

might have led to partial volume effects, especially in the

hippocampus. Decreasing the voxel size might increase the

voxel-by-voxel reproducibility. It could be demonstrated

that voxel sizes of 2 * 2 * 1 mm3 reduce susceptibility

artifacts within the hippocampal formation (Fransson et al.,

2001). Furthermore, a shorter echo time (TE) has shown to

produce an increased signal-to-noise ratio and less signal

loss, even though a shorter TE is not sufficient to recover the

BOLD signal from regions affected by susceptibility

artifacts (Gorno-Tempini et al., 2002). Additionally, enhanc-

ing the power of the study by performing more scans per

condition should increase the signal-to-noise ratio, and

therefore probably improve reproducibility within the MTL.

(2) It is known that various additional factors that are

independent of the subject can influence location and size

of activations, and therefore increase variability within and

between sessions: variations in the shim procedure, spatial

filter size, or the stability of the scanner over time

(Howseman et al., 1998; Rombouts et al., 1998). It is also

possible that spatial preprocessing has an effect on

intersession variance. Additionally, repositioning errors

and miscoregistration of EPI images may account for

variations in activation sites.

Several psychophysiological effects within the subject like

changes in arousal, attention, fatigue, task acquaintance, and

habituation might also have contributed to an increased

variability and therefore a lower reproducibility. Good

performances throughout all sessions indicate that all

subjects attended to the task. The contribution of other

factors cannot be determined exactly, but it seems probable

that they vary more across as compared to in-between

measurements. As expected, the comparison of reliability

variables of subjects who remained inside the scanner for

both test versions (CON group) and of those who repeated

the procedure after an average of 228 days (SEP group)

displayed higher values for the CON group. This underlines

the influence of numerous factors, like physiological aspects

within the subjects and also technical artifacts, on the

consistency of activations throughout the brain.

(3) It was shown that the task had no significant influence on

test–retest reliability. The only difference between ENC and

REC was seen for lateralization indices of the whole brain:

Encoding processes produced more left lateralized activation

patterns than recognition. This might indicate that during

recognition phases, subjects retrieved verbal as well as

visuospatial associations, which is expressed by less left-

lateralized activation patterns.

It should be pointed out that the use of correlational analyses for

evaluating reliability has some restrictions. In order to identify

significant correlations, it is necessary to have larger intersubject

than intrasubject variance. The restriction of Pearson’s Product

Moment Correlation is that it is only able to detect linear

relationships: If there were neither inter- nor intrasubject variance,

meaning perfect within-subject reliability, the correlational analysis

would fail to detect it. We assumed that this is a rare case in

empirical data, and we would rather expect greater variance

between than within subjects. Nevertheless, the scatterplot should

always be inspected to explore inter- and intrasubject variance. In

our study, analysis of lateralization indices revealed no significant

correlation during recognition, although laterality of activations

showed less inter- and intrasubject variance than for encoding (see

Figs. 3a and b). Therefore, the reproducibility of lateralization for

recognition is underestimated by linear correlational analysis.

Additionally, descriptive data (standard deviation of lateralization

indices, percentage of laterality changes between sessions) should

be considered to corroborate the interpretation of correlational

analyses.

In this study, the reproducibility of medial temporal lobe

activations was studied with quantitative measurements for the first

time. The reliability measures indicated larger intrasubject varia-

bility of activations within the MTL than within the whole brain.

Measures of subjects who left the scanner between sessions were

less consistent than of those who remained inside the MRT. More

information is needed about factors contributing to intrasubject

variability of medial temporal lobe activations. This can be

achieved by varying and optimizing scanning parameters (voxel

size, TE, TR, and number of scans), controlling for technical

variability (scanner instability, shim and preprocessing proce-

dures), and psychophysiological effects (respiration, skin conduc-

tance, EEG). Furthermore, the relationship between cognitive

strategies as well as material and laterality of activations within the

MTL should be further investigated.

Conclusion

The evaluation of test–retest reliability of verbal episodic

memory activation patterns within the medial temporal lobes

compared to the whole brain revealed three major findings: (1) It

could be shown that whole brain analysis resulted in higher

reliability measures compared to separate analysis of the MTL. (2)

Within the whole brain analysis, higher reliability measures were

observed for the subjects who remained inside the scanner

compared to those who performed the two versions within two

separate measurements. (3) The task (encoding or recognition) had

no effect on reproducibility. The voxel-based quantitative measure-

ments showed an increased intrasubject variability within the

medial temporal lobes as compared to the whole brain, which

might be due to susceptibility artifacts. Single-subject voxel-based

evaluations of MTL activations should therefore be interpreted

carefully. Further insight is needed in contributing factors to

variability in medial temporal lobe activations.

References

Alvarez, P., Squire, L.R., 1994. Memory consolidation and the medial

temporal lobe: a simple network model. Proc. Natl. Acad. Sci. U.S.A

91, 7041–7045.

Brannen, J.H., Badie, B., Moritz, C.H., Quigley, M., Meyerand, M.E.,

Haughton, V.M., 2001. Reliability of functional MR imaging with

word-generation tasks for mapping Broca’s area. AJNR Am. J.

Neuroradiol. 22, 1711–1718.

Buckner, R.L., Logan, J., Donaldson, D.I., Wheeler, M.E., 2000. Cognitive

neuroscience of episodic memory encoding. Acta Psychol. (Amst) 105,

127–139.

Cabeza, R., Nyberg, L., 2000. Imaging cognition II: an empirical review of

275 PET and fMRI studies. J. Cogn. Neurosci. 12, 1–47.

Page 10: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

K. Wagner et al. / NeuroImage 28 (2005) 122–131130

Casey, B.J., Cohen, J.D., O’Craven, K., Davidson, R.J., Irwin, W., Nelson,

C.A., Noll, D.C., Hu, X., Lowe, M.J., Rosen, B.R., Truwitt, C.L.,

Turski, P.A., 1998. Reproducibility of fMRI results across four

institutions using a spatial working memory task. NeuroImage 8,

249–261.

Daselaar, S.M., Rombouts, S.A., Veltman, D.J., Raaijmakers, J.G., Lazeron,

R.H., Jonker, C., 2001. Parahippocampal activation during successful

recognition of words: a self-paced event-related fMRI study. Neuro-

Image 13, 1113–1120.

Davachi, L., Wagner, A.D., 2002. Hippocampal contributions to episodic

encoding: insights from relational and item-based learning. J. Neuro-

physiol. 88, 982–990.

Dolan, R.J., Fletcher, P.C., 1997. Dissociating prefrontal and hippocampal

function in episodic memory encoding. Nature 388, 582–585.

Fernandez, G., Weyerts, H., Schrader-Bolsche, M., Tendolkar, I., Smid,

H.G., Tempelmann, C., Hinrichs, H., Scheich, H., Elger, C.E.,

Mangun, G.R., Heinze, H.J., 1998. Successful verbal encoding into

episodic memory engages the posterior hippocampus: a parametrically

analyzed functional magnetic resonance imaging study. J. Neurosci.

18, 1841–1847.

Fernandez, G., Specht, K., Weis, S., Tendolkar, I., Reuber, M., Fell, J.,

Klaver, P., Ruhlmann, J., Reul, J., Elger, C.E., 2003. Intrasubject

reproducibility of presurgical language lateralization and mapping using

fMRI. Neurology 60, 969–975.

Fransson, P., Merboldt, K.D., Ingvar, M., Petersson, K.M., Frahm, J.,

2001. Functional MRI with reduced susceptibility artifact: high-

resolution mapping of episodic memory encoding. NeuroReport 12,

1415–1420.

Golby, A.J., Poldrack, R.A., Brewer, J.B., Spencer, D., Desmond, J.E.,

Aron, A.P., Gabrieli, J.D., 2001. Material-specific lateralization in the

medial temporal lobe and prefrontal cortex during memory encoding.

Brain 124, 1841–1854.

Gorno-Tempini, M.L., Hutton, C., Josephs, O., Deichmann, R., Price, C.,

Turner, R., 2002. Echo time dependence of BOLD contrast and

susceptibility artifacts. NeuroImage 15, 136–142.

Halsband, U., Krause, B.J., Sipila, H., Teras, M., Laihinen, A., 2002. PET

studies on the memory processing of word pairs in bilingual Finnish–

English subjects. Behav. Brain Res. 132, 47–57.

Henke, K., Weber, B., Kneifel, S., Wieser, H.G., Buck, A., 1999. Human

hippocampus associates information in memory. Proc. Natl. Acad. Sci.

U. S. A. 96, 5884–5889.

Howseman, A.M., McGonigle, D.J., Grootonk, S., Ramdeen, J., Athwal,

B.S., Turner, R., 1998. Assessment of the variability in fMRI data sets

due to subject positioning and calibration of the MRI scanner.

NeuroImage 7, 599.

Jackson III, O., Schacter, D.L., 2004. Encoding activity in anterior medial

temporal lobe supports subsequent associative recognition. NeuroImage

21, 456–462.

Kelley, W.M., Miezin, F.M., McDermott, K.B., Buckner, R.L., Raichle,

M.E., Cohen, N.J., Ollinger, J.M., Akbudak, E., Conturo, T.E., Snyder,

A.Z., Petersen, S.E., 1998. Hemispheric specialization in human dorsal

frontal cortex and medial temporal lobe for verbal and nonverbal

memory encoding. Neuron 20, 927–936.

Machielsen, W.C., Rombouts, S.A., Barkhof, F., Scheltens, P., Witter, M.P.,

2000. FMRI of visual encoding: reproducibility of activation. Hum.

Brain Mapp. 9, 156–164.

Maitra, R., Roys, S.R., Gullapalli, R.P., 2002. Test– retest reliability

estimation of functional MRI data. Magn Reson. Med. 48, 62–70.

Maldjian, J.A., Laurienti, P.J., Driskill, L., Burdette, J.H., 2002. Multiple

reproducibility indices for evaluation of cognitive functional MR

imaging paradigms. AJNR Am. J. Neuroradiol. 23, 1030–1037.

Manoach, D.S., Halpern, E.F., Kramer, T.S., Chang, Y., Goff, D.C., Rauch,

S.L., Kennedy, D.N., Gollub, R.L., 2001. Test– retest reliability of a

functional MRI working memory paradigm in normal and schizo-

phrenic subjects. Am. J. Psychiatry 158, 955–958.

Mazziotta, J.C., Toga, A.W., Evans, A., Fox, P., Lancaster, J., 1995. A

probabilistic atlas of the human brain: theory and rationale for its

development. The International Consortium for Brain Mapping

(ICBM). NeuroImage 2, 89–101.

McGonigle, D.J., Howseman, A.M., Athwal, B.S., Friston, K.J., Frack-

owiak, R.S., Holmes, A.P., 2000. Variability in fMRI: an examination of

intersession differences. NeuroImage 11, 708–734.

Miki, A., Raz, J., van Erp, T.G., Liu, C.S., Haselgrove, J.C., Liu,

G.T., 2000. Reproducibility of visual activation in functional MR

imaging and effects of postprocessing. AJNR Am. J. Neuroradiol.

21, 910–915.

Miki, A., Liu, G.T., Englander, S.A., Raz, J., van Erp, T.G.,

Modestino, E.J., Liu, C.J., Haselgrove, J.C., 2001a. Reproducibility

of visual activation during checkerboard stimulation in functional

magnetic resonance imaging at 4 Tesla. Jpn. J. Ophthalmol. 45,

151–155.

Miki, A., Raz, J., Englander, S.A., Butler, N.S., van Erp, T.G., Haselgrove,

J.C., Liu, G.T., 2001b. Reproducibility of visual activation in functional

magnetic resonance imaging at very high field strength (4 Tesla). Jpn. J.

Ophthalmol. 45, 1–4.

Miller, M.B., Van Horn, J.D., Wolford, G.L., Handy, T.C., Valsangkar-

Smyth, M., Inati, S., Grafton, S., Gazzaniga, M.S., 2002.

Extensive individual differences in brain activations associated

with episodic retrieval are reliable over time. J. Cogn. Neurosci.

14, 1200–1214.

Neumann, J., Lohmann, G., Zysset, S., von Cramon, D.Y., 2003. Within-

subject variability of BOLD response dynamics. NeuroImage 19,

784–796.

Noll, D.C., Genovese, C.R., Nystrom, L.E., Vazquez, A.L., Forman, S.D.,

Eddy, W.F., Cohen, J.D., 1997. Estimating test– retest reliability in

functional MR imaging. II: application to motor and cognitive

activation studies. Magn. Reson. Med. 38, 508–517.

Nyberg, L., Cabeza, R., Tulving, E., 1996. PET studies of encoding and

retrieval: the HERA model. Psychonom. Bull. Rev. 3, 135–148.

Oldfield, R.C., 1971. The assessment and analysis of handedness: the

Edinburgh inventory. Neuropsychologia 9, 97–113.

Otten, L.J., Henson, R.N., Rugg, M.D., 2001. Depth of processing effects

on neural correlates of memory encoding: relationship between findings

from across-and within-task comparisons. Brain 124, 399–412.

Rombouts, S.A., Barkhof, F., Hoogenraad, F.G., Sprenger, M., Valk, J.,

Scheltens, P., 1997. Test– retest analysis with functional MR of the

activated area in the human visual cortex. AJNR Am. J. Neuroradiol.

18, 1317–1322.

Rombouts, S.A., Barkhof, F., Hoogenraad, F.G., Sprenger, M., Scheltens,

P., 1998. Within-subject reproducibility of visual activation patterns

with functional magnetic resonance imaging using multislice echo

planar imaging. Magn. Reson. Imaging 16, 105–113.

Rutten, G.J., Ramsey, N.F., van Rijen, P.C., van Veelen, C.W., 2002.

Reproducibility of fMRI-determined language lateralization in individ-

ual subjects. Brain Lang. 80, 421–437.

Schacter, D.L., Wagner, A.D., 1999. Medial temporal lobe activations in

fMRI and PET studies of episodic encoding and retrieval. Hippocampus

9, 7–24.

Scoville, W.B., Milner, B., 1957. Loss of recent memory after

bilateral hippocampal lesions. J. Neurol., Neurosurg. Psychiatry

20, 11–21.

Shergill, S.S., Bullmore, E.T., Brammer, M.J., Williams, S.C., Murray,

R.M., McGuire, P.K., 2001. A functional study of auditory verbal

imagery. Psychol. Med. 31, 241–253.

Strother, S.C., Lange, N., Anderson, J.R., Schaper, K.A., Rehm, K.,

Hansen, L.K., Rottenberg, D.A., 1997. Activation pattern reproduci-

bility: measuring the effects of group size and data analysis models.

Hum. Brain Mapp. 5, 312–316.

Swallow, K.M., Braver, T.S., Snyder, A.Z., Speer, N.K., Zacks, J.M., 2003.

Reliability of functional localization using fMRI. NeuroImage 20,

1561–1577.

Tegeler, C., Strother, S.C., Anderson, J.R., Kim, S.G., 1999. Reproduci-

bility of BOLD-based functional MRI obtained at 4 T. Hum. Brain

Mapp. 7, 267–283.

Page 11: The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task

K. Wagner et al. / NeuroImage 28 (2005) 122–131 131

Wagner, A.D., Poldrack, R.A., Eldridge, L.L., Desmond, J.E., Glover, G.H.,

Gabrieli, J.D., 1998. Material-specific lateralization of prefrontal

activation during episodic encoding and retrieval. NeuroReport 9,

3711–3717.

Wei, X., Yoo, S.S., Dickey, C.C., Zou, K.H., Guttmann, C.R., Panych, L.P.,

2004. Functional MRI of auditory verbal working memory: long-term

reproducibility analysis. NeuroImage 21, 1000–1008.

Yetkin, F.Z., McAuliffe, T.L., Cox, R., Haughton, V.M., 1996. Test– retest

precision of functional MR in sensory and motor task activation. AJNR

Am. J. Neuroradiol. 17, 95–98.