Page 1
CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING TEST
Brian T. Smith
A Thesis Submitted to the
University of North Carolina Wilmington in Partial Fulfillment
of the Requirements for the Degree of
Master of Arts
Department of Psychology
University of North Carolina Wilmington
2011
Approved by
Advisory Committee
Jeffrey Toth Alissa Dark-Freudeman
Karen Daniels
Accepted by
Dean, Graduate School
Page 2
ii
TABLE OF CONTENTS
ABSTRACT .................................................................................................................................. iii
ACKNOWLEDGEMENTS ........................................................................................................... iv
LIST OF TABLES ...........................................................................................................................v
INTRODUCTION ...........................................................................................................................1
About the Trail Making Test ...................................................................................................... 2
TMT as a Predictor of MCI and AD ........................................................................................... 4
Limitations of the TMT .............................................................................................................. 5
Rationale for the Study ............................................................................................................... 7
Summary of Hypotheses ........................................................................................................... 13
METHOD ......................................................................................................................................13
Participants ................................................................................................................................ 13
Materials ................................................................................................................................... 14
Procedure .................................................................................................................................. 18
RESULTS ......................................................................................................................................19
Performance on Individual Tasks ............................................................................................. 19
Task Reliability ......................................................................................................................... 23
Correlations Among Trails Tasks ............................................................................................. 25
Correlations of Trails to Criterion Measures ............................................................................ 25
DISCUSSION ................................................................................................................................26
CONCLUSIONS ...........................................................................................................................31
REFERENCES ..............................................................................................................................33
Page 3
iii
ABSTRACT
The goal of this research was to create and evaluate a computerized touch-screen
version of the popular Trail Making Test (TMT). The TMT is a pen-and-paper test that has been
used for decades to measure individual differences in executive functioning and to help identify
cognitive deficits associated with dementia. Our computerized variant, called eTrails, was aimed
at addressing some of the limitations of the original TMT and improving both its reliability and
predictive accuracy. Two additional eTrails variants were also created that manipulated aspects of
the task thought to drive its predictability; namely, the ability to block out distraction (eTrails
Flash) and visual search ability (eTrails Scramble). All variants of eTrails demonstrated
increased reliability relative to the TMT and most of the eTrails variants showed strong inter-task
correlations; however, relationships between eTrails and well-known measures of executive
functioning were generally insignificant. Potential explanations for the failure to find increased
predictive power for this more reliable TMT variant are discussed.
Page 4
iv
ACKNOWLEDGEMENTS
I would like to thank Dr. Karen Daniels for her mentorship and constant guidance
throughout my graduate career. You have been an excellent role model as a scientist and a
person. To my committee members, Dr. Jeffrey Toth and Dr. Alissa Dark-Freudeman, thank you
for your invaluable assistance and suggestions. I would like to especially thank Dr. Toth for his
contributions to designing and coding eTrails. I could not have accomplished what I have
without the three of you.
Finally I would like to thank my parents Thomas and Katherine Smith for
instilling the love of science and learning in me. Your love and support through the years have
made me the man I am today.
Page 5
v
LIST OF TABLES
Table Page
1. Trail Making Test Scores ....................................................................................38
2a. Form ―A‖ Statistics for eTrails ...........................................................................39
2b. Form ―B‖ Statistics for eTrails ............................................................................39
3. Divided Attention Task Results...........................................................................40
4. Ospan Results......................................................................................................42
5. Stroop Results .....................................................................................................43
6a. Reliability for Trails ............................................................................................43
6b. Reliability for Criterion Measures ......................................................................43
7. Correlations Between Trails Tasks ......................................................................44
8. Correlation Between Trails Tasks and Criterion Measures .................................45
9. Correlations Between Trails Tasks Using Subtraction Scores ............................46
10. Correlations Between Trails Tasks and Criterion Measures Using
Subtraction Scores ..............................................................................................47
Page 6
INTRODUCTION
Old age can be a time of fulfillment and enjoyment; retirement frees up time to
engage in leisure activities made impossible by careers and more time can be spent with
family and friends. Unfortunately, many older adults never have the opportunity to enjoy
the benefits of late life because it is also a time of increased vulnerability to a number of
severe disorders. One of the most prevalent and debilitating age-related disorders is
Alzheimer‘s Disease.
Alzheimer's Disease (AD) is a "degenerative brain disorder in which neurons, the
specialized cells of the brain that process information, stop functioning properly" (Caroli
& Frisoni, 2009, p. 570). The 2011 Facts and Figures report from the Alzheimer‘s
Association stated that approximately 5.4 million people in the United States are
currently living with AD, making it the sixth leading cause of death for Americans. The
incidence of AD rises sharply with age with only 2% to 5% of 65 year olds showing signs
of AD, but 25% to 50% of those 85 and older showing symptoms. More alarming, the
report revealed that, while most major causes of death (e.g., heart disease, many cancers,
stroke, and HIV/AIDS) are on the decline, AD has increased 66% in recent years because
there is no known cure and no clear method for prevention of the disease. As a result,
intervention for AD has tended to focus on early detection. Diagnosing AD early has
many benefits including enhanced medical care, preparation for the lifestyle changes that
must accompany eventual cognitive decline, and allowing for interventions that slow
cognitive decline at the earliest possible stage (Caroli & Frisoni, 2009). Unfortunately,
Page 7
2
detecting AD and providing a clear diagnosis can be very difficult in the early stages of
the disorder.
One approach to early detection of AD is identifying cognitive precursors of the
disease in pre-clinical individuals (Balota, Tse, Hutchison, Spieler, & Morris, 2010). Mild
cognitive impairment (MCI) is defined broadly as cognitive decline that is greater than
would normally be expected for an individual at a certain age or education level without
affecting their daily activities in notable ways (Gauthier, Reisberg, Zaudig, & Petersen,
et. al, 2006). MCI typically presents itself as minor memory lapses (amnesic-MCI) with
normal thinking and reasoning skills. Evidence suggests that the brains of individuals
who suffer from MCI are neurobiologicaly different from individuals without such
cognitive impairment, and that the changes in the brains of MCI patients are similar to
those who suffer from AD but on a less severe scale (Harotunian, Hoffman, & Beeri,
2009). These findings suggest that MCI is likely associated with the early stages of AD
and that being able to effectively identify individuals with MCI might serve as an ―early
warning system‖ for dementia.
While there are a number of neuropsychological measures known to be sensitive
to MCI, a simple paper-and-pencil measure known as the Trail Making Test (TMT) has
proven to be one of the most widely used and sensitive tests to the onset of the disease
(Blacker, Lee, Muzikansky, Martin, Tanzi, & McArdle, 2007; Chen, Ratcliff, Belle,
Cauley, DeKosky, & Ganguli, 2001; Johnson, Lui, & Yaffe, 2007; Storandt, 2008).
About the Trail Making Test
The Trail Making Test (TMT) is an, "efficient and sensitive instrument that is
easily administered, and which reliably discriminates between normal individuals and
Page 8
3
those with brain impairment" (Arbuthnott & Frank, 2000, p. 312). It is considered to be
one of the best measures of general brain functioning (Reitan & Wolfson, 1985;
Mitrushina, Boone, & D‘Elia, 1999). The TMT was created in 1938, was originally called
―Partington‘s Pathways‖, and was included in the Army Individual Test Battery as well as
the Halsted-Reitan Neuropsychological Test Battery. The TMT is a two-part pen-and-
paper test that is believed to measure visual-motor functioning, symbol recognition, the
ability to scan a page, the flexible integration of numerical and alphabetical information
under time pressure, as well as executive functions such as sequencing and mental
flexibility (Reitan & Wolfson, 1985).
The original TMT included two versions, A and B. In Trails A, individuals are
given a sheet of paper containing circles with the numbers 1 to 25 organized randomly on
the page and are asked to rapidly connect the numbers in sequential order. Trails B,
shown in Appendix A, also requires individual to connect 25 target circles, but this time
alternating between letters and numbers in alphabetical order (i.e., circling 1, then A, then
2, then B etc.). Trails B is thought to measure a more complex set of cognitive abilities
that include planning, sequencing, updating working memory, and shifting between two
stimulus domains (Arbuthnott & Frank, 2000; Lezak, Howieson, & Loring, 2004;
Strauss, Sherman, & Spreen, 2006). It is important to note that these are all executive
abilities that are often found to decline in older adults (Gaudino, 1995) and they are also
among the first to show decrements as a function of MCI (Reitan, 1985). Trails B is one
of the few neuropsychological measures that is able to differentiate between dementia
patients and control subjects (Cahn, Salmon, Butters, Wiederholt, Corey-Bloom,
Edelstein, & Barrett-Connor, 1995).
Page 9
4
Trails A is generally treated as a baseline condition where response latency is
believed to reflect simple reaction time (RT). Unlike Trails B, successful performance on
Trails A has been shown to rely very little on executive abilities (Arbuthnott & Frank,
2000). By comparing an individual‘s performance on the A (non-executive) and B
(executive) versions, one is able to generate two critical measures of the individual‘s
cognitive/executive capacity: the difference between Trails A and Trails B (how much
longer it took them to complete Trails B) and the B/A ratio. Slowed performance on
Trails B relative to Trails A is used as an indication of cognitive impairment or a general
frontal lobe dysfunction. There have also been attempts to create normative reaction
times on Trails B; a time of less than 72 seconds is considered normal performance, 73-
105 seconds is considered mild impairment, and anything beyond 106 seconds is
considered serious impairment. Still, the most commonly used measure remains the
difference score between Trails A and Trails B because it is the most general and
conservative, and therefore difference scores will comprise the primary dependent
variable for the current study.
TMT as a Predictor of MCI and AD
Prior research has also illustrated the validity of the TMT as a predictor of
the cognitive deficits associated with MCI; this includes the ability to maintain focus on a
goal despite distractions as well as the ability to alternate attention between two different
goals (Arbuthnott & Frank, 2000). More recent studies have further demonstrated the
predictive power of the TMT in diagnosing MCI by extending these findings to the early
stages of AD where afflicted individuals show both longer reaction times and increased
error rates compared to healthy individuals (Ashendorf, Jefferson, O'Connor, Chaisson,
Page 10
5
Green, & Stern, 2008). A final source of evidence for the utility of the TMT comes from
the neuroscience literature. A recent fMRI study, for example, found that TMT
performance was associated with significant increases in blood flow to the prefrontal
cortex, the brain region known to underlie many executive abilities found to decline with
MCI and AD (Kubo, Shoshi, Kitawaki, Takemoto, Kinugasa, Yoshida, Honda, &
Okamoto, 2008).
However, while a general relationship between the TMT and these
frontal/executive regions is evident, it is still not clear exactly which executive abilities
are being taxed. Given that the prefrontal cortex has been linked to a broad array of
higher-order abilities, it is not clear whether the deficits in TMT performance and the
corresponding increase in blood flow to frontal regions are due to visual search, blocking
out distractions, planning, etc. Prior TMT research provides conflicting views regarding
which factors drive performance (Cubillo, 2009; Artbutnott & Frank, 2000) and this
controversy provides one of the motivations for our creation of a computerized version of
the TMT.
Limitations of the TMT
Although the Trail Making Test is one of the most commonly used tests for
diagnosing MCI, its utility is limited by several factors having to do with test design and
administration. A number of these limitations are created by its reliance on a pen-and-
paper format. First, repeated use of the test—critical for detecting the types of within-
individual changes associated with the early stages of AD—is highly limited with current
pen-and-paper versions (Salthouse, Toth, Daniels, et al., 2000). Most notably, there are
only two alternate forms of Trails A and B (all testing is done with same two
Page 11
6
arrangements of circles). The lack of alternate forms makes it difficult to experimentally
investigate the relevant cognitive processes underlying task performance. This can be
attributed to the fact that these various factors that may be driving performance (e.g.,
circle arrangement, distance between circles, etc.) cannot be systematically manipulated.
Secondly, research has shown that there are significant practice effects with repeated
administration of existing forms (e.g., Buck, Atkinson, & Ryan, 2008); these practice
effects are evident after only two exposures in the same day (Franzen, 1996) and can be
detectable up to one year after initial administration (Basso, Bornstein, & Lang, 1999).
Such practice effects represent one of the most pervasive problems when utilizing within-
subject research designs. Any improvements observed during the second administration
of the test may simply be due to their prior exposure to it, making it both difficult to
interpret performance gains as well as to establish reliability. Measures of executive
function often show a high level of practice effects because they typically present the
subjects with novel situations in which they must solve a problem or recognize an
abstract concept. After the first administration of the test, they know all of the "tricks";
the novelty wears off quickly and they are able to refine their strategies thereby
improving test scores. Trails-B is even more affected by practice effects than Trails-A
because of the increased novelty associated with doing this particular task (Franzen,
1996). These findings substantially undermine the diagnostic sensitivity of the TMT,
especially in cases of AD where intra-individual changes in performance are thought to
be one of the most sensitive indicators of abnormal cognitive decline.
A final notable limitation of the TMT is linked to its requirement for individuals
to manually connect the circles with a pencil line or "trail". This requirement adds time
Page 12
7
and variability to performance due to a number of factors unrelated to the cognitive
processes of interest (i.e., those affected in early AD)—factors such as handedness,
arthritis, and general dexterity. In addition, reliability scores can vary greatly based on
administrative errors. One such error is the failure of the examiner to correctly return the
subject‘s pencil to the place from which they began drawing the incorrect trail. It is also
common for the subject to not fully understand the directions of the test before beginning
(Arbuthnott & Frank, 2000). Individuals who are instructed thoroughly and who are
given sufficient practice prior to beginning the actually task demonstrate a significant
time advantage over those who are not. These limitations complicate interpretation of
TMT performance. When impaired performance is observed on the TMT, is it not clear
whether such impairments reflect a true cognitive deficit or more superficial problems
related to task administration or format. The current research aims to standardize the
procedures involved in Trails administration with the goal of increasing its diagnostic
power.
Rationale for the Study
With the above issues in mind, Dr. Jeffrey Toth and I created a computerized
version of the TMT called "eTrails" that uses touch-screen technology. This computerized
task embodies the same general methods and principles as the original pen-and-paper
version with the key exception that, rather than connecting circles with a pen on paper,
participants touch targets arranged on a computerized display in the specified order. One
goal of eTrails is to try to address some of the limitations in the existing version of the
TMT described above. First, it will allow the researchers to change the location of the
targets on the screen (the letters and numbers), thereby substantially reducing the problem
Page 13
8
of practice effects and opening up the possibility of multiple testing sessions for the same
individual. As stated earlier, the TMT only has two different forms; eTrails currently has
over 30. As noted by Buck, Atkinson, and Ryan (2008), the most effective way to
determine whether an individual's change in performance from one testing session to the
next is meaningful is by conducting "test-retest score difference using alternate and
theoretically equivalent forms" (p. 312). A computerized version of the TMT that
provides us with a number of different, but equivalent, forms would allow us to examine
such test-retest reliability. Finally, comparing the original pen-and-paper version of the
TMT with eTrails in the current study will provide a direct test of the effects of test
format (paper vs. computer) on TMT performance and may introduce computerized
(touch screen) testing as an easier and more reliable method of responding that
overcomes limitations related to the dexterity of the subject. This ease of testing also
potentially allows for the inclusion of more varied research and clinical populations.
It should be noted that computerized versions of the TMT have been previously
attempted (Drapeau, Bastien-Toniazzo, Rous, & Carlier 2007; Kubo, 2009). However, the
current study differs from this earlier research in two important ways: First, the earlier
studies were simply direct replications of the TMT and did not fully take advantage of the
change in format. The current study is designed to go beyond these direct replications and
to use computerization to address some of the aforementioned limitations of the pen-and-
paper format. A second, exciting difference regarding the computerized variants is that
they will allow us to make systematic changes to the task with the ultimate goal of further
improving its predictive power. These changes will be discussed in the following
sections. eTrails‘ computerized format (1) is millisecond-accurate and thus can detect
Page 14
9
much smaller differences in performance across individuals; (2) will introduce the
possibility of getting additional measures of performance that go beyond those derived
from the TMT and which may be related to early AD (e.g., time to first touch, average
touch time, time between touches as a function of stimulus class, etc.); and (3) will permit
various aspects of the task to be experimentally manipulated with the goal of increasing
the cognitive, or executive, control needed to perform the task and thus will allow the
effects of these changes to be directly assessed. The specific manipulations performed are
discussed in the next sections.
eTrails-Standard
The first computerized TMT variant, eTrails-Standard, is similar in
structure and procedure to the original TMT with the exception that different
arrangements of numbers and letters are afforded by the computerized format. The goal
of eTrails-Standard is to keep the critical aspects of the task as close to the original TMT
as possible such that any observed performance differences between the two are
attributable to the format change. If eTrails-Standard is found to not correlate with the
original TMT then it would suggest that the pen-and-paper format and/or the fixed
configuration of targets may have contributed to the validity of the original task. This
version of eTrails is expected to show a moderate correlation with the paper-and-pencil
TMT indicating that they are tapping into the same executive abilities. It should also have
higher correlations with other executive measures compared to the TMT given that it is
expected to be less hindered by the structural and procedural limitations mentioned
above.
Page 15
10
eTrails-Flash: Random and Next
The second and third eTrails variants involve attentional capture. Capture occurs
when an aspect of one‘s environment (e.g., a horn, a flashing light, etc.) automatically
draws one‘s attention, sometimes despite the intention to ignore it. There is increasing
evidence showing that loss of attentional control occurs as a function of healthy aging
(Jacoby, Bishara, Hessels, & Toth, 2005) and that these deficits are particularly
pronounced even in the early stages of AD (Castel, 2009). Older adults with dementia
seem particularly vulnerable to what Daniels, Toth, and Jacoby (2006) call "goal neglect"
in which a distractor can derail their attention from the task at hand. In the current study,
capture was increased relative to the original TMT by making one of the square targets on
the computer screen flash briefly (quickly change from red to white and back again). The
first version of this task, referred to as ―FlashRandom‖, involves having a random
incorrect square flash white briefly as participants are trying to respond to each target
(i.e., the flashed number is not predictably related to the target response). The expectation
is that, when an individual is searching for the next letter/number in the sequence, the
flashing of an incorrect letter/number will draw attention away from the task goal. Thus,
to avoid clicking on the flashing incorrect target, the individual must exert executive
control; this should further distinguish those who may be more prone to the increased
capture typical of dementia.
The second capture variant, ―FlashNext‖, is similar to the ―FlashRandom‖
variant described above in that it involves having a random incorrect square flash;
however, for this version, the next target square in the sequence will flash. For example,
Page 16
11
as a participant selected the ―2‖ target square in the Trails A task, the ―3‖ square would
flash. In this case, the flash provides the participant with the correct answer and, unlike
the ―FlashRandom‖ variant, may actually facilitate Trails performance. This change may
be diagnostic as well: participants who struggle on this ―FlashNext‖ variant may
demonstrate the most significant deficits on other tasks. If they react slowly in a
facilitating, predictive environment, they will undoubtedly be slow under conditions of
distraction.
eTrails-Scramble
In the final eTrails variant, a correct button press on any trial will result in all of
the other, non-target labels switching positions. The location of the buttons themselves
stay in the same physical space on the screen, but the labels of the buttons (the numbers 2
through 16) randomly switch positions with one another (the ―2‖ may move to where the
―4‖ once was, the ―4‖ may move to where the ―16‖ previously was, etc.). This scrambling
is intended to remove the ability of participants to plan their next selection in advance by
visually scanning the fixed arrangement of squares. Visual search is considered by many
to fall under the umbrella of executive functioning (e.g., Kubo et al., 2007) and this
ability to select targets from a visual display is one that appears to decline in AD (Castel,
Balota, & McCabe, 2009). By preventing ―look ahead‖ with this scramble variant, it is
expected that participants will require more cognitive effort to search for the next target,
placing pressure on those suffering from executive declines.
Dividing Attention as a Proxy for Executive Deficits
As stated above, one of the goals of this study is to improve upon the predictive
strength of TMT, especially as it relates to early diagnosis of the kinds of cognitive
Page 17
12
deficits found in MCI and Alzheimer‘s disease. Unfortunately, patient populations are
difficult to access and are impractical for the purposes of establishing the validity and
reliability of our eTrails variants. Thus, it became necessary to find a way to mimic
cognitive deficits in the laboratory as a proxy for testing patients. We accomplished this
by using a divided attention (DA) paradigm, also known as the secondary task technique
and the dual-task technique (Posner & Boles, 1971). Divided attention manipulations
require an individual to allocate some their attentional resources to a simultaneous
secondary task, preventing them from fully attending to the primary task. Research
consistently shows that dividing attention (DA) results in significantly poorer
performance on the primary task relative to full attention (FA) consistent with the idea
that DA uses up attentional resources (Anderson, Craik, & Naveh-Benjamin, 1998).
There is evidence to suggest that older adults have fewer resources to devote to a
task when compared to younger adults and thus these older adults often show greater
costs of dividing attention (for a review, see Verhaegen, Steitz, Sliwinski, & Cerella,
2003). Indeed, requiring young adults to perform under conditions of divided attention
tends to result in memory performance very similar to older adults who are performing
with full attention (Skinner, & Fernandes, 2009). Research also suggests that there is a
marked impairment in the ability of Alzheimer‘s patients to co-ordinate the performance
of two simultaneous tasks and that AD may result in a severe dual-processing deficit not
observed to the same degree in normal aging (Baddeley, Baddeley, Bucks, & Wilcock,
2001). The current study employs dual-task costs, or the negative change in performance
for younger adults under divided attention compared with full attention, as a way to
Page 18
13
mimic cognitive decline. Those younger adults who suffer greatly under divided attention
provide a suitable analog for older adults with MCI.
Summary of Hypotheses
Participants completed the A and B versions of the original TMT, along with four
computerized variants (eTrails-Standard, eTrails-FlashNext, eTrails-FlashRandom, and
eTrails-Scramble). Participants were also given a brief battery of memory, attention, and
cognitive speed measures. For the divided attention task, the difference between the two
conditions (full attention performance minus divided attention performance) was used as
a proxy for the kinds of cognitive decline observed in AD. The main hypotheses under
investigation were that (1) eTrails-Standard would demonstrate stronger correlations with
the various executive criterion measures relative to the pen-and-paper TMT which is
hindered by the above limitations; and (2) the eTrails-Scramble and eTrails-FlashRandom
variants would better predict than eTrails-Standard because they are intended to directly
tax those executive processes believed to drive TMT performance; any differences
between them would inform the relative importance of capture and visual search to the
TMT. Moreover, the use of the divided attention manipulation would provide evidence of
the predictive accuracy of these various measures along a spectrum of cognitive ability.
METHOD
Participants
The sample for the current study consisted of 43 young adults (22 females, 21
males). All were UNCW undergraduates who voluntarily signed up through the
Psychology Department‘s research system. Two subjects‘ results were excluded due to
Page 19
14
large error rates on the TMT making their data impossible to score. The data from the
forty-one remaining subjects were statistically analyzed .
Materials
The main experimental tool for this study was ―eTrails 1.0‖. The program was
built using Microsoft Visual Basic version 6.0. The program consists of a 576 x 792 pixel
window on which 16 50x50 pixel buttons are presented. Unlike the pen-and-paper TMT
that has only 2 different forms, eTrails utilizes 32 different forms in the current study (16
practice forms and 16 full forms). The practice forms each contain 6 squares, while the
full length forms contain 16. Each form was created by dividing the computer screen into
4 sections, and then again into 4 subsections (Appendix B). A random dice software was
used and based on two dice rolls the squares' placement were assigned to that area (in the
appendix, green boxes would be the first subsection, the blue would be the second, black
would be 3, and orange would be 4). For example, if a ‗1‘ and then a ‗4‘ were rolled the
square would be placed somewhere in the 4th
box in the 1st subsection. Slight manual
adjustments were made to the positioning assigned by the program only under the
following conditions: (a) squares were overlapping or too near to one another (closer than
50 pixels); (b) two sequential numbers were placed directly next to one another; or (c) the
resultant patterns of placement were too easy (all odd numbers placed on the top and all
even numbers on the bottom). This process was done for each and every configuration
such that no two layouts ended up the same.
When the eTrails program is initiated there is a prompt to enter an identification
number, the participant‘s age, and the participant‘s gender. Once all the values are
assigned, the participant presses a ―begin‖ button and the test initiates. Before each full
Page 20
15
round, the program includes a 6-button practice session. The program records the
response time both between buttons and for the entire trial as well as the total number of
errors. eTrails was designed for use on touch screen computers and all variants were
administered using ASUS ―EeeTop PC ET1602‖. In addition to eTrails and the original
pen-and-paper TMT described above, several cognitive tests were used as outcome
measures to assess the degree to which these Trails variants can successfully predict
various forms of higher-order cognition.
Operation Span (Ospan)
Ospan is a common measure of working memory capacity (Conway, Kane,
Bunting, Hambrick, Wilhelm, & Engle, 2005; Turner & Engle, 1989). Working memory
is a limited capacity system involved in the storage and manipulation of information in
the service of complex goals (Baddeley & Hitch, 1974; Engle, 2002). Working memory
tasks have been shown to be highly predictive of a performance on a variety of laboratory
and real-world tasks (Conway et al., 2005). Importantly, performance on span tasks
shows marked impairment even in the early stages of Alzheimer‘s Disease (Kensinger,
Shearer, Locascio, Growdon, & Corkin 2003; Rosen, Bergeson, Putnam, Harwell, &
Sunderland, 2002). While performing Ospan, participants were asked to recall a list of
words while simultaneously solving simple math equations. On each equation-word trial,
participants were given a math problem followed by a letter (i.e. Is (6x2) -5 = 7? Q). The
participants were then instructed to read the math equation aloud and to respond ―yes‖ or
"no‖ to its correctness, and finally to read the letter aloud and try to remember it for a
later test. After some number of these equation-word trials, a recall cue appeared on the
screen (i.e.‖???‖) and participants attempted to recall as many letters as possible in the
Page 21
16
order in which they were presented by writing their responses on an answer sheet. Set
sizes ranged from two to five words across a total of 12 sets. An individual‘s span score
was determined by the number of words they correctly recalled in order.
Color-Word Binary Stroop
The Stroop task is considered the ―gold standard‖ measure of attentional control
(MacLeod, 1992). In this task, participants were presented with a color-word (either the
word ―red‖ or the word ―blue‖) presented in a colored font (e.g., the word RED presented
in blue font; the word BLUE presented in blue font) and were instructed to name the
color of the font (―blue‖ in both cases) as quickly as possible. Participants were told to
only respond to the color of the font, and not the word itself. A Labtec AM-22
microphone was used to record reaction time. The experimenter recorded the accuracy of
responses by pressing the "1" key when the participant responded "red", the "2" key for
"blue", and the "3" key for discarded trials. Discarded trials included partial responses
("bl-red"), stutters (r…r…red), and extraneous noises and movements that inadvertently
triggered the microphone (e.g., coughing, exhaling, etc.).
Participants completed 155 total trials, 95 congruent (where the color and word
matched), 30 incongruent (where the color and word were different), and 30 neutral trials
(which involved ampersands in different colors). Stroop is a good measure given the
goals of the current study, because increases in interference scores in the Stroop task are
regularly observed for Alzheimer‘s patients (Bondi, Serody, Chan, Eberson-Shumate,
Delis, Hansen, & Salmon, 2002; Spieler, Balota, & Faust, 1996) and Stroop variants with
a large proportion of congruent trials have been shown to be particularly taxing to
executive processes like working memory (Kane & Engle, 2003).
Page 22
17
Recognition Task with Full vs. Divided Attention.
Recognition is a measure of long-term episodic memory. For this study we used a
recognition procedure similar to that of Jacoby, Toth, and Yonelinas (1993). Participants
were shown five-letter nouns, one at a time, on the computer screen and were asked to
read them aloud and to remember them for a later test. During the divided-attention
portion of the task, participants were further instructed that they would also be
performing a listening task (Craik, 1982). The listening task involved the participants
monitoring a computer-recorded audio file in which a list of numbers was read aloud at a
2-second pace. The participants were instructed to respond ―now‖ every time they heard
the target sequence, three odd digits in a row. The list of numbers conformed to two
rules: (1) Each target sequence must have at least two even numbers between it and the
next target sequence, and (2) the interval between target sequences must vary in length in
order to be unpredictable (this interval ranged between two to six numbers). During the
test phase participants were again given five-letter words. Some of these words were
previously on the study list, others were new words not encountered previously in the
task. The participant simply needed to discriminate between these old and new items by
pressing the ―1‖ key if an item was on the previous list and ―2‖ if the word is a new word.
Other Tests
Two final pen-and-paper tests were used to ensure that our young sample was
relatively representative of their age group. One was the Shipley Test which is a common
pen-and-paper test of vocabulary where participants are shown 40 increasingly
challenging words (e.g., ―talk‖ is an early word and ―querulous‖ is a later one) and are
Page 23
18
asked to choose from among four alternatives which word is most similar in meaning to
the target. The other was Letter Comparison which is a computerized measure of speed
of cognitive processing where participants are asked to quickly compare two strings of
letters and to indicate using a key-press whether they are the same (by pressing the ―S‖
key) or different (―D‖ key).
Procedure
When participants arrived in the laboratory, they were directed to a separate
testing room. They completed consent and demographics forms, and were given a general
introduction to the design and goals of the study. They were then given the Trails variants
in the following order: paper-and-pencil TMT, eTrails-Standard, eTrails-FlashNext,
eTrails-Scramble, and eTrails-FlashRandom. For each, they received the A version
directly followed by the B version. Following completion of all the Trails tasks (which
took approximately 30 minutes), participants completed the other cognitive measures in
the following order: Recognition Full Attention, Recognition Divided Attention, Ospan,
[5-10 minute break], and Stroop. For the purposes of establishing reliability, participants
then completed a second administration of each version of Trails. Note that different
layouts of letters and numbers were used for the second administration of each task to
reduce practice effects. Lastly, participants completed Letter Comparison, and Shipley.
The entire session took approximately one and a half hours. Participants were given
specific instructions prior to each task and were given the opportunity to ask clarification
questions. Upon completion of all of the tasks, the participants were debriefed, thanked
for their participation, and escorted out of the laboratory.
Page 24
19
RESULTS
Performance on Individual Tasks
The Trail Making Test
The reaction time (RT) data for the paper-and-pencil TMT are summarized in
Table 1. A 2 x 2 repeated measures ANOVA was conducted with Test Type (Trails A vs.
B) and Time of Administration (Time 1 vs. 2) as the within-subjects factors. The effects
of Test Type (F (1,160) = 12.56, p = < .005) as well as Time of Administration
(F (1,160 ) = 161, p < .001) were significant. The interaction did approach, but not
achieve, significance (F (1,160) = 3.44, p= .065). The significant effect of Test Type
shows that the ―B‖ versions of the task took significantly longer than the ―A‖ versions.
The significance of Time of Administration suggests that there were significant practice
effects with consistently slower RTs for the first administration of each task. Note also
that there was considerable variability in performance for the B forms relative to the A
forms as indexed by the larger range of scores across participants (SD = 17,046ms for the
first administration).
Focusing on the error rates for the TMT, only four of the 41 subjects made no
errors on the TMT (9.75%). Participants made an average of .89 errors per condition
(approximately 3.56 per person). A 2x2 ANOVA with Task Type and Time of
Administration as the factors was conducted on the error data. There were no significant
main effects or interactions. The most common error was not fully connecting the line to
the target circle. The subjects intended target was usually clear, but the pen line would
not cross the boundary of the circle. In addition to reducing their accuracy scores, this
Page 25
20
error likely also impacted participants‘ reaction times, because it makes the overall path
each subject takes shorter than it should be and therefore potentially shortens their RT.
eTrails
The eTrails reaction time data for the ―A‖ form can be found in Table 2a (top
panel) and the corresponding data for the ―B‖ form can be found in Table 2b (bottom
panel) [A1 indicates the first administration of each task and A2 the second
administration]. A oneway ANOVA comparing the ―A‖ forms revealed significantly
different RTs across the eTrails variants (F (7,320) = 59.15, p < .05). The Scramble
variant produced the longest average RT (19,508ms) and the FlashNext variant produced
the fastest RTs (11,789ms).
Like the TMT, the ―B‖ version of the eTrails produced slower reaction
times than their ―A‖ form counterparts. Again, Scramble took the longest to complete and
FlashNext was the fastest (24,668ms and 12,613ms respectively). A oneway ANOVA
comparing the "B" forms found that the RTs differed significantly across the variants (F
(7,305) = 74.84, p < .05). Practice effects were also evident in this data set as the second
administration took less time to complete than the first administration, as can be seen by
looking at the patterns present in Table 2b.
Turning to error rates for eTrails, six of the 41 participants made no errors
on the eTrails variants (14.63%). On average, participants made an average of .23 errors
per condition. These small error rates compared with the TMT (which you may recall
produced the corresponding values of 9.75% and .89) are particularly impressive when
one considers that each subject only completed four TMT forms but completed sixteen
eTrails forms, giving them much more opportunity for error on eTrails. These
Page 26
21
improvements in the error rate become even more evident when one directly compares
the TMT and eTrails-Standard. Twenty of the forty-one subjects never made an error on
eTrails-Standard (48.78%).
Divided Attention
As shown in Table 3, average corrected accuracy (hits – false alarms) for the full
attention condition was .73 (SD = .16) and for the divided attention condition was .36
(SD = .18). Divided attention performance was significantly impaired relative to full
attention performance, t (40) = -14.32, p = < .001. As an index of the cost of dividing
attention for each individual, participants‘ DA score was subtracted from their FA score.
The average divided attention cost was .38 (with a range of .65 with the lowest
performing individual scoring .13 and the highest performance individual scoring .68).
These results both support that the odd-numbers task was successful in dividing the
attention of our subjects and that it produced a good range of cognitive ability which is
critical in the current study for correlational analyses.
Ospan
Operation Span was analyzed in two ways (see Table 4). The first—referred to as
the relative score—involved adding up the total number of correct words the participant
recalled. The average relative score was 26.1 (SD = 5.2) and ranged from 16 to 36. The
second measure, called the absolute score, is calculated by counting correctly recalled
words only for trials where the participant successfully recalled all the words correctly.
The average absolute score was 13.4 (SD = 6.19) with a range in scores from 4 to 26. A
rule-of-thumb for understanding performance on Operation Span is by looking at the
general distribution of performance; for this approach, those individuals having an
Page 27
22
absolute score of 9 or lower are considered ―low spans‖, those with a score between 10
and 18 are considered ―mid spans‖, and those scoring 19 or over are considered ―high
spans‖. In the current study, then, there were 13 low spans, 18 mid spans and 10 high
spans, again demonstrating a relatively normal and variable distribution of performance
on this task.
Stroop
The results for congruent, incongruent, and neutral trials are displayed in Table 5.
The average reaction time for the congruent trials was 564ms (SD = 84ms), for the
incongruent trials was 704ms (SD = 130ms), and for the neutral trials was 600ms (SD =
99ms). The omnibus ANOVA and all post-hoc comparisons were significant (F (2, 80) =
100.41, p < .001), showing that they were all significantly different from one another.
Facilitation scores were calculated by subtracting neutral performance from congruent
performance (M = 37ms, SD = 47.21) and interference was calculated by subtracting
neutral trials from incongruent trials (M = 103ms, SD = 73.90). The Stroop Effect, an
index of the degree to which an individual was influenced by the to-be-ignored word, was
calculated by subtracting the congruent score from the incongruent score for each person.
The average Stroop effect was found to be 140ms (SD=73) and is consistent with
previous literature on the Stroop effect found in young adults (Spieler, Balota, & Faust,
1996).
Letter Comparison and Shipley
The average reaction time for the letter comparison task was 2,218ms
(SD=472ms). Because this is a recently computerized version of a traditionally pen-and-
paper task, there is no clear standard of comparison for performance on this measure. It
Page 28
23
will, however, help to elucidate age differences in cognitive speed in follow-up studies of
eTrails that include older participants. The average number of correct responses for the
Shipley vocabulary measure was 28 out of 40 in the current study (SD=3.77). These
scores are comparable, but slightly lower than those typically observed for young adults
on this task in the literature which are often in the low- to mid-thirties (e.g., Kemper &
Sumner, 2001; Spieler & Balota, 2000)
Task Reliability
Data Trimming
For all tasks, scores that were more than two standard deviations above or below
the mean for each individual were removed from the data set and were not included in
statistical analyses. This resulted in 19 individual data points – less than 1 per individual
– being deleted for the entire study.
Test-Retest Reliability
For the Trails tasks, test-retest reliability was calculated by correlating scores on
the first and second administrations of the task. As seen in Table 6a, the Trail Making
Task achieved low, but significant test-retest reliability, r = .37 (n = 37). Each version of
eTrails had considerably higher reliability estimates than its paper-and-pencil
counterpart: eTrails-Standard had the highest reliability with an r value of .62 (n = 37)
followed closely by eTrails-FlashNext (r = .56; n = 38), eTrails-Scramble (r = .55; n =
36), and eTrails-FlashRandom (r = .58, n = 38). All of the individual correlations reached
significance (p < .05); however only the TMT and eTrails-Standard showed marginally
significant differences in their reliabilities (z = -1.51, p = .07).
Page 29
24
Split-Half Reliability
The criterion measures used in this study (FA/DA, Ospan, and Stroop) are
commonly used in research because they are recognized as highly reliable measures.
Nevertheless, the reliability of each criterion measure was assessed in the current study
using a split-half procedure (because each measure was only administered once) as well
as the Spearman-Brown boosting formula, the results of which are in Table 6b. Our
divided attention paradigm was found to have strong reliability, r = .92 for the divided
attention task, and r = .93 for the full attention task. Ospan was found to have a reliability
that was somewhat lower than what is traditionally reported in the literature, r = .68 in the
absolute scoring condition, r = .71 in the relative scoring condition (e.g., Conway et al.
(2005) report it to generally be around .80). Stroop produced surprisingly poor reliability,
only r = .65. This is likely due to some combination of the following: (1) Motivational
changes over the course of the task: The Stroop task was the last critical criterion
measure to be administered in the current study and thus cognitive fatigue and subject
apathy were likely at their highest for this task; (2) the binary format of the current Stroop
task, and most critically, (3) the Stroop effect is represented by a subtraction of congruent
from incongruent performance; subtraction scores are notorious for producing lower
reliability in the Stroop task compared with correlations using the response latencies from
one or more conditions (Strauss, Allen, Jorgensen, & Cramer, 2005). In sum, with the
exception of the divided attention task, these frequently used criterion measures achieved
lower levels of reliability in the current study than expected.
Page 30
25
Correlations Among Trails Tasks
The Pearson Product-Moment correlations between the Trails tasks can be
found in Table 7. Correlations were conducted in two ways; the first set of correlations
described below look at relationships between the ―B‖ forms only while the second round
of correlations look at subtraction scores (B minus A). The most evident pattern in these
correlations is that the majority of the Trails tasks, including the TMT, showed
consistently significant correlations with one another. However, there is also one clear
exception to this pattern of significance: Both the TMT and eTrails-FlashNext lost much
of their predictability on the second administration. The potential weaknesses of these
two Trails variants specifically, and the potential problems with multiple administrations
of an executive task, will be discussed in more detail in the General Discussion. As a
general rule, though, this table points to fairly good construct validity for eTrails; eTrails
generally correlates with the TMT and with itself, which suggests that each variant is
measuring the same basic ability.
Correlations of Trails to Criterion Measures
Few correlations between the B forms of the Trails variants and the executive
tasks achieved significance (Table 8). The single exception to this pattern was the divided
attention task which was correlated with the first administration of the TMT (r = .363),
the first administration of the eTrails-Standard (r = .354), and the first administration of
eTrails-FlashRandom (r = .354). Neither Ospan nor Stroop correlated with any of the
TMT or eTrails tests. The only other notable correlation involving the criterion measures
was between Ospan Absolute and the divided attention task (r = -.370).
Page 31
26
Correlations Using Subtraction Scores
Our main hypotheses focused on the use of using subtraction scores as the key
measure for the Trails tasks given their use as the principal index of executive ability in
the original Trail Making Task, Unfortunately, none of the correlations involving the
Trails subtraction scores achieved significance, including both correlations among the
Trails tasks themselves (Table 9) and with the criterion measures (Table 10). A few of the
strongest correlations—largely involving the second administration of eTrails-Standard
and the divided attention task—remained, but the general pattern is clearly one of
nonsignificance. Also notable, but not easily interpretable, the Stroop task, which had not
demonstrated a significant interaction in any of the other analyses, correlated
significantly with the first administration of eTrails-RandomFlash.
DISCUSSION
The main goal of the current study was to standardize the procedures associated
with the pen-and-paper TMT using computerized, touch-screen technology. By simply
streamlining the administration of the TMT, it was expected that eTrails-Standard would
produce fewer errors, show increased reliability, and produce stronger correlations with
the other executive measures compared with the original TMT. Moreover, the eTrails
variants designed to strategically tax specific executive abilities (inhibition and visual
search) were expected to show even stronger correlations with other executive measures.
By achieving these goals, this study would provide both a clearer understanding of the
factors that drive the predictive power of the Trail Making Task as well as create a
potentially better diagnostic tool for executive deficits.
Page 32
27
Computerizing the Trail Making Test
The first goal of the current study–improving the administration of the
TMT through computerization—appears to have been achieved to some degree. Both the
TMT and eTrails-Standard produced data consistent with the prior literature (Arbuthnott
& Frank, 2000): In each case, subjects took longer to complete the ―B‖ form of the task
compared with the ―A‖ form and there was more variability in performance on the ―B‖
forms; both of these outcomes are consistent with the more executive nature of Trails B
and point to sufficient individual differences in the current study to conduct correlational
analyses.
More importantly, there were a number of limitations observed for the TMT in the
current study that appear to have been mitigated by computer administration. The first is
the higher than expected error rate on the TMT. Most of the errors observed were due to
rushed and careless performance (not connecting the pen line to a circle). These errors
due to carelessness were eliminated in eTrails due to its strict criteria as to what counts as
a correct response. These restrictions help make the error term a more helpful tool and the
overall RTs a more valuable indicator of performance.
A second, potentially related finding is the reduced practice effects for eTrails
compared with the TMT. There was a 20% reduction in average reaction time for TMT
from the first to second administration (for the ―B‖ forms) and only 10.33% for eTrails-
Standard (―B‖ forms). The reduced practice effects may be due, in part, to the fact that
there is less of a learning curve in the computer version. Many of the confusing
procedural aspects of the TMT are removed in eTrails (e.g. learning the mechanics
involved in connecting the circles quickly, holding your arm in a position that avoids
Page 33
28
blocking the target numbers/letters, understanding what to do when you produce an
incorrect trail, etc.). Therefore, while participants may use the first administration of the
TMT to get accustomed to the task and thus show substantial improvement the second
time they encounter the task, performance on eTrails would not show the same degree of
change because participants likely start off performing the eTrails task at a more optimal
level. A second factor likely contributing to the reduced practice effects on the eTrails
variants is the fact that, while TMT involved repeated administration of the identical
form, eTrails never repeated the same layout of targets. Thus, knowledge acquired in the
first round of the TMT would have facilitated performance more directly on the second
round than was the case for eTrails,
A final benefit of eTrails was evident in the correlations among the various
eTrails variants and the TMT. The majority of the eTrails variants correlated with one
another which is indicative of solid construct validity in these new, computerized Trails
tasks. One of the exceptions to this pattern of significance is also noteworthy. The second
administration of the TMT failed to demonstrate consistent correlations with the other
Trails tasks. Given the importance of repeated testing in diagnosing dementia and other
cognitive disorders, it is of some concern that a widely used test changes in its
predictability from its first to second administration. Computerization of the Trails task
appears to have successfully increased its consistency.
It also bears mentioning at this point that one of the eTrails variants, eTrails-
FlashNext, also did not fare well on the second administration. eTrails is intended as a
measure executive control and FlashNext is ostensibly the least executive of all the
eTrails variants. It differs from the other eTrails tasks in that participants can arrive at the
Page 34
29
correct answer by simply responding automatically to task cues (i.e., the flashing of the
next item in the series). The automaticity of FlashNext is supported by its considerably
faster reaction times compared to the other eTrails variants. These automatic processes
are likely at their strongest during the second administration of FlashNext, as participants
become increasingly reliant on these automatic signals. As performance on FlashNext
becomes more automatic, it would be expected to correlate less with other eTrails tasks
measuring executive abilities.
Task Reliability
The issue of task reliability in the current study was mixed. On one hand, an
exciting outcome of this study was the noticeable improvement in reliability observed for
the eTrails variants compared with the TMT. Every version of eTrails produced higher
reliability coefficients (ranging from .55 to .62) than that produced by the TMT (.37).
Moreover, the difference in reliability between the TMT and eTrails-Standard almost
reached statistical significance. These improvements were likely due, at least in part, to
the reduction of administration errors discussed above; in other words, unlike the TMT,
eTrails performance during the two different administrations was likely due to a common
set of executive processes rather than idiosyncratic factors having to do with task
administration.
Conversely, the reliability estimates for many of our established criterion
measures were low. This was unexpected given that individual performance on each of
our executive measures was consistent with expectations. Ospan is a generally reliable
measure (achieving reliability of approximately .80; Conway et al., 2005); this level of
reliability was not replicated in the current study (where a split-half reliabilities of .68
Page 35
30
and .71 were obtained for absolute and relative scoring, respectively). There is no clear
explanation for this decreased reliability. Anecdotal evidence suggests that participants
found it to be a very frustrating task and that they might have ―given up‖ halfway
through.
Like Ospan, Stroop performance was consistent with prior research (Spieler,
Balota, & Faust, 1996); mean reaction times for congruent trials were significantly faster
than neutral trials (i.e., there was significant facilitation) and incongruent trials were
significantly slower than the neutral trials (i.e., there was significant interference). This
also produced ample variability in the resultant Stroop effects in this task. Nevertheless,
the Stroop task produced the lowest reliability estimate of all of the criterion measures (r
= .65). As discussed earlier, this low reliability was likely due primarily to the use of
difference scores, or Stroop effects, as the main predictor for this task, but might have
also been due to the binary nature of the current task or to subject fatigue.
The one criterion task that showed both high levels of split-half reliability and
significant correlations with more than one of the Trails measures was the divided
attention task. This finding, while clearly not as strong as anticipated, is noteworthy given
the secondary goal of the current research—to explore the use of dual-task costs as a
proxy for cognitive decline. When the participants had their attention divided, their
accuracy fell (to .36 from .73). This is a substantial drop and suggests that our divided
attention task succeeded in reducing the participant's total amount of available cognitive
resources. Thereby, making their data resemble an older adult's expected performance
and giving our data set a proxy for a diverse cognitive functioning. The significant
Page 36
31
correlations between eTrails and the divided attention task point to the potential for
eTrails to be a sensitive and predictive measure across a range of cognitive ability.
CONCLUSIONS
Of the two main goals of this research—to produce a viable computerized Trails
measure and to demonstrate its ability to predict a variety of other executive tasks—only
the first was met with any degree of success. Even still, there are a number of take-home
lessons from this research that help inform how to build a good Trails measure.
A Trails task must not be so complicated as to produce large numbers of
procedural errors. Such errors may compromise reliability. Conversely, a Trails task
cannot be so easy that it can be performed automatically. Executive functioning is a
difficult construct to measure, because it is required only in novel situations that cannot
be performed using routines or habits. Tasks, especially when they are repeatedly
administered, run the risk of becoming quickly automated such that one often only has
one or two attempts to get an accurate measure of executive control. Note that the two
highest performing eTrails variants in the current study, Scramble and Flash Random,
were the ones arguably least amenable to the build-up of automaticity. Moreover, given
that our criterion tasks were performed so late in the experimental sequence, perhaps
participants had exhausted their executive control and were running more on automatic,
rather than executive, abilities. In addition to maximizing executive functioning by
performing key tasks early, future Trails studies should also use multiple forms of each
test. The current results demonstrated the key role of multiple forms in mitigating
Page 37
32
practice effects and for demonstrating the kind of reliability necessary for predicting the
differences in cognition seen in Alzheimer‘s Disease.
Alzheimer‘s Disease is a destructive and expensive burden on society. It is a
disease that is emotionally painful for both the individual and their caretakers. Although
this study does not provide any definitive conclusions regarding the ability of our
computerized Trails measure to predict the kinds of executive declines associated with
Alzheimer‘s Disease, it does provide a starting point for creating a better Trails task. The
TMT has proven itself to be a successful tool; however, the small increases in accuracy
and reliability may provide the first steps toward the small increases in the predictive
power that could have far-reaching clinical and research benefits in the future.
Page 38
33
REFERENCES
Arbuthnott, K., & Frank, J. (2000). Trail Making Test, Part B as a measure of executive
control: Validation using a set-switching paradigm. Journal of Clinical and
Experimental Neuropsychology, 22, 518-528.
Ashendorf, L., Jefferson, A. L., O'Connor, M. K., Chaisson, C., Green, R. C., & Stern, R.
A. (2008) Archives of Clinical Neuropsychology, 23(2), 129-137.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In K.W. Spence and J. T. Spence
(eds.) The Psychology of Learning and Motivation, vol 8. (pp. 67-89). New York:
Academic Press.
Basso, M. R., Bornstein, R. A., & Lang, J. M. (1999). Practice effects on commonly used
measures of executive function across twelve months. The Clinical
Neuropsychologist, 13, 283-292.
Blacker, D., Lee, H., Muzikansky, A., Martin, E. C., Tanzi, R., & McArdle, J. J. et al.,
(2007). Neuropsychological measures in normal individuals that predict cognitive
decline. Archives of Neurology, 64, 862-871.
Bondy, M. W., Serody, A. B., Chan, A. S., Eberson-Shumate, S. C., Delis, D. C., Hansen,
L. A., & Salmon, D. P. (2002). Cognitive and neuropathologic correlates of stroop
color-word test performance in Alzheimer's disease. Neuropsychology,16(3), 335-
343.
Buck, K. K., Atkinson, T. M., & Ryan, J. P. (2008). Evidence of practice effects in
variants of the Trail Making Test during serial assessment. Journal of Clinical and
Experimental Neuropsychology, 30, 312-318.
Page 39
34
Cahn, D. A., Salmon, D. P., Butters, N., Wiederholt, W.C., Corey-Bloom, J., Edelstein,
S.L., & Barrett-Connor, E. (1995). Detection of dementia of the Alzheimer type in a
population-based sample: Neuropsychological test performance. Journal of the
International Neuropsychological Society, 1, 252–260
Caroli, A., & Frisoni, GB. (2009). Quantitative evaluation of Alzheimer's disease. Expert
review of Medical Device, 6(5), 569-688.
Chen, P., Ratcliff, G., Belle, S. H., Cauley, J. A., DeKosky, S. T., & Ganguli, M. (2001).
Patterns of cognitive decline in presymtomatic Alzheimer's Disease. Archives of
General Psychiatry, 58, 853-858.
Conway, A. R. A., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle,
R. W. (2005). Working memory span tasks: A methodological review and user‘s
guide. Psychonomic Bulletin & Review, 12, 769-786.
Craik, F. I. M. (1982) . Selective changes in encoding as a function of reduced processing
capacity. Cognitive Research in Psychology, Berlin: Deutscher Verlag der
Wissenschaften, 152–161.
Daniels, K.A., Toth, J.P., & Jacoby, L.L. (2006). The aging of executive functions. In
F.I.M. Craik & E. Bialystok (Eds.), Lifespan cognition: Mechanisms of change (pp. in
press). New York, NY: Oxford University Press
Drapeau, C.E., Bastien-Toniazzo, M., Rous, C., & Carlier, M. (2007) Nonequivalence of
computerized and paper-and-pencil versions of the trail making test. Perceptual &
Motor Skills, 104(3), 785-791.
Engle, R. W., (2002) Working memory capacity as executive attention. Current
Directions in Psychological Science, 11(1), 19-23.
Page 40
35
Jacoby, L.L., Bishara, A.J., Hessels, S., & Toth, J.P. (2005). Aging, subjective
experience, and cognitive control: Dramatic false remembering by older
adults. Journal of Experimental Psychology: General, 134, 131-148.
Jacoby, L.L., Toth, J.P., & Yonelinas, A.P. (1993). Separating conscious and unconscious
influences of memory: Measuring recollection. Journal of Experimental Psychology:
General, 122(2), 139-154.
Johnson, J. K., Lui, L., & Yaffe, K. (2007). Executive Function, More Than Global
Cognition, Predicts Functional Decline and Mortality in Elderly Women. The
Journals of Gerontology Series A: Biological Sciences and Medical Sciences, 62,
1134-1141.
Kensinger E.A., Shearer D.K., Locascio J.J., Growdon J.H., & Corkin S. (2003).
Working memory in mild Alzheimer's disease and early Parkinson's disease.
Neuropsychology, 17(2), 230-239.
Kinner, E. I., & Fernandes, M.A. (2009). Illusory recollection in iolder adults and
younger adults under divided attention. Psychology and Aging, 24(1), 211-216
Kubo, M., Shoshi, C., Kitawaki, T., Takemoto, R., Kinugasa, K., Yoshida, H., Honda, C.,
& Okamoto, M. (2008). Increase in prefrontal cortex blood flow during the computer
version of the trail making test. Neuropscyhobiology, 58, 200-210.
Lezak, MD, Howieson, D.B., & Loring, D.W. (2004). Neuropsychological Assessment
(4th ed.). New York: Oxford University Press.
MacLeod, C. M. (1992). The stroop task: The "gold standard" of attention measures.
Journal of Experimental Psychology: General, 121(1), 12-14.
Page 41
36
Mitrushina, M. N., Boone, K. B., & D‘Elia, L. F. (1999). Handbook of Normative Data
for Neuropsychological Assesment. New York: Oxford University Press.
Rosen, V. M., Bergeson, J. L., Putnam, K., Harwell, A., & Sunderland, T. (2002).
Working memory and apolipoprotein E: What's the connection? Neuropsychologia,
40(13), 2226-2233.
Salthouse, T.A., Toth, J.P., Daniels, K., Parks, C., Pak, R., Wolbrette, M., & Hocking,
K.J. (2000). Effects of aging on efficiency of task switching in a variant of the Trail
Making Test. Neuropsychology, 14, 102-111.
Sanchez-Cubillo, L., Perianez, J.A., Adrover-Roig, D., Rodriguez-Sanchez, J.M., Rios-
Lago, M., Tirapu, J., & Barcelo, F. (2009). Construct validity of the trail making test:
Role of task-switching, working memory, inhibition/interference control, and
visuomotor abilities. Journal of the International Neuropsychological Society, 15,
438-450.
Spieler, D.H., Balota, D.A. (2000). Factors influencing word naming in younger and
older adults. Psychology and Aging, 16(2), 312-322.
Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy
younger and older adults and in individuals with dementia of the Alzheimer's
type. Journal of Experimental Psychology: Human Perception and Performance,
22, 461-479.
Storandt, M. (2008). Cognitive deficits in the early stages of Alzheimer's Disease.
Current Directions in Psychological Science, 17, 198-202
Page 42
37
Strauss, E., Sherman, E.M.S., & Spreen, O. (2006). A Compendium of
neuropsychological tests: Administration, norms, and commentary. (3rd. ed.). NY.
Oxford University Press.
Turner, M. L. & Engle, R. W. (1989). Is working memory capacity task dependent?
Journal of Memory and Language, 28, 127-154.
Reitan, R. M., & Wolfson, D. (1985). The Halstead-Reitan Neuropsychological Test
Battery: Theory and Clinical Interpretation. Tucson: Neuropsychology press.
Verhaegen, P., Steitz, D., Sliwinski, M., & Cerella, J. (2003) Aging and Dual task
performance: A meta-analysis. Psychology and Aging, 18, 443-460.
Page 43
38
Table 1
Trail Making Test Scores
TMTA 1st
Administration
TMTA 2nd
Administration
TMTB 1st
Administration
TMTB 2nd
Administration
Average 21518 18586 46789 37408
STDev 7035 4374 17046 11658
*scores in milliseconds
Page 44
39
Table 2a
Form ―A‖ Statistics for eTrails
eTrails
Standard
A1
eTrails
Standard
A2
eTrails
FlashNext
A1
eTrails
FlashNext
A2
eTrails
Scramble
A1
eTrails
Scramble
A2
eTrails
FlashRandom
A1
eTrails
FlashRandom
A2
Average 14124 13915 12412 11789 19508 18495 15839 14155
SD 3114 3335 2517 2429 4063 3883 3508 3076
*scores in milliseconds
Table 2b
Form ―B‖ Statistics for eTrails
eTrails
Standard
B1
eTrails
Standard
B2
eTrails
FlashNext
B1
eTrails
FlashNext
B2
eTrails
Scramble
B1
eTrails
Scramble
B2
eTrails
FlashRandom
B1
eTrails
FlashRandom
B2
Average 18257 16370 12834 12613 24669 23380 19450 16284
SD 5080 3872 2948 2660 5840 5304 5746 4441
*scores in milliseconds
Page 45
40
Table 3
Divided Attention Task Results
DA ACC
Corrected
FA ACC
Corrected
FA-DA
Average 0.37 0.73 0.37
SD 0.18 0.16 0.17
Page 46
41
Table 4
OSPAN Results
Ospan
Relative
Ospan
Absolute
Average 26.12 13.44
SD 5.21 6.19
Page 47
42
Table 5
Stroop results
Congruent Incongruent Neutral Facilitation Interference Stroop Effect
Average 564 704 601 37 103 140
SD 84.17 130.02 99.00 47.21 73.90 72.57
*scores in milliseconds
Page 48
43
Table 6a
Reliability for Trails Tasks
TMT eTrailsStandard eTrailsFlashNext eTrailsScramble eTrailsFlashRandom
Time 1 46,789 18,415 12,880 24,429 19,744
Time 2 37,408 16,644 12,815 23,380 16,301
R 0.37 0.62 0.56 0.55 0.58
*Time in milliseconds
Table 6b
Reliability for Criterion Measures
Divided Attention Full Attention Ospan Relative Ospan Absolute Stroop
Split-Half Reliability 0.92 0.93 0.71 0.68 0.65
Page 49
44
Table 7
Correlation Between Trails Tasks
TMT
B1
TMT
B2
eTrails
Standard
B1
eTrails
Standard
B2
eTrails
FlashNext
B1
eTrails
FlashNext
B2
eTrails
Scramble
B1
eTrails
Scramble
B2
eTrails
Flash
Random
B1
TMTB1
TMTB2 0.366
eTrails
Standard B1
0.668 0.315
eTrails
Standard B2
0.593 0.296 0.62
eTrails
FlashNext B1
0.547 0.494 0.488 0.655
eTrails
FlashNext B2
0.34 0.016 0.37 0.378 0.558
eTrails
Scramble B1
0.425 0.168 0.484 0.542 0.486 0.431
eTrails
Scramble B2
0.537 0.5 0.385 0.626 0.588 0.224 0.547
eTrails
FlashRandom B1
0.568 0.505 0.461 0.469 0.381 0.1 0.297 0.537
eTrails
FlashRandom B2
0.416 0.595 0.436 0.479 0.527 0.089 0.375 0.719 0.575
*green=significant
Page 50
45
Table 8
Correlations Between Trails Tasks and Criterion Measures
TMT
B1
TMT
B2
eTrails
Standard
B1
eTrails
Standard
B2
eTrails
FlashNext
B1
eTrails
FlashNext
B2
eTrails
Scramble
B1
eTrails
Scramble
B2
eTrails
Flash
Random
B1
eTrails
Flash
Random
B2
Stroop -0.165 0.006 -0.081 0.021 0.181 0.176 0.31 0.14 -0.124 .061
Ospan
Relative 0.111 -0.012 -0.183 -0.146 -0.089 -0.39 -0.105 0.033 -0.063
-0.124
Ospan
Absolute 0.062 -0.033 -0.129 -0.027 0.047 0.047 -0.004 0.067 -0.056
-.224
FADA 0.362 0.303 0.354 0.187 0.196 0.1963 -0.091 0.166 0.354 .273
*green=significant
Page 51
46
Table 9
Correlation Between Trails Tasks Using Subtraction Scores
TMT
1
TMT
2
eTrails
Standard
1
eTrails
Standard
2
eTrails
FlashNext
1
eTrails
FlashNext
2
eTrails
Scramble
1
eTrails
Scramble
2
eTrails
FlashRandom
1
TMT 1
TMT 2 0.116
eTrails
Standard1 0.161 -0.034
eTrails
Standard2 0.218 0.083 0.241
eTrails
FlashNext1 0.124 0.375 0.004 0.421
eTrails
FlashNext2 0.317 0.074 0.045 -0.065 0.028
eTrails
Scramble1 0.317 0.122 0.102 0.151 0.221 0.136
eTrails
Scramble2 -0.164 0.262 0.055 0.059 0.283 0.049 0.2
eTrails
FlashRandom
1 0.162 0.383 0.046 -0.7 0.126 0.334 -0.095 -0.016
eTrails
FlashRandom
2 0.375 0.338 0.247 0.085 0.052 0.121 0.3 0.156 -0.036
*green=significant
Page 52
47
Table 10
Correlation Between Trails Tasks and Criterion Measures Using Subtraction Scores
TMT
1
TMT
2
eTrails
Standard
1
eTrails
Standard
2
eTrails
Flash
Next
1
eTrails
Flash
Next
2
eTrails
Scramble
1
eTrails
Scramble
2
eTrails
Flash
Random
1
eTrails
Flash
Random
2
Stroop -0.184 0.072 0.056 0.133 -0.084 0.057 0.109 0.313 -0.353 -0.057
Ospan
Relative 0.089 0.075 -0.034 -0.37 0.194
0.00 0.087 0.076 0.106 0.049
Ospan
Absolute 0.004 0.056 0.97 -0.336 0.241
0.019 0.036 0.22 0.06 -0.027
FADA 0.312 0.175 0.115 0.41 0.263 0.186 -0.176 -0.01 0.31 -0.053
*green=significant
Page 53
48
Appendix A
Figure 1
Trails B Example
Page 54
49
Appendix B
Figure 2
Example of Button Layout Procedure
1 2 1 2
3 4 3 4
1 2 1 2
3 4 3 4