CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING …dl.uncw.edu/Etd/2011-2/smithb/briansmith.pdf · CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING ... a simple paper-and-pencil

CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING TEST

Brian T. Smith

A Thesis Submitted to the

University of North Carolina Wilmington in Partial Fulfillment

of the Requirements for the Degree of

Master of Arts

Department of Psychology

University of North Carolina Wilmington

2011

Approved by

Advisory Committee

Jeffrey Toth Alissa Dark-Freudeman

Karen Daniels

Accepted by

Dean, Graduate School

ii

TABLE OF CONTENTS

ABSTRACT .................................................................................................................................. iii

ACKNOWLEDGEMENTS ........................................................................................................... iv

LIST OF TABLES ...........................................................................................................................v

INTRODUCTION ...........................................................................................................................1

About the Trail Making Test ...................................................................................................... 2

TMT as a Predictor of MCI and AD ........................................................................................... 4

Limitations of the TMT .............................................................................................................. 5

Rationale for the Study ............................................................................................................... 7

Summary of Hypotheses ........................................................................................................... 13

METHOD ......................................................................................................................................13

Participants ................................................................................................................................ 13

Materials ................................................................................................................................... 14

Procedure .................................................................................................................................. 18

RESULTS ......................................................................................................................................19

Performance on Individual Tasks ............................................................................................. 19

Task Reliability ......................................................................................................................... 23

Correlations Among Trails Tasks ............................................................................................. 25

Correlations of Trails to Criterion Measures ............................................................................ 25

DISCUSSION ................................................................................................................................26

CONCLUSIONS ...........................................................................................................................31

REFERENCES ..............................................................................................................................33

iii

ABSTRACT

The goal of this research was to create and evaluate a computerized touch-screen

version of the popular Trail Making Test (TMT). The TMT is a pen-and-paper test that has been

used for decades to measure individual differences in executive functioning and to help identify

cognitive deficits associated with dementia. Our computerized variant, called eTrails, was aimed

at addressing some of the limitations of the original TMT and improving both its reliability and

predictive accuracy. Two additional eTrails variants were also created that manipulated aspects of

the task thought to drive its predictability; namely, the ability to block out distraction (eTrails

Flash) and visual search ability (eTrails Scramble). All variants of eTrails demonstrated

increased reliability relative to the TMT and most of the eTrails variants showed strong inter-task

correlations; however, relationships between eTrails and well-known measures of executive

functioning were generally insignificant. Potential explanations for the failure to find increased

predictive power for this more reliable TMT variant are discussed.

iv

ACKNOWLEDGEMENTS

I would like to thank Dr. Karen Daniels for her mentorship and constant guidance

throughout my graduate career. You have been an excellent role model as a scientist and a

person. To my committee members, Dr. Jeffrey Toth and Dr. Alissa Dark-Freudeman, thank you

for your invaluable assistance and suggestions. I would like to especially thank Dr. Toth for his

contributions to designing and coding eTrails. I could not have accomplished what I have

without the three of you.

Finally I would like to thank my parents Thomas and Katherine Smith for

instilling the love of science and learning in me. Your love and support through the years have

made me the man I am today.

v

LIST OF TABLES

Table Page

1. Trail Making Test Scores ....................................................................................38

2a. Form ―A‖ Statistics for eTrails ...........................................................................39

2b. Form ―B‖ Statistics for eTrails ............................................................................39

3. Divided Attention Task Results...........................................................................40

4. Ospan Results......................................................................................................42

5. Stroop Results .....................................................................................................43

6a. Reliability for Trails ............................................................................................43

6b. Reliability for Criterion Measures ......................................................................43

7. Correlations Between Trails Tasks ......................................................................44

8. Correlation Between Trails Tasks and Criterion Measures .................................45

9. Correlations Between Trails Tasks Using Subtraction Scores ............................46

10. Correlations Between Trails Tasks and Criterion Measures Using

Subtraction Scores ..............................................................................................47

INTRODUCTION

Old age can be a time of fulfillment and enjoyment; retirement frees up time to

engage in leisure activities made impossible by careers and more time can be spent with

family and friends. Unfortunately, many older adults never have the opportunity to enjoy

the benefits of late life because it is also a time of increased vulnerability to a number of

severe disorders. One of the most prevalent and debilitating age-related disorders is

Alzheimer‘s Disease.

Alzheimer's Disease (AD) is a "degenerative brain disorder in which neurons, the

specialized cells of the brain that process information, stop functioning properly" (Caroli

& Frisoni, 2009, p. 570). The 2011 Facts and Figures report from the Alzheimer‘s

Association stated that approximately 5.4 million people in the United States are

currently living with AD, making it the sixth leading cause of death for Americans. The

incidence of AD rises sharply with age with only 2% to 5% of 65 year olds showing signs

of AD, but 25% to 50% of those 85 and older showing symptoms. More alarming, the

report revealed that, while most major causes of death (e.g., heart disease, many cancers,

stroke, and HIV/AIDS) are on the decline, AD has increased 66% in recent years because

there is no known cure and no clear method for prevention of the disease. As a result,

intervention for AD has tended to focus on early detection. Diagnosing AD early has

many benefits including enhanced medical care, preparation for the lifestyle changes that

must accompany eventual cognitive decline, and allowing for interventions that slow

cognitive decline at the earliest possible stage (Caroli & Frisoni, 2009). Unfortunately,

2

detecting AD and providing a clear diagnosis can be very difficult in the early stages of

the disorder.

One approach to early detection of AD is identifying cognitive precursors of the

disease in pre-clinical individuals (Balota, Tse, Hutchison, Spieler, & Morris, 2010). Mild

cognitive impairment (MCI) is defined broadly as cognitive decline that is greater than

would normally be expected for an individual at a certain age or education level without

affecting their daily activities in notable ways (Gauthier, Reisberg, Zaudig, & Petersen,

et. al, 2006). MCI typically presents itself as minor memory lapses (amnesic-MCI) with

normal thinking and reasoning skills. Evidence suggests that the brains of individuals

who suffer from MCI are neurobiologicaly different from individuals without such

cognitive impairment, and that the changes in the brains of MCI patients are similar to

those who suffer from AD but on a less severe scale (Harotunian, Hoffman, & Beeri,

2009). These findings suggest that MCI is likely associated with the early stages of AD

and that being able to effectively identify individuals with MCI might serve as an ―early

warning system‖ for dementia.

While there are a number of neuropsychological measures known to be sensitive

to MCI, a simple paper-and-pencil measure known as the Trail Making Test (TMT) has

proven to be one of the most widely used and sensitive tests to the onset of the disease

(Blacker, Lee, Muzikansky, Martin, Tanzi, & McArdle, 2007; Chen, Ratcliff, Belle,

Cauley, DeKosky, & Ganguli, 2001; Johnson, Lui, & Yaffe, 2007; Storandt, 2008).

About the Trail Making Test

The Trail Making Test (TMT) is an, "efficient and sensitive instrument that is

easily administered, and which reliably discriminates between normal individuals and

3

those with brain impairment" (Arbuthnott & Frank, 2000, p. 312). It is considered to be

one of the best measures of general brain functioning (Reitan & Wolfson, 1985;

Mitrushina, Boone, & D‘Elia, 1999). The TMT was created in 1938, was originally called

―Partington‘s Pathways‖, and was included in the Army Individual Test Battery as well as

the Halsted-Reitan Neuropsychological Test Battery. The TMT is a two-part pen-and-

paper test that is believed to measure visual-motor functioning, symbol recognition, the

ability to scan a page, the flexible integration of numerical and alphabetical information

under time pressure, as well as executive functions such as sequencing and mental

flexibility (Reitan & Wolfson, 1985).

The original TMT included two versions, A and B. In Trails A, individuals are

given a sheet of paper containing circles with the numbers 1 to 25 organized randomly on

the page and are asked to rapidly connect the numbers in sequential order. Trails B,

shown in Appendix A, also requires individual to connect 25 target circles, but this time

alternating between letters and numbers in alphabetical order (i.e., circling 1, then A, then

2, then B etc.). Trails B is thought to measure a more complex set of cognitive abilities

that include planning, sequencing, updating working memory, and shifting between two

stimulus domains (Arbuthnott & Frank, 2000; Lezak, Howieson, & Loring, 2004;

Strauss, Sherman, & Spreen, 2006). It is important to note that these are all executive

abilities that are often found to decline in older adults (Gaudino, 1995) and they are also

among the first to show decrements as a function of MCI (Reitan, 1985). Trails B is one

of the few neuropsychological measures that is able to differentiate between dementia

patients and control subjects (Cahn, Salmon, Butters, Wiederholt, Corey-Bloom,

Edelstein, & Barrett-Connor, 1995).

4

Trails A is generally treated as a baseline condition where response latency is

believed to reflect simple reaction time (RT). Unlike Trails B, successful performance on

Trails A has been shown to rely very little on executive abilities (Arbuthnott & Frank,

2000). By comparing an individual‘s performance on the A (non-executive) and B

(executive) versions, one is able to generate two critical measures of the individual‘s

cognitive/executive capacity: the difference between Trails A and Trails B (how much

longer it took them to complete Trails B) and the B/A ratio. Slowed performance on

Trails B relative to Trails A is used as an indication of cognitive impairment or a general

frontal lobe dysfunction. There have also been attempts to create normative reaction

times on Trails B; a time of less than 72 seconds is considered normal performance, 73-

105 seconds is considered mild impairment, and anything beyond 106 seconds is

considered serious impairment. Still, the most commonly used measure remains the

difference score between Trails A and Trails B because it is the most general and

conservative, and therefore difference scores will comprise the primary dependent

variable for the current study.

TMT as a Predictor of MCI and AD

Prior research has also illustrated the validity of the TMT as a predictor of

the cognitive deficits associated with MCI; this includes the ability to maintain focus on a

goal despite distractions as well as the ability to alternate attention between two different

goals (Arbuthnott & Frank, 2000). More recent studies have further demonstrated the

predictive power of the TMT in diagnosing MCI by extending these findings to the early

stages of AD where afflicted individuals show both longer reaction times and increased

error rates compared to healthy individuals (Ashendorf, Jefferson, O'Connor, Chaisson,

5

Green, & Stern, 2008). A final source of evidence for the utility of the TMT comes from

the neuroscience literature. A recent fMRI study, for example, found that TMT

performance was associated with significant increases in blood flow to the prefrontal

cortex, the brain region known to underlie many executive abilities found to decline with

MCI and AD (Kubo, Shoshi, Kitawaki, Takemoto, Kinugasa, Yoshida, Honda, &

Okamoto, 2008).

However, while a general relationship between the TMT and these

frontal/executive regions is evident, it is still not clear exactly which executive abilities

are being taxed. Given that the prefrontal cortex has been linked to a broad array of

higher-order abilities, it is not clear whether the deficits in TMT performance and the

corresponding increase in blood flow to frontal regions are due to visual search, blocking

out distractions, planning, etc. Prior TMT research provides conflicting views regarding

which factors drive performance (Cubillo, 2009; Artbutnott & Frank, 2000) and this

controversy provides one of the motivations for our creation of a computerized version of

the TMT.

Limitations of the TMT

Although the Trail Making Test is one of the most commonly used tests for

diagnosing MCI, its utility is limited by several factors having to do with test design and

administration. A number of these limitations are created by its reliance on a pen-and-

paper format. First, repeated use of the test—critical for detecting the types of within-

individual changes associated with the early stages of AD—is highly limited with current

pen-and-paper versions (Salthouse, Toth, Daniels, et al., 2000). Most notably, there are

only two alternate forms of Trails A and B (all testing is done with same two

6

arrangements of circles). The lack of alternate forms makes it difficult to experimentally

investigate the relevant cognitive processes underlying task performance. This can be

attributed to the fact that these various factors that may be driving performance (e.g.,

circle arrangement, distance between circles, etc.) cannot be systematically manipulated.

Secondly, research has shown that there are significant practice effects with repeated

administration of existing forms (e.g., Buck, Atkinson, & Ryan, 2008); these practice

effects are evident after only two exposures in the same day (Franzen, 1996) and can be

detectable up to one year after initial administration (Basso, Bornstein, & Lang, 1999).

Such practice effects represent one of the most pervasive problems when utilizing within-

subject research designs. Any improvements observed during the second administration

of the test may simply be due to their prior exposure to it, making it both difficult to

interpret performance gains as well as to establish reliability. Measures of executive

function often show a high level of practice effects because they typically present the

subjects with novel situations in which they must solve a problem or recognize an

abstract concept. After the first administration of the test, they know all of the "tricks";

the novelty wears off quickly and they are able to refine their strategies thereby

improving test scores. Trails-B is even more affected by practice effects than Trails-A

because of the increased novelty associated with doing this particular task (Franzen,

1996). These findings substantially undermine the diagnostic sensitivity of the TMT,

especially in cases of AD where intra-individual changes in performance are thought to

be one of the most sensitive indicators of abnormal cognitive decline.

A final notable limitation of the TMT is linked to its requirement for individuals

to manually connect the circles with a pencil line or "trail". This requirement adds time

7

and variability to performance due to a number of factors unrelated to the cognitive

processes of interest (i.e., those affected in early AD)—factors such as handedness,

arthritis, and general dexterity. In addition, reliability scores can vary greatly based on

administrative errors. One such error is the failure of the examiner to correctly return the

subject‘s pencil to the place from which they began drawing the incorrect trail. It is also

common for the subject to not fully understand the directions of the test before beginning

(Arbuthnott & Frank, 2000). Individuals who are instructed thoroughly and who are

given sufficient practice prior to beginning the actually task demonstrate a significant

time advantage over those who are not. These limitations complicate interpretation of

TMT performance. When impaired performance is observed on the TMT, is it not clear

whether such impairments reflect a true cognitive deficit or more superficial problems

related to task administration or format. The current research aims to standardize the

procedures involved in Trails administration with the goal of increasing its diagnostic

power.

Rationale for the Study

With the above issues in mind, Dr. Jeffrey Toth and I created a computerized

version of the TMT called "eTrails" that uses touch-screen technology. This computerized

task embodies the same general methods and principles as the original pen-and-paper

version with the key exception that, rather than connecting circles with a pen on paper,

participants touch targets arranged on a computerized display in the specified order. One

goal of eTrails is to try to address some of the limitations in the existing version of the

TMT described above. First, it will allow the researchers to change the location of the

targets on the screen (the letters and numbers), thereby substantially reducing the problem

8

of practice effects and opening up the possibility of multiple testing sessions for the same

individual. As stated earlier, the TMT only has two different forms; eTrails currently has

over 30. As noted by Buck, Atkinson, and Ryan (2008), the most effective way to

determine whether an individual's change in performance from one testing session to the

next is meaningful is by conducting "test-retest score difference using alternate and

theoretically equivalent forms" (p. 312). A computerized version of the TMT that

provides us with a number of different, but equivalent, forms would allow us to examine

such test-retest reliability. Finally, comparing the original pen-and-paper version of the

TMT with eTrails in the current study will provide a direct test of the effects of test

format (paper vs. computer) on TMT performance and may introduce computerized

(touch screen) testing as an easier and more reliable method of responding that

overcomes limitations related to the dexterity of the subject. This ease of testing also

potentially allows for the inclusion of more varied research and clinical populations.

It should be noted that computerized versions of the TMT have been previously

attempted (Drapeau, Bastien-Toniazzo, Rous, & Carlier 2007; Kubo, 2009). However, the

current study differs from this earlier research in two important ways: First, the earlier

studies were simply direct replications of the TMT and did not fully take advantage of the

change in format. The current study is designed to go beyond these direct replications and

to use computerization to address some of the aforementioned limitations of the pen-and-

paper format. A second, exciting difference regarding the computerized variants is that

they will allow us to make systematic changes to the task with the ultimate goal of further

improving its predictive power. These changes will be discussed in the following

sections. eTrails‘ computerized format (1) is millisecond-accurate and thus can detect

9

much smaller differences in performance across individuals; (2) will introduce the

possibility of getting additional measures of performance that go beyond those derived

from the TMT and which may be related to early AD (e.g., time to first touch, average

touch time, time between touches as a function of stimulus class, etc.); and (3) will permit

various aspects of the task to be experimentally manipulated with the goal of increasing

the cognitive, or executive, control needed to perform the task and thus will allow the

effects of these changes to be directly assessed. The specific manipulations performed are

discussed in the next sections.

eTrails-Standard

The first computerized TMT variant, eTrails-Standard, is similar in

structure and procedure to the original TMT with the exception that different

arrangements of numbers and letters are afforded by the computerized format. The goal

of eTrails-Standard is to keep the critical aspects of the task as close to the original TMT

as possible such that any observed performance differences between the two are

attributable to the format change. If eTrails-Standard is found to not correlate with the

original TMT then it would suggest that the pen-and-paper format and/or the fixed

configuration of targets may have contributed to the validity of the original task. This

version of eTrails is expected to show a moderate correlation with the paper-and-pencil

TMT indicating that they are tapping into the same executive abilities. It should also have

higher correlations with other executive measures compared to the TMT given that it is

expected to be less hindered by the structural and procedural limitations mentioned

above.

10

eTrails-Flash: Random and Next

The second and third eTrails variants involve attentional capture. Capture occurs

when an aspect of one‘s environment (e.g., a horn, a flashing light, etc.) automatically

draws one‘s attention, sometimes despite the intention to ignore it. There is increasing

evidence showing that loss of attentional control occurs as a function of healthy aging

(Jacoby, Bishara, Hessels, & Toth, 2005) and that these deficits are particularly

pronounced even in the early stages of AD (Castel, 2009). Older adults with dementia

seem particularly vulnerable to what Daniels, Toth, and Jacoby (2006) call "goal neglect"

in which a distractor can derail their attention from the task at hand. In the current study,

capture was increased relative to the original TMT by making one of the square targets on

the computer screen flash briefly (quickly change from red to white and back again). The

first version of this task, referred to as ―FlashRandom‖, involves having a random

incorrect square flash white briefly as participants are trying to respond to each target

(i.e., the flashed number is not predictably related to the target response). The expectation

is that, when an individual is searching for the next letter/number in the sequence, the

flashing of an incorrect letter/number will draw attention away from the task goal. Thus,

to avoid clicking on the flashing incorrect target, the individual must exert executive

control; this should further distinguish those who may be more prone to the increased

capture typical of dementia.

The second capture variant, ―FlashNext‖, is similar to the ―FlashRandom‖

variant described above in that it involves having a random incorrect square flash;

however, for this version, the next target square in the sequence will flash. For example,

11

as a participant selected the ―2‖ target square in the Trails A task, the ―3‖ square would

flash. In this case, the flash provides the participant with the correct answer and, unlike

the ―FlashRandom‖ variant, may actually facilitate Trails performance. This change may

be diagnostic as well: participants who struggle on this ―FlashNext‖ variant may

demonstrate the most significant deficits on other tasks. If they react slowly in a

facilitating, predictive environment, they will undoubtedly be slow under conditions of

distraction.

eTrails-Scramble

In the final eTrails variant, a correct button press on any trial will result in all of

the other, non-target labels switching positions. The location of the buttons themselves

stay in the same physical space on the screen, but the labels of the buttons (the numbers 2

through 16) randomly switch positions with one another (the ―2‖ may move to where the

―4‖ once was, the ―4‖ may move to where the ―16‖ previously was, etc.). This scrambling

is intended to remove the ability of participants to plan their next selection in advance by

visually scanning the fixed arrangement of squares. Visual search is considered by many

to fall under the umbrella of executive functioning (e.g., Kubo et al., 2007) and this

ability to select targets from a visual display is one that appears to decline in AD (Castel,

Balota, & McCabe, 2009). By preventing ―look ahead‖ with this scramble variant, it is

expected that participants will require more cognitive effort to search for the next target,

placing pressure on those suffering from executive declines.

Dividing Attention as a Proxy for Executive Deficits

As stated above, one of the goals of this study is to improve upon the predictive

strength of TMT, especially as it relates to early diagnosis of the kinds of cognitive

12

deficits found in MCI and Alzheimer‘s disease. Unfortunately, patient populations are

difficult to access and are impractical for the purposes of establishing the validity and

reliability of our eTrails variants. Thus, it became necessary to find a way to mimic

cognitive deficits in the laboratory as a proxy for testing patients. We accomplished this

by using a divided attention (DA) paradigm, also known as the secondary task technique

and the dual-task technique (Posner & Boles, 1971). Divided attention manipulations

require an individual to allocate some their attentional resources to a simultaneous

secondary task, preventing them from fully attending to the primary task. Research

consistently shows that dividing attention (DA) results in significantly poorer

performance on the primary task relative to full attention (FA) consistent with the idea

that DA uses up attentional resources (Anderson, Craik, & Naveh-Benjamin, 1998).

There is evidence to suggest that older adults have fewer resources to devote to a

task when compared to younger adults and thus these older adults often show greater

costs of dividing attention (for a review, see Verhaegen, Steitz, Sliwinski, & Cerella,

2003). Indeed, requiring young adults to perform under conditions of divided attention

tends to result in memory performance very similar to older adults who are performing

with full attention (Skinner, & Fernandes, 2009). Research also suggests that there is a

marked impairment in the ability of Alzheimer‘s patients to co-ordinate the performance

of two simultaneous tasks and that AD may result in a severe dual-processing deficit not

observed to the same degree in normal aging (Baddeley, Baddeley, Bucks, & Wilcock,

2001). The current study employs dual-task costs, or the negative change in performance

for younger adults under divided attention compared with full attention, as a way to

13

mimic cognitive decline. Those younger adults who suffer greatly under divided attention

provide a suitable analog for older adults with MCI.

Summary of Hypotheses

Participants completed the A and B versions of the original TMT, along with four

computerized variants (eTrails-Standard, eTrails-FlashNext, eTrails-FlashRandom, and

eTrails-Scramble). Participants were also given a brief battery of memory, attention, and

cognitive speed measures. For the divided attention task, the difference between the two

conditions (full attention performance minus divided attention performance) was used as

a proxy for the kinds of cognitive decline observed in AD. The main hypotheses under

investigation were that (1) eTrails-Standard would demonstrate stronger correlations with

the various executive criterion measures relative to the pen-and-paper TMT which is

hindered by the above limitations; and (2) the eTrails-Scramble and eTrails-FlashRandom

variants would better predict than eTrails-Standard because they are intended to directly

tax those executive processes believed to drive TMT performance; any differences

between them would inform the relative importance of capture and visual search to the

TMT. Moreover, the use of the divided attention manipulation would provide evidence of

the predictive accuracy of these various measures along a spectrum of cognitive ability.

METHOD

Participants

The sample for the current study consisted of 43 young adults (22 females, 21

males). All were UNCW undergraduates who voluntarily signed up through the

Psychology Department‘s research system. Two subjects‘ results were excluded due to

14

large error rates on the TMT making their data impossible to score. The data from the

forty-one remaining subjects were statistically analyzed .

Materials

The main experimental tool for this study was ―eTrails 1.0‖. The program was

built using Microsoft Visual Basic version 6.0. The program consists of a 576 x 792 pixel

window on which 16 50x50 pixel buttons are presented. Unlike the pen-and-paper TMT

that has only 2 different forms, eTrails utilizes 32 different forms in the current study (16

practice forms and 16 full forms). The practice forms each contain 6 squares, while the

full length forms contain 16. Each form was created by dividing the computer screen into

4 sections, and then again into 4 subsections (Appendix B). A random dice software was

used and based on two dice rolls the squares' placement were assigned to that area (in the

appendix, green boxes would be the first subsection, the blue would be the second, black

would be 3, and orange would be 4). For example, if a ‗1‘ and then a ‗4‘ were rolled the

square would be placed somewhere in the 4th

box in the 1st subsection. Slight manual

adjustments were made to the positioning assigned by the program only under the

following conditions: (a) squares were overlapping or too near to one another (closer than

50 pixels); (b) two sequential numbers were placed directly next to one another; or (c) the

resultant patterns of placement were too easy (all odd numbers placed on the top and all

even numbers on the bottom). This process was done for each and every configuration

such that no two layouts ended up the same.

When the eTrails program is initiated there is a prompt to enter an identification

number, the participant‘s age, and the participant‘s gender. Once all the values are

assigned, the participant presses a ―begin‖ button and the test initiates. Before each full

15

round, the program includes a 6-button practice session. The program records the

response time both between buttons and for the entire trial as well as the total number of

errors. eTrails was designed for use on touch screen computers and all variants were

administered using ASUS ―EeeTop PC ET1602‖. In addition to eTrails and the original

pen-and-paper TMT described above, several cognitive tests were used as outcome

measures to assess the degree to which these Trails variants can successfully predict

various forms of higher-order cognition.

Operation Span (Ospan)

Ospan is a common measure of working memory capacity (Conway, Kane,

Bunting, Hambrick, Wilhelm, & Engle, 2005; Turner & Engle, 1989). Working memory

is a limited capacity system involved in the storage and manipulation of information in

the service of complex goals (Baddeley & Hitch, 1974; Engle, 2002). Working memory

tasks have been shown to be highly predictive of a performance on a variety of laboratory

and real-world tasks (Conway et al., 2005). Importantly, performance on span tasks

shows marked impairment even in the early stages of Alzheimer‘s Disease (Kensinger,

Shearer, Locascio, Growdon, & Corkin 2003; Rosen, Bergeson, Putnam, Harwell, &

Sunderland, 2002). While performing Ospan, participants were asked to recall a list of

words while simultaneously solving simple math equations. On each equation-word trial,

participants were given a math problem followed by a letter (i.e. Is (6x2) -5 = 7? Q). The

participants were then instructed to read the math equation aloud and to respond ―yes‖ or

"no‖ to its correctness, and finally to read the letter aloud and try to remember it for a

later test. After some number of these equation-word trials, a recall cue appeared on the

screen (i.e.‖???‖) and participants attempted to recall as many letters as possible in the

16

order in which they were presented by writing their responses on an answer sheet. Set

sizes ranged from two to five words across a total of 12 sets. An individual‘s span score

was determined by the number of words they correctly recalled in order.

Color-Word Binary Stroop

The Stroop task is considered the ―gold standard‖ measure of attentional control

(MacLeod, 1992). In this task, participants were presented with a color-word (either the

word ―red‖ or the word ―blue‖) presented in a colored font (e.g., the word RED presented

in blue font; the word BLUE presented in blue font) and were instructed to name the

color of the font (―blue‖ in both cases) as quickly as possible. Participants were told to

only respond to the color of the font, and not the word itself. A Labtec AM-22

microphone was used to record reaction time. The experimenter recorded the accuracy of

responses by pressing the "1" key when the participant responded "red", the "2" key for

"blue", and the "3" key for discarded trials. Discarded trials included partial responses

("bl-red"), stutters (r…r…red), and extraneous noises and movements that inadvertently

triggered the microphone (e.g., coughing, exhaling, etc.).

Participants completed 155 total trials, 95 congruent (where the color and word

matched), 30 incongruent (where the color and word were different), and 30 neutral trials

(which involved ampersands in different colors). Stroop is a good measure given the

goals of the current study, because increases in interference scores in the Stroop task are

regularly observed for Alzheimer‘s patients (Bondi, Serody, Chan, Eberson-Shumate,

Delis, Hansen, & Salmon, 2002; Spieler, Balota, & Faust, 1996) and Stroop variants with

a large proportion of congruent trials have been shown to be particularly taxing to

executive processes like working memory (Kane & Engle, 2003).

17

Recognition Task with Full vs. Divided Attention.

Recognition is a measure of long-term episodic memory. For this study we used a

recognition procedure similar to that of Jacoby, Toth, and Yonelinas (1993). Participants

were shown five-letter nouns, one at a time, on the computer screen and were asked to

read them aloud and to remember them for a later test. During the divided-attention

portion of the task, participants were further instructed that they would also be

performing a listening task (Craik, 1982). The listening task involved the participants

monitoring a computer-recorded audio file in which a list of numbers was read aloud at a

2-second pace. The participants were instructed to respond ―now‖ every time they heard

the target sequence, three odd digits in a row. The list of numbers conformed to two

rules: (1) Each target sequence must have at least two even numbers between it and the

next target sequence, and (2) the interval between target sequences must vary in length in

order to be unpredictable (this interval ranged between two to six numbers). During the

test phase participants were again given five-letter words. Some of these words were

previously on the study list, others were new words not encountered previously in the

task. The participant simply needed to discriminate between these old and new items by

pressing the ―1‖ key if an item was on the previous list and ―2‖ if the word is a new word.

Other Tests

Two final pen-and-paper tests were used to ensure that our young sample was

relatively representative of their age group. One was the Shipley Test which is a common

pen-and-paper test of vocabulary where participants are shown 40 increasingly

challenging words (e.g., ―talk‖ is an early word and ―querulous‖ is a later one) and are

18

asked to choose from among four alternatives which word is most similar in meaning to

the target. The other was Letter Comparison which is a computerized measure of speed

of cognitive processing where participants are asked to quickly compare two strings of

letters and to indicate using a key-press whether they are the same (by pressing the ―S‖

key) or different (―D‖ key).

Procedure

When participants arrived in the laboratory, they were directed to a separate

testing room. They completed consent and demographics forms, and were given a general

introduction to the design and goals of the study. They were then given the Trails variants

in the following order: paper-and-pencil TMT, eTrails-Standard, eTrails-FlashNext,

eTrails-Scramble, and eTrails-FlashRandom. For each, they received the A version

directly followed by the B version. Following completion of all the Trails tasks (which

took approximately 30 minutes), participants completed the other cognitive measures in

the following order: Recognition Full Attention, Recognition Divided Attention, Ospan,

[5-10 minute break], and Stroop. For the purposes of establishing reliability, participants

then completed a second administration of each version of Trails. Note that different

layouts of letters and numbers were used for the second administration of each task to

reduce practice effects. Lastly, participants completed Letter Comparison, and Shipley.

The entire session took approximately one and a half hours. Participants were given

specific instructions prior to each task and were given the opportunity to ask clarification

questions. Upon completion of all of the tasks, the participants were debriefed, thanked

for their participation, and escorted out of the laboratory.

19

RESULTS

Performance on Individual Tasks

The Trail Making Test

The reaction time (RT) data for the paper-and-pencil TMT are summarized in

Table 1. A 2 x 2 repeated measures ANOVA was conducted with Test Type (Trails A vs.

B) and Time of Administration (Time 1 vs. 2) as the within-subjects factors. The effects

of Test Type (F (1,160) = 12.56, p = < .005) as well as Time of Administration

(F (1,160 ) = 161, p < .001) were significant. The interaction did approach, but not

achieve, significance (F (1,160) = 3.44, p= .065). The significant effect of Test Type

shows that the ―B‖ versions of the task took significantly longer than the ―A‖ versions.

The significance of Time of Administration suggests that there were significant practice

effects with consistently slower RTs for the first administration of each task. Note also

that there was considerable variability in performance for the B forms relative to the A

forms as indexed by the larger range of scores across participants (SD = 17,046ms for the

first administration).

Focusing on the error rates for the TMT, only four of the 41 subjects made no

errors on the TMT (9.75%). Participants made an average of .89 errors per condition

(approximately 3.56 per person). A 2x2 ANOVA with Task Type and Time of

Administration as the factors was conducted on the error data. There were no significant

main effects or interactions. The most common error was not fully connecting the line to

the target circle. The subjects intended target was usually clear, but the pen line would

not cross the boundary of the circle. In addition to reducing their accuracy scores, this

20

error likely also impacted participants‘ reaction times, because it makes the overall path

each subject takes shorter than it should be and therefore potentially shortens their RT.

eTrails

The eTrails reaction time data for the ―A‖ form can be found in Table 2a (top

panel) and the corresponding data for the ―B‖ form can be found in Table 2b (bottom

panel) [A1 indicates the first administration of each task and A2 the second

administration]. A oneway ANOVA comparing the ―A‖ forms revealed significantly

different RTs across the eTrails variants (F (7,320) = 59.15, p < .05). The Scramble

variant produced the longest average RT (19,508ms) and the FlashNext variant produced

the fastest RTs (11,789ms).

Like the TMT, the ―B‖ version of the eTrails produced slower reaction

times than their ―A‖ form counterparts. Again, Scramble took the longest to complete and

FlashNext was the fastest (24,668ms and 12,613ms respectively). A oneway ANOVA

comparing the "B" forms found that the RTs differed significantly across the variants (F

(7,305) = 74.84, p < .05). Practice effects were also evident in this data set as the second

administration took less time to complete than the first administration, as can be seen by

looking at the patterns present in Table 2b.

Turning to error rates for eTrails, six of the 41 participants made no errors

on the eTrails variants (14.63%). On average, participants made an average of .23 errors

per condition. These small error rates compared with the TMT (which you may recall

produced the corresponding values of 9.75% and .89) are particularly impressive when

one considers that each subject only completed four TMT forms but completed sixteen

eTrails forms, giving them much more opportunity for error on eTrails. These

21

improvements in the error rate become even more evident when one directly compares

the TMT and eTrails-Standard. Twenty of the forty-one subjects never made an error on

eTrails-Standard (48.78%).

Divided Attention

As shown in Table 3, average corrected accuracy (hits – false alarms) for the full

attention condition was .73 (SD = .16) and for the divided attention condition was .36

(SD = .18). Divided attention performance was significantly impaired relative to full

attention performance, t (40) = -14.32, p = < .001. As an index of the cost of dividing

attention for each individual, participants‘ DA score was subtracted from their FA score.

The average divided attention cost was .38 (with a range of .65 with the lowest

performing individual scoring .13 and the highest performance individual scoring .68).

These results both support that the odd-numbers task was successful in dividing the

attention of our subjects and that it produced a good range of cognitive ability which is

critical in the current study for correlational analyses.

Ospan

Operation Span was analyzed in two ways (see Table 4). The first—referred to as

the relative score—involved adding up the total number of correct words the participant

recalled. The average relative score was 26.1 (SD = 5.2) and ranged from 16 to 36. The

second measure, called the absolute score, is calculated by counting correctly recalled

words only for trials where the participant successfully recalled all the words correctly.

The average absolute score was 13.4 (SD = 6.19) with a range in scores from 4 to 26. A

rule-of-thumb for understanding performance on Operation Span is by looking at the

general distribution of performance; for this approach, those individuals having an

22

absolute score of 9 or lower are considered ―low spans‖, those with a score between 10

and 18 are considered ―mid spans‖, and those scoring 19 or over are considered ―high

spans‖. In the current study, then, there were 13 low spans, 18 mid spans and 10 high

spans, again demonstrating a relatively normal and variable distribution of performance

on this task.

Stroop

The results for congruent, incongruent, and neutral trials are displayed in Table 5.

The average reaction time for the congruent trials was 564ms (SD = 84ms), for the

incongruent trials was 704ms (SD = 130ms), and for the neutral trials was 600ms (SD =

99ms). The omnibus ANOVA and all post-hoc comparisons were significant (F (2, 80) =

100.41, p < .001), showing that they were all significantly different from one another.

Facilitation scores were calculated by subtracting neutral performance from congruent

performance (M = 37ms, SD = 47.21) and interference was calculated by subtracting

neutral trials from incongruent trials (M = 103ms, SD = 73.90). The Stroop Effect, an

index of the degree to which an individual was influenced by the to-be-ignored word, was

calculated by subtracting the congruent score from the incongruent score for each person.

The average Stroop effect was found to be 140ms (SD=73) and is consistent with

previous literature on the Stroop effect found in young adults (Spieler, Balota, & Faust,

1996).

Letter Comparison and Shipley

The average reaction time for the letter comparison task was 2,218ms

(SD=472ms). Because this is a recently computerized version of a traditionally pen-and-

paper task, there is no clear standard of comparison for performance on this measure. It

23

will, however, help to elucidate age differences in cognitive speed in follow-up studies of

eTrails that include older participants. The average number of correct responses for the

Shipley vocabulary measure was 28 out of 40 in the current study (SD=3.77). These

scores are comparable, but slightly lower than those typically observed for young adults

on this task in the literature which are often in the low- to mid-thirties (e.g., Kemper &

Sumner, 2001; Spieler & Balota, 2000)

Task Reliability

Data Trimming

For all tasks, scores that were more than two standard deviations above or below

the mean for each individual were removed from the data set and were not included in

statistical analyses. This resulted in 19 individual data points – less than 1 per individual

– being deleted for the entire study.

Test-Retest Reliability

For the Trails tasks, test-retest reliability was calculated by correlating scores on

the first and second administrations of the task. As seen in Table 6a, the Trail Making

Task achieved low, but significant test-retest reliability, r = .37 (n = 37). Each version of

eTrails had considerably higher reliability estimates than its paper-and-pencil

counterpart: eTrails-Standard had the highest reliability with an r value of .62 (n = 37)

followed closely by eTrails-FlashNext (r = .56; n = 38), eTrails-Scramble (r = .55; n =

36), and eTrails-FlashRandom (r = .58, n = 38). All of the individual correlations reached

significance (p < .05); however only the TMT and eTrails-Standard showed marginally

significant differences in their reliabilities (z = -1.51, p = .07).

24

Split-Half Reliability

The criterion measures used in this study (FA/DA, Ospan, and Stroop) are

commonly used in research because they are recognized as highly reliable measures.

Nevertheless, the reliability of each criterion measure was assessed in the current study

using a split-half procedure (because each measure was only administered once) as well

as the Spearman-Brown boosting formula, the results of which are in Table 6b. Our

divided attention paradigm was found to have strong reliability, r = .92 for the divided

attention task, and r = .93 for the full attention task. Ospan was found to have a reliability

that was somewhat lower than what is traditionally reported in the literature, r = .68 in the

absolute scoring condition, r = .71 in the relative scoring condition (e.g., Conway et al.

(2005) report it to generally be around .80). Stroop produced surprisingly poor reliability,

only r = .65. This is likely due to some combination of the following: (1) Motivational

changes over the course of the task: The Stroop task was the last critical criterion

measure to be administered in the current study and thus cognitive fatigue and subject

apathy were likely at their highest for this task; (2) the binary format of the current Stroop

task, and most critically, (3) the Stroop effect is represented by a subtraction of congruent

from incongruent performance; subtraction scores are notorious for producing lower

reliability in the Stroop task compared with correlations using the response latencies from

one or more conditions (Strauss, Allen, Jorgensen, & Cramer, 2005). In sum, with the

exception of the divided attention task, these frequently used criterion measures achieved

lower levels of reliability in the current study than expected.

25

Correlations Among Trails Tasks

The Pearson Product-Moment correlations between the Trails tasks can be

found in Table 7. Correlations were conducted in two ways; the first set of correlations

described below look at relationships between the ―B‖ forms only while the second round

of correlations look at subtraction scores (B minus A). The most evident pattern in these

correlations is that the majority of the Trails tasks, including the TMT, showed

consistently significant correlations with one another. However, there is also one clear

exception to this pattern of significance: Both the TMT and eTrails-FlashNext lost much

of their predictability on the second administration. The potential weaknesses of these

two Trails variants specifically, and the potential problems with multiple administrations

of an executive task, will be discussed in more detail in the General Discussion. As a

general rule, though, this table points to fairly good construct validity for eTrails; eTrails

generally correlates with the TMT and with itself, which suggests that each variant is

measuring the same basic ability.

Correlations of Trails to Criterion Measures

Few correlations between the B forms of the Trails variants and the executive

tasks achieved significance (Table 8). The single exception to this pattern was the divided

attention task which was correlated with the first administration of the TMT (r = .363),

the first administration of the eTrails-Standard (r = .354), and the first administration of

eTrails-FlashRandom (r = .354). Neither Ospan nor Stroop correlated with any of the

TMT or eTrails tests. The only other notable correlation involving the criterion measures

was between Ospan Absolute and the divided attention task (r = -.370).

26

Correlations Using Subtraction Scores

Our main hypotheses focused on the use of using subtraction scores as the key

measure for the Trails tasks given their use as the principal index of executive ability in

the original Trail Making Task, Unfortunately, none of the correlations involving the

Trails subtraction scores achieved significance, including both correlations among the

Trails tasks themselves (Table 9) and with the criterion measures (Table 10). A few of the

strongest correlations—largely involving the second administration of eTrails-Standard

and the divided attention task—remained, but the general pattern is clearly one of

nonsignificance. Also notable, but not easily interpretable, the Stroop task, which had not

demonstrated a significant interaction in any of the other analyses, correlated

significantly with the first administration of eTrails-RandomFlash.

DISCUSSION

The main goal of the current study was to standardize the procedures associated

with the pen-and-paper TMT using computerized, touch-screen technology. By simply

streamlining the administration of the TMT, it was expected that eTrails-Standard would

produce fewer errors, show increased reliability, and produce stronger correlations with

the other executive measures compared with the original TMT. Moreover, the eTrails

variants designed to strategically tax specific executive abilities (inhibition and visual

search) were expected to show even stronger correlations with other executive measures.

By achieving these goals, this study would provide both a clearer understanding of the

factors that drive the predictive power of the Trail Making Task as well as create a

potentially better diagnostic tool for executive deficits.

27

Computerizing the Trail Making Test

The first goal of the current study–improving the administration of the

TMT through computerization—appears to have been achieved to some degree. Both the

TMT and eTrails-Standard produced data consistent with the prior literature (Arbuthnott

& Frank, 2000): In each case, subjects took longer to complete the ―B‖ form of the task

compared with the ―A‖ form and there was more variability in performance on the ―B‖

forms; both of these outcomes are consistent with the more executive nature of Trails B

and point to sufficient individual differences in the current study to conduct correlational

analyses.

More importantly, there were a number of limitations observed for the TMT in the

current study that appear to have been mitigated by computer administration. The first is

the higher than expected error rate on the TMT. Most of the errors observed were due to

rushed and careless performance (not connecting the pen line to a circle). These errors

due to carelessness were eliminated in eTrails due to its strict criteria as to what counts as

a correct response. These restrictions help make the error term a more helpful tool and the

overall RTs a more valuable indicator of performance.

A second, potentially related finding is the reduced practice effects for eTrails

compared with the TMT. There was a 20% reduction in average reaction time for TMT

from the first to second administration (for the ―B‖ forms) and only 10.33% for eTrails-

Standard (―B‖ forms). The reduced practice effects may be due, in part, to the fact that

there is less of a learning curve in the computer version. Many of the confusing

procedural aspects of the TMT are removed in eTrails (e.g. learning the mechanics

involved in connecting the circles quickly, holding your arm in a position that avoids

28

blocking the target numbers/letters, understanding what to do when you produce an

incorrect trail, etc.). Therefore, while participants may use the first administration of the

TMT to get accustomed to the task and thus show substantial improvement the second

time they encounter the task, performance on eTrails would not show the same degree of

change because participants likely start off performing the eTrails task at a more optimal

level. A second factor likely contributing to the reduced practice effects on the eTrails

variants is the fact that, while TMT involved repeated administration of the identical

form, eTrails never repeated the same layout of targets. Thus, knowledge acquired in the

first round of the TMT would have facilitated performance more directly on the second

round than was the case for eTrails,

A final benefit of eTrails was evident in the correlations among the various

eTrails variants and the TMT. The majority of the eTrails variants correlated with one

another which is indicative of solid construct validity in these new, computerized Trails

tasks. One of the exceptions to this pattern of significance is also noteworthy. The second

administration of the TMT failed to demonstrate consistent correlations with the other

Trails tasks. Given the importance of repeated testing in diagnosing dementia and other

cognitive disorders, it is of some concern that a widely used test changes in its

predictability from its first to second administration. Computerization of the Trails task

appears to have successfully increased its consistency.

It also bears mentioning at this point that one of the eTrails variants, eTrails-

FlashNext, also did not fare well on the second administration. eTrails is intended as a

measure executive control and FlashNext is ostensibly the least executive of all the

eTrails variants. It differs from the other eTrails tasks in that participants can arrive at the

29

correct answer by simply responding automatically to task cues (i.e., the flashing of the

next item in the series). The automaticity of FlashNext is supported by its considerably

faster reaction times compared to the other eTrails variants. These automatic processes

are likely at their strongest during the second administration of FlashNext, as participants

become increasingly reliant on these automatic signals. As performance on FlashNext

becomes more automatic, it would be expected to correlate less with other eTrails tasks

measuring executive abilities.

Task Reliability

The issue of task reliability in the current study was mixed. On one hand, an

exciting outcome of this study was the noticeable improvement in reliability observed for

the eTrails variants compared with the TMT. Every version of eTrails produced higher

reliability coefficients (ranging from .55 to .62) than that produced by the TMT (.37).

Moreover, the difference in reliability between the TMT and eTrails-Standard almost

reached statistical significance. These improvements were likely due, at least in part, to

the reduction of administration errors discussed above; in other words, unlike the TMT,

eTrails performance during the two different administrations was likely due to a common

set of executive processes rather than idiosyncratic factors having to do with task

administration.

Conversely, the reliability estimates for many of our established criterion

measures were low. This was unexpected given that individual performance on each of

our executive measures was consistent with expectations. Ospan is a generally reliable

measure (achieving reliability of approximately .80; Conway et al., 2005); this level of

reliability was not replicated in the current study (where a split-half reliabilities of .68

30

and .71 were obtained for absolute and relative scoring, respectively). There is no clear

explanation for this decreased reliability. Anecdotal evidence suggests that participants

found it to be a very frustrating task and that they might have ―given up‖ halfway

through.

Like Ospan, Stroop performance was consistent with prior research (Spieler,

Balota, & Faust, 1996); mean reaction times for congruent trials were significantly faster

than neutral trials (i.e., there was significant facilitation) and incongruent trials were

significantly slower than the neutral trials (i.e., there was significant interference). This

also produced ample variability in the resultant Stroop effects in this task. Nevertheless,

the Stroop task produced the lowest reliability estimate of all of the criterion measures (r

= .65). As discussed earlier, this low reliability was likely due primarily to the use of

difference scores, or Stroop effects, as the main predictor for this task, but might have

also been due to the binary nature of the current task or to subject fatigue.

The one criterion task that showed both high levels of split-half reliability and

significant correlations with more than one of the Trails measures was the divided

attention task. This finding, while clearly not as strong as anticipated, is noteworthy given

the secondary goal of the current research—to explore the use of dual-task costs as a

proxy for cognitive decline. When the participants had their attention divided, their

accuracy fell (to .36 from .73). This is a substantial drop and suggests that our divided

attention task succeeded in reducing the participant's total amount of available cognitive

resources. Thereby, making their data resemble an older adult's expected performance

and giving our data set a proxy for a diverse cognitive functioning. The significant

31

correlations between eTrails and the divided attention task point to the potential for

eTrails to be a sensitive and predictive measure across a range of cognitive ability.

CONCLUSIONS

Of the two main goals of this research—to produce a viable computerized Trails

measure and to demonstrate its ability to predict a variety of other executive tasks—only

the first was met with any degree of success. Even still, there are a number of take-home

lessons from this research that help inform how to build a good Trails measure.

A Trails task must not be so complicated as to produce large numbers of

procedural errors. Such errors may compromise reliability. Conversely, a Trails task

cannot be so easy that it can be performed automatically. Executive functioning is a

difficult construct to measure, because it is required only in novel situations that cannot

be performed using routines or habits. Tasks, especially when they are repeatedly

administered, run the risk of becoming quickly automated such that one often only has

one or two attempts to get an accurate measure of executive control. Note that the two

highest performing eTrails variants in the current study, Scramble and Flash Random,

were the ones arguably least amenable to the build-up of automaticity. Moreover, given

that our criterion tasks were performed so late in the experimental sequence, perhaps

participants had exhausted their executive control and were running more on automatic,

rather than executive, abilities. In addition to maximizing executive functioning by

performing key tasks early, future Trails studies should also use multiple forms of each

test. The current results demonstrated the key role of multiple forms in mitigating

32

practice effects and for demonstrating the kind of reliability necessary for predicting the

differences in cognition seen in Alzheimer‘s Disease.

Alzheimer‘s Disease is a destructive and expensive burden on society. It is a

disease that is emotionally painful for both the individual and their caretakers. Although

this study does not provide any definitive conclusions regarding the ability of our

computerized Trails measure to predict the kinds of executive declines associated with

Alzheimer‘s Disease, it does provide a starting point for creating a better Trails task. The

TMT has proven itself to be a successful tool; however, the small increases in accuracy

and reliability may provide the first steps toward the small increases in the predictive

power that could have far-reaching clinical and research benefits in the future.

33

REFERENCES

Arbuthnott, K., & Frank, J. (2000). Trail Making Test, Part B as a measure of executive

control: Validation using a set-switching paradigm. Journal of Clinical and

Experimental Neuropsychology, 22, 518-528.

Ashendorf, L., Jefferson, A. L., O'Connor, M. K., Chaisson, C., Green, R. C., & Stern, R.

A. (2008) Archives of Clinical Neuropsychology, 23(2), 129-137.

Baddeley, A. D., & Hitch, G. (1974). Working memory. In K.W. Spence and J. T. Spence

(eds.) The Psychology of Learning and Motivation, vol 8. (pp. 67-89). New York:

Academic Press.

Basso, M. R., Bornstein, R. A., & Lang, J. M. (1999). Practice effects on commonly used

measures of executive function across twelve months. The Clinical

Neuropsychologist, 13, 283-292.

Blacker, D., Lee, H., Muzikansky, A., Martin, E. C., Tanzi, R., & McArdle, J. J. et al.,

(2007). Neuropsychological measures in normal individuals that predict cognitive

decline. Archives of Neurology, 64, 862-871.

Bondy, M. W., Serody, A. B., Chan, A. S., Eberson-Shumate, S. C., Delis, D. C., Hansen,

L. A., & Salmon, D. P. (2002). Cognitive and neuropathologic correlates of stroop

color-word test performance in Alzheimer's disease. Neuropsychology,16(3), 335-

343.

Buck, K. K., Atkinson, T. M., & Ryan, J. P. (2008). Evidence of practice effects in

variants of the Trail Making Test during serial assessment. Journal of Clinical and

Experimental Neuropsychology, 30, 312-318.

34

Cahn, D. A., Salmon, D. P., Butters, N., Wiederholt, W.C., Corey-Bloom, J., Edelstein,

S.L., & Barrett-Connor, E. (1995). Detection of dementia of the Alzheimer type in a

population-based sample: Neuropsychological test performance. Journal of the

International Neuropsychological Society, 1, 252–260

Caroli, A., & Frisoni, GB. (2009). Quantitative evaluation of Alzheimer's disease. Expert

review of Medical Device, 6(5), 569-688.

Chen, P., Ratcliff, G., Belle, S. H., Cauley, J. A., DeKosky, S. T., & Ganguli, M. (2001).

Patterns of cognitive decline in presymtomatic Alzheimer's Disease. Archives of

General Psychiatry, 58, 853-858.

Conway, A. R. A., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle,

R. W. (2005). Working memory span tasks: A methodological review and user‘s

guide. Psychonomic Bulletin & Review, 12, 769-786.

Craik, F. I. M. (1982) . Selective changes in encoding as a function of reduced processing

capacity. Cognitive Research in Psychology, Berlin: Deutscher Verlag der

Wissenschaften, 152–161.

Daniels, K.A., Toth, J.P., & Jacoby, L.L. (2006). The aging of executive functions. In

F.I.M. Craik & E. Bialystok (Eds.), Lifespan cognition: Mechanisms of change (pp. in

press). New York, NY: Oxford University Press

Drapeau, C.E., Bastien-Toniazzo, M., Rous, C., & Carlier, M. (2007) Nonequivalence of

computerized and paper-and-pencil versions of the trail making test. Perceptual &

Motor Skills, 104(3), 785-791.

Engle, R. W., (2002) Working memory capacity as executive attention. Current

Directions in Psychological Science, 11(1), 19-23.

35

Jacoby, L.L., Bishara, A.J., Hessels, S., & Toth, J.P. (2005). Aging, subjective

experience, and cognitive control: Dramatic false remembering by older

adults. Journal of Experimental Psychology: General, 134, 131-148.

Jacoby, L.L., Toth, J.P., & Yonelinas, A.P. (1993). Separating conscious and unconscious

influences of memory: Measuring recollection. Journal of Experimental Psychology:

General, 122(2), 139-154.

Johnson, J. K., Lui, L., & Yaffe, K. (2007). Executive Function, More Than Global

Cognition, Predicts Functional Decline and Mortality in Elderly Women. The

Journals of Gerontology Series A: Biological Sciences and Medical Sciences, 62,

1134-1141.

Kensinger E.A., Shearer D.K., Locascio J.J., Growdon J.H., & Corkin S. (2003).

Working memory in mild Alzheimer's disease and early Parkinson's disease.

Neuropsychology, 17(2), 230-239.

Kinner, E. I., & Fernandes, M.A. (2009). Illusory recollection in iolder adults and

younger adults under divided attention. Psychology and Aging, 24(1), 211-216

Kubo, M., Shoshi, C., Kitawaki, T., Takemoto, R., Kinugasa, K., Yoshida, H., Honda, C.,

& Okamoto, M. (2008). Increase in prefrontal cortex blood flow during the computer

version of the trail making test. Neuropscyhobiology, 58, 200-210.

Lezak, MD, Howieson, D.B., & Loring, D.W. (2004). Neuropsychological Assessment

(4th ed.). New York: Oxford University Press.

MacLeod, C. M. (1992). The stroop task: The "gold standard" of attention measures.

Journal of Experimental Psychology: General, 121(1), 12-14.

36

Mitrushina, M. N., Boone, K. B., & D‘Elia, L. F. (1999). Handbook of Normative Data

for Neuropsychological Assesment. New York: Oxford University Press.

Rosen, V. M., Bergeson, J. L., Putnam, K., Harwell, A., & Sunderland, T. (2002).

Working memory and apolipoprotein E: What's the connection? Neuropsychologia,

40(13), 2226-2233.

Salthouse, T.A., Toth, J.P., Daniels, K., Parks, C., Pak, R., Wolbrette, M., & Hocking,

K.J. (2000). Effects of aging on efficiency of task switching in a variant of the Trail

Making Test. Neuropsychology, 14, 102-111.

Sanchez-Cubillo, L., Perianez, J.A., Adrover-Roig, D., Rodriguez-Sanchez, J.M., Rios-

Lago, M., Tirapu, J., & Barcelo, F. (2009). Construct validity of the trail making test:

Role of task-switching, working memory, inhibition/interference control, and

visuomotor abilities. Journal of the International Neuropsychological Society, 15,

438-450.

Spieler, D.H., Balota, D.A. (2000). Factors influencing word naming in younger and

older adults. Psychology and Aging, 16(2), 312-322.

Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy

younger and older adults and in individuals with dementia of the Alzheimer's

type. Journal of Experimental Psychology: Human Perception and Performance,

22, 461-479.

Storandt, M. (2008). Cognitive deficits in the early stages of Alzheimer's Disease.

Current Directions in Psychological Science, 17, 198-202

37

Strauss, E., Sherman, E.M.S., & Spreen, O. (2006). A Compendium of

neuropsychological tests: Administration, norms, and commentary. (3rd. ed.). NY.

Oxford University Press.

Turner, M. L. & Engle, R. W. (1989). Is working memory capacity task dependent?

Journal of Memory and Language, 28, 127-154.

Reitan, R. M., & Wolfson, D. (1985). The Halstead-Reitan Neuropsychological Test

Battery: Theory and Clinical Interpretation. Tucson: Neuropsychology press.

Verhaegen, P., Steitz, D., Sliwinski, M., & Cerella, J. (2003) Aging and Dual task

performance: A meta-analysis. Psychology and Aging, 18, 443-460.

38

Table 1

Trail Making Test Scores

TMTA 1st

Administration

TMTA 2nd

Administration

TMTB 1st

Administration

TMTB 2nd

Administration

Average 21518 18586 46789 37408

STDev 7035 4374 17046 11658

*scores in milliseconds

39

Table 2a

Form ―A‖ Statistics for eTrails

eTrails

Standard

A1

eTrails

Standard

A2

eTrails

FlashNext

A1

eTrails

FlashNext

A2

eTrails

Scramble

A1

eTrails

Scramble

A2

eTrails

FlashRandom

A1

eTrails

FlashRandom

A2

Average 14124 13915 12412 11789 19508 18495 15839 14155

SD 3114 3335 2517 2429 4063 3883 3508 3076


Table 2b

Form ―B‖ Statistics for eTrails

eTrails

Standard

B1

eTrails

Standard

B2

eTrails

FlashNext

B1

eTrails

FlashNext

B2

eTrails

Scramble

B1

eTrails

Scramble

B2

eTrails

FlashRandom

B1

eTrails

FlashRandom

B2

Average 18257 16370 12834 12613 24669 23380 19450 16284

SD 5080 3872 2948 2660 5840 5304 5746 4441


40

Table 3

Divided Attention Task Results

DA ACC

Corrected

FA ACC

Corrected

FA-DA

Average 0.37 0.73 0.37

SD 0.18 0.16 0.17

41

Table 4

OSPAN Results

Ospan

Relative

Ospan

Absolute

Average 26.12 13.44

SD 5.21 6.19

42

Table 5

Stroop results

Congruent Incongruent Neutral Facilitation Interference Stroop Effect

Average 564 704 601 37 103 140

SD 84.17 130.02 99.00 47.21 73.90 72.57


43

Table 6a

Reliability for Trails Tasks

TMT eTrailsStandard eTrailsFlashNext eTrailsScramble eTrailsFlashRandom

Time 1 46,789 18,415 12,880 24,429 19,744

Time 2 37,408 16,644 12,815 23,380 16,301

R 0.37 0.62 0.56 0.55 0.58

*Time in milliseconds

Table 6b

Reliability for Criterion Measures

Divided Attention Full Attention Ospan Relative Ospan Absolute Stroop

Split-Half Reliability 0.92 0.93 0.71 0.68 0.65

44

Table 7

Correlation Between Trails Tasks

TMT

B1

TMT

B2

eTrails

Standard

B1

eTrails

Standard

B2

eTrails

FlashNext

B1

eTrails

FlashNext

B2

eTrails

Scramble

B1

eTrails

Scramble

B2

eTrails

Flash

Random

B1

TMTB1

TMTB2 0.366

eTrails

Standard B1

0.668 0.315

eTrails

Standard B2

0.593 0.296 0.62

eTrails

FlashNext B1

0.547 0.494 0.488 0.655

eTrails

FlashNext B2

0.34 0.016 0.37 0.378 0.558

eTrails

Scramble B1

0.425 0.168 0.484 0.542 0.486 0.431

eTrails

Scramble B2

0.537 0.5 0.385 0.626 0.588 0.224 0.547

eTrails

FlashRandom B1

0.568 0.505 0.461 0.469 0.381 0.1 0.297 0.537

eTrails

FlashRandom B2

0.416 0.595 0.436 0.479 0.527 0.089 0.375 0.719 0.575

*green=significant

45

Table 8

Correlations Between Trails Tasks and Criterion Measures

TMT

B1

TMT

B2

eTrails

Standard

B1

eTrails

Standard

B2

eTrails

FlashNext

B1

eTrails

FlashNext

B2

eTrails

Scramble

B1

eTrails

Scramble

B2

eTrails

Flash

Random

B1

eTrails

Flash

Random

B2

Stroop -0.165 0.006 -0.081 0.021 0.181 0.176 0.31 0.14 -0.124 .061

Ospan

Relative 0.111 -0.012 -0.183 -0.146 -0.089 -0.39 -0.105 0.033 -0.063

-0.124

Ospan

Absolute 0.062 -0.033 -0.129 -0.027 0.047 0.047 -0.004 0.067 -0.056

-.224

FADA 0.362 0.303 0.354 0.187 0.196 0.1963 -0.091 0.166 0.354 .273

*green=significant

46

Table 9

Correlation Between Trails Tasks Using Subtraction Scores

TMT

1

TMT

2

eTrails

Standard

1

eTrails

Standard

2

eTrails

FlashNext

1

eTrails

FlashNext

2

eTrails

Scramble

1

eTrails

Scramble

2

eTrails

FlashRandom

1

TMT 1

TMT 2 0.116

eTrails

Standard1 0.161 -0.034

eTrails

Standard2 0.218 0.083 0.241

eTrails

FlashNext1 0.124 0.375 0.004 0.421

eTrails

FlashNext2 0.317 0.074 0.045 -0.065 0.028

eTrails

Scramble1 0.317 0.122 0.102 0.151 0.221 0.136

eTrails

Scramble2 -0.164 0.262 0.055 0.059 0.283 0.049 0.2

eTrails

FlashRandom

1 0.162 0.383 0.046 -0.7 0.126 0.334 -0.095 -0.016

eTrails

FlashRandom

2 0.375 0.338 0.247 0.085 0.052 0.121 0.3 0.156 -0.036

*green=significant

47

Table 10

Correlation Between Trails Tasks and Criterion Measures Using Subtraction Scores

TMT

1

TMT

2

eTrails

Standard

1

eTrails

Standard

2

eTrails

Flash

Next

1

eTrails

Flash

Next

2

eTrails

Scramble

1

eTrails

Scramble

2

eTrails

Flash

Random

1

eTrails

Flash

Random

2

Stroop -0.184 0.072 0.056 0.133 -0.084 0.057 0.109 0.313 -0.353 -0.057

Ospan

Relative 0.089 0.075 -0.034 -0.37 0.194

0.00 0.087 0.076 0.106 0.049

Ospan

Absolute 0.004 0.056 0.97 -0.336 0.241

0.019 0.036 0.22 0.06 -0.027

FADA 0.312 0.175 0.115 0.41 0.263 0.186 -0.176 -0.01 0.31 -0.053

*green=significant

48

Appendix A

Figure 1

Trails B Example

49

Appendix B

Figure 2

Example of Button Layout Procedure

1 2 1 2

3 4 3 4

1 2 1 2

3 4 3 4

CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING …dl.uncw.edu/Etd/2011-2/smithb/briansmith.pdf · CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING ... a simple paper-and-pencil

Documents