Assessing the Utility of a Virtual Reality Test of Executive Dysfunction on Traumatic Brain Injury Patients - Matthew R. J. Vandermeer

Running Head: VIRTUAL REALITY TESTING IN TBI PATIENTS

Assessing the Utility of a Virtual Reality Test of Executive Dysfunction on Traumatic

Brain Injury Patients

Matthew R. J. Vandermeer

University of Toronto Scarborough

996234445

VIRTUAL REALITY TESTING IN TBI PATIENTS 1

Abstract

The present study examines the ability of an ecologically valid “Virtual Reality (VR)

Office Task” test of executive functioning (alongside a battery of traditional tests of

executive functioning) to discriminate between a group of 30 non-injured control subjects

and 5 TBI patients. Statistical comparison of the means demonstrated that the two groups

performed significantly differently on the novel VR test but failed to reach significant

differences on the majority of traditional tests of executive functioning. Analysis of the

magnitude of the difference between the two groups’ performances using Cohen’s d

effect sizes showed that the greatest magnitude difference was found among the three

scores of the VR tests. This VR Office Task may prove to have clinical utility in

assessing ongoing cognitive dysfunction in TBI patients.


Introduction

Neuropsychological tests occupy an extremely important role in examining a patient’s

cognitive deficits incurred as a result of traumatic brain injury (TBI). Clinical diagnosis is

based on a patient’s level of performance on these tests (Lezak, Howieson, & Loring,

2004). The clinician’s diagnosis consequently has a wide range of implications for the

life of the patient, the most important of which is predicting the degree to which the

patient can expect to function with regards to tasks in their every day life (Chaytor &

Schmitter-Edgecombe, 2003). Therefore, it is of utmost importance that these tests

accurately predict the degree to which an individual with a TBI can expect to suffer from

deficits in their daily functioning.

Mild traumatic brain injury

TBI can be classified as mild, complicated mild, moderate, and severe.

Classification is based on testing at the time of injury, Glascow Coma Scale (GCS;

Teasdale & Jennett, 1974), post-injury characteristics, duration of Loss of Consciousness

(LOC; Lezak et al., 2004) and Post Traumatic Amnesia (PTA; Bond, 1990). Mild TBI

(mTBI) specifically, is defined by a head trauma resulting in a GCS score of 13 to 15,

PTA of less than 24 hours (commonly much less), and a very brief LOC (if any) of

seconds to minutes (Alexander, 1995). Increasingly severe TBI are defined by decreasing

scores on the GCS, and increasingly longer periods of LOC and PTA.

Recovery from mTBI

Expected recovery. While there is consensus regarding ongoing cognitive

dysfunction among moderate and severe TBI patients, the outcome for mTBI patients is

much less clear (Dikmen et al., 2009; Ponsford et al., 2000). Neuropsychological


sequelae post-mTBI include: reduced visuo-motor speed (Levin et al., 1987), deficits in

attention (Chan, 2005; Ziino & Ponsford, 2006), reduced expressive fluency (Henry &

Crawford, 2004), and slowed information processing (Johansson, Berglund, & Ronnback,

2009; Mathias et al., 2004). In addition there is considerable evidence that executive

functioning is impaired following TBI (Hartikainen et al., 2010; Nolin, 2006; Ord, Greve,

K. W., Bianchini, K. J., & Aguerrevere, 2010). Most research has demonstrated that

following mild cognitive impairment in the immediate post-traumatic period (Maddocks

& Saling, 1996) the vast majority of mTBI patients recover within the 1-3 months,

showing no long-term neuropsychological deficits (Alexander, 1995; Ponsford et al.,

2000; Voller et al., 1999).

Persistent difficulties in mTBI patients. Despite evidence for mTBI recovery

within 1-3 months, many patients report prolonged difficulties in aspects of every day

living related to executive dysfunction (Conboy, Barth, & Boll, 1986). Konrad et al.

(2011) demonstrated that even 6 years following a mTBI patients showed persistent

cognitive dysfunction along with significantly higher self-ratings of impairment in daily

life. Alves, Macciocchi, and Barth (1993) found that a small but significant number of

mTBI patients continued to self-report impairment up to 1 year post-trauma. Additionally

there have been several studies showing persistent difficulties following a mTBI after the

typical recovery phase as demonstrated by a failure to return to work (Ruffolo, Friedland,

Dawson, Colantonio, & Lindsay 1999; Hanlon, Demery, Martinovich, & Kelly, 1999).

These two contradictory findings: that most mTBI patients recover to premorbid

levels within 3 months of injury, and that some mTBI patients persist in self-reporting


difficulties in every day functioning, present a problem. If neuropsychological testing

reflects a full recovery why is it that some individuals claim that their difficulties persist?

Neuropsychological tests and ecological validity

Over the past several decades a number of tests have been developed to assist in

the assessment of executive dysfunction. This includes the Wisconsin Card Sort Test

(WCST; Heaton, Chelune, Talley, Gary, & Curtiss, 1993), the Ruff Figural Fluency Test

(RFFT; Ruff, 1996), and the Tower of London Revised (TOL-DX; Cubertson & Zillmer,

2001). A frequently argued limitation to the use of traditional neuropsychological tests

such as these is their lack of ecological validity. Ecological validity refers to how well a

test predicts how one performs in the real world, outside of the testing environment. The

concept of ecological validity in testing can best be explained along two dimensions,

veridicality and verisimilitude (Chaytor & Schmitter-Edgecombe, 2003). Veridicality is

defined as the degree to which neuropsychological tests are statistically related to ones

performance in the real world (Franzen & Wilhelm, 1996). Verisimilitude is defined as

the degree to which a neuropsychological test resembles a task found in every day life

(Franzen & Wilhelm, 1996). The main idea behind the design of tests that are verisimilar

is that performance on these tests will reflect individual abilities in performing everyday

tasks, without taking into account the specific cause of deficits (Chaytor & Schmitter-

Edgecombe, 2003). That is to say, it’s the prediction of future, non-testing behaviours

that we’re interested in – not necessarily an explanation of the behaviour itself or the

cause of the behaviour.

The concept of ecological validity in testing neuropsychological function is one of

vital importance. The assumption has often been made in the past that diminished


functioning in a patient’s everyday behaviour could be inferred from poor performance

on neuropsychological tests (Chaytor, Schmitter-Edgecombe, & Burr, 2006). This is an

important assumption to examine, and perhaps reevaluate, in light of the finding that

many neuropsychological tests don’t predict the level of real life functioning (Wilson,

1993).

Several explanations have been put forth as to why traditional neuropsychological

testing has lacked ecological validity, and more specifically, show limited verisimilitude.

First, traditional neuropsychological testing has taken place in a heavily controlled and

artificial testing environment. The test environment is often quiet, distraction free, with

strictly determined rules, and all behaviours prompted by the clinician (Manchester,

Priestley, & Jackson, 2004). This removes much of the environmental challenges that an

individual with executive dysfunction may experience in their real life experiences. It is

very possible that the testing environment is so different from the real world that the

cognitive factors being assessed in the testing environment are independent of those used

in every day life (Burgess, Alderman, Evans, Emslie, & Wilson, 1998; Norris & Tate,

2000).

A second factor contributing to limited ecological validity in neuropsychological

testing is that many of the tests now being used in a clinical setting were originally

developed for use in research (Manchester et al., 2004). The requirements for assessment

tools in research and clinical settings are fundamentally different (Burgess et al., 2006).

Tasks derived and used in a research setting usually aim to explain some relationship

between the human brain and behaviour; this is not necessarily what the clinician is

interested in. Rather, it is the functional outcome, or what the patient is capable of in the


real world, that is of primary clinical interest (Burgess et al., 2006). In other words, tasks

that may be useful in delineating relationships between biology and behaviour in a

research setting do not necessarily describe functionality in the real world (the primary

concern of the clinician).

Virtual reality

The use of Virtual Reality (VR) in a clinical capacity is increasingly common

including: psychiatric treatment (Rothbaum et al., 1995), neurocognitive rehabilitation

(Trepagnier, 1999; Wilson, Foreman, & Stanton, 1997), and surgical training (Seymour et

al., 2002). The application of VR to increase ecological validity in neuropsychological

testing is also becoming increasingly prevalent (Campbell, et al., 2009; Kang et al., 2008;

Christiansen et al., 1998).

The use of VR offers several advantages over traditional neuropsychological tests.

One of the most obvious of these is immersion of the patient in a realistic Virtual

Environment (VE; Schultheis, Himelstein, & Rizzo, 2002). The use of VEs provides an

increased approximation of the environment and situations that a patient is likely to

encounter in the real world, and therefore an increase in testing’s verisimilitude and

ecological validity. In the most general sense, VR allows the clinician or researcher to

directly observe an individual’s functionality in a real world situation, while placing strict

controls on the environment (Schultheis et al., 2002).

Present study

While traditional neuropsychological testing routinely shows that mTBI patients

recover by 3 months post-injury (Alexander, 1995; Ponsford et al., 2000; Voller et al.,

1999), many patients report persistent challenges with cognitive dysfunction. There is a


disconnect between what the neuropsychological tests indicate and what many mTBI

patients report. Accordingly, the purpose of the present study was to examine an

ecologically valid VR measure of executive function (VR Office Task) in order to

determine if it can be useful in discerning executive dysfunction in TBI patients after the

expected recovery period.

For the purposes of this study several hypotheses have been developed.

Hypothesis 1 states that due to the low ecological validity of traditional

neuropsychological tests of executive functioning (i.e. the Wisconsin Card Sort Test,

Tower of London, and Ruff Figural Fluency Test), no significant differences will be

found between the performance of non-injured control subjects and TBI patients.

Hypothesis 2 states that due to the high level of ecological validity afforded by the virtual

environment in the VR Office Task, there will be a significant difference between the

performance of non-injured control subjects and TBI patients. Finally, hypothesis 3 states

that the greatest magnitude of difference in performance between the two groups will be

found in the VR Office Task (as measured by effect sizes).

Methods

Participants

Control subjects were healthy individuals, with no history of psychological

disturbance or head injury. They were primarily drawn from an introductory psychology

class at University of Toronto Scarborough, along with a number of non-student

members of the community. Student participation was compensated by the addition of

1% onto their final grade, while community members were not compensated. The final

control sample consisted of 30 healthy individuals. The control sample had a mean age of


20.1 (SD = 5.67, range = 18 – 49), was 63.3% female, and had a mean education of 12.9

years (SD = 1.41, range = 11 – 16) (see table A1).

Clinical participants were drawn from a population of individuals diagnosed with

TBI. The final control sample consisted of 5 patients, with a mean age of 40.0 (SD =

26.5, range = 20 – 83), was 100% male, and had a mean education of 15.6 years (SD =

4.34, range = 12 – 22). Medical files indicated a mean Glasgow Coma Scale of 11.4 (SD

= 4.9, range = 3 – 15), and an average of 409 days (SD = 77.5, range = 278 – 477)

between date of loss and date of assessment.

Any subject with a history of psychological diagnosis, or who failed to score >45

on the Test of Memory Malingering (used here as an assessment of effort [O’Bryant,

Engel, Kleiner, Vasterling, & Black, 2007]), or whose scores on the Personality

Assessment Inventory (PAI) indicated either invalid results or some psychological

disorder were exempt from the study. This resulted in 6 control subjects being removed

from the final analysis.

Materials

In order to assess the contribution of personality on executive functioning tasks,

subjects were administered the Personality Assessment Inventory (PAI). The Test of

Memory Malingering (TOMM) was administered to subjects as a measure of effort level.

The Wide Ranging Achievement Test 4th Edition (WRAT) Word Reading subtest was

administered in order to evaluate participant reading level. Finally, executive functioning

was assessed using the Tower of London-Drexel University Second Edition (TOL), the

64-card Wisconsin Card Sort Test (WCST-64), Ruff Figural Fluency Test (RFFT), and a

novel VR Office Task (VROT).


Personality Assessment Inventory. The PAI (Morey, 1991) is a self-report,

multiple-scale measure designed to assess both psychopathology and personality. The

344-item scale consists of 22 orthogonal scales: 4 validity scales, 11 clinical scales, 5

treatment scales, and 2 interpersonal scales. Each item is rated by participants on a 4-

point Likert scale from “false, not at all true” to “very true”. The PAI was primarily used

to screen for the presence of psychopathology in participants.

Wide Ranging Achievement Test – 4. The WRAT (Wilkinson & Robertons,

2006) is a brief assessment of academic skills, including word reading, spelling, and basic

math. The WRAT reading subtest was used to quickly assess reading level among

subjects.

Test of Memory Malingering. The TOMM was originally designed as a test to

detect symptom validity in neuropsychological testing (Tombaugh, 1996), and as such

has become the most frequently administered symptom validity test among clinical

neuropsychologists (Slick, Tan, Strauss, & Hultsch, 2004). The TOMM trial 1 was

administered to participants according to standard procedures as outlined in Tombaugh

(1996). Participants first completed a practice trial consisting of 2 stimuli pictures they

had to remember, and then identify, using a forced choice paradigm with two options.

Following the practice trial, participants completed TOMM trial 1, which is identical to

the practice trial, excepting there are now 50 stimuli instead of 2.

Following the O’Bryant et al. (2008) finding that little additional information

could be obtained by administering the full TOMM to subjects who score >45 on TOMM

trial 1, a discontinue rule for subjects who scored >45 on TOMM trial 1 was followed.

Additionally, TOMM trial 1 has emerged as an efficient tool to quickly screen for


sufficient effort in participants (O’Bryant et al., 2007). This study used TOMM to screen

against insufficient effort in participants. Participants who failed to score >45 on any

given trial of the TOMM were excluded from the study.

Tower of London-Drexel University Second Edition. The TOL was designed to

assess a subject’s executive functioning, specifically planning behaviour (Shallice, 1982).

TOL consists of 12 trials (two practice trials and ten test trials). Participants are required

to rearrange 3 coloured rings from their initial position on 2 of 3 upright pegs of varying

length, to a new predetermined position. As the participant rearranges the rings, they

must follow two rules: the participant can not attempt to place more rings on a given peg

than it can support due to it’s length (Rule I), and the participant may only move one peg

at a time (Rule II). Participants are marked as correct for a given trial if they correctly

rearrange the rings within the minimum number of moves required (ranging from 2 to 7),

within the allotted time limit (2 minutes per trial). Scores were calculated based on total

move score (sum of total moves minus the minimum number of moves required), total

time (sum of times required for each trial), total initiation time (sum of time taken to

begin each trial), and the total correct score (number of trials completed within the

minimum move count).

Wisconsin Card Sort Test-64. The Wisconsin Card Sort Test (WCST) is

designed to test the subject’s executive functioning, specifically formation of abstract

thoughts, and cognitive flexibility (Heaton et al., 1993). Rabin, Barr, & Burton (2005)

reported that the WCST is the most commonly used tool to assess executive functioning,

therefore it is a necessary inclusion in any battery assessing the ecological validity of a

novel test of executive functioning. The WCST-64 is an abbreviated form of the full 128


card Wisconsin Card Sort Test (WCST), following the same administration instructions

as laid out in the full WCST manual (Heaton et al., 1993). Vayalakkara, Devaraju-

Backhaus, Bradley, Simco, and Golden (2000) demonstrated the high validity of the

WCST-64 in predicting scores on the full WCST, while simultaneously reducing

administration time. For theses reasons the WCST-64 was used.

Subjects were presented with four stimuli cards, varying on three patterned

dimensions: colour of items, form (shape) of items, and number of items. Subjects were

then instructed to sort response cards underneath each of the four stimuli cards, according

to their own judgment of the correct sorting pattern. Simple feedback as to whether they

are correctly sorting response cards was given to participants, according to a

predetermined set of rules that the subject is not aware of. After 10 correct responses, the

sorting rule was changed (sorting by colour, form, or number) without the subject’s

awareness. The test was complete when the subject has finished six sets of 10 correct

trials or the entire 64-card deck is used. Scores were calculated based on number of trials

to reach first category, number of errors made, number of sets completed, and failure to

maintain set errors (incorrect care placement after 5 consecutive correct placements)

(Heaton et al., 1993).

Ruff Figural Fluency Test. The RFFT (Ruff, Light, & Evans, 1987) is a figural

test of a person’s ability to shift cognitive set, use planning strategies, and their overall

executive functioning in coordinating these activities (Psychological Assessment

Resources [PAR], 2012). The RFFT consists of 5 different trials, each consisting of a

piece of paper with 40 squares, each containing an identical arrangement of 5 dots. The

first three trials have the dots arranged identically, with trial 2 and 3 containing


interference patterns (lines or other shapes). Trials 4 and 5 each contain different

arrangements of 5 dots with no interference patterns. Subjects were instructed to create as

many different “patterns” or “figures” as they could by connecting 2 or more of the dots

in each of the squares. They were then allowed to complete a practice trial before each of

the test trials, consisting of 3 squares identical to the 40 squares in the testing trial.

Subjects were corrected following the sample trial if any errors were made (i.e. two or

more identical patterns). They then completed as many different patterns or figures as

possible in one minute for each of the trials. Scores were calculated based on the number

of different patterns provided, and the number of perseverations (or repeated patterns).

Virtual Reality Office Task. The VROT is a brief VR task based on the WCST.

Participants were instructed to imagine that they were working for a courier company and

it was their job to deliver packages to the correct rooms inside an office building.

Participants were given a game pad in order to navigate the virtual office building. They

were instructed to approach a shipping cart filled with packages to pick up a new

package, and then approach the door they believe the package should be delivered to

pressing the appropriate button on the game pad to deliver the virtual package. For every

package delivered the computer prompted “CORRECT DOOR” or “INCORRECT

DOOR”. Regardless of their result subjects were to then return to the shipping cart to

pick up a new package and try to determine how to correctly deliver each of the rest of

the packages.

There are 4 doors that packages can be delivered to, all labeled with different

signs identifying the business inside. From left to right the rooms are: 401-410 (The

Doctor’s Office), 411-420 (Riverdale Florists), 421-430 (Shutterbugs Photography), and


431-440 (A Touch of Class Catering). Packages are labeled with either: the appropriate

room number, printed paraphernalia associated with the one of the 4 rooms, or the exact

sign/logo as is shown on the doors. There is no time limit for this task, participants must

continue until the simulation is complete (45-50 packages per simulation).

Scores were calculated based on the total number of packages delivered, the

number of correct deliveries, number of incorrect deliveries, perseveration errors (subject

making more than 2 consecutive incorrect deliveries), and failure to maintain set (subject

making an incorrect delivery following 3 consecutive correct deliveries).

It is believed that the more immersive VR environment in the VR Office Task

will provide increased verisimilitude (similarity to every day tasks) and be more

ecologically valid than traditional neuropsychological tests. As such, it is expected that

this improved ecological validity will translate to better detection of the subtle executive

dysfunctions that many mTBI patients complain about following their expected recovery

phase.

Procedure

Demographic information was collected at the onset of testing for all participants.

Injury characteristics for TBI patients were derived from patient clinical reports. Subjects

were then administered a neuropsychological battery containing the tests discussed

above. Following completion of the traditional tests subjects were familiarized with the

VR controller and instructed to complete the VR Office Task.

Statistical Analysis

All test data was analyzed using the Shapiro-Wilk test separately for both injured

patients and non-injured controls. Those scores that were found to have normal


distributions in both injured and non-injured groups were compared using two-tailed t-

test. Those scores that were found to have non-normal distributions were compared using

the Mann-Whitney U test. Effect sizes were computed using Cohen’s d for all test scores.

All analyses were preformed using SPSS 20.0.0.

Results & Discussion

Table A2 compares the TBI and normal controls normed and raw scores for all

tests of executive functioning, using two-tailed t-tests and Mann-Whitney U tests. The

scores indicate the majority of traditional neuropsychological tests were not able to

significantly differentiate TBI patients from normal controls. This is not surprising, given

that our patient group had a mean of 409 days (SD = 77.5, range = 278 – 477) well past

the 3 month period of typical recovery seen in mild TBI patients (Rohling et al., 2011).

At this point in recovery, mild TBI patients should be cognitively indistinguishable from

non-injured controls subjects. Any further subtle executive dysfunctions would likely go

unnoticed by these traditional tests. However, many patients persist in their complaints of

diminished abilities in every day life well past this recovery period (Konrad et al., 2011;

Alves, et al., 1993; Ruffolo et al., 1999; Hanlong et al., 1999). The lack of ecological

validity in traditional neuropsychological tests likely accounts for this disparity between

the testing environment, where it appears the patient is fully recovered, and the patients’

every day life, where they are faced with ongoing cognitive problems. These findings

agree with past research demonstrating that traditional tests of executive function poorly

distinguish normal control subjects from mild TBI patients (Cockburn, 1995; Levin,

Goldstein, Williams, & Eisenberg, as cited in Lezak et al., 2004, p. 618; Ord et al., 2010).


These findings (that performance on the majority of neuropsychological tests of

executive functioning showed no significant difference between control subjects and TBI

patients) provide partial support for hypothesis 1. Of the traditional tests of executive

functioning used, only the difference between TBI patients (M = -1.10, SD = 0.83) and

control subjects (M = -0.02, SD = 1.27) z scores for TOL execution time reached

significance; U = 19.5, p = 0.030. The TOL is notable for its focus on assessing

prospection, the ability to plan ahead to reach a specified goal and all steps in between.

Unterrainer et al. (2004) found that increased execution time in the TOL was associated

with decreased planning ability. It may be that one of the main subtle deficits that TBI

patients face is decreased prospection, or preplanning, leading to a significantly longer

execution time during the TOL even after the expected recovery time.

In contrast to the majority of traditional executive function tests, the novel VR

Office Task implemented was able to significantly distinguish TBI patients from normal

control subjects (“incorrect deliveries”; U = 11.0, p = 0.022; “failure to maintain set”; U

= 7.50, p = 0.009; “perseverations”; U = 30.0, p = 0.002). It seems that the virtual

environment in the VR task approximates a hypothetical real life environment and

situation, thereby increasing the ecologically validity of the test (verisimilitude).

Essentially, the delivery of differently labeled packages is a more likely and realistic

scenario of a patient’s real life than sorting various cards by some vague rule set (as in

the WCST). This improved ecological validity is elucidating the subtle cognitive

dysfunctions of the TBI patient group that were missed by a battery of traditional

neuropsychological tests. The finding that the virtual environment produced significant


differences between control subject and TBI patient performance on the VR Office Task

provides support for hypothesis 2.

One of the main limitations of this study is the extremely low sample size for the

TBI patient group (n=5). Looking more closely, one can see that the TBI patient sample

size is never greater than n=4 for any given test of executive dysfunction. As such, it is

pertinent to employ a statistic that is not as dependent on sample size as Student’s t and

the Mann-Whitney U tests used above – Cohen’s d effect size. Cohen’s d also provides

the added benefit of considering the magnitude of difference between patient and control

group performances, rather than simply investigating the significance of their differences

(Zakzanis, 2001).

Table A3 presents effect sizes and percent overlap in scores (Zakzanis, 2001)

between the TBI group and the normal control group. Percent overlap refers to the

fraction of the sample that overlaps in terms of performance on any given test – the lower

the percent overlap, the greater the magnitude of difference between the two groups.

Using Cohen’s (1988) heuristic benchmarks for interpreting the magnitude of effect sizes

(i.e. effect sizes of 0.2 are taken as small magnitude, 0.5 as medium, and 0.8 and up as

large) we see that several tests can be considered to have large effect sizes. Of the

traditional neuropsychological tests employed these include: Z score for TOL execution

time (d = 1.52, OL% = 28.8), and z-score for TOL number of time violations (d = 0.89,

OL% = 48.8). All scores collected from the VR Office Task were found to have large

effect sizes, including: incorrect deliveries (d = -2.33, OL% = 13.9), failure to maintain

set (d = -2.54, OL% = 11.4), and perseverations (d = -2.25, OL% = 15.0). TBI patients


tended to underperform compared to normal control subjects on the VR Office Task. The

rest of the executive tests had effect sizes below Cohen’s benchmark of a “large effect”.

As Zakzanis (2001) explained, lack of statistical significance does not necessarily

equate to lack of effect. The argument can be made that the reverse is also true –

statistical significance does not necessarily equate to the presence of an effect. These

findings are important as they demonstrate that not only are the differences between

control subject and TBI patient performances on the VR Office Task statistically

significant, but they’re also clinically significant. With an overlap between the two

groups’ performance on the VR task ranging between 11.4% and 15.0%, the VR Office

Task offers a potential clinical tool for discriminating between TBI patients and control

subjects following the expected recovery time. These findings provide support for

hypothesis 3.

There are several limitations to the current study. As was already mentioned

above, the TBI sample size was extremely low (n=5), and not every patient was able to be

tested on all tests of executive dysfunction resulting in a range of sample sizes from n=3

to n=4. This extremely limited sample size throws the results of any comparisons of

means tests (i.e. t-test, Mann-Whitney U test) into question. Further studies should aim to

acquire a larger sample size. Furthermore, mainly due to our limited pool of TBI patients,

not all TBI patients studied were suffering from a mild TBI; we therefore pooled all TBI

patients under the umbrella “TBI” rather than accounting for severity. It would be

essential in the future to analyze a larger sample of TBI patients, stratifying analysis by

TBI severity. The current study also saw a TBI group who had noticeably higher average

age, greater composition of males, and greater average level of education than the


counterpart control subject group. In the future the disparity between the two groups’

demographics should be reduced. Finally, in an effort to better evaluate the ecological

validity of the VR Office Task, TBI patient performance on it should be compared to

their level of functioning in the real world (i.e. ability to return to work, cook, drive, etc.).

Conclusions

The VR Office Task is an ecologically valid tool that has the clinical potential to

assess the presence of ongoing executive dysfunction in TBI patients, otherwise

undetected by traditional neuropsychological tests. Further study is needed in order to

increase sample size, study the relationship between performance on the VR Office Task

and severity of TBI, and better assess its ecological validity in terms of patient ability to

function in the real world.


References

Alexander, M. P. (1995). Mild traumatic brain injury: Pathophysiology, natural history,

and clinical management. Neurology, 42, 1253-1260.

Alves, W., Macciocchi, S. N., & Barth, J. T. (1993). Postconcussive symptoms after

uncomplicated mild head injury. Journal of Head Trauma Rehabilitation, 8(3),

48-59.

Bond, M.R. (1990). Standardized methods of assessing and predicting outcome. In

Rosenthal, M., Bond, M.R., Griffirth, E.R., & Miller, J.D. (Eds.), Rehabilitation

of the adult and child with traumatic brain injury (2nd ed.). Philadelphia: Davis.

Burgess, P. W., Alderman, N., Evans, J., Emslie, H., & Wilson, B. A. (1998). The

ecological validity of tests of executive function. Journal of the International

Neuropsychological Society, 4, 547-558.

Burgess, P. W., Alderman, N., Forbes, C., Costello, A., Coates, L. M., Dawson, D. R.,

…, & Channon, S. (2006). The case for the development and use of “ecologically

valid” measures of executive function in experimental and clinical

neuropsychology. Journal of the International Neuropsychological Society, 12,

194-209.

Campbell, Z., Zakzanis, K. K., Jovanovski, D., Joordens, S., Mraz, R., & Graham, S. J.

(2009). Utilizing virtual reality to improve the ecological validity of clinical

neuropsychology: An fMRI case study elucidating the neural basis of planning by

comparing the Tower of London with a three-dimensional navigation task.

Applied Neuropsychology, 4, 295-306.


Chan, R. C. K. (2005). Sustained attention in patients with traumatic brain injury.

Clinical Rehabilitation, 19, 188–193.

Chaytor, N., & Schmitter-Edgecombe, M. (2003). The ecological validity of

neuropsychological tests: A review of the literature on everyday cognitive skills.

Neuropsychology Review, 13(4), 181-197.

Chaytor, N., Schmitter-Edgecombe, M., & Burr, R. (2006). Improving the ecological

validity of executive functioning assessment. Archives of Clinical

Neuropsychology, 21, 217-227.

Christiansen, C., Abreu, B., Ottenbacher, K., Huffman, K., Masel, B., & Culpepper, R.

(1998). Task performance in virtual environments used for cognitive

rehabilitation after traumatic brain injury. Archives of Physical and Medical

Rehabilitation, 79, 888-892.

Conboy, T.J., Barth, J., & Boll, T.J. (1986). Treatment and rehabilitation of mild and

moderate head trauma. Rehabilitation Psychology, 31(4), 203 – 215.

Cockburn, J. (1995). Performance on the Tower of London test after severe head injury.

Journal of the International Neuropsychological Society, 1, 537-544.

Cubertson, W.C., & Zillmer, E.A. (2001). Tower of London: Drexel University (TOL-

DX): Test manual. Toronto, Canada: Multi Health Systems.

Dikmen, S. S., Corrigan, J. D., Levin, H. S., Machamer, J., Stiers, W., & Wesskopf, M.

G. (2009). Cognitive outcome following traumatic brain injury. Journal of Head

Trauma Rehabilitation, 24(6), 430-438.


Franzen, M. D., & Wilhelm, K. L. (1996). Conceptual foundations of ecological validity

in neuropsychology. In: Sbordone, R. J., and Long, C. J. (eds.), Ecological

Validity of Neuropsychological Testing, GR Press/St. Lucie Press, Delray Beach,

FL, pp. 91–112.

Hanlon, R. E., Demery, J. A., Martinovich, Z., & Kelly, J. P. (1999). Effects of acute

injury characteristics on neuropsychological status and vocational outcome

following mild traumatic brain injury. Brain Injury, 13(11), 873-887.

Hartikainen, K. M., Waljas, M., Isoviita, T., Dastidar, P., Liimatainen, S., Solbakk, A. K.,

…, Ohman, J. (2010). Persistent symptoms in mild to moderate traumatic brain

injury associated with executive dysfunction. Journal of Clinical and

Experimental Neuropsychology, 32(7), 767–774.

Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., & Curtiss, G. (1993). The

Wisconsin Card Sorting Test Manual Revised and Expanded. Odessa, FL:

Psychological Assessment Resources, Inc.

Henry, J. D., & Crawford, J. R. (2004a). A meta-analytic review of verbal fluency

performance following focal cortical lesions. Neuropsychology, 18(2), 284–295.

Henry, J. D., & Crawford, J. R. (2004b). A meta-analytic review of verbal fluency

performance in patients with traumatic brain injury. Neuropsychology, 18(4),

621–628.

Johansson, B., Berglund, P., & Ronnback, L. (2009). Mental fatigue and impaired

information processing after mild and moderate traumatic brain injury. Brain

Injury, 23, 1027–1040.


Kang, Y. J., Ku, J., Han, K., Kim, S. I., Yu, T. W., Lee, J. H., & Park, C. I. (2008).

Development and clinical trial of virtual reality-based cognitive assessment in

people with stroke: Preliminary Study. CyberPsychology & Behavior, 11(3), 329-

339.

Konrad, C., Geburek, A. J., Rist, F., Blumenroth, H., Fischer, B., Husstedt, I., Arolt, V.,

…, & Lohmann, H. (2011). Long-term cognitive and emotional consequences of

mild traumatic brain injury. Psychological Medicine, 41, 1197-1211.

Levin, H. S., Goldstein, F. C., Williams, D. H., & Eisenberg, H. M. (1991). The

contribution of frontal lobe lesions to the neurobehavioral outcome of closed head

injury. In H.S. Levin et al. (Eds.), Frontal lobe function and dysfunction. New

York: Oxford University Press.

Levin, H. S., Mattis, S., Ruff, R. M., Eisenberg, H. M., Marshall, L. F., Tabaddor, K., et

al. (1987). Neurobehavioral outcome following minor head injury: A three-center

study. Journal of Neurosurgery, 66(2), 234–243.

Lezak, M. D., Howieson, D.B., & Loring, D.W. (2004). Neuropsychological assessment,

4th edn., Oxford: Oxford University Press.

Maddocks, D., & Saling, M. (1996). Neuropsychological deficits following concussion.

Brain Injury, 2, 99-103.

Manchester, D., Priestley, N., & Jackson, H. (2004). The assessment of executive

functions: Coming out of the office. Brain Injury, 18(11), 1067-1081.


Mathias, J. L., Bigler, E. D., Jones, N. R., Bowden, S. C., Barrett-Woodbridge, M., &

Brown, G. C. (2004). Neuropsychological and Informational processing

performance and its relationship to white matter changes following moderate and

severe traumatic brain injury: A preliminary study. Applied Neuropsychology, 11,

134–152.

Morey, L. C. (1991). The Personality Assessment Inventory. Odessa, FL: Psychological

Assessment Inventory.

Nolin, P. (2006). Executive memory dysfunctions following mild traumatic brain injury.

Journal of Head Trauma Rehabilitation, 21(1), 68–75.

Norris, G., & Tate, R. L. (2000). The Behavioural Assessment of the Dysexecutive

Syndrome (BADS): Ecological, concurrent, and construct validity.

Neuropsychological Rehabilitation, 10(1), 33-45.

O’Bryant, S. E., Engel, L. R., Kleiner, J. S., Vasterling, J. J., & Black, F. W. (2007). Test

of Memory Malingering (TOMM) trial 1 as a screening measure for insufficient

effort. The Clinical Neuropsychologist, 21(3), 511-521.

O’Bryant, S. E., Gavett, B. E., McCaffrey, R. J., O’Jile, J. R., Huerkamp, J. K.,

Smitherman, T. A., & Humphreys, J. D. (2008). Clinical utility of trial 1 of the

Test of Memory Malingering (TOMM). Applied Neuropsychology: Adult, 15(2),

113-116).

Ord, J. S., Greve, K. W., Bianchini, K. J., & Aguerrevere, L. E. (2010). Executive

dysfunction in traumatic brain injury: The effects of injury severity and effort on

the Wisconsin Card Sorting Test. Journal of Clinical and Experimental

Neuropsychology, 32(2), 132–140.


Ponsford, J., Willmott, C., Rothwell, A., Cameron, P., Kelly, A., Nelms, R., …, Ng, K.

(2000). Factors influencing outcome following mild traumatic brain injury in

adults. Journal of the International Neuropsychological Society, 6, 568-579.

Psychological Assessment Resources, Inc. (2012). RFFT (Ruff Figural Fluency Test).

Retrieved from http://www4.parinc.com/Products/Product.aspx?ProductID=RFFT

Rabin, L. A., Barr, W. B., & Burton, L. A. (2005). Assessment practices of clinical

neuropsychologists in the United States and Canada: A survey of INS, NAN, and

APA division 40 members. Archives of Clinical Neuropsychology, 20, 33-65.

Rohling, M. L., Binder L. M., Demakis, G. J., Larrabee, G. J., Ploetz, D. M., &

Langhinrichsen-Rohling, J. (2011). A meta-analysis of neuropsychological

outcome after mild traumatic brain injury: Re-analysis and reconsiderations of

Binder et al. (1997), Frenchman et al. (2005), and Pertab et al. (2009). The

Clinical Neuropsychologist, 25(4), 608-623.

Rothbaum B., Hodges L. F., Kooper R., Opdyke, D., Williford, J. S., & North M. (1995).

Effectiveness of computer-generated (virtual reality) graded exposure in the

treatment of acrophobia. American Journal of Psychiatry, 152, 626–28.

Ruff R. M. (1996). Ruff Figural Fluency Test: Professional manual. Lutz: Psychological

Assessment Resources, Inc.

Ruff R. M., Light, R. H., & Evans, R. W. (1987). The Ruff Figural Fluency Test: A

normative study with adults. Developmental Neuropsychology, 3(1), 37-51.

Ruffolo, C. F., Judith, F. F., Deirdre, R. D., Colantonio, A., & Linday, P. H. (1999). Mild

traumatic brain injury from motor vehicle accidents: Factors associated with

return to work. Archives of Physical Medicine and Rehabilitation, 80, 392-398.


Seymour N. E., Gallagher A. G., Roman S. A., O’Brien, M. K., Bansal, V. K., Andersen,

D. K., Satava, R. M. (2002). Virtual reality training improves operating room

performance: Results of a randomized, double-blinded study. Annals of Surgery,

236, 458–63.

Shallice, T. (1982). Specific impairments of planning. Philosophical Transactions of the

Royal Society of London: Biological Sciences, 298, 199-209.

Schultheis, M. T., Himelstein, J., & Rizzo, A. A. (2002). Virtual reality and

neuropsychology: Upgrading the current tools. Journal of Head Trauma

Rehabilitation, 17(5), 378-394.

Slick, D. J., Tan, J. E., Strauss, E. H., & Hultsch, D. F. (2004). Detecting malingering: a

survey of experts’ practices. Archives of Clinical Neuropsychology, 19, 465-473.

Teasdale, G., Jennett, B. (1974). Assessment of coma and impaired consciousness: A

practical scale. The Lancet, ii, 81-84.

Tombaugh, T. N. (1996). Test of Memory Malingering (TOMM). North Tonawanda, NY:

Multi Health Systems.

Trepagnier C. G. (1999). Virtual environments for the investigation and rehabilitation of

cognitive and perceptual impairments. Neurorehabilitation, 12, 63–72.

Unterrainer, J. M., Rahm, B., Kaller, C. P., Leonhart, R., Quiske, K., Hoppe-Seyler, K.,

. . . Halsband, U. (2004). Planning abilities and the Tower of London: Is this task

measuring a discrete cognitive function? Journal of Clinical and Experimental

Neuropsychology, 26(6), 846-856.


Vayalakkara, J., Devaraju-Backhaus, S., Bradley, J. D. D., Simco, E. D., & Golden, C. J.

(2000). Abbreviated form of the Wisconsin Card Sort Test. International Journal

of Neuroscience, 103, 131-137.

Voller, B., Benke, T. Benedetto, K., Schnider, P., Auff, E., & Aichner, F. (1999).

Neuropsychological, MRI and EEG findings after very mild traumatic brain

injury. Brain Injury, 13(10), 821-827.

Wilson, B. A. (1993). Ecological validity of neuropsychological assessment: Do

neuropsychological indexes predict performance in everyday activities?. Applied

& Preventive Psychology, 2, 209-215.

Wilson P. N., Foreman N., & Stanton D. (1997). Virtual reality, disability and

rehabilitation. Disability and Rehabilitation, 19, 213–20.

Zakzanis, K. K. (2001). Statistics to tell the truth, the whole truth, and nothing but the

truth: Formulae, illustrative numerical examples, and heuristic interpretation of

effect size analyses for neuropsychological researchers. Archives of Clinical

Neuropsychology, 16, 654-667.

Ziino, C., & Ponsford, J. (2006). Vigilance and fatigue following traumatic brain injury.

Journal of the International Neuropsychological Society, 12, 100–110.


Appendix A

Table A1. Demographic Information

Characteristic n Mean Standard Deviation Minimum Maximum

Age

Control 30 20.1 5.67 18 49

TBI 5 40.0 26.5 20 83

Education

Control 30 12.9 1.41 11 16

TBI 5 15.6 4.34 12 22

Gender

Control M/F 15/21 N/A N/A N/A N/A

TBI M/F 5/0 N/A N/A N/A N/A

Note. M, Male; F, Female


Table A2. Comparison of Mean Performance on Tests

Test No TBI TBI n M(SD) n M(SD) p

WCST Number of Errors Z-score* 29 0.37(0.98) 4 0.28(0.43) N.S. Categories Completed ‡ 29 3.69(1.27) 4 4.25(2.22) N.S. Trials to First Category ‡ 29 13.52(6.28) 4 14.00(4.08) N.S. Failure to Maintain Set ‡ 29 0.34(0.77) 4 0.50(0.58) N.S. RFFT Unique Designs Z-score ‡ 30 -1.23(1.20) 4 -1.87(0.43) N.S. Error Ratio Z-score ‡ 30 -0.26(0.91) 4 -0.48(1.04) N.S. TOL Total Move Z-score ‡ 30 -0.12(1.25) 4 -0.73(0.78) N.S. Total Correct Z-score ‡ 30 -0.02(1.27) 4 -0.58(0.73) N.S. Total Initiation Time Z-score ‡ 30 0.50(0.83) 4 -0.03(0.52) N.S. Total Execution time Z-score ‡ 30 -0.02(0.69) 4 -1.10(0.83) <0.05 Total Time Z-score ‡ 30 -0.25(0.68) 4 -0.78(0.50) N.S. Number of Time Violations Z-score ‡ 30 0.00(0.74) 4 -0.63(0.19) N.S. Type 2 Violations Z-score ‡ 30 0.13(0.43) 4 0.25(0.50) N.S. VR Office Task Total Incorrect ‡ 30 0.73(0.94) 3 3.33(2.52) <0.05 Failure to Maintain Set ‡ 30 0.50(0.68) 3 2.33(1.16) <0.01 Perseverations ‡ 30 0.00(0.00) 3 0.33(0.58) <0.01 Note. N.S., not significant. * two tailed t-test ‡ two tailed Mann-Whitney U test


Table A3. Effect Sizes and Percent Overlap With Regards to Test Performance.

Test Minimum Maximum d Overlap % WCST Number of Errors Z-score -2.10 2.30 0.10 92.3 Categories Completed 1 6 0.41 72.0 Trials to First Category 10 37 -0.08 93.8 Failure to Maintain Set 0 3 -0.21 84.6 RFFT Unique Designs Z-score -3.38 0.78 0.56 63.7 Error Ratio Z-score -1.76 1.36 0.24 82.7 TOL Total Move Z-score -2.40 1.73 0.50 66.6 Total Correct Z-score -1.87 2.13 0.45 69.6 Total Initiation Time Z-score

-0.40 3.20 0.65 59.4

Total Execution time Z-score

-2.00 1.33 1.52 28.8

Total Time Z-score -2.00 0.80 0.79 53.0 Number of Time Violations Z-score

-2.13 0.53 0.89 48.8

Type 2 Violations Z-score 0 2 -0.27 80.7 VR Office Task Incorrect Deliveries 0 6 -2.33 13.9 Failure to Maintain Set 0 3 -2.54 11.4 Perseverations 0 1 -2.25 15.0 Note. d, Cohen’s d effect size

Assessing the Utility of a Virtual Reality Test of Executive Dysfunction on Traumatic Brain Injury Patients - Matthew R. J. Vandermeer

Documents

tbi patients difficulties

severe tbi patients

mild tbi mtbi

vast majority of mtbi

mtbi recovery

patients level of performance

neuropsychological testing

vr tests