Ghent University Faculty of Psychology and Educational Sciences Second year Master of Science in Psychology Theoretical and Experimental Psychology Second exam period To Go or Not to Go: Differences in Cognitive Reinforcement Learning Master thesis to obtain a degree as master of science in Psychology, in the field of Theoretical and Experimental Psychology. Michiel Van Boxelaere 00700889 Promoter: Prof. Dr. Tom Verguts Supervisor: Dr. Filip Van Opstal Department of Experimental Psychology 13-08-2013
68
Embed
To Go or Not to Go: Differences in Cognitive Reinforcement ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ghent University
Faculty of Psychology and Educational Sciences
Second year Master of Science in Psychology
Theoretical and Experimental Psychology
Second exam period
To Go or Not to Go: Differences in Cognitive Reinforcement
Learning
Master thesis to obtain a degree as master of science in Psychology,
in the field of Theoretical and Experimental Psychology.
Michiel Van Boxelaere 00700889
Promoter: Prof. Dr. Tom Verguts
Supervisor: Dr. Filip Van Opstal
Department of Experimental Psychology
13-08-2013
[2]
[3]
Abstract
Background. Psychologists have long suggested that the procedural learning system and the
declarative learning system are engaged under different circumstances (Poldrack & Packard,
2003). Researchers have indicated an important involvement of dopamine and the striatum in
procedural trial and error learning (Yin & Knowlton, 2006). It remains largely unclear whether
learning from feedback occurs similarly in more declarative memory tasks, thought to rely on
the medial temporal lobe and the hippocampus (Squire, 1992).
Objective. In the current study we want to investigate whether individual differences in
learning from positive or negative feedback differs between tasks that rely on declarative
memory cortices and tasks that rely on cortices involved in habit formation.
Methodology. To address this research question we adopted two well established procedural
learning tasks (Frank, Seeberger, & O'reilly, 2004) and compared decision making performance
on these tasks with feedback learning performance on two versions of a newly developed
explicit declarative memory task.
Hypothesis. We hypothesized that participants who learn better from positive feedback in one
task, will also learn better from positive feedback in another task.
Results. We observed a general bias to learn better from positive feedback in the declarative
learning tasks, but not in the procedural learning tasks. Participants who learned better from
negative feedback during procedural tasks were more likely to learn better from positive
feedback in the explicit declarative memory tasks. These results suggest a different functional
role for the declarative and procedural memory system in learning from negative or positive
feedback.
[4]
Acknowledgements
First of all my special thanks go out to my promoter, Dr. Tom Verguts, for
introducing me to the research field of reinforcement learning and providing me the
necessary input and guidance to complete this master thesis.
Secondly, I would also like to thank my supervisor, Dr. Filip van Opstal, for
guiding me while programming the experiments.
Thirdly, special thanks goes out to Jan Van Boxem and my partner Fien Van
Boxem for providing me the necessary feedback to improve various stylistic aspects of
this thesis and for general support and guidance during the process of writing.
Some thoughts go out to my stepbrother, Jochen Pichal, and my uncle, Geert
Van Boxelaere, who passed away while writing this thesis, teaching me to put the little
‘big’ problems in perspective.
General thanks goes out to my family and friends for giving me the opportunity
to develop myself as a person, both personally and professionally, and giving me the
TABLE 1: Inter-individual variability across learning tasks……………………………………………………….39
TABLE 2: Descriptive statistics of the probabilistic selection task……………………………………………41
TABLE 3: Descriptive statistics of the transitive inference task ……………………………………………….42
TABLE 4: Descriptive statistics of the transitive inference task, controlled for awareness…….…44
TABLE 5: Descriptive statistics of the one shot learning tasks (1 & 2)…………………………………… 45
TABLE 6: Rank correlations of test performance across tasks………………………………………………….47
TABLE 7: Rank correlations of bias rates across tasks………………………………………………………….....49
TABLE 8: Rank correlations of bias rates across tasks, controlled for awareness TI-task……….…49
[9]
1. Introduction
In order to increase the likelihood of survival and reproduction, organisms have to
flexibly adapt to a constantly changing environment. To flexibly interact with the environment
requires an adaptive learning system that dynamically distinguishes between good, bad, novel,
relevant and irrelevant stimuli in different environmental contexts (Sugrue, Corrado, &
Newsome, 2005). When confronted with an external stimulus, organisms do not only have to
decide whether this stimulus is potentially harmful or beneficial for its preservation. They also
have to decide to act or not based on the expected outcome of this action behavior, so the
brain processes relevant for decision making not only have to encode signals related to values
of alternative options, they also have to be able to recall past experiences and store new
experiences to guide future decision making behaviors. These processes involve adaptive trial
and error learning and flexible memory updating (Daw, Yael, & Dayan, 2005; Gläscher, et al.,
2010).
1.1. Adaptive learning and memory.
In general, learning is defined as “a relatively permanent change in behavior based on
an organism’s interactional experiences with the environment” (Robbins, 1998, p.41).
Relatively permanent, because memories tend to get lost or changed over time. Therefore,
adaptive learning critically involves memory processes to make predictions concerning the
positive or negative outcomes following decision making behavior based on previous
experiences.
Memory is generally referred to as “the processes that are used to acquire, retain and
later on retrieve learned information” (Baddeley, Eysenck, & Anderson, 2009, p.5). These
memory processes are traditionally categorized into different ‘memory systems’ (Squire, 1992)
regarding to how long information is retained (Baddeley, 2001) or whether information
concerning events or cognitive skills are recalled deliberate and consciously (explicitly) or
automatic and unconsciously (implicitly) (Graf & Shacter, 1985; Milner, et al., 1998). Despite
the extensive amount of research dedicated to explore the neural underpinnings of multiple
memory systems (reviewed in Squire, 2004), together with growing evidence from animal
(White & McDonald, 2002), fMRI (Poldrack, et al., 2001) and patient studies (Knowlton,
Mangels, & Squire, 1996) concerning the important role of certain brain regions1 in specific
1 There is, for example, strong evidence for an important role of the striatum and connected basal ganglia (BG) structures in (implicit) procedural learning and habit formation (Yin & Knowlton, 2006)
[10]
memory subtypes, it is still poorly understood how value-related information is integrated with
stored knowledge about past experiences across different memory systems. These research
questions concerning how value-related choice behaviors are tuned by past experiences, are
typically studied by reinforcement learning theories (Sutton & Barto, 1998).
1.2.4 Computational models of reinforcement learning: Instrumental Conditioning. I. What about active decision making?
The learning principles of the Rescorla-Wagner model and the Temporal Difference
model as described above hold true whenever associations are made between environmental
states that are fixed in such a way that agents do not influence them by voluntary actions (i.e.,
classical conditioning). But, in order to functionally adapt to the environment it is not only
important to predict rewarding outcomes of different environmental states, it is also essential
to decide whether to behaviorally act or not based on the expected outcome of this action
behavior (i.e., instrumental conditioning). Behavioral theories of optimal action-based decision
making have long suggested that organisms are more likely to perform specific actions when
the expected outcome is rewarding (Thorndike, 1911; Skinner B. F., 1935). On the other hand,
if expected outcomes are punishing, organisms are less likely to perform these actions
(Thorndike, 1911; Skinner B. F., 1935). Importantly, these early theories of decision making do
not address how we determine which particular action, from a series of previous sequential
2 In the case of immediate primary reinforcement, the associative weight between a CS and the US is
near its maximum (and ɣ is thus near 1), whereas in the case of long-delayed primary reinforcement, the associative weight between CS and US is near its minimum (and ɣ is thus near 0).
[14]
actions, should get credit for a positive or negative outcome, this issue is formally known as
the credit-assignment problem (Sutton & Barto, 1998).
II. The Actor-Critic framework.
Models of reinforcement learning efficiently solved the credit-assignment problem by
providing a two-process Actor-Critic learning system of instrumental conditioning (Barto,
Sutton, & Anderson, 1983; Barto, 1995). According to this framework one component, the
critic, uses a temporal difference prediction error signal to evaluate and update possible
actions and environmental states in terms of predictions of future rewards (Barto, 1995). The
other component, the actor, uses a similar prediction error signal to learn preferences for each
action in each environmental state and selects these actions, based on the evaluations
provided by the critic, that are associated with greater long-term reward (Barto, 1995). In
other terms the critic learns and stores values concerning the surrounding environmental
states (i.e., temporal-difference learning), which allows the actor to select and update
preferred actions (Sutton & Barto, 1998). In the actor, an action is strengthened (or weakened)
when immediately followed by a positive (or negative) prediction error (Barto, 1995).
Accordingly, the critic is involved in both classical and instrumental conditioning, whereas the
actor only applies to instrumental conditioning (O'Doherty, et al., 2004)
III. Model-free (habit) learning vs. Model-based (goal-directed) learning .
The actor-critic architecture of action selection in instrumental conditioning is closely
related to, in psychology, ‘habit’ (procedural) learning (Dickenson & Balleine, 2002) or, in
computational terms, ‘model free’ learning (Daw, Yael, & Dayan, 2005). In model free (habit)
learning approaches, through trial and error learning, associations between an organism’s
actions and outcomes are stored in a prediction error signal summarizing its long-term value
without specifying the nature of the outcome. This learning approach has the advantage of
being not susceptible to outcome devaluation at the cost of inflexibility (Daw, Yael, & Dayan,
2005). Model free learning approaches closely interact with, but are distinguished from, more
Pessiglione, et al., 2006; Schultz, Dayan, & Montague, 1997); we expect that participants who
learn better from positive feedback in one task, will also learn better from positive feedback in
another task. We hypothesize that this is more so for tasks that rely on the same memory
cortices (implicit procedural learning tasks) when compared to tasks that rely on different
memory cortices (implicit vs explicit learning task).
[28]
3. Method
3.1 Materials and Methods
3.1.1. Participants
Thirty healthy first year (5 male and 25 female) bachelor students in psychology
participated in this study on two separate testing sessions (2 tasks per session). Participants
received partial credits for participation in the experiment after they completed both sessions.
Two participants (1 male and 1 female) were excluded from analysis since because they did
not show up for either the first and/or the second session, and thus, did not complete the full
experiment.
3.1.2. Stimuli and Apparatus
We made use of Dell computers (Windows XP) with 17 inch monitors. Participants
faced the monitor at an approximate distance of 50 cm. E-prime 1.1 software was used for
programming the different tasks in the experiment and developing the stimuli (Schneider,
Eschman, & Zuccolotto, 2002). During all four tasks participants had to choose between stimuli
appearing in pairs (left and right) on the screen. During the two implicit procedural memory
tasks stimuli consisted of Japanese Hiragana characters (Frank, Seeberger & O'reilly, 2004),
whereas standardized pictures of well known objects, tools and fruits were used during the
two explicit memory tasks (Brodeur, et al., 2010). Stimuli were randomized across subjects and
tasks. Responses were registered using the keyboard. Participants had to press key 1 or key 0
to select the left or right stimuli, respectively. All stimuli appeared in color (pictures) or in black
font (Hiragana) on a white background.
3.1.3. General Procedure
Each participant performed four learning tasks over two separate testing sessions (two
in the first session and two in the second session). There were at least 72 hours between
testing sessions to avoid potential learning effects across sessions. All four tasks had a two-
alternative forced choice procedure, where participants had to choose one of two stimuli on
the computer screen by pressing one of two keys on the keyboard. Some stimuli had a
negative reinforcement value, whereas others had a positive reinforcement value. There were
two implicit learning tasks (i.e., a probabilistic selection task and a transitive inference task)
and two explicit learning tasks (two versions of a one shot learning task). The order of the tasks
was randomized both within and between sessions. But, each session contained one implicit
learning task en one explicit learning task (Fig.2).
[29]
Figure 2. Example of randomized task order for a single subject. Tasks were randomized within and between sessions across subjects. There was always one implicit task and one explicit task in each session.
3.2 Implicit procedural learning tasks
3.2.1. Probabilistic Selection Task
I. Stimuli
During the probabilistic selection task (adopted from Frank, Seeberger, & O'reilly,
2004), pairs of visual stimuli that are not easily verbalized were used (Japanese Hiragana
characters, Fig.3). Following a fixation cross (duration 1000ms), Hiragana stimuli were shown
in black on a white background in 72 pt font. Responses were registered using key “1” (left on
the keyboard) to select the left stimulus or key “0” (right on the keyboard) to select the
stimulus on the right. Visual feedback appeared (duration 1.5 sec) following a choice. Either
the word “Correct!” or the word “Incorrect!” appeared centrally on the screen in green or red,
respectively (Courier New, pt 48). If no response was registered within six seconds, the words
“no response detected” appeared centrally in black (Courier New, pt 18).
II. Procedures
The probabilistic selection task consisted of two phases. Following a practice block,
which consisted of 10 trials, a learning phase was superimposed. During the learning phase
three different pairs of stimuli (AB, CD and EF) appeared randomly on the screen. Feedback
was given after each trial in a probabilistic manner (Fig.3A). Choosing stimulus A in the AB pair
led to positive feedback in 80% of AB trials, whereas choosing stimulus B led to negative
feedback in these trials. The CD and EF pairs were less reliable. Choosing stimulus C led to
positive feedback in 70% of CD trials and choosing stimulus E led to positive feedback in 60 %
of EF trials. Over the course of the learning phase, participants learned to choose A, C and E
above B,D and F. To make sure participants learned the correct associations between stimuli
and feedback, a performance evaluation had to be met before advancing to the next phase.
Evaluation occurred following each learning block of 60 trials. Because of the different
probabilistic nature of each stimulus pair, different performance criteria were chosen. In the
Session 1
Probalistic Selection Task
One Shot Learning Taks
(Version 2)
Session 2
One Shot Learning Taks
(Version 1)
Transitive Inference Task
[30]
AB pair, participants had to choose A above B at least in 65% of the trials. In the CD pair, C had
to be chosen above D in 60% of the trials . In the last pair, stimulus E had to be chosen in 50%
of the trials8. Participants advanced to the test phase if all these criteria were met or after six
learning blocks (360 trials). During the test phase (Fig.3B), training pairs and new pairs were
randomly shown on the screen without feedback. New pairs contained all other possible
combinations of stimuli (AC, DF, BE, …). Participants were instructed to instinctively choose
when confronted with novel pairs. Each test pair was presented six times.
3.2.2 Transitive Inference Task
I. Stimuli
During the (implicit) transitive inference task (Frank, Rudy, Levy, & O'Reilly, 2005), the
same type of visual stimuli (Japanese Hiragana characters) were used as in the probabilistic
selection task. To avoid confusion and confounding learning effects different characters were
used across the probabilistic selection task and the transitive inference task. Both the order
and the content of the Hiragana characters were counterbalanced. Fixation, stimulus
presentation and feedback presentation was exactly the same as in the probabilistic selection
task.
8 Note that stimulus E is correct in 60% of EF trials, which is particularly difficult to learn. We
implemented a 50% (chance level) performance criteria to ensure that participants who consequently choose the slightly more incorrect stimulus F over E cannot go through to the testing phase.
Training
A B 80% 20%
C D 70% 30%
E F 60% 40%
Test
Choose A? Avoid B?
AC AD AE AF
BC BD BE BF
Figure 3. (A) Example of the stimulus pairs (Hiragana) used during the probabilistic selection task. One pair was shown per trial. In actuality, stimuli were randomized across participants. (B) During test, all other combinations of pairs, together with all training pairs, appeared randomly. During test no feedback was given (not shown in this example). Performance was analyzed on all new pairs containing A (positive learning) or B (negative learning).
A B
[31]
II. Procedures
During the (implicit) transitive inference task, participants had to learn an underlying
ordinal sequence of stimuli (A>B>C>D>E) based on separate pairs of adjacent elements in the
sequence (AB, BC, …). During this task four pairs of stimuli (Fig.4A) are presented (A+B-, B+C-,
C+D- and D+E-). The + and - characters represent positive and negative feedback, respectively.
Again, as in the probabilistic selection task, participants had to get through a learning segment
before advancing to the testing segment. In the learning segment there were three phases of
blocked trials. The first phase consisted of eight (random) blocks of four trials. Per block a
stimulus pair is shown during four trials. Thus, the first block could for example consist of four
A+B- trials, the second block could consist of four C+D- trials and so on. Phase two consisted of
16 (random) blocks of two trials. The third phase was the performance evaluation phase, in
which 32 trials of pairs were randomly shown on the screen, still with feedback.
Associative Strength
Hypothesis
Test
A B C D E
Postive Associative value
Training
A+ B -
B+ C-
C+ D-
D+ E-
AB BC
CD DE
AE BD
Top
Novel Bottom
Figure 4. (A) Example of the adjacent stimulus pairs (Hiragana) used during the transitive inference task. Stimuli were randomized across participants and differed from the probabilistic selection task. (B) Example of associative strength hypothesis (Rudy, Frank, & O'Reilly, 2003), during training participants implicitly learn to strongly associate A with positive reinforcement. In contrast, E becomes associated with a lack of positive reinforcement, previously shown to induce dopamine dips (Schultz, 2002).These net associative values then “bleed over” to the other adjacent pairs, such that B in the BC pair has a stronger positive association, whereas D in the DC pair has a stronger negative association(Rudy, Frank, & O'Reilly, 2003),. (C) During test, all training pairs and two novel ‘transitive pairs’ were randomly presented eight times each. Performance was analyzed on top pairs AB & BC (positive learning) and bottom pairs CD & DE (negative learning).
A B
C
[32]
To make sure that participants learned the correct associations between stimuli and
feedback we have set a performance criteria at an accuracy level of 75% before participants
advanced to the test segment. In this segment all training pairs and two new transitive pairs
(AE and BD) were randomly shown eight times each, without feedback (Fig.4C). Following the
transitive inference task, participants were given a questionnaire to assess their explicit
awareness of the logical hierarchy of the stimuli, and to determine whether strategies were
used to respond to the novel test pairs.
3.3 Explicit episodic memory tasks
3.3.1 One Shot Learning Task (version 1 & 2)
I. Stimuli
In the one-shot learning tasks, a different set of stimuli was used. Instead of using
unknown Japanese Hiragana characters that are relatively difficult to verbalize, highly
recognizable standardized pictures of well known objects, tools and fruits were used (Brodeur,
et al., 2010). Following fixation (duration 1 sec), a cue9 (A) appeared (160x160 pixels) centrally
on the screen above the fixation cross. After 2 seconds a pair of target stimuli (BC) appeared
(160x160 pixels) left and right underneath the cue (A) on the screen. Responses were
registered using key “1” to select the left stimulus or key “0” to select the stimulus on the
right10. Because participants had only a single trial to learn the correct stimulus-stimulus
association, time constraints to make a choice were omitted. Visual feedback was provided
(duration 1.5 sec) after a choice was made. Either the word “Correct!” or “Incorrect!” was
printed centrally on the screen in green or red, respectively (Courier New, pt 48).There were
144 unique pictures of objects , tools or fruits used per task. Both stimulus content and order
was counterbalanced.
II. Procedures
During the one shot learning tasks there was a learning segment (Fig.5A), which
consisted of two learning blocks of 24 trials, followed by a memory retrieval test phase that
also consisted of two blocks of 24 trials. During the learning segment, participants only had
one shot at learning to match the cue (A) with one of two target stimuli (B or C). Following
their choice, positive or negative feedback appeared randomly on the screen. After 24 learning
trials there was a break, before the next learning block of 24 trials started. We presented the
9 Cues were implemented in the one shot learning tasks to promote and facilitate (explicit) declarative
associative learning and retrieval (Buckner, et al., 1995; Squire, Knowlton, & Musen, 1993) 10
Because we wanted to make sure that participants learned (to avoid or approach) about the chosen stimuli, no opportunity was given to look back at the correct stimuli when subject responded incorrectly.
[33]
same two blocks of 24 trials to the participants in the memory retrieval phase, without
feedback (Fig.4B). The order of trials, as well as the location of the target stimuli, were
randomized within each block. Both one shot learning tasks were exactly the same, though
different sets of stimuli were used across both tasks. Both tasks were never in the same
session (Fig.2).
3.4 Data Analysis
3.4.1 Probabilistic Selection Task
I. Data filtering
Since we were mainly interested to what extent subjects learned from positive and
negative feedback following their choices, we firstly had to make sure participants learned the
basic task. Although the performance criteria were implemented to resolve this issue, some
participants performed worse on the training pairs during the testing segment in comparison
with the learning segment. To overcome this issue, we excluded participants who did not
1000ms
1500ms
2000ms
2 Blocks (24 trials)
+
Incorrect!
Training
Figure 5. (A) Example of a learning trial during training. Each trial was only presented once. When a choice was made, stimuli disappeared and feedback was given. (B) Example of a correctly solved test trial, the procedure was the same as during training with the exception that feedback was omitted.
A
B Test
1000ms
2000ms
2 Blocks (24 trials)
+
?
[34]
perform better than chance during the test phase in the easiest training pair conditions (AB
pair). We rationalized that if these participants could not reliably choose A above B in this pair,
results in novel pairs were meaningless. By filtering the data in this manner, three participants
were excluded because they did not perform better than chance level (50%) in the easiest
training pair.
II. Test Pair Analysis
We firstly wanted to test whether there were systematic differences across subjects in
learning from positive reinforcement (choose A) versus learning from negative reinforcement
(avoid B) in this task. To test whether there were any differences as ascribed above, we
performed a paired sample Student t-test. The degree to which participants learned from
positive reinforcement (choosing A) was operationalized as the performance level on the novel
pairs involving A (AC, AD, AE and AF)(Fig.3B). Comparatively, the degree to which participants
learned from negative reinforcement (avoiding B) was operationalized as the performance
level on the novel pairs involving B (BC, BD, BE and BF)(Fig 3B.). We measured effect sizes using
Cohen’s d, where d ≥ 0.2 represents a small effect, d ≥0.5 represents a medium effect and d ≥
0.8 represents a large effect.
III. Training Analysis
In the learning phase of the probabilistic selection task, different performance criteria
were implemented to make sure participants learned the correct stimulus-reinforcement
associations. If participants failed to reach these performance criteria, they had to do the
learning phase again until they finally reached the performance criteria. Thus, some
participants performed more learning trials than others. We therefore checked, using general
linear model regression analysis with continuous measures, whether general test performance,
performance on the easiest training pair (AB) and performance on either choosing A or
avoiding B could be explained by the number of learning trials. For regression analysis we
measured effect sizes using eta squared (η2) , where η2 = 0.01 represents a small effect, η2 =
0.06 represents a medium effect and η2 = 0.14 or larger represents a large effect.
IV. Session Analysis
Some participants performed the probabilistic selection task in the first session and the
transitive inference task in the second session, whereas other participants did it the other way
around. To minimize learning effects between these two visually similar tasks, we made sure
there were 72 hours between sessions. Nevertheless, it is possible that there were some non-
specific transfer effects across sessions that are unrelated to the particular nature of each task.
[35]
We therefore checked, using general linear model regression analysis with the categorical
between-subjects variable session as predictor, whether general test performance,
performance on the easiest training pair (AB) and performance on either choosing A or
avoiding B could be explained by which session participants were in.
3.4.2 Transitive Inference Task
I. Data filtering
As in the probabilistic selection task (PS11), we excluded participants who did not
perform better than chance on the easiest training pairs (AB & DE) during the testing segment.
As a result, we filtered out two participants who did not perform better than, on average, 50%
across anchor pairs AB & DE. Analysis described below apply to the remainder of the
participants (n = 26).
II. Test Pair Analysis
Similarly to the PS-task, we investigated whether there were systematic differences
across subjects in learning from positive feedback versus learning from negative feedback.
Similar to previous studies using an implicit version of the transitive inference task and in
accordance with the associative strength hypothesis12, we rationalized that stimuli at the top
of the hierarchy (A, B) have net positive associations, whereas stimuli at the bottom of the
hierarchy (D, E) have net negative associations (Frank, O'Reilly, & Curran, 2006; Frank,
Seeberger, & O'reilly, 2004; Rudy, Frank, & O'Reilly, 2003). As a result, learning from positive
reinforcement ameliorates performance on the AB and BC pairs, while learning from negative
reinforcement ameliorates performance on the CD and DE pairs. Therefore, we
operationalized the degree to which participants learned from positive feedback as the
performance level on AB & BC pairs during test. Similarly, the degree to which participants
learned from negative feedback was operationalized as the performance level on CD & DE
pairs during test. We also checked whether subjects performed better on the ‘easier’ anchor
pairs (AB & DE) in comparison with inner pairs (BC & CD). Moreover, pairwise analysis between
separate training pairs during test were conducted to get a more detailed insight on the
11
For clarity reasons we will use the following abbreviations to refer to the different tasks: PS-task for the probabilistic selection task, TI-task for the transitive inference task and OSL1/OSL2-Task for the two versions of the one shot learning tasks. 12
According to the associative strength hypothesis, the top and bottom pairs, respectively AB and DE, “anchor” the development of associative values: During training agents implicitly learn to strongly associate A with positive reinforcement (since choosing A always induces positive feedback), while E becomes associated with strong negative reinforcement (since E always induces negative feedback). These net associative values then “bleed over” to the other adjacent pairs, such that B in the BC pair has a stronger positive association, whereas D in the DC pair has a stronger negative association, though B and D are positively (negatively) reinforced during half of the trials (see Rudy, Frank, & O'Reilly, 2003 for a detailed description on how this occurs).
[36]
pattern of previous results. To test differences between conditions, a paired sample Student t-
test was conducted. Novel test pairs AE and BD were analyzed separately since these pairs
could be solved by either learning to choose A and B or by avoiding D and E.
III. Training Analysis
As in the PS task, a performance criteria was implemented in the task. As a
consequence, some participants performed more learning trials when compared to others. We
checked, using general linear model regression analysis with continuous measures, whether
general test performance and performance top (AB & BC) or bottom (CD & DE) pairs could be
explained by the number of learning trials.
IV. Session Analysis
Comparatively to the PS task, we checked possible confounding effects of session using
general linear model regression analysis with categorical between-subjects variables. General
test performance and performance on top and bottom pairs were tested using the factor
session as predictor.
V. Awareness Questionnaire
After completing the transitive inference task participants were asked to fill in a
questionnaire (translated from Frank, Seeberger, & O'reilly, 2004) asking about the familiarity
with Japanese Hiragana characters and, importantly, the degree to which participants explicitly
became aware of the underlying hierarchy in the task. None of the participants indicated
having any experience with the Hiragana characters. Surprisingly, when asked whether
participants “had the impression that there was some kind of logical rule, order or hierarchy
between the symbols” (Frank, Seeberger, & O'reilly, 2004), 11 out of 28 participants indicated
becoming aware of the underlying order or hierarchy between the symbols. The remaining 17
participants did not notice the underlying hierarchy in the task. Because we are generally
interested in differences between learning from reinforcement across more implicit and more
explicit learning tasks, it would be interesting to see whether there are any differences in
learning performance between implicit and more explicit learning within one task. We
therefore reanalyzed the data checking whether the degree of awareness (implicit or explicit
learning) predicts differences in performance level in the transitive inference task. We tested,
using one-way between subjects ANOVAs, whether implicit learners differed significantly from
explicit learners on general test performance, performance on top pairs, bottom pairs and
novel pairs. Furthermore we carried out 2x2 ANOVAs for the between subjects factor group
(implicit, explicit) and the within subjects factor hierarchy (Top, Bottom) to check for
interaction effects between explicit/ implicit learners and positive (top) or negative (bottom)
[37]
learning. We also performed pairwise Student’s t-tests to check whether the previously
observed differences between anchor pairs and inner pairs remained significant for the implicit
and the explicit subgroup.
3.4.3 One Shot Learning Task (version 1 & 2)
I. Data filtering
We excluded participants who did not perform better than chance (50%) during the
testing phase. There was not a single participant who performed worse than chance level,
neither in the first OSL-task nor the second OSL task. Participants were instructed to associate
a cue-object and one of the target-objects based on feedback and to do this “as accurately as
possible”. As a consequence there were no specific time constraints during the learning or
testing blocks during the one shot learning tasks. To make sure participants did not take
advantage of this lack of time constraints to use all sorts of strategies during learning, we
checked reaction times during the learning segment. We excluded participants who had an
outlying average reaction time during the learning segment (M + 2SD). This was the case for
three participants in the first one shot learning task and for two participants in the second one
shot learning task (OSL task). Analysis described below apply to the remainder of the
participants (n = 25 for the first OSL task and n = 26 for the second OSL task).
II. Test pair Analysis
We used the same methodology in the one shot learning tasks as in the probabilistic
selection task and the transitive inference task. Again, we researched whether there were
systematic differences across subjects in learning from positive reinforcement versus learning
from negative reinforcement. To do so we tested, using paired sample Student t-tests,
whether there were differences across subjects between recognition accuracy following
positive feedback and recognition accuracy following negative feedback.
III. Session Analysis
Similar to previous tasks, we tested for possible confounding effects of session using
general linear model regression analysis with categorical between-subjects variables. General
test performance, recognition accuracy after positive feedback and negative feedback were
analyzed using the factor session as predictor.
3.4.4 Cross-task Comparisons
In general, we wanted to investigate how participants performed on learning from
positive and negative feedback across tasks. Previous studies have suggested that there is an
important genetic factor that determines the inter-individual variability in learning better from
[38]
either positive or negative feedback (Frank, D'Lauro, & Curran, 2007; Klein, et al., 2007).
Therefore, we hypotehsized that participants who learn better from positive feedback in one
task, also should learn better from postive feedback in another task. We expect that is more
the case for tasks that rely on the same memory cortices (implicit procedural learning tasks)
when compared to tasks that rely on different memory cortices (for example the PS-task
compard to the OSL1 task). To test this assumption we devided participants into two
subgroups (positive and negative learners, see Table 1) for each separate task, comparable to
Frank, D'Lauro, & Curran (2007). By doing so, we could check whether subgroups (positive
learners and negative learners) in one task, could predict value-related differences in other
tasks. We performed analysis across tasks to answer two main questions: (1) Do participants
who perform well on one task also perform well on another task? (2) Is the inter-individual
variability in biased feedback learning robust across tasks? To do so we examined how
between task performance was related within-sessions, between sessions and both within and
between implicit and explicit tasks.
1. Do participants who perform well on one task also perform well on another task?
To investigate whether the individual performance rank of participants was consistent
across tasks we conducted Spearman’s Rank Correlations on general test performance
between tasks in session 1 and 2, between tasks across sessions 1 and 2 and across implicit
and explicit tasks, regardless of session. To keep analysis across tasks as comparable as
possible, participants who were excluded from analysis in the separate task analysis were also
excluded from cross task-analysis. Results described below apply to the remainder of the
participants13 (n = 21).
2. Is the inter-individual variability in learning better from either positive or negative feedback
robust across tasks?
To investigate whether the positive or negative learning bias within subjects was
robust across tasks, we had to transform performance rates on positive and negative learning
conditions to a single score. We did so by simply subtracting performance rates of the negative
learning condition from performance rates of the positive learning condition. The range of this
bias rate is between (-1, 1). Negative bias rates represent subjects that learned better from
13
Three participants were excluded from the PS-task, two participants were excluded from the TI task. Three and two participants were excluded from the first and the second OSL task, respectively. Three participants were excluded in more than one task, which results in a total of seven participants that were excluded from cross-task comparisons on general test performance.
[39]
Subgroup Criterium Sample Size
Probabilistic Selection Task
Positive learners Choose A performance > Avoid B performance
n = 12
Negative learners
Avoid B performance > Choose A performance
n = 12
Transitive Inference Task Positive learners Top pair performance (AB & BC) > Bottom pair performance (CD & DE)
n = 10
Negative learners
Bottom pair performance (CD & DE) > Top pair performance (AB & BC)
n = 11
One Shot Learning Task (1)
Positive learners Performance after positive feedback > Performance after negative feedback
n = 16
Negative learning
Performance after negative feedback > Performance after positive feedback
n = 8
One Shot Learning Task (2)
Positive learners Performance after positive feedback > Performance after negative feedback
n = 18
Negative learners
Performance after negative feedback > Performance after positive feedback
n = 4
Table 1. Inter-individual variabillity across learning tasks in learning better from positive feedback compared to learning from negative feedback (positive learners) and learning better from negative feedback compared to learning from positive feedback (positive learners). Participants who scored equally well on positive and negative feedback were excluded since they do not add relevant information to the analysis derived from this classification.
[40]
negative feedback, whereas positive bias rates represent subjects that learned better from
positive feedback. Again we conducted Spearman’s Rank Correlations on the bias rate
between tasks in session 1 & 2, between tasks across sessions 1 & 2 and across implicit and
explicit tasks, regardless of session. Furthermore, we wanted to check whether biased learners
in one task could predict biased learning in the other tasks. To do so we examined whether the
factor group (Positive/Negative Learners), derived from the implicit PS task, could predict the
degree of bias rate in the other tasks, using general linear model regression analysis.
To avoid losing valuable information in the implicit and explicit cross-task analysis from
participants’ learning bias, we were less conservative in excluding participants. Contrary to
previous cross-task comparisons we only excluded participants that were relevant for the
between task comparisons14. This allowed us to have more power in separate between task
comparisons, although it makes comparisons across all tasks more difficult. Nevertheless, since
we are mainly interested in the relatedness of the bias rate between specific tasks, we argue
that is beneficial to have more power.
4. Results
4.1 Separate Task Analysis
4.1.1 Probabilistic Selection Task
Analysis across all subjects (Fig. 6A) on choose A performance compared to avoid B
performance (Table 2) did not show significant differences [t24 = -0.104, p = 0.918, two-tailed, d
= 0.03]. Training analysis did not show significant effects of the number of learning trials on
general test performance [F(1,23) = 0.058, p = 0.812, η2 < 0.001 ], AB pair performance [F(1,23)
= 0.238, p = 0.630, η2 = 0.01], general choose A performance [F(1,23) = 0.171, p = 0.683, η2 =
0.07] (Fig. 6B) or general avoid B performance [F(1,23) = 0.017, p = 0.898, η2 < 0.001](Fig. 6C).
Session analysis did not show significant effects of session on general test performance
[F(1,23) = 1.010, p = 0.325, η2 = 0.04], AB pair performance [F(1,23) = 0.446, p = 0.511, η2 =
0.01], general choose A performance [F(1,23) = 1.314, p = 0.263, η2 = 0.05] or general avoid B
performance [F(1,23) = 2.278, p = 0.145, η2 = 0.09].
14
For example we did not exclude these participants, previously excluded due to possible confounding strategy use in one of the OSL tasks, when correlating the PS task with the TI task.
[41]
n Mean SD
AB pairs
25
97.3%
5.63
Choose A 25 72% 23.79
Avoid B 25 72.6% 16.2
# Training trials
25 160 93.06
Table 2. Descriptive statistics of test performance during the probabilistic selection task on AB pairs, choose A, avoid B and training trial frequency.
PS Regression Analysis - Avoid B
0 60 120 180 240 30020
40
60
80
100
F(1,23) = 0.017, p= 0.898
Training Trials Frequency
Test P
erf
orm
ance %
Figure 6. (A) No significant differences were observed across subjects between choose A and avoid B test performance. Data are presented as Mean ± SEM (n.s.= non significant). (B) The number of learning trials did not significantly predict choose A performance. (C) Learning trial frequency did not significantly predict avoid B performance.
PS Regression Analysis - Choose A
0 60 120 180 240 30020
40
60
80
100
F(1,23) = 0.171, p= 0.683
Training Trials Frequency
Test P
erf
orm
ance %
Probalistic Selection Task
Choose A Avoid B50
60
70
80
90
100
n.s.
% P
erf
orm
ance T
est
A B
C
[42]
4.1.2 Transitive Inference Task
Across all subjects comparisons between top pairs AB & BC and bottom pairs CD & DE
(see Table 3 for descriptive statistics) did not show significant differences [t25 = 0.191, p =
0.850, two-tailed, d = 0.05 ] (Fig. 7A). Separate pair analysis on the performance level between
“anchor” pairs AB and DE showed no significant differences [t25 = 0.611, p = 0.547, two-tailed,
d = 0.17]. Accordingly, no significant difference was observed [t25 = -0.113, p = 0.811, two-
tailed, d < 0.01] between “inner” pairs BC and CD, when compared. Pairwise comparisons
between anchor pairs and inner pairs (Fig. 7B) did demonstrate a significant difference [t25
=3.757, p = 0.001, two-tailed, d = 0.7]. These results suggest that, on average, differences in
solving pairs correctly are mainly due to whether it is an inner or anchor pair rather than its
place (higher or lower) in the hierarchy.
Cross-subjects comparisons between novel pairs AE and BD (Fig. 7C) did not show
significant differences [t25 < 0.001, p > 0.05 , two-tailed, d < 0.01]. Furthermore, training
analysis did not demonstrate significant effects of the number of learning trials on general test
performance [F(1,24) = 3.573, p = 0.122, η2 = 0.096], general performance on top pairs AB &
BC [F(1,24) = 0.942, p = 0.341, η2 = 0.04] or general performance on bottom pairs CD & DE
[F(1,24) = 0.173, p = 0.681, η2 < 0.001]. Session analysis did not show significant effects across
subjects of the factor session on general test performance[F(1,24) = 0.887, p = 0.356, η2 =
0.036], performance on top pairs AB & BC [F(1,24) = 0.381, p = 0.543, η2 = 0.015] or on
performance on bottom pairs CD & DE [F(1,24) = 1.012, p = 0.325, η2 = 0.04].
n Mean SD
Top pairs AB & BC 26 87.9% 15.69
Bottom pairs CD & DE 26 87% 17.61
Anchor pairs 26 95.91% 14.13
Inner pairs 26 78.85% 21.32
Novel pair AE 26 93.75% 20.39
Novel pair BD 26 93.75% 19.76
# Training trials 26 118.15 19.77
Table 3. Descriptive statistics of test performance during the transitive inference task on top pairs, bottom pairs, anchor pairs, inner pairs, novel pairs and training trial frequency.
[43]
Top-Bottom Pairs TI
AB & BC CD & DE60
70
80
90
100n.s.
% P
erf
orm
ance T
est
Anchor-Inner Pairs TI
Anchor Inner60
70
80
90
100***
% P
erf
orm
ance T
est
Novel Pairs TI
AE BD60
70
80
90
100n.s.
% P
erf
orm
ance T
est
Figure 7. (A) No significant differences were observed across subjects between top and bottom pair test performance. (B) Participants scored significantly better (p = 0.001) on anchor pairs when compared to inner pairs. (C) No significant differences were observed across subjects between novel pairs AE and BD. Data are presented as mean ± SEM ( n.s.= non significant, ***= p<0.001).
When we divided participants15 into an implicit and explicit subclass based on the
awareness questionnaire, results indicated a significant difference between implicit and
explicit learners on general test performance [F(1,26) = 7.379, p = 0.012, η2 = 0.22].
Participants who explicitly learned the hierarchy between symbols generally performed better
during test relatively to participants who learned implicitly (see Table 4 for descriptive
statistics). This effect was still significant without the two, previously excluded, weak
performers16 [F(1,24) = 5.722, p = 0.025, η2 = 0.19]. Further analysis (Fig.8) suggests that the
above mentioned differences are driven by positive learning, since explicit learners perform
significantly better when compared to implicit learners on top pairs AB & BC [F(1,24) = 8.937, p
< 0.01, η2 = 0.27], but not on bottom pairs CD & DC [F(1,24) = 1.635, p = 0.213, η2 = 0.06] or
novel pairs AE & BD F(1,24) = 0.973, p = 0.334, η2 = 0.04]. However, both the explicit learning
group and the implicit learning group did not perform significantly better on top pairs when
compared to bottom pairs [t10 = 1.341, p = 0.209, d = 0.48 and t14= -0.246, p = 0.809, d = 0.11,
respectively]. Furthermore, no significant interaction effect for group x hierarchy was observed
[F(1,24) = 0.553, p = 0.464, η2 = 0.02 ].
15
Note that we firstly included all participants in the awareness analysis to check whether there was a general difference in performance between implicit or explicit learners. Both participants who were initially excluded from analysis indicated that they were not aware (implicit group) of the underlying order in the task. Since we do not know whether the low performance of these participants is related to either implicit learning or to a general confusion during the testing phase, we tested differences between implicit and explicit learners both with and without them. 16
These participants were again excluded for further analysis, to make sure that effects are driven by differences between implicit and explicit learning, not by outlying values (possibly due to test confusion) in the implicit group.
A B
C
C
[44]
Group comparisons between implicit and explicit learners (Fig .8) did not show a
significant difference on anchor pairs [F(1,24) = 1.045, p = 0.317, η2 = 0.043]. However, we did
observe a significant difference between implicit and explicit learners on inner pairs [F(1,24) =
8.555, p < 0.01, η2 = 0.26]. Similar to previous analysis, implicit learners performed significantly
better on anchor pairs when compared to inner pairs [t14 = 3.757, p = 0.002, two-tailed, d=
1.47]., but no significant difference was observed between performance on anchor pairs and
inner pairs for the explicit learning group [t10 = 1.627, p = 0.135, two-tailed, d = 0.65].
n Mean SD
General performance Explicit learners (aware of the hierarchy)
11 95.8% 7.34
Implicit learners (not aware of the hierarchy)
17 80.98% 16.90
Performance on top pairs AB & BC
Explicit learners 11 97.16% 4.30
Implicit learners 15 81.22% 17.18
Performance on inner pairs
Explicit learners 11 90.91% 15.65
Implicit learners 15 70.83% 18.25 Table 4. Descriptive statistics of the transitive inference task , controlled for underlying hierarchy awareness.
AB
BC
CD D
E
50
60
70
80
90
100
Underlying Hierarchy Awareness TI-task
Explicit learners
Implicit learners
% P
erf
orm
ance T
est
Figure 8. Test performance on training pairs for participants who were aware (explicit learners) or unaware (implicit learners) of the underlying hierarchy in the transitive inference task. Data are presented as mean ± SEM.
[45]
Taken together these results clearly indicate that participants who explicitly learned
the underlying hierarchy between symbols generally perform better than participants who
implicitly learned the hierarchical relationship between symbols. This effect seems driven by
learning performance following positive feedback (top pairs), rather than learning following
negative feedback . The main difference between implicit and explicit learners in this task is a
difference in performance on the inner pairs (BC & CD). Explicit learners, like implicit learners,
do not only perform well on anchor pairs. They also have a high performance rate on inner
pairs relative to implicit learners.
Although these results seem very clear, they should be interpreted with caution since
observed effects are mostly due to a ceiling effect on test performance in the relatively small
explicit group. This high performance rate of explicit learners on all testing pairs could also
explain the lack of an interaction effect between group and top/bottom pair performance.
4.1.3. One shot learning Task (1)
Test pair analysis across all subjects demonstrated a significant difference between
recognition accuracy following positive feedback and recognition accuracy following negative
feedback [t24 = 2.152, p = 0.042, two-tailed, d = 0.47], with a bias to learn better from positive
feedback compared to learning following negative feedback (Table 5, Fig. 9A). Session analysis
did not show significant effects of session on general recognition accuracy [F(1,23) = 0.374, p =
0.547, η2 = 0.016], recognition accuracy following positive feedback [F(1,23) = 0.018, p = 0.896,
η2 < 0.001] or recognition accuracy following negative feedback [F(1,23) = 0.487, p = 0.492, η2
= 0.02].
One Shot Learning Task (1) n Mean SD
Performance after positive feedback 25 86.5% 11.17
Performance after negative feedback 25 80.4% 14.41
One Shot Learning Task (2)
Performance after positive feedback 26 92.3% 8.77
Performance after negative feedback 26 78.2% 15.42
Table 5. Descriptive statistics for the one shot learning tasks (1 and 2).
[46]
One Shot Learning Task (1)
Pos Fee
dback
Neg
Fee
dback
60
70
80
90
100
*%
Perf
orm
ance T
est
One Shot Learning Task (2)
Pos Fee
dback
Neg
Fee
dback
60
70
80
90
100
% P
erf
orm
ance T
est
***
Figure 9.(A) In the first version of the one shot learning task, participants performed, on average, significantly better following positive feedback when compared to performance following negative feedback. (B) This bias effect across subjects towards learning better following positive feedback was also observed in the second version of the one shot learning task, using different stimuli. Data are presented as Mean ± SEM (* = p< 0.05, *** = p < 0.001).
4.1.4. One shot learning Task (2)
Across subjects analysis of test pairs confirmed the previously observed results (Fig
9B). In the second OSL task participants again showed higher recall accuracy following positive
feedback relative to recall accuracy following negative feedback (Table 5) during test [t25 =
4.886, p < 0.001, two-tailed, d = 1.12]. Session analysis did not show significant effects of
session on general recognition accuracy [F(1,24) = 2.560, p = 0.123, η2 = 0.095] and recognition
accuracy following negative feedback [F(1,24) = 0.165, p = 0.689, η2 < 0.001]. We did observe a
significant effect of session on performance following positive feedback [F(1,24) = 5.661, p =
0.026, η2 = 0.19], where performance following positive feedback was better for participants
who did the second OSL task in the second session (M = 88.23%, SD = 10.55) compared to
participants who did the second OSL task in the first session (M = 95.76%, SD = 5.05).
4.2 Cross-Task Analysis
4.2.1. Relationships between session and tasks on general test performance?
I. Tasks Within and Between Sessions
General test performance between the implicit and explicit task conducted in the first
session showed no significant correlation (n=21, rs= 0.19, p= 0.410). Also, no significant
correlation was observed between general test performance on the implicit and explicit task in
the second session (n=21, rs =- 0.03, p= 0.883). There were no significant correlations between
A B
[47]
the implicit task performance rate in the first session and the explicit task performance rate in
the second session (n=21, rs =- 0.01, p= 0.964), which is also the case for the relationship
between the implicit task in the second session and explicit task in the first session(n=21, rs = -
0.03, p= 0.436). There was a marginally insignificant negative correlation between implicit task
performance rate across sessions (n=21, rs =- 0.41, p= 0.062). We did observe a significant
positive correlation between explicit task performance rate across sessions (n=21, rs = 0.52, p=
0.015). These results suggest that better test performers, when compared to the other
subjects, on one task within a session do not consistently perform better on the other task
within this session. However participants who perform better on an implicit task in one session
seemingly perform worse, when compared to the other participants, on the implicit task in the
other session. On the other hand, better performers on one explicit task in a given session are
better performers on the explicit task in the other session.
II. Implicit and Explicit tasks
Correlation coefficients between test performance rates across tasks are shown in
table 6. There was a significant correlation between OSL tasks(n=21, rs = 0.51, p= 0.018). No
significant correlations across tasks were observed between PS-OSL1 (p = 0.586), PS-OSL2 (p =
0.175 ), TI-OSL1 (p = 0.943), TI-OSL2 (p = 0.489) and TI-PS task (p= 0.620). When we controlled
for awareness in the TI task, correlations between both implicit tasks became smaller for
participants who were unaware of the underlying hierarchy in the TI-task (n=11, rs = -0.05, p=
0.989). Correlation coefficients became insignificantly larger between the TI-task and OSL
(1&2) tasks when we restricted analysis to participants who were explicitly aware of the
underlying hierarchy in the TI task (n=10, rs = 0.22, p= 0.540 and rs = 0.23, p= 0.520,
respectively). These results suggest that participants who perform better, compared to the
other subjects, on one OSL task will also perform better on the other OSL task. No such
relationship was observed between the other tasks.
PS TI OSL (1) OSL (2)
PS 1
TI .12 1
OSL (1) .13 .02 1
OSL (2) - .31 .16 .51* 1
Table 6. Spearman rank correlation coefficients between test performance rate across tasks. A significant positive correlation was observed between OSL tasks, indicating that participants who performed well on one OSL-task are likely to score well on the other OSL task (* = p < 0.05).
[48]
4.2.2. Inter-individual bias towards positive or negative learning across tasks?
To examine whether positive or negative learners are consistently biased towards,
learning better from positive or negative feedback, we firstly tested whether positive (or
negative) learners do in fact learn better from positive (or negative) feedback when compared
to negative (or positive) learners, using one-way between subjects ANOVAs. Indeed, positive
learners did learn better from positive feedback, when compared to negative learners during
the PS-task [F(1,22) = 22.510, p < 0.001, η2 = 0.51], the TI-task [F(1,19) = 7.788, p = 0.012, η2 =
0.029] and the OSL2 task [F(1,20) = 13.124, p = 0.002, η2 = 0.39], but not during the OSL1
task[F(1,22) = 3.786, p = 0.065, η2 = 0.15].
Similarly, negative learners learned better from negative feedback, when compared to
positive learners during the PS-task [F(1,22) = 11.467, p = 0.003, η2 = 0.34] and the TI-
task[F(1,19) = 6.44, p = 0.02, η2 = 0.25], but not during the OSL1 [F(1,22) = 4.104, p = 0.055, η2
= 0.16] and the OSL2 task [F(1,20) = 2.747, p = 0.131, η2 = 0.11]. Though, between group tests
in the OSL tasks do show a very clear trend towards significance and the lack of significance is
most likely due to the smaller sample size of the negative learning group within the explicit
memory tasks (see table 1). We should therefore be cautious to draw any broad conclusions
from these analysis. Nevertheless we used the implicit procedural learning task subgroup
classifications for further analysis to check whether they could predict value-related
differences in other tasks.
I. Tasks Within and Between Sessions
Correlations coefficients on learning-bias rates between implicit and explicit tasks were
not significant for the first session (n=21, rs= -0.12, p= 0.601) or the second session (n=21, rs= -
0.10, p= 0.656). We did observe a significant negative correlation between the implicit task
bias rates in the first session and the explicit task bias rates in the second session (n=21, rs =-
0.56, p= 0.009). This was not the case for the relationship between the implicit task in the
second session and the explicit task in the first session (n=21, rs= -0.03, p= 0.895).
Furthermore, no significant correlations were observed between explicit task bias rates
across sessions (n=21, r= -0.17, p= 0.463) or between implicit task bias rates across sessions
(n=21, r= 0.08, p= 0.722). These results suggest that participants who learned better from
negative feedback on the implicit task in the first session, would more likely learn better from
positive feedback on the explicit task in the second session. No such relationships were
observed within and between sessions for the other tasks.
[49]
II. Implicit and Explicit tasks
Correlation coefficients between bias rates across tasks are shown in Table 7.. No
significant correlations were observed across tasks. However, we did observe some trends
towards a significant negative correlation between the TI-task and both the first (n=23, rs= -
0.36, p= 0.09) and second (n=24, rs= -0.38, p= 0.068) OSL-Task. Results of performed
correlation analysis between the PS-task and both the first (n=23, rs= -0.02, p= 0.973) and
second (n=23, rs= -0.31, p= 0.148) OSL-tasks were less clear. Correlations between implicit
tasks showed a small insignificant positive correlation (n=23, rs= 0.10, p= 0.644), whereas
correlations between explicit tasks showed a small insignificant negative correlation(n=25, rs= -
0.13, p= 0.539).
PS TI OSL (1) OSL (2)
PS 1
TI .10 1
OSL (1) -.02 -.37 1
OSL (2) - .31 -.38 -.13 1
Table 7. Spearman rank correlation coefficients between bias rates across tasks. No significant correlations were observed between tasks. A positive correlation was observed between implicit procedural tasks and a relatively high negative correlation was observed between implicit procedural tasks and the second one shot learning task.
Again we controlled for awareness in the TI task. Correlations coefficients are
presented in Table 8. However, the large inter-individual variation between bias rates and the
low sample sizes across tasks make it very hard to validly interpret these results.
Table 8. Spearman rank correlation coefficients between bias rates across tasks, controlled for awareness in the TI-task. No significant correlations were observed between tasks. The relatively low sample sizes make it hard to draw valid conclusions following this analysis.
PS OSL (1) OSL (2)
rs =
TI-Aw p =
n =
.59
0.072
10
-.37
0.260
11
-.37
0.263
11
rs =
TI-nAw p =
n =
-.25
0.417
13
-.41
0.191
12
-.52
0.068
13
[50]
General linear model regression analysis using the predictor groupPS (Positive and
Negative learners), derived from the implicit PS-task (see table 1), showed no significant
effects on bias rates in the TI-task (Fig.10A) [F(1,23) = 0.006, p = 0.939, η2 < 0.001] or the OSL1
task [F(1,22) = 0.036, p = 0.852, η2 < 0.001]. Interestingly, we did observe a significant effect of
the predictor groupPS on bias rates in the OSL2 task [F(1,23)=5.908, p= 0.023, η2 = 0.2 ], where
Taken together these results suggest that participants who perform better following
positive feedback, when compared to performance following negative feedback, in one
implicit learning task will more likely show the same pattern of results in the other procedural
learning task when compared to more explicit memory task. Participants are more likely to
show the opposite pattern of results in the explicit memory tasks. Biased positive learners in
one of the implicit procedural tasks show a lower positive bias rate the more explicit memory
learning tasks, when compared to biased negative learners. However most of our cross-task
comparisons show no or very small significant effects, most likely due to our relatively small
sample sizes across tasks. Further investigations using bigger sample sizes and perhaps
different implicit tasks (i.e., how implicit is the transitive inference task?) are necessary to
confirm this pattern of results.
[51]
OSL (2) Bias rate
Pos PS-L
earn
ers
Neg
PS-L
earn
ers
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Bia
s R
ate
OSL (2) Bias rate
Pos TI-L
earn
ers
Neg
TI-L
earn
ers
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6B
ias R
ate
TI Bias rate
Pos
PS-L
earn
ers
Neg
PS-L
earn
ers
-0.5
-0.4
-0.3
-0.2
-0.1
-0.0
0.1
0.2
0.3
0.4
0.5
0.6B
ias
Rate
PS Bias rate
Pos
TI-Lea
rner
s
Neg
TI-L
earn
ers-0.5
-0.4
-0.3
-0.2
-0.1
-0.0
0.1
0.2
0.3
0.4
0.5
0.6
Bia
s R
ate
Figure 10. Regression analysis on bias rates using subgroups (positive and negative learners) derived from the probabilistic selection task and the transitive inference task as predictor. (A) Positive and negative PS-learners did not predict differences on inter-individual bias rates in the transitive inference task. (B) Positive and negative TI-learners also did not predict differences on inter-individual bias rates in the probabilistic selection task. (C) Positive PS-learners had significantly lower positive bias rates when compared to negative PS-learners’ bias rates in the second OSL task. (D) Positive TI-learners had significantly lower positive bias rates when compared the bias rates of negative TI-learners in the second OSL task.
A B
C D
[52]
5. Discussion
In the current study we investigated whether individual differences in learning from
positive or negative feedback differs between tasks that rely on declarative memory cortices
and tasks that rely on cortices involved in habit formation. Recent research on the neural bases
of making choices following feedback has mainly focused on the role of dopamine and the
striatum. Collectively, these studies pointed out a crucial role of midbrain dopamine neurons
and their striatal targets for learning to predict reward (Daw, Yael, & Dayan, 2005, ; Delgado,
et al., 2000; Frank, et al., 2004; Holroyd & Coles, 2002; Pessiglione, et al., 2006; Schultz, et al.,
1997). These findings concerning the role of the dopaminergic-striatal circuitry in
reinforcement learning are postulated in a prediction-error signal that guides choices by
updating value representations following repeated experience of feedback (Schultz, et al.,
1997; Hollerman & Schultz, 1998; Holroyd & Coles, 2002; Sutton & Barto, 1998). This allows an
organism to use previous experiences to optimize choices when confronted with a similar
situation.
However, since organisms are rarely confronted with the same environment, decisions
made in the past may not repeat themselves. Instead, novel choices mostly involve new
options and contexts which requires a flexible integration of knowledge from the past with
novel information. Previous investigations on flexibly generalizing knowledge from the past to
guide choices in novel situations have illuminated an important role of the declarative memory
system in the medial temporal lobe (Eichenbaum, 2000; Hassabis, et al., 2007; Shohamy &
Adcock, 2010; Squire, 1992). To investigate how individuals differently learn to make optimal
decisions following feedback, we adopted the probabilistic selection task designed by Frank et
al. (2004). We compared optimal decision making performance following positive and negative
feedback in this task with (1) performances on an implicit version of the transitive inference
task and (2) performances on two versions of a declarative memory task.
5.1 Learning within the procedural memory system.
5.1.1. The Probabilistic Selection Task
Besides its usage to investigate the underlying mechanisms of procedural learning
processes, the probabilistic selection task has frequently been used to research inter-individual
variability in learning more from good choices than from bad choices (Frank, et al., 2004;
Lighthall, et al., 2013; Simons, Howard, & Howard, 2010). Our results from the probabilistic
selection task showed no differences in learning performance following positive feedback
relative to learning performance following negative feedback, when averaged across subjects.
[53]
As expected, when distinguishing between a positive learner and negative learner subgroup17
similar to Frank and colleagues’ 2005 and 2007, we did observe that positive learners learned
significantly better following positive feedback relative to negative learners. Accordingly,
negative learners learned significantly better following negative feedback when compared to
positive learners. These results are in line with previous investigations that used the
probabilistic selection task with young and healthy subjects (Frank, et al., 2005, 2007; Klein, et
al., 2007; Simons, Howard, & Howard, 2010).
A previous study used the probabilistic selection task and the implicit transitive
inference task with PD-patients to investigate individual differences in reinforcement learning
(Frank, Seeberger, & O'reilly, 2004). In this experiment it was assumed that the probabilistic
selection task and the implicit transitive inference task rely on the same neural processes,
namely the basal ganglia. Results suggested that inter-individual biases in feedback-learning
are a direct consequence of higher or lower levels of dopamine that differently affect striatal
synaptic changes (Frank, et al., 2004).
Phasic burst of dopamine, following positive feedback, excite D1 receptors in the direct
pathway which induces long term potentiation (LTP18) in striatal Go cells (Holroyd &
bursts of dopamine inhibit the indirect pathway via D2 receptors which induces long-
term depression (LTD19) in striatal No-Go cells (Calabresi, et al., 1997). Short
dopaminergic drops below baseline that follow negative feedback have the opposite
effect, i.e., dissuading LTP and LTD in striatal Go and No-Go cells, respectively
(Calabresi, et al., 1997; Frank, 2005; Holroyd & Coles, 2002; Nischi, et al., 1997; Schultz,
2002). Consequently, high or low levels of dopamine biases the go or no-go pathway to
be more active with better learning from positive or negative feedback as a behavioral
output (Frank, et al., 2004).
17
We adopted this distinction from (Frank, Woroch, & Curran, 2005). Positive learners were operationalized as those participants who performed better on choosing A trials compared to performance on avoiding B trials, whereas negative learners were operationalized as those participants who performed better on avoiding B trials compared to performance on choosing A trials (Frank et al., 2005; 2007). 18
Long term potentiation (LTP) is an activity dependent change in the strength of synapses, mediated by NMDA receptors. As a result of pre- and postsynaptic co-activation, a wide range of local biochemical changes strengthen the synaptic connectivity. It is widely accepted that the LTP process can be interpreted as the cellular correlate of associative learning and memory formation in general (Fedulov, et al., 2007; Whitlock & al, 2006). 19
Long term depression (LTD).is the cellular mechanism of synaptic weakening. The difference between LTD and LTD, although not entirely the same, lies in the magnitude of calcium signals in the postsynaptic cell. LTD can be, to some extent, be seen as the cellular correlate of forgetting (Foy, 2001)
[54]
These assumptions of Frank’s Go-NoGo model, together with findings regarding a
genetic factor of biased feedback learning, led us to the hypothesis that the inter-individual
range of learning better from positive or negative feedback, observed in the probabilistic
selection task, will closely relate to the pattern of results in the implicit transitive inference
task (Frank, et al., 2004; Klein, et al., 2007). In our experiment, results showed only a very small
(rs = 0.10) insignificant correlation between bias rates from the probabilistic selection task and
bias rates from the implicit transitive inference task, suggesting that learning from positive and
negative feedback across these tasks might not be modulated by the same underlying
mechanisms.
5.1.2. The Transitive Inference task.
The transitive inference task has previously been used to study higher-order reasoning,
where organisms have to learn a hierarchical structure of stimuli based on the inferences that
are drawn from adjacent pairs in an ordinal sequence (Dusek & Eichenbaum, 1997; (Van
Opstal, Verguts, Orban, & Fias, 2007). In common terms, this means that participants learn to
logically infer that Vincent is taller than Eden, based on the premises that Vincent is taller than
Kevin and Kevin is taller than Eden. Studies, using different modifications of the transitive
inference task with both animals and humans, have suggested that this task importantly
involves the hippocampus (Dusek & Eichenbaum, 1997; Greene, et al., 2006; Van Opstal, et al.,
2007). However, a recent study challenged the assumption of a necessary involvement of the
hippocampus in transitive inference tasks. This study demonstrated that participants with a
temporally disrupted hippocampus, due to the benzodiazepene midazolam, showed enhanced
transitive inference performance by fully recruiting the dopamine-striatal learning system
(Frank, O'Reilly, & Curran, 2006). These results were in line with their previously proposed
associative strength hypothesis, which explains how organisms transitively infer associations
According to the associative strength hypothesis, outer pairs (AB,DE) at the top
or bottom of an underlying hierarchy “anchor” the development of associative values.
Over consecutive trials, agents implicitly learn to associate A with positive
reinforcement, because choosing A always leads to positive feedback. In contrast,
choosing E becomes associated with negative reinforcement, because choosing E
always induces negative feedback. These net associative values than ‘transfer’ these
associative values to the inner adjacent pairs (BC,CD). As a result, B in the BC pair has a
[55]
stronger positive association, whereas D in the DC pair has a stronger negative
association, though B and D are positively (negatively) reinforced during half of the
trials (Rudy, Frank, & O'Reilly, 2003).
Although other researchers questioned some of the pharmacological assumptions of
Frank’s midazolam study (see Green, 2007 and Frank, et al., 2008 for comment and reply), it is
agreed upon that the transitive inference task can be solved by both explicit ‘declarative’
strategies and implicit ‘procedural’ strategies (Green, 2007; Van Elzakker, et al., 2003; Rudy,
Frank, & O'Reilly, 2003). This reason might explain the lack of a significant correlation in biased
feedback learning between the probabilistic selection task and the transitive inference task in
our study.
Indeed, when we controlled20 for explicit and implicit strategies in the TI-task, a
substantial part of the participants (more than 1/3) indicated using an explicit strategy to solve
the transitive inference task. When we took the results of our questionnaire into account,
results indicated that explicit learners performed better than implicit learners. Our data
suggest that this effect could largely be explained by better performances of explicit learners
on inner pairs. Furthermore, results from the implicit learner group suggest that implicitly
learning the underlying hierarchy is driven by positive and negative associative values in the
outer pairs, since implicit learners significantly scored better on ‘outer pairs’ (AB,DE) relative to
‘inner pairs’ (BC,CD). This pattern of results is in line with the associative strength hypothesis
(Rudy, Frank, & O'Reilly, 2003).
Regarding to our research question, no significant differences were observed across
subjects between performance following positive feedback (top pairs, AB & BC) relative to
performance following negative feedback (bottom pairs, CD & DE), neither for explicit learners,
nor implicit learners. However, one interesting finding was that explicit learners performed
significantly better on top pairs, but not on bottom pairs, when compared to implicit learners.
This finding suggests that the performance level of explicit learners is more likely driven by
learning following positive feedback rather than by learning following negative feedback.
5.2 Learning across the procedural and the declarative memory system.
In line with the results of Frank and colleagues’ midazolam study, results of animal
studies, using spatial navigation tasks, have indicated that inactivating one learning system
(e.g., hippocampus) improves performances on tasks related to the other learning system (e.g.
20
We adopted a translated version of the questionnaire used in Frank and colleagues’ 2004 to control for explicit awareness of the underlying hierarchy in the transitive inference task.
[56]
striatal processing). These results have provided strong evidence for a bidirectional
dissociation between the ‘declarative’ memory system and the ‘procedural’ memory system,
indicating that both memory systems, respectively supported by the hippocampus and the
striatum, competitively interact under some circumstances (Frank, O'Reilly, & Curran, 2006;
Lee, et al., 2008; Poldrack & Packard, 2003).
However, a very recent lesion study , using a associative reinforcement learning task,
rather than a spatial navigation task, suggested that striatal processing might be a prerequisite
for declarative associative learning following reinforcement. Results of this study showed that
rodents with striatal lesions had impaired procedural and declarative-like memories, whereas
rodents with hippocampal lesions had impaired declarative-like memories, but spared
procedural memories (Jacquet, et al., 2013). These results suggest that striatal processing
might be necessary for decision making following feedback in declarative memory tasks.
5.2.1. Comparing results with the ‘episodic-like’ one-shot learning tasks.
In our study we wanted to investigate whether decision making from feedback differed
between procedural learning tasks and declarative learning tasks within subjects. We
presumed that participants who learn better from positive feedback in a procedural task will
show the same learning bias in the declarative memory tasks.
First, dopamine modulates learning from positive or negative feedback (Holroyd &
Coles, 2002; Schultz, et al., 1997). Second, it has been shown that dopaminergic projections to
the striatum and the hippocampus modulate cellular learning in both regions (Frank, et al.,
2004; Frey, 1990; Huang & Kandel, 1995). Third, there is strong evidence that learning from
reinforcement is directly or indirectly modulated by striatal processing, shown to be a
prerequisite to learn declarative memory tasks (Frank M. J., 2005; Pessiglione,et al., 2006;
Jacquet, et al., 2013). Fourth, previous investigations using performances following
probabilistic feedback to examine error-processing in a recognition memory task found results
consistent with our hypothesis (Frank, et al., 2007).
Surprisingly, when we directly compared implicit and explicit tasks, results indicated
that participants who learned better from negative feedback in the implicit procedural tasks
were more likely to learn better from positive feedback in the explicit declarative memory
tasks, when compared to participants who learned better from positive feedback in the
implicit procedural task. This opposite learning bias across implicit and explicit learning tasks
was especially true for the second version of the explicit memory task and was seemingly
driven by a general bias towards learning from positive feedback during explicit memory tasks.
In both one shot learning tasks we observed that subjects had significantly better recognition
[57]
accuracy on trials previously followed by positive feedback when compared to trials previously
followed by negative feedback. It could be argued that these effects are task specific.
However, a similar learning bias for explicit learners in the transitive inference task suggests
otherwise.
One possible explanation for these results could be that dopamine plays a functionally
different modulatory role in hippocampal-based associative learning, compared to its role in