This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Cite as: J. Neurosci 2020; 10.1523/JNEUROSCI.2586-19.2020
Received: 21 October 2019Revised: 10 March 2020Accepted: 28 April 2020
This Early Release article has been peer-reviewed and accepted, but has not been throughthe composition and copyediting processes. The final version may differ slightly in style orformatting and will contain links to any extended data.
Alerts: Sign up at www.jneurosci.org/alerts to receive customized email alerts when the fullyformatted version of this article is published.
1
Dopamine modulates dynamic decision-making during foraging 1
Abbreviated title: Dopamine modulates dynamics of foraging 2
3
Campbell Le Heron1,2,3, Nils Kolling4,5, Olivia Plant4, Annika Kienast4, Rebecca Janska4, 4
Yuen-Siang Ang1,4, Sean Fallon4,6, Masud Husain1,4,5*, Matthew A J Apps4,5,7* 5
6
1 Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX39DU, 7
UK 8
2 New Zealand Brain Research Institute, Christchurch 8011, New Zealand 9
3 Department of Medicine, University of Otago, Christchurch 8011, NZ 10
4 Department of Experimental Psychology, University of Oxford, Oxford OX26GG, UK 11
5 Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford OX39DU, 12
UK 13
6 Bristol Medical School, University of Bristol, Bristol BS81UD, UK 14
7 Centre for Human Brain Health, School of Psychology, University of Birmingham, UK 15
16
*These authors contributed equally 17
Correspondence to: Dr Campbell Le Heron, New Zealand Brain Research Institute, 18
21 Number of pages: 24 22 Number of figure: 4 23 Number of tables: 2 24 Abstract word count: 247 25 Introduction word count: 650 26 Discussion word count: 1500 27 28 The authors declare no competing financial interests. 29 30 ACKNOWLEDGEMENTS 31 This research was supported by a University of Oxford Christopher Welch Scholarship in Biological Sciences, a 32 University of Oxford Clarendon Scholarship and a Green Templeton College Partnership award (C.L.H.); a 33 Wellcome Trust Principal research fellowship to MH; the NIHR Oxford BRC (Biomedical Research Centre; 34 MH); the Velux Foundation (MH); A BBSRC David Phillips Fellowship (BB/R010668/1) to MAJA. 35
2
ABSTRACT 36
The mesolimbic dopaminergic system exerts a crucial influence on incentive processing. 37
However, the contribution of dopamine in dynamic, ecological situations where reward rates 38
vary, and decisions evolve over time, remains unclear. In such circumstances, current 39
(foreground) reward accrual needs to be compared continuously with potential rewards that 40
could be obtained by travelling elsewhere (background reward rate), in order to determine the 41
opportunity cost of staying versus leaving. We hypothesised that dopamine specifically 42
modulates the influence of background – but not foreground – reward information when 43
making a dynamic comparison of these variables for optimal behaviour. On a novel foraging 44
task based on an ecological account of animal behaviour (marginal value theorem), human 45
participants of either sex decided when to leave locations in situations where foreground 46
rewards depleted at different rates, either in rich or poor environments with high or low 47
background rates. In line with theoretical accounts, people’s decisions to move from current 48
locations were independently modulated by changes in both foreground and background 49
reward rates. Pharmacological manipulation of dopamine D2 receptor activity using the 50
agonist cabergoline significantly affected decisions to move on, specifically modulating the 51
effect of background reward rates. In particular, when on cabergoline, people left patches in 52
poor environments much earlier. These results demonstrate a role of dopamine in signalling 53
the opportunity cost of rewards, not value per se. Using this ecologically derived framework 54
we uncover a specific mechanism by which D2 dopamine receptor activity modulates 55
decision-making when foreground and background reward rates are dynamically compared. 56
57
58
59
60
3
Significance statement 61
Many decisions, across economic, political and social spheres, involve choices to “leave”. 62
Such decisions depend on a continuous comparison of a current location’s value, with that of 63
other locations you could move on to. However, how the brain makes such decisions is 64
poorly understood. Here, we developed a computerized task, based around theories of how 65
animals make decisions to move on when foraging for food. Healthy human participants had 66
to decide when to leave collecting financial rewards in a location, and travel to collect 67
rewards elsewhere. Using a pharmacological manipulation, we show that the activity of 68
dopamine in the brain modulates decisions to move on, with people valuing other locations 69
differently depending on their dopaminergic state. 70
We hypothesised that modulating dopamine levels would not alter the effect of changing 387
patch-types on patch leaving, if manipulating tonic levels predominantly affects the 388
processing of average reward rates. In line with this hypothesis, there was no significant drug 389 × patch interaction (F(1,187) = 1.29, p = 0.26): cabergoline did not lead to a significant 390
change in the way participants used foreground reward rate information to guide leaving 391
decisions (Figure 3B). There was also no statistically significant difference in leaving times 392
overall on drug compared to placebo (mean difference = 0.73s, F(1,29) = 1.86, p = 0.18), nor 393
did the reward rate at leaving vary as a function of drug state (mean difference = 0.39, t28 = 394
0.8, p = 0.41). Furthermore, these results were consistent regardless of whether leaving times, 395
or patch reward rate at leaving time, was analysed as the dependent variable (Table 2B). 396
397
As would also be predicted within MVT, there was no interaction in leaving times between 398
patch-type and background reward rate. Moreover, the observed drug × background reward 399
rate interaction was present across all patch types, with no 3-way interaction (F(1,186) = 400
0.31, p = 0.58). All of these results remain significant after controlling for weight, height and 401
BMI. Although the experiment was designed to minimise the effects of any learning, because 402
the dopaminergic manipulations could in theory lead to differential learning effects between 403
states we analysed the data from experiment two for session or order effects. The inclusion of 404
session (1st or 2nd) worsened model fit (ΔBIC = 7.6), and the parameter estimate for session 405
effect was not significant (PE=-0.16, F(1,29) = 0.36, p = 0.56). Similarly, including order (the 406
session * drug interaction) also worsened model fit (ΔBIC = 14.2) and again this term was 407
not significant (PE=0.87, F(1,29) = 1.4, p = 0.25). Therefore session and order effects were 408
17
not included in the final model. The inclusion of these effects did not change the significance 409
(or otherwise) of the other model terms. There was also no evidence of a systematic shift in 410
patch leaving behaviour across the course of each session as a function of drug state. We 411
calculated, for each subject, the mean leaving time in the first and second half of each session 412
(ON and OFF) for the two conditions with the highest number of trials (high yield patch in 413
rich environment, and low yield patch in poor environment). Using this metric, the mean 414
difference in leaving time across the experiment was not significantly different between the 415
Cabergoline and Placebo conditions (Mean Difference PLAC-CAB = -0.4s, t28 = 0.78, p = 0.44). 416
417
Finally, to test whether learning during the task was influencing behaviour as a function of 418
drug state, we fitted a model for the Cabergoline data that included the previous trial outcome 419
(reward obtained on the previous trial) as a predictor of patch leaving time on the subsequent 420
trial, and compared this to our primary model that did not include this metric (Garrett and 421
Daw, 2019). Although the parameter estimate for the term “Reward on Previous Trial” was 422
significant (F(1,2776) = 12, p = 0.001), inclusion of this term led to a worsening of model fit, 423
as measured by either Bayesian Information Criterion (Δ = 50) or Aikake Information 424
Criterion (Δ = 2.1). To further investigate the potential for dopaminergic state to be 425
changing task performance by an interaction with learning, we also calculated, for each 426
participant, the difference in the parameter estimate for ‘reward on previous trial’ between 427
the ON and OFF states. We found no evidence of a systematic change based on previous trial 428
(Mean change (ON vs OFF) = -0.1, t28 = 1.19, p = 0.25). These results suggest that 429
participants’ behaviour was not systematically changing from one patch to the next based on 430
the rewards received in the last patch as a function of dopamine state. 431
432
18
As observed in Study 1, people showed a significant bias to remain in all patch-types longer 433
than expected (Figure 3E-H). Again, this observation was not explained by predicting 434
optimal behaviour based on actually obtained long-run background reward rate (rather than 435
MVT predicted optimal) – on average people still left 6.8s & 7.3s later than predicted from 436
each patch-type in the ON and OFF state respectively (p < 0.0001 for each comparison – 437
Figure 4B & C). 438
439
Could participants be paying less attention when off medication? We analysed leaving time 440
variability to examine whether participants’ decisions were noisier as a function of drug state. 441
There was no significant difference in the variance of each participant’s decisions between 442
placebo and cabergoline conditions (Mean Difference PLAC-CAB = 0.31, t28 = 1.34, p = 0.19 ). 443
Therefore cabergoline had a specific rather than general effect on patch leaving behaviour, 444
altering only the influence of background reward rate on leaving time. 445
446
DISCUSSION 447
When to move on and leave a rewarding activity or location is an essential decision problem 448
for animals and humans alike. Here, we show that humans – both young and old – make 449
dynamic foraging decisions that, although not optimal, broadly conform to ecological 450
principles captured by Marginal Value Theorem (MVT) (Charnov, 1976; Stephens and 451
Krebs, 1986). Furthermore, dopaminergic D2 receptor activity may play a crucial role in 452
modulating such decisions. Specifically, the findings support the view that dopamine plays an 453
important role in signalling the average value of alternative locations, influencing dynamic 454
decisions of when to move on. Administration of cabergoline altered the effect of 455
background – but not foreground – reward rate changes on patch leaving times. In particular, 456
19
this interaction between cabergoline and background reward rate was driven mainly by 457
people leaving all patches in poor environments earlier. 458
459
The results provide new evidence for the role of dopamine in decision-making. Manipulation 460
of dopamine levels modulated the influence of background reward rate on dynamic decisions 461
about when to switch behaviour. Specifically, ON cabergoline people tended to leave all 462
patch-types in the poor environment earlier than when OFF drug. In contrast, in the rich 463
environment, there was a much smaller change in leaving times between the ON and OFF 464
drug states. The drug manipulations used here putatively alter tonic dopamine levels (Brooks 465
et al., 1998), a component of the dopaminergic neuromodulatory system which has been 466
ascribed, in the context of motor responses, a role in signalling background reward rates (Niv 467
et al., 2007; Hamid et al., 2016). Of course, in this study we were not able to measure firing 468
rates of dopamine neurons, and the relationship between firing rates, dopamine availability 469
and dopamine receptor activity is far from clear (Mohebi et al., 2019). Nevertheless, some 470
existing evidence suggests tonic dopamine levels encode information about background 471
reward rate, and therefore the opportunity cost (alternatives that are foregone) of chosen 472
actions (Niv et al., 2007; Guitart-Masip et al., 2014). 473
474
Much of the previous research in this area has used bandit-type designs to better understand 475
dopaminergic functions, which although useful, may not always reflect real-world problems. 476
Furthermore, in such experiments foreground and background reward rates can become 477
correlated, such that value of exploring alternatives has an instrumentally predictive value of 478
obtaining an immediate (foreground) reward (Daw et al., 2006; Kayser et al., 2015; 479
Westbrook and Frank, 2018). However, in ecological settings, choices to “leave” a patch and 480
explore are not choices between two stimuli with a predictive value, but instead involve 481
20
travelling to obtain rewards elsewhere. Thus, rewards available in a patch can be orthogonal 482
to the environment one is in. Using an MVT inspired paradigm, we showed that D2 483
manipulation impacts on background reward rates. This parallels results from a recent study 484
which used a different patch-leaving design, administered to people with Parkinson’s disease 485
ON and OFF their normal dopaminergic medications, to test a similar hypothesis 486
(Constantino et al., 2017). Patients left patches at lower reward rate thresholds (i.e. stayed in 487
patches for longer) when OFF medications, consistent with a lower estimation of the 488
background reward rate in a dopamine depleted state. In the current study we show such 489
effects are specifically linked to D2 receptors in healthy people, suggesting D2-mediated 490
pathways may be of particular importance for signalling such contextual reward information 491
(Beaulieu and Gainetdinov, 2011). When D2 receptors were stimulated (ON state), people 492
left patches earlier (at a higher foreground reward rate) in the poor environment, consistent 493
with an increase in perceived richness of the environment. This effect was not observed in the 494
rich environment, possibly because of a ceiling effect (D2 stimulation having reduced effect 495
on behaviour when reward rates were already high) – future research using multiple drug 496
doses, or multiple environments could investigate this issue further. Overall, whilst 497
dopaminergic stimulation may increase the vigour of movements or exploratory binary 498
choices, in more abstract, ecological decision settings it serves to increase the perceived 499
environmental richness, setting a higher threshold reward rate of when to leave. 500
501
Importantly, these results appear to be driven by changes in sensitivity to the background 502
reward rate, rather than alternative explanations. Firstly, patch rewards were constantly being 503
accrued – rather than stepped changes as has been used in previous studies (Hutchinson et 504
al., 2008; Constantino and Daw, 2015). This approach has the advantage of minimising the 505
use of simple heuristics to guide decisions while leaving the dependent variable 506
21
approximately normally distributed. It should be noted though that mathematically MVT 507
principles hold for both discreet and continuous patch-leaving designs. Secondly, variance in 508
patch leaving times did not change as a function of drug state. This makes it unlikely the 509
results can be explained by a confounding factor such as reduced attention. Thirdly, as 510
participants were explicitly informed of the current environment in which they were in, had 511
experienced the different background reward rates in a training phase, and did not 512
systematically alter patch leaving behaviour across the experiment, it is unlikely that the 513
results could be explained by differences in learning as a function of drug. Finally, although 514
the observed results could theoretically be explained by dopaminergic stimulation reducing 515
subjects’ estimation of current patch reward rate (instead of altering background reward rate 516
appraisal), the lack of main effect of dopamine on leaving time and the absence of change in 517
behaviour in patches embedded within the rich environment makes this unlikely. 518
519
One further possible interpretation is that differential learning of background reward rates 520
within the training phase, ON vs OFF drug, could have influenced subsequent patch-leaving 521
behaviour. Dopamine has a long history of being linked to learning through reward prediction 522
errors (Frank et al., 2004; Daw and Doya, 2006; Schultz, 2007). In this study, we controlled 523
for such effects by showing absence of order effects in behavioural data, by explicitly 524
training participants on the environmental richness, as well as instructing them of this at all 525
times while in patches. However, it is plausible that participants could have been poorer at 526
learning the average reward value in each environment when OFF drug in the training 527
session, due to changes in how prediction errors were signalled. Such an effect seems 528
unlikely given the absence of an effect of cabergoline on rich environment leaving times, 529
suggesting participants were able to learn the rich environment reward rates ON or OFF the 530
drug. However, even if driven by a failure to learn, our results show the consequences: poor 531
22
environments are treated as richer leading to reduced patch residency times when ON 532
cabergoline. Furthermore, although not the focus of the current manuscript, the overlapping 533
and dissociable effects of the dopaminergic system on both reward motivated behaviour and 534
learning is an evolving research area (Cools et al., 2011; Berke, 2018) that the use of 535
foraging-style tasks may be particularly important for advancing understanding of 536
(Constantino and Daw, 2015). 537
538
Our results highlight that human foraging behaviour broadly conforms to the principles of 539
MVT, although it is sub-optimal (Charnov, 1976; Pearson et al., 2014). This accords with 540
earlier field work in behavioural ecology (Stephens and Krebs, 1986; Pearson et al., 2014) 541
and anthropology (Smith et al., 1983; Metcalfe and Barlow, 1992) literatures, and more 542
recent work beginning to explore the neural basis of such decisions (Hayden et al., 2011; 543
Kolling et al., 2012; Constantino and Daw, 2015). In the current study, the use of a foraging 544
framework informed by MVT enabled us to dissociate the effects of reward rates on different 545
time scales, which are often correlated in reinforcement-learning based manipulations of 546
average reward rates (Niv et al., 2007; Mobbs et al., 2018). Specifically, it allowed us to 547
examine if dopaminergic modulations impacted on one, either or both reward components, 548
with our results showing an effect of cabergoline only on the background rate. 549
550
From a clinical perspective these findings may be significant when considering mechanisms 551
underlying common disorders of motivated behaviour, such as apathy (Le Heron et al., 2019). 552
Apathy is often associated with disruption of mesolimbic dopaminergic systems (Santangelo 553
et al., 2015), and, at least in some cases can be improved with D2/D3 receptor agonists 554
(Adam et al., 2013; Thobois et al., 2013). Accumulating evidence demonstrates altered 555
reward processing in patients with apathy (Strauss et al., 2014; Le Heron et al., 2018a), and it 556
23
is plausible – although as yet untested – that chronic underestimation of background 557
environment reward leads to a state where it is never “worth switching” from a current 558
activity, even if this activity is very minimal. Future work could profitably explore this 559
hypothesis. 560
561
Recent theoretical accounts of decision-making have called for a shift to more ecologically 562
derived experiments to investigate the mechanisms of this fundamental neural process 563
(Pearson et al., 2014; Mobbs et al., 2018). The current results highlight the utility of such an 564
approach, demonstrating a role for D2 activity in signalling the average background reward 565
rate during foraging. It links basic ecological models of animal behaviour to a mechanistic 566
understanding of human decision making, highlighting the specific influence of dopaminergic 567
systems as people decide when to move on as they pursue rewards in their environment. 568
569
ACKNOWLEDGEMENTS 570
This research was supported by a University of Oxford Christopher Welch Scholarship in 571
Biological Sciences, a University of Oxford Clarendon Scholarship and a Green Templeton 572
College Partnership award (C.L.H.); a Wellcome Trust Principal research fellowship to MH; 573
the NIHR Oxford BRC (Biomedical Research Centre; MH); the Velux Foundation (MH); A 574
BBSRC David Phillips Fellowship (BB/R010668/1) to MAJA. 575
576
AUTHOR CONTRIBUTIONS 577
CLH, NK, MH and MAJA designed the study; CLH, NK and MAJA coded the experiment, 578
CLH, OP, AK, RJ and YA collected data; CLH, NK, SF and MAJA analysed data; CLH, NK, 579
SF, MH and MAJA wrote the paper. 580
581
24
DECLARATION OF INTERESTS 582
We declare no conflicts of interest. 583
584
585
REFERENCES 586
587
Adam R, Leff A, Sinha N, Turner C, Bays P, Draganski B, Husain M (2013) Dopamine 588
reverses reward insensitivity in apathy following globus pallidus lesions. Cortex 589
49:1292–1303. 590
Barr DJ, Levy R, Scheepers C, Tily HJ (2013) Random effects structure for confirmatory 591
hypothesis testing: Keep it maximal. J Mem Lang 68. 592
Bateson M, Kacelnik A (1996) Rate currencies and the foraging starling: the fallacy of the 593
averages revisited. Behav Ecol Ecol 7. 594
Beaulieu J-M, Gainetdinov RR (2011) The Physiology, Signaling, and Pharmacology of 595
Dopamine Receptors. 596
Beierholm U, Guitart-Masip M, Economides M, Chowdhury R, Düzel E, Dolan R, Dayan P 597