Top Banner
Social Reinforcement Learning and its Neural Modulation by Oxytocin in Autism Spectrum Disorder Kruppa, JA 1,2,3 , Gossen, A 1,2,3 , Großheinrich, N 1,3 , Schopf, H 2 , Kohls, G 1 , Fink, GR 2 , Herpertz-Dahlmann, B 4 , Konrad, K 1,2 , Schulte-Rüther, M 1,2,3 1 Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 2 Institute of Neuroscience and Medicine (INM-3), Jülich Research Center, Germany 3 Translational Brain Research in Psychiatry and Neurology, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 4 Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany Introduction BACKGROUND OBJECTIVES PARTICIPANTS METHODS BEHAVIORAL RESULTS (PRELIMINARY) CONCLUSION/DISCUSSION References Contact: [email protected] A modi’ied social reinforcement learning task 5 was used, in which participants selected whether stimuli belong to category A or B. Upon correct choice, they were given feedback with a probability of 75% to be rewarding and 25% to be neutral. Upon incorrect choice they were given feedback with a probability of 75% to be neutral and 25% to be rewarding. Conditions: (1) NN neutral cue & neutral feedback, (2) NS neutral cue & social feedback, (3) SN – social cue & neutral feedback Reward Learning Model (Qlearning) 6 : On each trial t, subjects make a choice c t (button A or B). An expected outcome value is de’ined for each cueresponse combination Q t (A) and Q t (B), which are initialized with “0” for all possible cue response combinations. On each trial, the Qvalues are updated as follows: Q t+1 (c t )= Q t (c t )+α∙δ t , where 0≤α≤1 is a free learning rate parameter and δ t =r t Q t (c t ) is the prediction error. r t re’lects the value of the obtained reward for trial t. Assuming that subjects choose probablistically between A and B based on the acquired Qvalues according to a Softmaxdistribution, the individual learning parameter can be estimated individually for each subject from the behavioral choice data using maximum likelihood estimation. In addition to alpha, the initial Qvalues, reward values for positive reinforcement and for neutral feedback, could be estimated from the data, but were kept constant here (Qinit=0, rpos=1, rneut=0). Group N Mean Age Age Range IQ TDC 22 22.00 18 25 120.64 ASD 11 21.27 18 25 113.27 Currently, no pharmacological treatment of the core social symptoms of autism spectrum disorder (ASD) is available. Treatments of choice are behavioral interventions, mostly based on operant reinforcement learning 1,2 . However, training is very timeconsuming, usually not covered by the health insurance system and treatment effects are only modest. Recently, the neuropeptide oxytocin (OXT) has been shown to enhance motivation and attention to social stimuli, by modulating the brain reward circuitry in social situations 3 . Likely, these effects have the potential to enhance social reinforcement learning, the core mechanism of behavioral interventions. The addition of OXT to behavioral interventions may prove a fruitful combination for the treatment of the social symptoms of ASD 4 . No functional imaging studies are available that explicitly investigated the in’luence of OXT on socially reinforced learning. 1 ViruesOrtega J (2010). Applied behavior analytic intervention for autism in early childhood: metaanalysis, metaregression and doseresponse metaanalysis of multiple outcomes. Clinical psychology review, 30(4): 38768. 2 Williams White S, Keonig K, & Scahill L (2007). Social skills development in children with autism spectrum disorders: a review of the intervention research. Journal of autism and developmental disorders, 37(10): 185868. 3 Gamer M (2010). Does the amygdala mediate oxytocin effects pn socially reinforced learning? The Journal of Neuroscience: the ofIicial journal of the Society for Neuroscience, 30(28): 93478. 4 Bartz JA, Zaki J, Bolger N, & Ochsner KN (2011). Social effects of oxytocin in humans: context and person matter. Trends in cognitive neurosciences, 15(7): 201309. 5 Hurlemann R, Patin A, Onur O, et al. (2010). Oxytocin enhances amygdaladependent, socially reinforced learning and emotional empathy in humans. The Journal of Neuroscience: the of’icial journal of the Society for Neuroscience, 30(14): 49995007. 6 Daw, N (2011). Trialbytrial data analysis using computational models. In Delgado MR, Phelps EA, & Robbins TW (Eds.), Decision making, affect, and learning: Attention and Performance XXIII (pp.338). Oxford, UK: Oxford University Press. 7 Haruno M & Kawato M (2006). Heterarchical reinforcementlearning model for integration of multiple corticostriatal loops: fMRI examination in stimulusactionreward association learning. Neural Netw. 19(8): 124254. What are the behavioral and neural correlates of OXT induced enhancement of socially reinforced learning in TDC and ASD? Difference between percentage correct responses at the beginning and end of the task (improvement in performance) OXTinduced enhancement of socially reinforced learning re’lected by brain activation in the ventral striatum (VS) c. Improvement in Performance a. TDC Group – Placebo Behavioral level: TDC subjects perform better than ASD subjects across tasks and OXT/PLC condition The ASD group bene’its from OXT in the SN task: participants perform better under OXT than PLC, and achieve a comparable learning level to the TDC group Attention to social stimuli seems to be enhanced in ASD under OXT 3 All subjects show an improvement in performance during the course of the task In the SN task, ASD subjects perform worse than TDC subjects during early stages of learning particularly under PLC, but show comparable performance under OXT Neural level: Preliminary analyses of the healthy control group showed a correlation of brain activity in the ventral striatum with the reward prediction error in both the PLC and OXT condition, but no difference between both conditions Estimation of model parameters may need to be optimized to provide higher sensitivity (e.g., simultaneous estimation of further parameters) More extensive analyses including data of the ASD group will follow 190 200 210 220 230 240 250 260 NN NS SN AUC ASD_PLC ASD_OXT 40 50 60 70 80 90 100 110 1 2 3 4 5 6 7 8 percentage correct hits (%) Learning interval TDC_PLC_SN TDC_OXT_SN ASD_OXT_SN ASD_PLC_SN 190 200 210 220 230 240 250 260 NN NS SN AUC TDC_PLC TDC_OXT ASD_PLC ASD_OXT IMAGING RESULTS (PRELIMINARY) Each participant took part in two measurements on two consecutive days. A double blind withinsubjects crossover design was used to detect OXT induced differences in comparison to the PLC condition. Each measurement consisted of (1) nasal administration of OXT/PLC, (2) fMRI scan, and (3) neuropsychological assessments. Blood samples were drawn twice at predetermined time points. Overview of the study protocol. B1-B2: blood draws at predetrmined time points. Dashed line illustrates the accumulation of OXT concentration within the brain. a. Main effect of group: F(1) = 5.25, p = 0.03, ηp 2 = 0.15 b. Condition*task interaction: F(2,9) =3.73, p = .07, ηp 2 = 0.45 (marginal signiIicant) c. Main effect of Interval: F(7,25) = 26.33, p= .000, ηp 2 = 0.88 Main effect of group: F(1) = 3.23, p = 0.08, ηp 2 = 0.09 (marginal signiIicant) b. TDC Group – Oxytocin a. ASD vs. TDC b. ASD Group Data analysis: For each participant, the learning parameter was estimated from the behavioral data. The resulting individual learning models were used to calculate trial wise rewardprediction error values. Neural data was preprocessed and statistically analyzed with SPM8. For ’irstlevel analyses, cue and response events were modeled separately using stick functions, convolved with the haemodynamic response function. Response events were parametrically modulated by the trialwise individual reward prediction error values. Beta values representing this modulation were taken to the second level, with all conditions modeled separately in a ’lexible factorial ANOVA model (randomeffects analysis). A ROI of the ventral striatum was de’ined functionally, using a 10 mm sphere around coordinates adopted from Haruno & Kawato (2006) 7 . Signi’icant correlation of brain activation in the ventral striatum with the reward prediction error signal from individual qlearning models (p<.001 unc.)
1

Social Reinforcement Learning and its Neural Modulation by Oxytocin …wgas-autismus.org/wp-content/uploads/2015/05/JanaKruppa_Poster8... · Social Reinforcement Learning and its

Apr 09, 2018

Download

Documents

HoàngAnh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Social Reinforcement Learning and its Neural Modulation by Oxytocin …wgas-autismus.org/wp-content/uploads/2015/05/JanaKruppa_Poster8... · Social Reinforcement Learning and its

Social Reinforcement Learning and its Neural Modulation by Oxytocin in Autism Spectrum Disorder

Kruppa, JA1,2,3, Gossen, A1,2,3, Großheinrich, N1,3, Schopf, H2, Kohls, G1, Fink, GR2, Herpertz-Dahlmann, B4,

Konrad, K1,2, Schulte-Rüther, M1,2,3

1 Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 2 Institute of Neuroscience and Medicine (INM-3), Jülich Research Center, Germany 3 Translational Brain Research in Psychiatry and Neurology, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 4 Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany

Introduction BACKGROUND

OBJECTIVES

PARTICIPANTS

METHODS

BEHAVIORAL RESULTS (PRELIMINARY)

CONCLUSION/DISCUSSION

References

Contact: [email protected]

A  modi'ied   social   reinforcement   learning   task5   was   used,   in   which   participants  selected  whether   stimuli  belong   to   category  A  or  B.  Upon  correct   choice,   they  were  given   feedback  with   a   probability   of   75%   to   be   rewarding   and   25%   to   be   neutral.  Upon   incorrect   choice   they   were   given   feedback   with   a   probability   of   75%   to   be  neutral  and  25%  to  be  rewarding.  Conditions:   (1)  NN   -­‐   neutral   cue  &  neutral   feedback,   (2)  NS   -­‐   neutral   cue  &   social  feedback,  (3)  SN  –  social  cue  &  neutral  feedback    

Reward  Learning  Model   (Q-­‐learning)6:  On  each   trial   t,   subjects  make   a   choice   ct  (button   A   or   B).   An   expected   outcome   value   is   de'ined   for   each   cue-­‐response  combination   Qt(A)   and   Qt(B),   which   are   initialized   with   “0”   for   all   possible   cue-­‐response  combinations.  On  each  trial,  the  Q-­‐values  are  updated  as  follows:  Qt+1(ct)  =  Qt  (ct)+α∙δt   ,  where  0≤α≤1  is  a  free  learning  rate  parameter  and  δt  =  rt-­‐Qt  (ct)   is  the  prediction  error.  rt  re'lects  the  value  of  the  obtained  reward  for  trial  t.  Assuming   that   subjects   choose   probablistically   between   A   and   B   based   on   the  acquired   Q-­‐values   according   to   a   Softmax-­‐distribution,   the   individual   learning  parameter  can  be  estimated  individually  for  each  subject  from  the  behavioral  choice  data  using  maximum  likelihood  estimation.  In  addition  to  alpha,  the  initial  Q-­‐values,  reward   values   for   positive   reinforcement   and   for   neutral   feedback,   could   be  estimated  from  the  data,  but  were  kept  constant  here  (Qinit=0,  rpos=1,  rneut=0).      

Group   N   Mean  Age   Age  Range   IQ  

TDC   22   22.00   18  -­‐  25   120.64  

ASD     11   21.27   18-­‐  25   113.27  

Currently,   no   pharmacological   treatment   of   the   core   social   symptoms   of   autism  spectrum   disorder   (ASD)   is   available.   Treatments   of   choice   are   behavioral  interventions,  mostly  based  on  operant  reinforcement  learning1,2.  However,  training  is  very   time-­‐consuming,   usually   not   covered   by   the   health   insurance   system   and  treatment  effects  are  only  modest.  Recently,  the  neuropeptide  oxytocin  (OXT)  has  been  shown  to  enhance  motivation  and  attention  to  social  stimuli,  by  modulating  the  brain  reward  circuitry  in  social  situations3.  Likely,  these  effects  have  the  potential  to  enhance  social   reinforcement   learning,   the   core   mechanism   of   behavioral   interventions.   The  addition  of  OXT   to  behavioral   interventions  may  prove  a   fruitful   combination   for   the  treatment  of    the  social  symptoms  of  ASD4.  No  functional  imaging  studies  are  available  that  explicitly  investigated  the  in'luence  of  OXT  on  socially  reinforced  learning.  

1Virues-­‐Ortega  J  (2010).  Applied  behavior  analytic  intervention  for  autism  in  early  childhood:  meta-­‐analysis,  meta-­‐regression  and  dose-­‐response  meta-­‐analysis  of  multiple  outcomes.  Clinical  psychology  review,  30(4):  387-­‐68.  2Williams  White  S,  Keonig  K,  &  Scahill  L  (2007).  Social  skills  development  in  children  with  autism  spectrum  disorders:  a  review  of  the  intervention  research.  Journal  of  autism  and  developmental  disorders,  37(10):  1858-­‐68.  3Gamer  M  (2010).  Does  the  amygdala  mediate  oxytocin  effects  pn  socially  reinforced  learning?  The  Journal  of  Neuroscience:  the  ofIicial  journal  of  the  Society  for  Neuroscience,  30(28):  9347-­‐8.  4Bartz  JA,  Zaki  J,  Bolger  N,  &  Ochsner  KN  (2011).  Social  effects  of  oxytocin  in  humans:  context  and  person  matter.  Trends  in  cognitive  neurosciences,  15(7):  201-­‐309.  5Hurlemann  R,  Patin  A,  Onur  O,  et  al.  (2010).  Oxytocin  enhances  amygdala-­‐dependent,  socially  reinforced  learning  and  emotional  empathy  in  humans.  The  Journal  of  Neuroscience:  the  of'icial  journal  of  the  Society  for  Neuroscience,  30(14):  4999-­‐5007.  6Daw,  N  (2011).  Trial-­‐by-­‐trial  data  analysis  using  computational  models.  In  Delgado  MR,  Phelps  EA,  &  Robbins  TW  (Eds.),  Decision  making,  affect,  and  learning:  Attention  and  Performance  XXIII  (pp.3-­‐38).  Oxford,  UK:  Oxford  University  Press.  7Haruno  M  &  Kawato  M  (2006).  Heterarchical  reinforcement-­‐learning  model  for  integration  of  multiple  cortico-­‐striatal  loops:    fMRI  examination  in  stimulus-­‐action-­‐reward  association  learning.  Neural  Netw.  19(8):  1242-­‐54.  

² What  are  the  behavioral  and  neural  correlates  of  OXT  induced  enhancement  of  socially  reinforced  learning  in  TDC  and  ASD?  •  Difference  between  percentage  correct  responses  at  the  beginning  and  end  of  the  task  (improvement  in  performance)  

•  OXT-­‐induced  enhancement  of  socially  reinforced  learning  re'lected  by  brain  activation  in  the  ventral  striatum  (VS)  

 

                                             

p2  

 

             c.    Improvement  in  Performance  

         a.  TDC  Group  –  Placebo    

Behavioral  level:  ²  TDC  subjects  perform  better  than  ASD  subjects  across  tasks  and  OXT/PLC  condition  ²  The  ASD  group  bene'its  from  OXT  in  the  SN  task:  participants  perform  better  under  OXT  than  PLC,  and  achieve  a  comparable  learning  level  to  the  TDC  group  

•  Attention  to  social  stimuli  seems  to  be  enhanced  in  ASD  under  OXT3  ²  All  subjects  show  an  improvement  in  performance  during  the  course  of  the  task  

•  In  the  SN  task,  ASD  subjects  perform  worse  than  TDC  subjects  during  early  stages  of  learning  particularly  under  PLC,  but  show  comparable  performance  under  OXT  

 Neural  level:  ²  Preliminary  analyses  of  the  healthy  control  group  showed  a  correlation  of  brain  activity  in  the  ventral  striatum  with  the  reward  prediction  error  in  both  the  PLC  and  OXT  condition,  but  no  difference  between  both  conditions  

²  Estimation  of  model  parameters  may  need  to  be  optimized  to  provide  higher  sensitivity  (e.g.,  simultaneous  estimation  of  further  parameters)  

² More  extensive  analyses  including  data  of  the  ASD  group  will  follow    

190$

200$

210$

220$

230$

240$

250$

260$

NN$ NS$ SN$

AUC$

ASD_PLC$

ASD_OXT$

40#

50#

60#

70#

80#

90#

100#

110#

1# 2# 3# 4# 5# 6# 7# 8#

percen

tage#co

rrect#h

its#(%

)#

Learning#interval#

TDC_PLC_SN#

TDC_OXT_SN#

ASD_OXT_SN#

ASD_PLC_SN#

190$

200$

210$

220$

230$

240$

250$

260$

NN$ NS$ SN$

AUC$

TDC_PLC$

TDC_OXT$

ASD_PLC$

ASD_OXT$

IMAGING RESULTS (PRELIMINARY)

Each  participant  took  part  in  two  measurements  on  two  consecutive  days.  A  double-­‐blind  within-­‐subjects  cross-­‐over  design  was  used  to  detect  OXT  induced  differences  in   comparison   to   the   PLC   condition.   Each   measurement   consisted   of   (1)   nasal  administration  of  OXT/PLC,  (2)   fMRI  scan,  and  (3)  neuropsychological  assessments.  Blood  samples  were  drawn  twice  at  predetermined  time  points.    

Overview of the study protocol. B1-B2: blood draws at predetrmined time points. Dashed line illustrates the accumulation of OXT concentration within the brain.

a.  Main  effect  of  group:  F(1)  =  5.25,  p  =  0.03,  ηp2  =  0.15  

 b.  Condition*task  interaction:  F(2,9)  =3.73,  

p  =  .07,  ηp2  =  0.45  (marginal  signiIicant)    c.  Main  effect  of  Interval:  F(7,25)  =  26.33,  

p=  .000,  ηp2  =  0.88              Main  effect  of    group:  F(1)  =  3.23,  p  =  0.08,              ηp2  =  0.09  (marginal  signiIicant)  

   

       b.  TDC  Group  –  Oxytocin  

a.  ASD  vs.  TDC   b.  ASD  Group  

Data  analysis:  For  each  participant,   the   learning  parameter  was  estimated   from  the  behavioral  data.  The  resulting  individual  learning  models  were  used  to  calculate  trial-­‐wise   reward-­‐prediction   error   values.  Neural   data  was  preprocessed   and   statistically  analyzed  with  SPM8.  For  'irst-­‐level  analyses,  cue  and  response  events  were  modeled  separately  using  stick  functions,  convolved  with  the  haemodynamic  response  function.  Response  events  were  parametrically  modulated  by  the  trial-­‐wise  individual  reward-­‐prediction  error  values.    Beta  values  representing  this  modulation  were  taken  to  the  second   level,   with   all   conditions   modeled   separately   in   a   'lexible   factorial   ANOVA  model   (random-­‐effects   analysis).   A   ROI   of   the   ventral   striatum   was   de'ined  functionally,   using   a   10   mm   sphere   around   coordinates   adopted   from   Haruno   &  Kawato  (2006)7.

Signi'icant  correlation  of  brain  activation  in  the  ventral  striatum  with  the  reward  prediction  error  signal  from  individual  q-­‐learning  models  (p<.001  unc.)