Social Reinforcement Learning and its Neural Modulation by Oxytocin in Autism Spectrum Disorder Kruppa, JA 1,2,3 , Gossen, A 1,2,3 , Großheinrich, N 1,3 , Schopf, H 2 , Kohls, G 1 , Fink, GR 2 , Herpertz-Dahlmann, B 4 , Konrad, K 1,2 , Schulte-Rüther, M 1,2,3 1 Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 2 Institute of Neuroscience and Medicine (INM-3), Jülich Research Center, Germany 3 Translational Brain Research in Psychiatry and Neurology, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 4 Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany Introduction BACKGROUND OBJECTIVES PARTICIPANTS METHODS BEHAVIORAL RESULTS (PRELIMINARY) CONCLUSION/DISCUSSION References Contact: [email protected] A modi’ied social reinforcement learning task 5 was used, in which participants selected whether stimuli belong to category A or B. Upon correct choice, they were given feedback with a probability of 75% to be rewarding and 25% to be neutral. Upon incorrect choice they were given feedback with a probability of 75% to be neutral and 25% to be rewarding. Conditions: (1) NN neutral cue & neutral feedback, (2) NS neutral cue & social feedback, (3) SN – social cue & neutral feedback Reward Learning Model (Qlearning) 6 : On each trial t, subjects make a choice c t (button A or B). An expected outcome value is de’ined for each cueresponse combination Q t (A) and Q t (B), which are initialized with “0” for all possible cue response combinations. On each trial, the Qvalues are updated as follows: Q t+1 (c t )= Q t (c t )+α∙δ t , where 0≤α≤1 is a free learning rate parameter and δ t =r t Q t (c t ) is the prediction error. r t re’lects the value of the obtained reward for trial t. Assuming that subjects choose probablistically between A and B based on the acquired Qvalues according to a Softmaxdistribution, the individual learning parameter can be estimated individually for each subject from the behavioral choice data using maximum likelihood estimation. In addition to alpha, the initial Qvalues, reward values for positive reinforcement and for neutral feedback, could be estimated from the data, but were kept constant here (Qinit=0, rpos=1, rneut=0). Group N Mean Age Age Range IQ TDC 22 22.00 18 25 120.64 ASD 11 21.27 18 25 113.27 Currently, no pharmacological treatment of the core social symptoms of autism spectrum disorder (ASD) is available. Treatments of choice are behavioral interventions, mostly based on operant reinforcement learning 1,2 . However, training is very timeconsuming, usually not covered by the health insurance system and treatment effects are only modest. Recently, the neuropeptide oxytocin (OXT) has been shown to enhance motivation and attention to social stimuli, by modulating the brain reward circuitry in social situations 3 . Likely, these effects have the potential to enhance social reinforcement learning, the core mechanism of behavioral interventions. The addition of OXT to behavioral interventions may prove a fruitful combination for the treatment of the social symptoms of ASD 4 . No functional imaging studies are available that explicitly investigated the in’luence of OXT on socially reinforced learning. 1 ViruesOrtega J (2010). Applied behavior analytic intervention for autism in early childhood: metaanalysis, metaregression and doseresponse metaanalysis of multiple outcomes. Clinical psychology review, 30(4): 38768. 2 Williams White S, Keonig K, & Scahill L (2007). Social skills development in children with autism spectrum disorders: a review of the intervention research. Journal of autism and developmental disorders, 37(10): 185868. 3 Gamer M (2010). Does the amygdala mediate oxytocin effects pn socially reinforced learning? The Journal of Neuroscience: the ofIicial journal of the Society for Neuroscience, 30(28): 93478. 4 Bartz JA, Zaki J, Bolger N, & Ochsner KN (2011). Social effects of oxytocin in humans: context and person matter. Trends in cognitive neurosciences, 15(7): 201309. 5 Hurlemann R, Patin A, Onur O, et al. (2010). Oxytocin enhances amygdaladependent, socially reinforced learning and emotional empathy in humans. The Journal of Neuroscience: the of’icial journal of the Society for Neuroscience, 30(14): 49995007. 6 Daw, N (2011). Trialbytrial data analysis using computational models. In Delgado MR, Phelps EA, & Robbins TW (Eds.), Decision making, affect, and learning: Attention and Performance XXIII (pp.338). Oxford, UK: Oxford University Press. 7 Haruno M & Kawato M (2006). Heterarchical reinforcementlearning model for integration of multiple corticostriatal loops: fMRI examination in stimulusactionreward association learning. Neural Netw. 19(8): 124254. What are the behavioral and neural correlates of OXT induced enhancement of socially reinforced learning in TDC and ASD? • Difference between percentage correct responses at the beginning and end of the task (improvement in performance) • OXTinduced enhancement of socially reinforced learning re’lected by brain activation in the ventral striatum (VS) c. Improvement in Performance a. TDC Group – Placebo Behavioral level: TDC subjects perform better than ASD subjects across tasks and OXT/PLC condition The ASD group bene’its from OXT in the SN task: participants perform better under OXT than PLC, and achieve a comparable learning level to the TDC group • Attention to social stimuli seems to be enhanced in ASD under OXT 3 All subjects show an improvement in performance during the course of the task • In the SN task, ASD subjects perform worse than TDC subjects during early stages of learning particularly under PLC, but show comparable performance under OXT Neural level: Preliminary analyses of the healthy control group showed a correlation of brain activity in the ventral striatum with the reward prediction error in both the PLC and OXT condition, but no difference between both conditions Estimation of model parameters may need to be optimized to provide higher sensitivity (e.g., simultaneous estimation of further parameters) More extensive analyses including data of the ASD group will follow 190 200 210 220 230 240 250 260 NN NS SN AUC ASD_PLC ASD_OXT 40 50 60 70 80 90 100 110 1 2 3 4 5 6 7 8 percentage correct hits (%) Learning interval TDC_PLC_SN TDC_OXT_SN ASD_OXT_SN ASD_PLC_SN 190 200 210 220 230 240 250 260 NN NS SN AUC TDC_PLC TDC_OXT ASD_PLC ASD_OXT IMAGING RESULTS (PRELIMINARY) Each participant took part in two measurements on two consecutive days. A double blind withinsubjects crossover design was used to detect OXT induced differences in comparison to the PLC condition. Each measurement consisted of (1) nasal administration of OXT/PLC, (2) fMRI scan, and (3) neuropsychological assessments. Blood samples were drawn twice at predetermined time points. Overview of the study protocol. B1-B2: blood draws at predetrmined time points. Dashed line illustrates the accumulation of OXT concentration within the brain. a. Main effect of group: F(1) = 5.25, p = 0.03, ηp 2 = 0.15 b. Condition*task interaction: F(2,9) =3.73, p = .07, ηp 2 = 0.45 (marginal signiIicant) c. Main effect of Interval: F(7,25) = 26.33, p= .000, ηp 2 = 0.88 Main effect of group: F(1) = 3.23, p = 0.08, ηp 2 = 0.09 (marginal signiIicant) b. TDC Group – Oxytocin a. ASD vs. TDC b. ASD Group Data analysis: For each participant, the learning parameter was estimated from the behavioral data. The resulting individual learning models were used to calculate trial wise rewardprediction error values. Neural data was preprocessed and statistically analyzed with SPM8. For ’irstlevel analyses, cue and response events were modeled separately using stick functions, convolved with the haemodynamic response function. Response events were parametrically modulated by the trialwise individual reward prediction error values. Beta values representing this modulation were taken to the second level, with all conditions modeled separately in a ’lexible factorial ANOVA model (randomeffects analysis). A ROI of the ventral striatum was de’ined functionally, using a 10 mm sphere around coordinates adopted from Haruno & Kawato (2006) 7 . Signi’icant correlation of brain activation in the ventral striatum with the reward prediction error signal from individual qlearning models (p<.001 unc.)