Top Banner
Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan
14

Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Dopamine, Uncertainty

and TD Learning

CoSyNe’04

Yael Niv Michael DuffPeter Dayan

Page 2: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

What does Dopamine encode?• Important neuromodulator

- Neurological/psychiatric disorders - Drug addiction/self stimulation

• Fundamental role in RL- Classical/Pavlovian conditioning- Instrumental/operant conditioning

• DA neurons respond to:− Unexpected (appetitive) rewards− Stimuli predicting (appetitive) rewards− Withdrawal of expected rewards− Novel/Salient stimuli

Page 3: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

What does Dopamine encode?

DA represents some aspect of reward, but not rewards as such.

Page 4: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

The TD Hypothesis of Dopamine

)()(ˆ

)(ˆ)1(ˆ)1()(

ttV

tVtVtrt

)1()1()(

)()(

tVtrtV

rtVt

DA encodes the reward prediction error

<-DA

Stimulus Reward Stimulus RewardStimulus Reward

DA

δ(t)

Precise theory for the generation of DA firing patternsCompelling account for the role of DA in classical conditioning

Page 5: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

But: Fiorillo, Tobler & Schultz 2003• Introduce inherent uncertainty into the

classical conditioning paradigm

• Five visual stimuli indicating different

reward probabilities: P=0,¼,½,¾,1

CS = 2 sec visual stimulus

US (probabilistic) = drops of juice

Page 6: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Fiorillo, Tobler & Schultz 2003

• At stimulus time: DA represents mean expected reward

• Interesting: A ramp in activity up

to reward (highest for p=½)

• Hypothesis: DA ramp encodes uncertainty in reward

Page 7: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Dopamine: Uncertainty or TD error?

• No apparent reason for ramp• The ramp is predictable from

the stimulus• TD predicts away

predictable quantities

contradiction !

• Side issue: the ramp is like a constantly surprising reward -- it can’t influence action choice

Page 8: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

At time of reward:• Prediction errors result

from uncertainty

• Crucially: Positive and negative errors cancel out

A closer look at FTS’s results:

p = 0.5

p = 0.75

Page 9: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

• TD error δ(t) can be positive or negative

• Neuronal firing rate is only positive (negative values are coded relative to base firing rate)

But:• DA base firing rate is low

-> asymmetric encoding of δ(t)

A closer look at FTS’s results:

55%

270%

δ(t)

DA

Page 10: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

x(1) x(2) …

r(t) δ(t)

V(1) V(20)

• Tapped delay line • Standard online TD learning• Fixed learning rate

• Negative δ(t) scaled by d=1/6 prior to PSTH

Modeling TD with asymmetric errors

Learning proceeds normally (without scaling) − Necessary to produce the right predictions− Can be biologically plausible

Page 11: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

TD learning with asymmetric prediction errors replicates

the recorded data accurately.

Ramps result from asymmetrically coded prediction errors propagating back to stimulus

Artifact of summing PSTHs over nonstationary recent reward histories

Modeling TD with asymmetric errors

Page 12: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Analytically deriving the maximum error at the time of the reward we get:

=> the ramp is indeed highest for P=½

But:• DA Encodes nothing but temporal difference error!• Experimental test: Ramp as within or between trial

phenomenon?

DA: Uncertainty or Temporal Difference?

)1)(1( dppTT

Page 13: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Trace conditioning: A puzzle and its resolution

• Same (if not more) uncertainty, but… no DA ramping! (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman)

• Resolution: lower learning rate in trace conditioning eliminates ramp

CS = short visual stimulus

Trace period

US (probabilistic) = drops of juice

Page 14: Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

ConclusionsPreserve the TD hypothesis of Dopamine:

− No explicit coding of uncertainty− Ramping explained by neural constraints− Explains the disappearance of the ramp in

trace conditioning

Important challenges to the TD hypothesis − Conditioned inhibition− Effects of timing