1 Decision making. 2 How does the brain learn the values?

1

Decision making

2

How does the brain learn the values?

3

The computational problem

The goal is to maximize the sum of rewards

Eend

tt

V r

4

The computational problem

The value of the state S1 depends on the policy

1 2ice cream V S r V S

If the animal chooses ‘right’ at S1,

5

How to find the optimal policy in a complicated world?

6


• If values of the different states are known then this task is easy

1 t t tV S r V S

7


• If values of the different states are known then this task is easy

How can the values of the different states be learned?

8

1 t t tV S r V S

V(St) = the value of the state at time t

rt = the (average) reward delivered at time t

V(St+1) = the value of the state at time t+1

9

where

t t tV S V S

1 t t t tr V S V S

is the TD error.

The TD (temporal difference) learning algorithm

10

11

Dopamine

12

Dopamine is good

• Dopamine is released by rewarding experiences, e.g., sex, food

• Cocaine, nicotine and amphetamine directly or indirectly lead to an increase of dopamine release

• Neutral stimuli that are associated with rewarding experiences result in a release of dopamine

• Drugs that reduce dopamine activity reduce motivation, cause anhedonia (inability to experience pleasure)

• Long-term use may result in dyskinesia (diminished voluntary movements and the presence of involuntary movements)

13

No dopamine is bad

14

• Bradykinesia – slowness in voluntary movement such as standing up, walking, and sitting down. This may lead to difficulty initiating walking, but when more severe can cause “freezing episodes” once walking has begun.

• Tremors – often occur in the hands, fingers, forearms, foot, mouth, or chin. Typically, tremors take place when the limbs are at rest as opposed to when there is movement.

• Rigidity – otherwise known as stiff muscles, often produce muscle pain that is increased during movement.

• Poor balance – happens because of the loss of reflexes that help posture. This causes unsteady balance, which oftentimes leads to falls.

No dopamine is bad (Parkinson’s disease)

15

Schultz, Dayan and Montague, Science, 1997

16

CS Reward

Before trial 1:

1 2 3 4 5 6 7 8 9

1 2 9 0 V S V S V S

In trial 1:

• no reward in states 1-7

1 0 t t t tr V S V S

0 t t tV S V S

• reward of size 1 in states 8

9 8 1 t tr V S V S

8 t tV S V S

17

CS Reward

Before trial 2:

1 2 3 4 5 6 7 8 9

1 2 7 9 0 V S V S V S V S

8 V SIn trial 2, for states 1-6


0 t t tV S V S

For state 7,

1 t t t tr V S V S 2

7 7 tV S V S

18

CS Reward

Before trial 2:

1 2 3 4 5 6 7 8 9

1 2 7 9 0 V S V S V S V S

8 V SFor state 8,


8 8 1 2 tV S V S

19

CS Reward

Before trial 3:

1 2 3 4 5 6 7 8 9

1 2 6 9 0 V S V S V S V S

27 8 2 V S V S

In trial 2, for states 1-5


0 t t tV S V S

For state 6,

21 t t t tr V S V S

37 7 tV S V S

20

CS Reward

1 2 3 4 5 6 7 8 9

For state 7,

21 2 2 1 t t t tr V S V S

2 2 37 7 2 1 3 2 tV S V S

Before trial 3: 1 2 6 9 0 V S V S V S V S

27 8 2 V S V S

For state 8,

1 1 2 t t t tr V S V S

8 8 2 1 1 2 tV S V S

21

CS Reward

After many trials

1 2 3 4 5 6 7 8 9

1 8 91 0 V S V S V S


Except for the CS whose time is unknown

22

23Schultz, 1998

24

Bayer and Glimcher, 1998

“We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected”.

25

Bayer and Glimcher, 1998

1 Decision making. 2 How does the brain learn the values?

Documents

dopamine slide

bad slide

easy slide

unknown slide

cs reward

bad parkinsons disease

average reward

current reward