CDB Exploring Science and Society Seminar Thursday 19 November 2009 at 5.30pm Host: Prof Giorgio Gabella The Bayesian brain, surprise and free-energy.

CDB Exploring Science and Society SeminarThursday 19 November 2009 at 5.30pm

Host: Prof Giorgio Gabella

The Bayesian brain, surprise and free-energy

Abstract

Value-learning and perceptual learning have been an important focus over the past decade, attracting the concerted attention of experimental psychologists, neurobiologists and the machine learning community. Despite some formal connections; e.g., the role of prediction error in optimizing some function of sensory states, both fields have developed their own rhetoric and postulates. In work, we show that perceptual learning is, literally, an integral part of value learning; in the sense that perception is necessary to integrate out dependencies on the inferred causes of sensory information. This enables the value of sensory trajectories to be optimized through action. Furthermore, we show that acting to optimize value and perception are two aspects of exactly the same principle; namely the minimization of a quantity (free energy) that bounds the probability of sensory input, given a particular agent or phenotype. This principle can be derived, in a straightforward way, from the very existence of agents, by considering the probabilistic behavior of an ensemble of agents belonging to the same class.

“Objects are always imagined as being present in the field of vision as would have to be there in order to produce the same impression on the nervous mechanism” - Hermann Ludwig Ferdinand von Helmholtz

Thomas Bayes

Geoffrey Hinton

Richard FeynmanFrom the Helmholtz machine and the Bayesian Brain

toAction and self-organization

Hermann Haken

http://en.wikipedia.org/wiki/Image:Helmholtz.jpg

Overview

Ensemble dynamicsEntropy and equilibriaFree-energy and surprise

The free-energy principleAction and perceptionGenerative models

PerceptionBirdsong and categorizationSimulated lesions

ActionActive inferenceReaching

PoliciesControl and attractorsThe mountain-car problem

Particle density contours showing Kelvin-Helmholtz instability, forming beautiful breaking waves. In the self-sustained state of Kelvin-Helmholtz turbulence the particles are transported away from the mid-plane at the same rate as they fall, but the particle density is nevertheless very clumpy because of a clumping instability that is caused by the dependence of the particle velocity on the local solids-to-gas ratio (Johansen, Henning, & Klahr 2006)

1

2

temperature

pH

falling

transport

( | )p m( | )p m

Self-organization that minimises an ensemble density

to ensure a limited repertoire of states are occupied (i.e., ensuring states have a random attracting set).

( | ) ln ( | )H p m p m d

( )A

How can an active agent minimise its equilibrium entropy? This entropy is bounded by the entropy of sensory signals (under simplifying assumptions)

Crucially, because the density on sensory signals is at equilibrium, it can be interpreted as the proportion of time each agent entertains them (the sojourn time). This ergodic argument means that entropy is the path integral of surprise experienced by a particular agent:

This means agents minimise surprise at all times. But there is one small problem… Agents cannot access surprise; however, they can evaluate a free-energy bound on surprise, which is induced with a recognition density q :

0

1( | ) ln ( | ) lim ln ( ( ) | )

T

sT

H p s m p s m ds p s t m dtT

( )

( ) ln | |s

s z

H H p d

g

g

( ( ), ( )) ln ( ( ) | )F s t q p s t m

Overview



PerceptionBirdsongSimulated lesions

ActionActive inference

Reaching

PolicesControl and attractors

The mountain-car problem

Action

( , )s x z g

argmin ( , )a

a F s External states in the world

Internal states of the agent (m)

Sensations

The free-energy principle

argmin ( , )F s

( , , )x x a w f

Action to minimise a bound on surprise Perception to optimise the bound

ln ( , | ) ln ( )q q

F Energy Entropy p s m q

( || ( )) ln ( ( ) | , )

argmax

q

a

F D q p p s a m

Complexity Accuracy

a Accuracy

( ( | ) || ( | )) ln ( | )

argmin

F D q p s p s m

Divergence Surprise

Divergence

{ ( ), , }x t

The free-energy rests on expected Gibb’s energy

and can be evaluated, given a generative model comprising a likelihood and prior:

So what models might the brain use?

( , ) lnq q

F s q Energy Entropy U q

( , ) ln ( , | ) ln ( | , ) ln ( | )U s p s m p s m p m

The generative model

{ ( ), , }x t

Processing hierarchy

Backward(nonlinear)

Forward(linear)

lateralEnsemble dynamics

Entropy and equilibriaFree-energy and surprise




Reaching



)1(~x )1(

s

)2((2)

(1)

)2(~x

)2(~v

)1(~v

x~

s

v~

( | , , , ) ( , ( ))sp s v x N g

( | , , , ) ( , ( ))xp x x v N f

( ) ( , )p v N

( , , )

( , , )

s g zs g x v z

x f x v w Dx f w

Hierarchical (deep) dynamic models

( 1) ( ) ( )( 1) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( , , )

( , , )

i i ii i i i i

i i i i i i i i

v g zv g x v z

x f x v w Dx f w

{ ( ), ( ), , }x t v t

1( ) ( ) ( ) ( ) ( ) ( 1) ( 1)

1

( ) ( ) ( ) ( )

( ) ( 1) ( 1) ( 1) ( )

, , | | , ,

, ( ) ( ) ( | ) ( | , )

| ( , )

| , ( , )

mm i i i i i i

i

i i i i x

i i i i i v

p s x v m p s x v p x v

p x v p v p x p Dx v p v x v

p Dx v N f

p v x v N g

Structural priors

Dynamical priors

Likelihood and empirical priors

(1) (1) (1)

(1) (1) (1) (1)

( 1) ( ) ( ) ( )

( ) ( ) ( ) ( )

( , )

( , )

( , )

( , )

i i i i

i i i i

s g x v z

x f x v w

v g x v z

x f x v w

(1)

(1)

( )

( )

v

m

m

v

x

s g

v

g

v

v g

Dx f

Hierarchal form

1 12 2

( , ) ln , , |

ln T

U s p s x v m

Gibb’s energy: a simple function of prediction error

Prediction errors

,x v

Synaptic gain

Synaptic activity Synaptic efficacy

Activity-dependent plasticity

Functional specialization

Attentional gain

Enabling of plasticity

Attention and salience

F

Perception and inference Learning and memory

The recognition density and its sufficient statistics

F

xx

vv

F

F

Mean-field approximation: ( ) ( ) ( , ) ( ) ( )

( ) ( , )

i

i

i i i

q q q x v q q

q N

Laplace approximation:

Backward predictions

Forward prediction error

( )i x

( )i x

( )i v

( 1)i v

( )s t

( )i v( 1)i x

( 1)i x

( 1)i v

( 2)i v

Perception and message-passing

( ) ( ) ( ) ( ) ( 1)

( ) ( ) ( ) ( )

i v i v i T i i vv

i x i x i T ix

D

D

12 ( ( ( )))T

i itr R ij

Tij

Synaptic plasticity

( ) ( ) ( ) ( ) ( 1) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ( ))

( ( ))

i v i v i v i v i v i

i x i x i x i x i x i

g

D f

Synaptic gain

Overview



PerceptionBirdsong and categorizationSimulated lesions


Reaching



Synthetic song-birds

SyrinxVocal centre

1

2

vv

v

Time (sec)

Freq

uenc

y

Sonogram

0.5 1 1.5

2 1

1 1 3 1 2

1 2 2 3

18 18

( , ) 2

2

x x

x x v v x x x x

x x v x

f

http://www.flickr.com/photos/jamuudsen/145813811/

x

x

v( )s t

v

10 20 30 40 50 60-5

0

5

10

15

20

prediction and error

time

10 20 30 40 50 60-5

0

5

10

15

20hidden states

time

Backward predictions

Forward prediction error

10 20 30 40 50 60-10

-5

0

5

10

15

20

Causal states

time (bins)

2 1

1 1 3 1 2

1 2 2 3

18 18

( ) 2

2

x x

f x v x x x x

x x v x

Recognition and message passing

stimulus

0.2 0.4 0.6 0.82000

2500

3000

3500

4000

4500

5000

time (seconds)

Perceptual categorization

Freq

uenc

y (H

z) Song A

0.2 0.4 0.6 0.82000

3000

4000

5000

1v

2v

10 15 20 25 30 351

1.5

2

2.5

3

3.5

C B A

time (seconds)

Song B

0.2 0.4 0.6 0.82000

3000

4000

5000

Song C

0.2 0.4 0.6 0.82000

3000

4000

5000

ABC

time (seconds)

0 0.2 0.4 0.6 0.8 1-20

-10

0

10

20

30

40

50

1v

2v


Generative models of birdsong: sequences of sequences

SyrinxNeuronal hierarchy

Time (sec)

Freq

uenc

y (K

Hz)

sonogram

0.5 1 1.5(2) (2)2 1

(2) (2) (2) (2) (2)1 3 1 2

(2) (2) (2)81 2 33

(2) (1)(2) 2 1

(2) (1)3 2

18 18

32 2

2

x x

x x x x

x x x

x v

x v

f

g

(1)1(1)2

v

v

(1)1

(1)2

x

x

Kiebel et al

(1) (1)2 1

(1) (1) (1) (1) (1) (1)1 1 3 1 2

(1) (1) (1) (1)1 2 2 3

(1)1(1) 2

(1)23

18 18

2

2

x x

v x x x x

x x v x

sx

sx

f

g


Freq

uenc

y (H

z)

percept

1 1.52000

2500

3000

3500

4000

4500

5000

Freq

uenc

y (H

z)no structural priors

1 1.52000

2500

3000

3500

4000

4500

5000

time (seconds)

Freq

uenc

y (H

z)

no dynamical priors

0.5 1 1.52000

2500

3000

3500

4000

4500

5000

0 500 1000 1500 2000-40

-20

0

20

40

60

LFP

(micr

o-vo

lts)

LFP

0 500 1000 1500 2000-60

-40

-20

0

20

40

60

LFP

(micr

o-vo

lts)

LFP

0 500 1000 1500 2000-60

-40

-20

0

20

40

60

peristimulus time (ms)

LFP

(micr

o-vo

lts)

LFP

Simulated lesion studies: a model for false inference in psychopathology?








Taa

s

predictionFrom reflexes to action

( ( ) ( ))s s s a g a

( )g

action

( )s a

dorsal root

ventral horn

( , , )

( , , , )

s x v z

x x v a w

g

f

True dynamics

( , , )

( , , )

s g x v z

x f x v w

Generative model

From reflexes to action

a

Vs w

J

1

2

xs w

x

(1)v (1)x

(1)v

(1)v

1J

1x

2x2J

(0,0)

Jointed arm

1 2 1 2( , )J J J j j

1 2 3( , , )V v v v

Movement trajectory(2)v(1)x

Descending sensory prediction

error

visual input

proprioceptive input

Overview





PolicesControl and attractorsThe mountain-car problem

( ( , ) || ( ) ( )) ( ) ln | |xD p s z p s p z p x dx g

Energies

ln ( ( ) | )q p s a d sensory prediction error

ln ( | )p s m f sensory surprise

ln ( | )p x m surprise

( , ( | ))F s q free-energy

( ( ) || ( )) 0D q p complexity

How do policies minimise entropy?

( ( ) || ( | )) 0D q p s perceptual divergence

Path integrals

ln ( | )xH dt p x m

ln ( | )sH dt p s m

( , )A dtF s q

sensory entropy

entropy

Under ergodic assumptions

free-action

perceptionargmin F

policy (model)argminm A

actionargmina F

ln

( ) 0

f V W

V p

V W

0

2

( ) ( ) ( , | )

( ) ( ) ( ) ( )

V x c p t m d dt

c x f V x V x

x x x

Richard Bellman

Cost-functions, value and optimal control(polices that lead to sparse distal goals)

Using the Helmholtz decomposition flow (i.e., policy) can be expressed in terms of scalar and vector potentials

Where value V is proportional to negative surprise and can be defined as expected (negative) cost

This means the cost-function is defined by the equilibrium density but not vice versa; this is the problem addressed by dynamic programming and reinforcement learning.

Surprise (negative value)

-20 0 20-30

-20

-10

0

10

20

30

Cost-function

-20 0 20-30

-20

-10

0

10

20

30

Equilibrium density

-20 0 20-30

-20

-10

0

10

20

30

Flow (policy)

-20 0 20-30

-20

-10

0

10

20

30

lnV p ( )c x

: ( | )p p x m

2

2 ( )( ) |

p f pp f p p f p x m

f

0 :x

f xf c x

f cx

A

( | ) 0 : ( )

( | ) 0 :

c x m x

c x m x

A A

A

Cost-functions and attracting sets(polices with attractors)

At equilibrium we have:

( )c x

A

( )x

0

This means maxima of the equilibrium density must have negative divergence.We can exploit this to ensure maxima lie in A, using a Langevin-based policy; where cost plays the role of dissipation

Adriaan Fokker Max Planck

x

f x

f cx

( )x

( )x ( )x

( ) 0c x f

equations of motion

( )c x

0

( ) 0c x f Exploitation

exploration

Exploration and exploitation under Langevin dynamics

18( ) x

xx

a xx

f

True equations of motion

-2 -1 0 1 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

position

( , )x

( )x

heig

ht

The mountain car problem

position satiety

The cost-function

x

xxf

cxx

Policy (expected equations of motion)

( , )c x h

( )h( )x

The environment

20 40 60 80 100 120

-1

-0.5

0

0.5

prediction and error

time20 40 60 80 100 120

-1

-0.5

0

0.5

hidden states

time

-2 -1 0 1 2-2

-1

0

1

2

position

velo

city

Trajectory of one trial

-2 -1 0 1 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

position

heig

ht

leaned (after 16 trials) and true potential

( )xt

( )x t

( , )x

Learning the environment

With no cost (i.e., Hamiltonian

dynamics)

2 ( , ) 0h c x h

( , )x

20 40 60 80 100 120 -5

0

5

10

15

20

25

30conditional expectations

20 40 60 80 100 120-3

-2

-1

0

1

2

3

time

action

( )a t

( )xt

( )c t

-2 -1 0 1 2-2

-1

0

1

2

velo

city

trajectories

( )x

( )x

-2 -1 0 1 2-30

-25

-20

-15

-10

-5

0

5

position

forc

e

cost-function (priors)

( ,0)c xWith cost (i.e., exploratory

dynamics)

0h

Exploring & exploiting the environment

Using just the free-energy principle and a simple gradient ascent scheme, we have solved a benchmark problem in optimal control theory using a handful of learning trials. Note that we use reinforcement learning or dynamic programming.

Adaptive policies and trajectories

( )x

( )x

200 400 600 800 1000 1200 1400 1600-4

-2

0

2

4

6

time

action

-2

0

2

-2

0

20

2

4

6

8

position

trajectories

velocity

satie

ty

200 400 600 800 1000 1200 1400 1600-4

-2

0

2

4

6

8

10prediction and error

time200 400 600 800 1000 1200 1400 1600

-5

0

5

10

15

20

25expected hidden states

time

Self-organisation with (happiness) dynamics on cost

x

x x

f x cx

h c h

policy (expected flow)

( , ( ))c x h t

( )h( )x

18( ) x

x x

x a x

h c h

f

true flow

( )c t( )h t

( )a t

Thank you

And thanks to collaborators:

Jean DaunizeauStefan KiebelJames Kilner

Klaas Stephan

And colleagues:

Peter DayanJörn DiedrichsenPaul Verschure

Florentin Wörgötter

CDB Exploring Science and Society Seminar Thursday 19 November 2009 at 5.30pm Host: Prof Giorgio Gabella The Bayesian brain, surprise and free-energy.

Documents

CDB Exploring Science and Society Seminar Thursday 19 November 2009 at 5.30pm Host: Prof Giorgio Gabella The Bayesian brain, surprise and free-energy.