CDB Exploring Science and Society SeminarThursday 19 November 2009 at 5.30pm
Host: Prof Giorgio Gabella
The Bayesian brain, surprise and free-energy
Abstract
Value-learning and perceptual learning have been an important focus over the past decade, attracting the concerted attention of experimental psychologists, neurobiologists and the machine learning community. Despite some formal connections; e.g., the role of prediction error in optimizing some function of sensory states, both fields have developed their own rhetoric and postulates. In work, we show that perceptual learning is, literally, an integral part of value learning; in the sense that perception is necessary to integrate out dependencies on the inferred causes of sensory information. This enables the value of sensory trajectories to be optimized through action. Furthermore, we show that acting to optimize value and perception are two aspects of exactly the same principle; namely the minimization of a quantity (free energy) that bounds the probability of sensory input, given a particular agent or phenotype. This principle can be derived, in a straightforward way, from the very existence of agents, by considering the probabilistic behavior of an ensemble of agents belonging to the same class.
“Objects are always imagined as being present in the field of vision as would have to be there in order to produce the same impression on the nervous mechanism” - Hermann Ludwig Ferdinand von Helmholtz
Thomas Bayes
Geoffrey Hinton
Richard FeynmanFrom the Helmholtz machine and the Bayesian Brain
toAction and self-organization
Hermann Haken
Overview
Ensemble dynamicsEntropy and equilibriaFree-energy and surprise
The free-energy principleAction and perceptionGenerative models
PerceptionBirdsong and categorizationSimulated lesions
ActionActive inferenceReaching
PoliciesControl and attractorsThe mountain-car problem
Particle density contours showing Kelvin-Helmholtz instability, forming beautiful breaking waves. In the self-sustained state of Kelvin-Helmholtz turbulence the particles are transported away from the mid-plane at the same rate as they fall, but the particle density is nevertheless very clumpy because of a clumping instability that is caused by the dependence of the particle velocity on the local solids-to-gas ratio (Johansen, Henning, & Klahr 2006)
1
2
temperature
pH
falling
transport
( | )p m( | )p m
Self-organization that minimises an ensemble density
to ensure a limited repertoire of states are occupied (i.e., ensuring states have a random attracting set).
( | ) ln ( | )H p m p m d
( )A
How can an active agent minimise its equilibrium entropy? This entropy is bounded by the entropy of sensory signals (under simplifying assumptions)
Crucially, because the density on sensory signals is at equilibrium, it can be interpreted as the proportion of time each agent entertains them (the sojourn time). This ergodic argument means that entropy is the path integral of surprise experienced by a particular agent:
This means agents minimise surprise at all times. But there is one small problem… Agents cannot access surprise; however, they can evaluate a free-energy bound on surprise, which is induced with a recognition density q :
0
1( | ) ln ( | ) lim ln ( ( ) | )
T
sT
H p s m p s m ds p s t m dtT
( )
( ) ln | |s
s z
H H p d
g
g
( ( ), ( )) ln ( ( ) | )F s t q p s t m
Overview
Ensemble dynamicsEntropy and equilibriaFree-energy and surprise
The free-energy principleAction and perceptionGenerative models
PerceptionBirdsongSimulated lesions
ActionActive inference
Reaching
PolicesControl and attractors
The mountain-car problem
Action
( , )s x z g
argmin ( , )a
a F s External states in the world
Internal states of the agent (m)
Sensations
The free-energy principle
argmin ( , )F s
( , , )x x a w f
Action to minimise a bound on surprise Perception to optimise the bound
ln ( , | ) ln ( )q q
F Energy Entropy p s m q
( || ( )) ln ( ( ) | , )
argmax
q
a
F D q p p s a m
Complexity Accuracy
a Accuracy
( ( | ) || ( | )) ln ( | )
argmin
F D q p s p s m
Divergence Surprise
Divergence
{ ( ), , }x t
The free-energy rests on expected Gibb’s energy
and can be evaluated, given a generative model comprising a likelihood and prior:
So what models might the brain use?
( , ) lnq q
F s q Energy Entropy U q
( , ) ln ( , | ) ln ( | , ) ln ( | )U s p s m p s m p m
The generative model
{ ( ), , }x t
Processing hierarchy
Backward(nonlinear)
Forward(linear)
lateralEnsemble dynamics
Entropy and equilibriaFree-energy and surprise
The free-energy principleAction and perceptionGenerative models
PerceptionBirdsongSimulated lesions
ActionActive inference
Reaching
PolicesControl and attractors
The mountain-car problem
)1(~x )1(
s
)2((2)
(1)
)2(~x
)2(~v
)1(~v
x~
s
v~
( | , , , ) ( , ( ))sp s v x N g
( | , , , ) ( , ( ))xp x x v N f
( ) ( , )p v N
( , , )
( , , )
s g zs g x v z
x f x v w Dx f w
Hierarchical (deep) dynamic models
( 1) ( ) ( )( 1) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( , , )
( , , )
i i ii i i i i
i i i i i i i i
v g zv g x v z
x f x v w Dx f w
{ ( ), ( ), , }x t v t
1( ) ( ) ( ) ( ) ( ) ( 1) ( 1)
1
( ) ( ) ( ) ( )
( ) ( 1) ( 1) ( 1) ( )
, , | | , ,
, ( ) ( ) ( | ) ( | , )
| ( , )
| , ( , )
mm i i i i i i
i
i i i i x
i i i i i v
p s x v m p s x v p x v
p x v p v p x p Dx v p v x v
p Dx v N f
p v x v N g
Structural priors
Dynamical priors
Likelihood and empirical priors
(1) (1) (1)
(1) (1) (1) (1)
( 1) ( ) ( ) ( )
( ) ( ) ( ) ( )
( , )
( , )
( , )
( , )
i i i i
i i i i
s g x v z
x f x v w
v g x v z
x f x v w
(1)
(1)
( )
( )
v
m
m
v
x
s g
v
g
v
v g
Dx f
Hierarchal form
1 12 2
( , ) ln , , |
ln T
U s p s x v m
Gibb’s energy: a simple function of prediction error
Prediction errors
,x v
Synaptic gain
Synaptic activity Synaptic efficacy
Activity-dependent plasticity
Functional specialization
Attentional gain
Enabling of plasticity
Attention and salience
F
Perception and inference Learning and memory
The recognition density and its sufficient statistics
F
xx
vv
F
F
Mean-field approximation: ( ) ( ) ( , ) ( ) ( )
( ) ( , )
i
i
i i i
q q q x v q q
q N
Laplace approximation:
Backward predictions
Forward prediction error
( )i x
( )i x
( )i v
( 1)i v
( )s t
( )i v( 1)i x
( 1)i x
( 1)i v
( 2)i v
Perception and message-passing
( ) ( ) ( ) ( ) ( 1)
( ) ( ) ( ) ( )
i v i v i T i i vv
i x i x i T ix
D
D
12 ( ( ( )))T
i itr R ij
Tij
Synaptic plasticity
( ) ( ) ( ) ( ) ( 1) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ( ))
( ( ))
i v i v i v i v i v i
i x i x i x i x i x i
g
D f
Synaptic gain
Overview
Ensemble dynamicsEntropy and equilibriaFree-energy and surprise
The free-energy principleAction and perceptionGenerative models
PerceptionBirdsong and categorizationSimulated lesions
ActionActive inference
Reaching
PolicesControl and attractors
The mountain-car problem
Synthetic song-birds
SyrinxVocal centre
1
2
vv
v
Time (sec)
Freq
uenc
y
Sonogram
0.5 1 1.5
2 1
1 1 3 1 2
1 2 2 3
18 18
( , ) 2
2
x x
x x v v x x x x
x x v x
f
x
x
v( )s t
v
10 20 30 40 50 60-5
0
5
10
15
20
prediction and error
time
10 20 30 40 50 60-5
0
5
10
15
20hidden states
time
Backward predictions
Forward prediction error
10 20 30 40 50 60-10
-5
0
5
10
15
20
Causal states
time (bins)
2 1
1 1 3 1 2
1 2 2 3
18 18
( ) 2
2
x x
f x v x x x x
x x v x
Recognition and message passing
stimulus
0.2 0.4 0.6 0.82000
2500
3000
3500
4000
4500
5000
time (seconds)
Perceptual categorization
Freq
uenc
y (H
z) Song A
0.2 0.4 0.6 0.82000
3000
4000
5000
1v
2v
10 15 20 25 30 351
1.5
2
2.5
3
3.5
C B A
time (seconds)
Song B
0.2 0.4 0.6 0.82000
3000
4000
5000
Song C
0.2 0.4 0.6 0.82000
3000
4000
5000
ABC
time (seconds)
0 0.2 0.4 0.6 0.8 1-20
-10
0
10
20
30
40
50
1v
2v
Generative models of birdsong: sequences of sequences
SyrinxNeuronal hierarchy
Time (sec)
Freq
uenc
y (K
Hz)
sonogram
0.5 1 1.5(2) (2)2 1
(2) (2) (2) (2) (2)1 3 1 2
(2) (2) (2)81 2 33
(2) (1)(2) 2 1
(2) (1)3 2
18 18
32 2
2
x x
x x x x
x x x
x v
x v
f
g
(1)1(1)2
v
v
(1)1
(1)2
x
x
Kiebel et al
(1) (1)2 1
(1) (1) (1) (1) (1) (1)1 1 3 1 2
(1) (1) (1) (1)1 2 2 3
(1)1(1) 2
(1)23
18 18
2
2
x x
v x x x x
x x v x
sx
sx
f
g
Freq
uenc
y (H
z)
percept
1 1.52000
2500
3000
3500
4000
4500
5000
Freq
uenc
y (H
z)no structural priors
1 1.52000
2500
3000
3500
4000
4500
5000
time (seconds)
Freq
uenc
y (H
z)
no dynamical priors
0.5 1 1.52000
2500
3000
3500
4000
4500
5000
0 500 1000 1500 2000-40
-20
0
20
40
60
LFP
(micr
o-vo
lts)
LFP
0 500 1000 1500 2000-60
-40
-20
0
20
40
60
LFP
(micr
o-vo
lts)
LFP
0 500 1000 1500 2000-60
-40
-20
0
20
40
60
peristimulus time (ms)
LFP
(micr
o-vo
lts)
LFP
Simulated lesion studies: a model for false inference in psychopathology?
Ensemble dynamicsEntropy and equilibriaFree-energy and surprise
The free-energy principleAction and perceptionGenerative models
PerceptionBirdsongSimulated lesions
ActionActive inferenceReaching
PolicesControl and attractors
The mountain-car problem
Taa
s
predictionFrom reflexes to action
( ( ) ( ))s s s a g a
( )g
action
( )s a
dorsal root
ventral horn
( , , )
( , , , )
s x v z
x x v a w
g
f
True dynamics
( , , )
( , , )
s g x v z
x f x v w
Generative model
From reflexes to action
a
Vs w
J
1
2
xs w
x
(1)v (1)x
(1)v
(1)v
1J
1x
2x2J
(0,0)
Jointed arm
1 2 1 2( , )J J J j j
1 2 3( , , )V v v v
Movement trajectory(2)v(1)x
Descending sensory prediction
error
visual input
proprioceptive input
Overview
Ensemble dynamicsEntropy and equilibriaFree-energy and surprise
The free-energy principleAction and perceptionGenerative models
PerceptionBirdsongSimulated lesions
ActionActive inferenceReaching
PolicesControl and attractorsThe mountain-car problem
( ( , ) || ( ) ( )) ( ) ln | |xD p s z p s p z p x dx g
Energies
ln ( ( ) | )q p s a d sensory prediction error
ln ( | )p s m f sensory surprise
ln ( | )p x m surprise
( , ( | ))F s q free-energy
( ( ) || ( )) 0D q p complexity
How do policies minimise entropy?
( ( ) || ( | )) 0D q p s perceptual divergence
Path integrals
ln ( | )xH dt p x m
ln ( | )sH dt p s m
( , )A dtF s q
sensory entropy
entropy
Under ergodic assumptions
free-action
perceptionargmin F
policy (model)argminm A
actionargmina F
ln
( ) 0
f V W
V p
V W
0
2
( ) ( ) ( , | )
( ) ( ) ( ) ( )
V x c p t m d dt
c x f V x V x
x x x
Richard Bellman
Cost-functions, value and optimal control(polices that lead to sparse distal goals)
Using the Helmholtz decomposition flow (i.e., policy) can be expressed in terms of scalar and vector potentials
Where value V is proportional to negative surprise and can be defined as expected (negative) cost
This means the cost-function is defined by the equilibrium density but not vice versa; this is the problem addressed by dynamic programming and reinforcement learning.
Surprise (negative value)
-20 0 20-30
-20
-10
0
10
20
30
Cost-function
-20 0 20-30
-20
-10
0
10
20
30
Equilibrium density
-20 0 20-30
-20
-10
0
10
20
30
Flow (policy)
-20 0 20-30
-20
-10
0
10
20
30
lnV p ( )c x
: ( | )p p x m
2
2 ( )( ) |
p f pp f p p f p x m
f
0 :x
f xf c x
f cx
A
( | ) 0 : ( )
( | ) 0 :
c x m x
c x m x
A A
A
Cost-functions and attracting sets(polices with attractors)
At equilibrium we have:
( )c x
A
( )x
0
This means maxima of the equilibrium density must have negative divergence.We can exploit this to ensure maxima lie in A, using a Langevin-based policy; where cost plays the role of dissipation
Adriaan Fokker Max Planck
x
f x
f cx
( )x
( )x ( )x
( ) 0c x f
equations of motion
( )c x
0
( ) 0c x f Exploitation
exploration
Exploration and exploitation under Langevin dynamics
18( ) x
xx
a xx
f
True equations of motion
-2 -1 0 1 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
position
( , )x
( )x
heig
ht
The mountain car problem
position satiety
The cost-function
x
xxf
cxx
Policy (expected equations of motion)
( , )c x h
( )h( )x
The environment
20 40 60 80 100 120
-1
-0.5
0
0.5
prediction and error
time20 40 60 80 100 120
-1
-0.5
0
0.5
hidden states
time
-2 -1 0 1 2-2
-1
0
1
2
position
velo
city
Trajectory of one trial
-2 -1 0 1 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
position
heig
ht
leaned (after 16 trials) and true potential
( )xt
( )x t
( , )x
Learning the environment
With no cost (i.e., Hamiltonian
dynamics)
2 ( , ) 0h c x h
( , )x
20 40 60 80 100 120 -5
0
5
10
15
20
25
30conditional expectations
20 40 60 80 100 120-3
-2
-1
0
1
2
3
time
action
( )a t
( )xt
( )c t
-2 -1 0 1 2-2
-1
0
1
2
velo
city
trajectories
( )x
( )x
-2 -1 0 1 2-30
-25
-20
-15
-10
-5
0
5
position
forc
e
cost-function (priors)
( ,0)c xWith cost (i.e., exploratory
dynamics)
0h
Exploring & exploiting the environment
Using just the free-energy principle and a simple gradient ascent scheme, we have solved a benchmark problem in optimal control theory using a handful of learning trials. Note that we use reinforcement learning or dynamic programming.
Adaptive policies and trajectories
( )x
( )x
200 400 600 800 1000 1200 1400 1600-4
-2
0
2
4
6
time
action
-2
0
2
-2
0
20
2
4
6
8
position
trajectories
velocity
satie
ty
200 400 600 800 1000 1200 1400 1600-4
-2
0
2
4
6
8
10prediction and error
time200 400 600 800 1000 1200 1400 1600
-5
0
5
10
15
20
25expected hidden states
time
Self-organisation with (happiness) dynamics on cost
x
x x
f x cx
h c h
policy (expected flow)
( , ( ))c x h t
( )h( )x
18( ) x
x x
x a x
h c h
f
true flow
( )c t( )h t
( )a t
Thank you
And thanks to collaborators:
Jean DaunizeauStefan KiebelJames Kilner
Klaas Stephan
And colleagues:
Peter DayanJörn DiedrichsenPaul Verschure
Florentin Wörgötter