Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian Structure 2.Invariant Geometry on Manifold of Probability Distributions 3.Geometry and Statistical Inference semi‐parametrics 4. Applications to Machine Learning and Signal Processing
64
Embed
Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Information Geometry and Its ApplicationsShun‐ichi Amari RIKEN Brain Science Institute
1.Divergence Function and Dually Flat Riemannian Structure2.Invariant Geometry on Manifold of Probability Distributions3.Geometry and Statistical Inference
semi‐parametrics4. Applications to Machine Learning and Signal Processing
Information Geometry
-- Manifolds of Probability Distributions
{ ( )}M p x
Information GeometryInformation Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics PhysicsInformation Sciences
Riemannian ManifoldDual Affine Connections
Manifold of Probability Distributions
Math. AIVision
Optimization
2
2
1; , ; , exp22
xS p x p x
Information Geometry ?Information Geometry ?
p x
;S p x θ
Gaussian distributions
( , ) θ
Manifold of Probability DistributionsManifold of Probability Distributions
1 2 3 1 2 3
1, 2,3 ={ ( )}, , 1
nx S p xp p p p p p
3p
2p1p
p
;M p x
Manifold and Coordinate System
coordinate transformation
Examples of Coordinate systems
Euclidean space
Gaussian distributions
2
2
1; , ; , exp22
xS p x p x
Discrete Distributions
Positive measures
Divergence: :D z y
: 0
: 0, iff
: ij i j
D
D
D d g dz dz
z y
z y z y
z z z
positive‐definite
Y
Z
M
Not necessarily symmetricD[z : y] = D[y : z]
Taylor expansion
Various Divergences
Euclidean
f‐divergence
KL‐divergence
(α‐β)‐divergence
Kullback‐Leibler Divergencequasi‐distance
( )[ ( ) : ( )] ( ) log( )
[ ( ) : ( )] 0 =0 iff ( ) ( )[ : ] [ : ]
x
p xD p x q x p xq x
D p x q x p x q xD p q D q p
( , ) divergence
, [ : ] { }i i i iD p q p q p q
: divergence1: -divergence
Manifold with Convex Function
S : coordinates 1 2, , , n
: convex function
negative entropy logp p x p x dx energy
212
i
mathematical programming, control systemsphysics, engineering, vision, economics
Riemannian metric and flatness (affine structure)
Bregman divergence , grad D
1,2
i jijD d g d d
, ij i j i ig
: geodesic (not Levi-Civita)Flatness (affine)
{ , ( ), }S
Legendre Transformation
, i i i i
one-to-one
0ii
,i i i
i
,D
( ) max { ( )}ii
Two affine coordinate systems ,
: geodesic (e-geodesic)
: dual geodesic (m-geodesic)
“dually orthogonal”,
,
j ji i
ii i
i
*, , ,X XX Y Z Y Z Y Z
Bi‐orthogonality
Dually flat manifold
2 2
exponen
-coordin
tial fam
ates -coordinatespotential functions ,
0
, exp
: cumulant generating function: negative entropy
canon
i
ical d v
:
i
ly
ijij
i j i j
i i
i i
g g
p x x
ergence D(P: P')= ' 'i i
Exponential Family
( , ) exp{ ( )}p x x
Gaussian:
Negative entropy
natural parameterexpectation parameter
( ) : convex function, free-energy
x : discrete X = {0, 1, …, n}
0 1
0 0
{ ( ) | }:
( ) ( ) exp[ ( )] exp
log( / ); ( ); ( ) log
[ ] ( )= log
n
n ni
i i ii i
ii i i
i i i i i
S p x x X
p x p x x
p p x x p
E x p p p
exponential family
η
x
Two geodesics
Tangent directions
Function space of probability distributions: topology{p(x)}
Exponential Family
Pythagorean Theorem (dually flat manifold)
: : :D P Q D Q R D P R
Euclidean space: self-dual
212 i
Projection Theorem
p
sq
arg min [ : ]s Mq D p s
arg min [ : ]s Mq D s p
m-geodesic
e-geodesic
M
S
Projection Theorem
min :Q M
D P Q
Q = m-geodesic projection of P to M
min :Q M
D Q P
Q’ = e-geodesic projection of P to M
Convex function – Bregman divergence– Dually flat Riemannian divergence
Dually flat R‐manifold – convex function – canonical divergenceKL‐divergence
Exponential family – Bregman divergenceBanerjee et al
InvarianceInvariance ,S p x
Invariant under different representation
, ,y y x p y 2
1 2
21 2
, ,
| ( , ) ( , ) |
p x p x dx
p y p y dy
Invariant divergence (manifold of probability distributions; )
: sufficient statisticsy k x
: :X X Y YD p x q x D p y q y
{ ( , )}S p x
ChentsovAmari ‐Nagaoka
Invariance ‐‐‐ characterization of f‐divergence
:ip
:p
1 n
1 2 m
ii A
p p
( )Ap p
Csiszar
: :
: :
A A
A A
D D
D D
p q p q
p q p q
; i ip c q i A
:p
:q
Invariance ⇒ f‐divergence
Csiszar f‐divergence
: ,if i
i
qD p fp
p q
: convex, 1 0,f u f
: :cf fD cDp q p q
1f u f u c u
' ''1 1 0 ; 1 1f f f 1
( )f u
u
Ali‐SilveyMorimoto
Theorem
An invariant separable divergence belongs to the class of f‐divergence.