Top Banner
Information Geometry and Its Applications Shunichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian Structure 2.Invariant Geometry on Manifold of Probability Distributions 3.Geometry and Statistical Inference semiparametrics 4. Applications to Machine Learning and Signal Processing
64

Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Jun 16, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Information Geometry and Its ApplicationsShun‐ichi Amari    RIKEN Brain Science Institute

1.Divergence Function and Dually Flat Riemannian Structure2.Invariant Geometry on Manifold of Probability Distributions3.Geometry and Statistical Inference

semi‐parametrics4.   Applications to Machine Learning and Signal Processing

Page 2: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Information Geometry

-- Manifolds of Probability Distributions

{ ( )}M p x

Page 3: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Information GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AIVision

Optimization

Page 4: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

2

2

1; , ; , exp22

xS p x p x

Information Geometry ?Information Geometry ?

p x

;S p x θ

Gaussian distributions

( , ) θ

Page 5: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Manifold of Probability DistributionsManifold of Probability Distributions

1 2 3 1 2 3

1, 2,3 ={ ( )}, , 1

nx S p xp p p p p p

3p

2p1p

p

;M p x

Page 6: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Manifold and Coordinate System

coordinate transformation

Page 7: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Examples of Coordinate systems

Euclidean space

Page 8: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Gaussian distributions

2

2

1; , ; , exp22

xS p x p x

Page 9: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Discrete Distributions

Positive measures

Page 10: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Divergence: :D z y

: 0

: 0, iff

: ij i j

D

D

D d g dz dz

z y

z y z y

z z z

positive‐definite

Y

Z

M

Not necessarily symmetricD[z : y] = D[y : z]

Taylor expansion

Page 11: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Various Divergences

Euclidean

f‐divergence

KL‐divergence

(α‐β)‐divergence

Page 12: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Kullback‐Leibler Divergencequasi‐distance

( )[ ( ) : ( )] ( ) log( )

[ ( ) : ( )] 0 =0 iff ( ) ( )[ : ] [ : ]

x

p xD p x q x p xq x

D p x q x p x q xD p q D q p

Page 13: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

( , ) divergence

, [ : ] { }i i i iD p q p q p q

: divergence1: -divergence

Page 14: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Manifold with Convex Function

S : coordinates 1 2, , , n

: convex function

negative entropy logp p x p x dx energy

212

i

mathematical programming, control systemsphysics, engineering, vision, economics

Page 15: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Riemannian metric and flatness (affine structure)

Bregman divergence , grad D

1,2

i jijD d g d d

, ij i j i ig

: geodesic (not Levi-Civita)Flatness (affine)

{ , ( ), }S

Page 16: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Legendre Transformation

, i i i i

one-to-one

0ii

,i i i

i

,D

( ) max { ( )}ii

Page 17: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Two affine coordinate systems ,

: geodesic (e-geodesic)

: dual geodesic (m-geodesic)

“dually orthogonal”,

,

j ji i

ii i

i

*, , ,X XX Y Z Y Z Y Z

Page 18: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Bi‐orthogonality

Page 19: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Dually flat manifold

2 2

exponen

-coordin

tial fam

ates -coordinatespotential functions ,

0

, exp

: cumulant generating function: negative entropy

canon

i

ical d v

:

i

ly

ijij

i j i j

i i

i i

g g

p x x

ergence D(P: P')= ' 'i i

Page 20: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Exponential Family 

( , ) exp{ ( )}p x x

Gaussian:

Negative entropy

natural parameterexpectation parameter

( ) : convex function, free-energy

Page 21: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

x : discrete X = {0, 1, …, n}

0 1

0 0

{ ( ) | }:

( ) ( ) exp[ ( )] exp

log( / ); ( ); ( ) log

[ ] ( )= log

n

n ni

i i ii i

ii i i

i i i i i

S p x x X

p x p x x

p p x x p

E x p p p

exponential family

η

x

Page 22: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Two geodesics

Tangent directions

Page 23: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Function space of probability distributions:  topology{p(x)}

Page 24: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Exponential Family

Page 25: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Pythagorean Theorem (dually flat manifold)

: : :D P Q D Q R D P R

Euclidean space: self-dual

212 i

Page 26: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Projection Theorem

p

sq

arg min [ : ]s Mq D p s

arg min [ : ]s Mq D s p

m-geodesic

e-geodesic

M

S

Page 27: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Projection Theorem

min :Q M

D P Q

Q = m-geodesic projection of P to M

min :Q M

D Q P

Q’ = e-geodesic projection of P to M

Page 28: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Convex function – Bregman divergence– Dually flat Riemannian divergence

Dually flat R‐manifold – convex function – canonical divergenceKL‐divergence

Page 29: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Exponential family – Bregman divergenceBanerjee et al

Page 30: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

InvarianceInvariance ,S p x

Invariant under different representation

, ,y y x p y 2

1 2

21 2

, ,

| ( , ) ( , ) |

p x p x dx

p y p y dy

Page 31: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Invariant divergence (manifold of probability distributions;                 ) 

: sufficient statisticsy k x

: :X X Y YD p x q x D p y q y

{ ( , )}S p x

ChentsovAmari ‐Nagaoka

Page 32: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Invariance ‐‐‐ characterization of f‐divergence

:ip

:p

1 n

1 2 m

ii A

p p

( )Ap p

Csiszar

Page 33: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

: :

: :

A A

A A

D D

D D

p q p q

p q p q

; i ip c q i A

:p

:q

Invariance ⇒ f‐divergence

Page 34: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Csiszar f‐divergence

: ,if i

i

qD p fp

p q

: convex, 1 0,f u f

: :cf fD cDp q p q

1f u f u c u

' ''1 1 0 ; 1 1f f f 1

( )f u

u

Ali‐SilveyMorimoto

Page 35: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Theorem

An invariant separable divergence belongs to the class of f‐divergence.

Separable divergence: [ : ] ( , )

( , ) ( )

i i

ii i i

i

D k p q

qk p q p fp

p q

Page 36: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

dually flat space

convex functionsBregman

divergence

invariance

invariant divergence Flat divergence

KL‐divergenceF‐divergenceFisher inf metricAlpha connection

: space of probability distributions}{pS

logp(x)D[p : q] = p(x) { }dxq(x)

Page 37: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

‐Divergence:  why? flat & invariant in

12

2

4 2( ) {1 } (1 ), 11 1

f u u u

KL-divergence( ) log ( 1)

[ : ] { log }ii i i

i

f u u u upD p q p p qq

1nS

Page 38: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

, 0 : ( 1 holds)i iS p p nn p

Space of positive measures :  vectors, matrices, arrays

f‐divergence

α‐divergence

Bregman divergence

Page 39: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

: 0if i

i

qD p fp

p q

: 0fD p q p q

not invariant under 1f u f u c u

divergence of f S

Page 40: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

divergence

1 12 21 1[ : ] { }

2 2i i i iD p q p q p q

[ : ] { log }ii i i

i

pD p q p p qq

KL‐divergence

Page 41: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

: dually flat: not dually flat (except 1)

SS

21

1

1

i

i

p

r

Page 42: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Metric and Connections Induced by Divergence(Eguchi)

'

' '

1: : : = (z - y )(z - y ) 2

:

:

ij i j ij i i j j

ijk i j k

ijk i j k

g D D g

D

D

y z

y z

y z

z z y z y z

z z y

z z y

*

'

{ , }

, i ii iz y

Riemannian metric

affine connections

Page 43: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Invariant geometrical structurealpha‐geometry(derived from invariant divergence)

,S p x

ij i j

ijk i j k

g E l l

T E l l l

log , ; i il p x

‐connection

, ;ijk ijki j k T

: dually coupled

, , ,X XX Y Z Y Z Y Z

α

Fisher information

Levi‐civita: 

Page 44: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Duality:

, ,

k ij kij kji

ijk ijk ijk

g

T

M g T

*, , ,X XX Y Z Y Z Y Z

Page 45: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Riemannian Structure

2 ( )

( )

( ) ( )

Euclidean

i jij

T

ij

ds g d d

d G d

G g

G E

Fisher information

Page 46: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

AffineConnection

covariant derivative,

0, X=X(t)

(

-

)

X c

X

i jij

Y X Y

X

s g d d

minimal dista

ge

nce non

odesi

me

c

tric

straight line

Page 47: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

DualityDuality

, , , i jijX Y X Y X Y g X Y

Riemannian geometry:

X

Y

X

Y

**, , ,X XX Y Z Y Z Y Z

Page 48: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Dual Affine Connections

e‐geodesic

m‐geodesic

log , log 1 logr x t t p x t q x c t

, 1r x t tp x t q x

,

q x

p x

*( , )

Page 49: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Mathematical structure of ,S p x

ij i j

ijk i j k

g E l l

T E l l l

log , ; i il p x

-connection

, ;ijk ijki j k T

: dually coupled

, , ,X XX Y Z Y Z Y Z

{M,g,T}

Page 50: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

α‐geometry

Page 51: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian
Page 52: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Dual Foliations

k‐cut

Page 53: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian
Page 54: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

00110001011010100100110100

0101101001010

firing rates:correlation—covariance?

1x2x

3x

00 01 10 11{ , , , }p p p p

1 2 12, ;r r r

Two neurons:

Page 55: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Correlations of Neural FiringCorrelations of Neural Firing

1 2

00 10 01 11

1 1 10 11

2 1 01 11

,

, , ,

p x x

p p p pr p p pr p p p

11 00

10 01

log p pp p

1x 2x 2

1

1 2{( , ), }r r

orthogonal coordinates

firing ratescorrelations

Page 56: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

1 2{ ( , )}S p x x1 2, 0,1x x

1 2{ ( ) ( )}M q x q x

Independent Distributions

Page 57: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

two neuron case

1 2 12 1 2 12

12 12 1 200 1112

01 10 1 12 2 12

12 1 2

12 1 2

, , ; , ,1

log log

, ,

, ,

r r rr r r rp p

p p r r r r

r f r r

r t f r t r t

Page 58: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Decomposition of KL-divergence

D[p:r] = D[p:q]+D[q:r]

p,q: same marginals

r,q: same correlations

1 2,

p

qr

independent

correlations

( )[ : ] ( ) log( )x

p xD p r p xq x

Page 59: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

pairwise correlations

ij ij i jc r r r

independent distributions

, ,ij i j ijk i j kr r r r rr r

How to generate correlated spikes?(Niebur, Neural Computation [2007])

higher-order correlations

covariance: not orthogonal

Page 60: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Orthogonal higher‐order correlations

1

1

;

;

,

,

,

,

i i n

i n

j

i j rr r

r

Page 61: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Neurons

1x nx

1i ix u

Gaussian [ ]i i ju E u u

2x

Population and Synfire

Page 62: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Synfiring

1( ) ( ,..., )

1n

i

p p x x

r x q rn

x

( )q r

r

Page 63: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Input‐output AnalysisGross product consumptionRelations among industires(K. Tsuda and R. Morioka)

Page 64: Information Geometry and Its Applications · Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute 1.Divergence Function and Dually Flat Riemannian

Mathematical Problems

M         submanifold of S ?Hong van Le

{M, g}        {M, g, T}  dually flat   J. Armstrong

Affine differential geometryHessian manifoldAlmost complex structure