Top Banner
Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience 2 Dept of Electrical and Computer Engineering
21

Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Newton Method for theICA Mixture Model

Jason A. Palmer1 Scott Makeig1

Ken Kreutz-Delgado2 Bhaskar D. Rao2

1 Swartz Center for Computational Neuroscience2 Dept of Electrical and Computer EngineeringUniversity of California San Diego, La Jolla, CA

Page 2: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Introduction• Want to model sensor array data with multiple

independent sources — ICA

• Non-stationary source activity — mixture model• Want the adaptation to be computationally

efficient — Newton method

Page 3: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• ICA mixture model• Basic Newton method• Positive definiteness of Hessian when model

source densities are true source densities• Newton for ICA mixture model• Example applications to analysis of EEG

Outline

Page 4: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

-10 -5 0 5 10

-10

-5

0

5

10

-10 -5 0 5 10

-10

-5

0

5

10

ICA Mixture Model—toy example• 3 models in two dimensions, 500 points per

model• Newton method converges < 200 iterations,

natural gradient fails to converge, has difficulty on poorly conditioned models

Page 5: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

ICA Mixture Model• Want to model observations x(t), t = 1,…,N,

different models “active” at different times• Bayesian linear mixture model, h = 1, . . . , M :

• Conditionally linear given the model, :

• Samples are modeled as independent in time:

Page 6: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• Each source density mixture component has unknown location, scale, and shape:

• Generalizes Gaussian mixture model, more peaked, heavier tails

Source Density Mixture Model

Page 7: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

ICA Mixture Model—Invariances • The complete set of parameters to be

estimated is:

h = 1, . . ., M, i = 1, . . ., n, j = 1, . . ., m• Invariances: W row norm/source density scale

and model centers/source density locations:

Page 8: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• Transform gradient (1st derivative) of cost function using inverse Hessian (2nd derivative)

• Cost function is data log likelihood:

• Gradient:

• Natural gradient (positive definite transform):

Basic ICA Newton Method

Page 9: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• Take derivative of (i,j)th element of gradient with respect to (k,l)th element of W :

• This defines a linear transform :

• In matrix form, this is:

Newton Method – Hessian

Page 10: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• To invert: rewrite the Hessian transformation in terms of the source estimates:

• Define , , :

• Want to solve linear equation :

Newton Method – Hessian

Page 11: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Newton Method – Hessian • The Hessian transformation can be simplified

using source independence and zero mean:

• This leads to 2x2 block diagonal form:

Page 12: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• Invert Hessian transformation, evaluate at gradient:

• Leads to the following equations:

• Calculate the Newton direction:

Newton Direction

Page 13: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Positive Definiteness of Hessian• Conditions for positive

definiteness: • Always true for true when model source

densities match true densities:1)

2)

3)

Page 14: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• Similar derivation applies to ICA mixture model:

Newton for ICA Mixture Model

Page 15: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

• Convergence is really much faster than natural gradient. Works with step size 1!

• Need correct source density model

20 40 60 80 100 120 140 160 180

-2.03

-2.02

-2.01

-2

-1.99

-1.98

-1.97

Convergence Rates

log likelihood

iterationiteration

Page 16: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Segmentation of EEG experiment trials

trial trial

3 models 4 models

loglikelihood

loglikelihood

iteration iteration

time time

Page 17: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Applications to EEG—Epilepsy

time time

time

loglikelihood

loglikelihooddifferencefrom single model

1 model 5 models

Page 18: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Conclusion• We applied method of Amari, Cardoso and

Laheld, to formulate a Newton method for the ICA mixture model

• Arbitrary source densities modeled with non-gaussian source mixture model

• Non-stationarity modeled with ICA mixture model (multiple mixing matrices learned)

• It works! Newton method is substantially faster (superlinear). Also Newton can converge when Natural Gradient fails

Page 19: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Code• There is Matlab code available!!

– Generate toy mixture model data for testing– Full method implemented: mixture sources,

mixture ICA, Newton

• Extended version of paper in preparation, with derivation of mixture model Newton updates

• Download from:http://sccn.ucsd.edu/~jason

Page 20: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Acknowledgements

• Thanks to Scott Makeig, Howard Poizner, Julie Onton, Ruey-Song Hwang, Rey Ramirez, Diane Whitmer, and Allen Gruber for collecting and consulting on EEG data

• Thanks to Jerry Swartz for founding and providing ongoing support the Swartz Center for Computational Neuroscience

• Thanks for your attention!

Page 21: Newton Method for the ICA Mixture Model Jason A. Palmer 1 Scott Makeig 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for Computational Neuroscience.

Newton for ICA Mixture Model