Top Banner
Introduction to Independent Component Analysis Barnabás Póczos University of Alberta Nov 26, 2009
71

Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Dec 20, 2018

Download

Documents

vuongnhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Introduction to Independent Component

Analysis

Barnabás Póczos

University of Alberta

Nov 26, 2009

Page 2: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

2

Contents

• Independent Component Analysis– ICA model– ICA applications– ICA generalizations– ICA theory

• Independent Subspace Analysis– ISA model– ISA theory– ISA results

Page 3: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

3

Independent Component AnalysisGoal:

Page 4: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

4

Independent Component Analysis

Observations (Mixtures)

original signals

Model

ICA estimated signals

Page 5: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

5

Independent Component Analysis

We observe

Model

We want

Goal:

Page 6: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

6

ICA vs PCA, Similarities

• Perform linear transformations

• Matrix factorization

X U S

X A S

PCA: low rank matrix factorization for compression

ICA: full rank matrix factorization to remove dependency between the rows

=

=

N

N

N

M

M<N

Page 7: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

7

ICA vs PCA, Differences• PCA: X=US, UTU=I• ICA: X=AS

• PCA does compression – M<N

• ICA does not do compression – same # of features (M=N)

• PCA just removes correlations, not higher order dependence

• ICA removes correlations, and higher order dependence

• PCA: some components are more important than others (based on eigenvalues)

• ICA: components are equally important

Page 8: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

8

ICA vs PCA

Note• PCA vectors are orthogonal • ICA vectors are not orthogonal

Page 9: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

9

PCA vs ICA

Page 10: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

10

PCA EstimationSources Observation

x(t) = As(t)s(t)

Mixing

The Cocktail Party ProblemSOLVING WITH PCA

y(t)=Wx(t)

Page 11: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

11

ICA EstimationSources Observation

x(t) = As(t)s(t)

Mixing

The Cocktail Party Problem SOLVING WITH ICA

y(t)=Wx(t)

Page 12: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

12

Some ICA ApplicationsSTATIC• Image denoising

• Microarray data processing

• Decomposing the spectra of galaxies

• Face recognition

• Facial expression recognition

• Feature extraction

• Clustering

• Classification

TEMPORAL

•Medical signal processing – fMRI, ECG, EEG

•Brain Computer Interfaces

•Modeling of the hippocampus, place cells

•Modeling of the visual cortex

•Time series analysis

•Financial applications

•Blind deconvolution

Page 13: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

13

ICA Application,Removing Artifacts from EEG

• EEG ~ Neural cocktail party• Severe contamination of EEG activity by

– eye movements – blinks– muscle– heart, ECG artifact– vessel pulse – electrode noise– line noise, alternating current (60 Hz)

• ICA can improve signal – effectively detect, separate and remove activity in EEG

records from a wide variety of artifactual sources. (Jung, Makeig, Bell, and Sejnowski)

• ICA weights help find location of sources

Page 14: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

14Fig from Jung

Page 15: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

15Fig from Jung

Page 16: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

16Fig from Jung

Page 17: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

17

PCA+ICA for Microarray data processing

XT = =* S

A

XT ∈ RM x N

M = number of experimentsN = number of genes

sk

ak

Assumption: • each experiment is a mixture of independent expression modes (s1,...sK). • some of these modes (e.g. sk) can be related to the difference between the classes.

• → ak correlates with the class labels

labels

PCA alone can estimate US only ) doesn’t work

Page 18: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

18

ICA for Microarray data processing (Schachtner et al, ICA07)

Breast Cancer Data set

9th column of A:

Class 1, weak metastasis

Class 2, strong metastasis

M=14 patientsN=22283 genes2 classes

|Corr(a9, d)|=0.89, whered is the vector of class labels

Page 19: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

19

ICA for Microarray data processing (Schachtner et al, ICA07)

Leukemia Data set

ALL-B AML

M=38 PatientsN=5000 genes3 classes: ALL-B, ALL-T, AML

ALL-T ALL-B AMLALL-T

Page 20: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

20

ICA for Image Denoising(Hoyer, Hyvarinen)

original noisy Wiener filtered

median filtered

ICA denoised

Page 21: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

21

ICA for Motion Style Components(Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003)

• Method for analysis and synthesis of human motion from motion captured data

• Provides perceptually meaningful components

• 109 markers, 327 parameters ) 6 independent components (emotion, content,…)

Page 22: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

22

Page 23: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

23

walk sneaky

walk with sneaky sneaky with walk

Page 24: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

24

ICA basis vectors extracted from natural images

Gabor wavelets, edge detection, receptive fields of V1 cells...

Page 25: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

25

PCA basis vectors extracted from natural images

Page 26: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Using ICA for classificationActivity distributions of

– within-category test images are much narrower

– off-category is closer to the Gaussian distribution

Test data

Trai

n da

ta

Happy

Disgust

ICA basis [Happy]

ICA basis [Disgust]

Page 27: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

27

ICA Generalizations• Independent Subspace Analysis

• Multilinear ICA

• Blind Source Deconvolution

• Blind SubSpace Deconvolution

• Nonnegative ICA

• Sparse Component Analysis

• Slow Component Analysis

• Noisy ICA

• Undercomplete, Overcomplete ICA

• Varying mixing matrix

• Online ICA

• (Post) Nonlinear ICA

x=f(s)

The Holy Grail

Page 28: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

ICA Theory

Page 29: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

29

Basic terms, definitions

• uncorrelated and independent variables

• entropy, joint entropy, neg_entropy

• mutual information

• Kullback-Leibler divergence

Page 30: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

30

Statistical (in)dependence

Proof: Homework

Definition:

Lemma:

Definition:

Page 31: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

31

Definition:Correlation

Lemma:

Proof: Homework

Lemma:

Proof: Homework

Lemma:

Proof: Homework

Page 32: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

32

Mutual Information, EntropyDefinition (Mutual Information)

Definition (Shannon entropy)

Definition (KL divergence)

Page 33: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

33

Solving the ICA problem with i.i.d. sources

Page 34: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

34

Solving the ICA problem with i.i.d. sources

Page 35: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

35

Whitening

Theorem (Whitening)

Definitions

Note

Page 36: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

36

Proof of the whitening theorem

We can use PCA for whitening!

Page 37: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

37

Whitening solves half of the ICA problem

Note: The number of free parameters of an N by N orthogonal

matrix is (N-1)(N-2)/2.

) whitening solves half of the ICA problem

whitenedoriginal mixed

Page 38: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

38

Solving ICA

• Remove mean, E[x]=0

• Whitening, E[xxT]=I• Find an orthogonal W optimizing an objective function

– Sequence of 2-d Jacobi (Givens) rotations

• find y (the estimation of s),

• find W (the estimation of A-1)

ICA solution: y=Wx

ICA task: Given x,

original mixed whitened rotated(demixed)

Page 39: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

39

Optimization Using Jacobi Rotation Matrices

p q

p

q

Page 40: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

40

Gaussian sources are problematic

The Gaussian distribution is spherically symmetric.

Mixing it with an orthogonal matrix… produces the same distribution...

However, this is the only ‘nice’ distribution that we cannot recover!

No hope for recovery...

Page 41: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

41

ICA Cost Functions

) go away from normal distribution

Page 42: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

42

Central Limit TheoremThe sum of independent variables converges to the normal distribution) For separation go far away from the normal distribution) Negentropy, |kurtozis| maximization

Figs borrowed from Ata Kaban

Page 43: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

43

AlgorithmsThere are more than 100 different ICA algorithms…

• Mutual information (MI) estimation – Kernel-ICA [Bach & Jordan, 2002]

• Entropy, negentropy estimation– Infomax ICA [Bell & Sejnowski 1995] – RADICAL [Learned-Miller & Fisher, 2003] – FastICA [Hyvarinen, 1999] – [Girolami & Fyfe 1997]

• ML estimation– KDICA [Chen, 2006]– EM-ICA [Welling]– [MacKay 1996; Pearlmutter & Parra 1996; Cardoso 1997]

• Higher order moments, cumulants based methods– JADE [Cardoso, 1993]

• Nonlinear correlation based methods– [Jutten and Herault, 1991]

Page 44: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

ICA ALGORITHMS

Page 45: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

45

Maximum Likelihood ICA AlgorithmDavid J.C. MacKay (97)

rows of W

Page 46: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

46

Kurtosis = 4th order cumulant

Measures •the distance from normality•the degree of peakedness

ICA algorithm based on Kurtosis maximization

Page 47: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

47

The Fast ICA algorithm (Hyvarinen)Probably the most famous ICA algorithm

Page 48: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Dependence Estimation Using Kernel Methods

The Kernel ICA Algorithm

Page 49: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

49

Kernel covariance (KC) A. Gretton, R. Herbrich, A. Smola, F. Bach, M. Jordan

The calculation of the supremum over function sets is extremely difficult. Reproducing Kernel Hilbert Spaces make it easier.

Page 50: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

50

RKHS construction for x, y stochastic variables.

Page 51: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

51

The Representer Theorem

Theorem:

1st term, empirical loss 2nd term, regularization

Page 52: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

52

Kernel covariance (KC)

Yay! We can use the representer theorem for our problem

Page 53: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

53

Kernel covariance (KC)

Page 54: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

54

Amari Error for Measuring the Performance

• Measures how close a square matrix is to a permutation matrix

B = WA

demixing mixing

Page 55: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Independent Subspace Analysis

Page 56: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

56

Independent Subspace Analysis(ISA, The Woodstock Problem)

Sources Observation Estimation

Find W, recover Wx

Page 57: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

57

Independent Subspace AnalysisOriginal

Separated

Mixed

Hinton diagram

Page 58: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

58

ISA Cost Functions

Mutual Information:

py pydyHy ¡R

Shannon-entropy:

RIy; : : : ;ym py

py¢¢¢pymdy

Page 59: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

59

ISA Cost Functions

Page 60: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Multidimensional Entropy Estimation

Page 61: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

61

Multi-dimensional Entropy Estimations, Method of Kozahenko and Leonenko

Then the nearest neighbor entropy estimation:

This estimation is means-square consistent, but not robust.Let us try to use more neighbors!

Hz n

nPj

nkN;j ¡ zjk CE

CE ¡1Re¡t tdt

fz; : : : ;zng n

z 2 Rd N;j zj

Page 62: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

62

Multi-dimensional Rényi’s Entropy Estimations

Let us apply Rényi’s-

entropy for estimating

the Shannon-entropy:

¡® R

®!

H® ¡R

Let us use - K-nearest neighbors - minimum spanning trees for estimating the multi-dimensional Rényi’s entropy.(It could be much more general…)

f®zdz

fz fzdz

Page 63: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

63

Beardwood - Halton - Hammersley Theorem for kNN graphs

fz; : : : ;zng n

z 2 Rd Nk;j k zj

° d¡ d®

¡®

kn®

nPj

Pv2Nk;j

kv¡zjk°! H®z c

n! 1

Lots of other graphs, e.g. MST, TSP, minimal matching, Steiner graph…etc could be used as well.

Page 64: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

64

Examples(J. A. Costa and A. O. Hero)

Page 65: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

Independent Subspace AnalysisResults

Page 66: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

66

Numerical Simulations 2D Letters (i.i.d.)

Sources ObservationEstimated sources

Performance matrix

Page 67: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

67

Numerical Simulations3D Curves (i.i.d.)

Sources ObservationEstimated sources

Performance matrix

Page 68: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

68

Numerical SimulationsFacial images (i.i.d.)

Sources ObservationEstimated sources

Performance matrix

Page 69: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

69

ISA 2D

Page 70: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

70

ISA 3D after ICA preprocessing

Page 71: Introduction to Independent Component Analysisbapoczos/other_presentations/ICA_26_10... · 2 Contents • Independent Component Analysis – ICA model – ICA applications – ICA

71

Thanks for the Attention!