Top Banner
Independent Component Analysis CAP5610: Machine Learning Instructor: Guo-Jun QI
25

Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Jun 04, 2018

Download

Documents

ngokhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Independent Component Analysis

CAP5610: Machine Learning

Instructor: Guo-Jun QI

Page 2: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Review: Principle Component Analysis

• PCA aims to find a set of principle components that span a subspace,• Projecting data into this subspace will generate

minimum reconstruction error.

• Principle components should be orthogonal

• PCA projection

• Each row of W is a direction along which x will be projected.

𝐲 = 𝑊𝐱

𝑤1𝑤2

Page 3: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

PCA

• PCA removes the correlations between components, but it does not mean the components become independent.• No correlation: COV 𝑦1𝑦2 = 𝐸 𝑦1𝑦2 − 𝐸 𝑦1 𝐸 𝑦2 = 0

• Independence: p 𝑦1𝑦2 − 𝑝 𝑦1 𝑝 𝑦2 = 0

• Only for Gaussian distribution, no correlation means independence.

• Independence Component Analysis (ICA) aims at finding a set of independent components•

Page 4: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Source separation problem

• M independent sources{𝑠1, … , 𝑠𝑀}

• Mixture observations of signals

𝑥𝑖 =

𝑗=1

𝑀

𝑎𝑖𝑗𝑠𝑗

𝐱 = 𝐴𝐬

• 𝐴 = [𝑎𝑖𝑗] is mixing matrix

• Can we find the mixing matrix and recover the sources?• ICA

Page 5: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Inverse problem

• Mixture of signals𝐱 = 𝐴𝐬

• ICA: Find W, 𝐲 = 𝑊𝐱

so that The components of y are as much independent as possible. • y is an estimate of s

• W is an estimate of 𝐴−1

Page 6: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

PCA VS. ICA

• ICA finds the underlying independent components that generate the data.

Page 7: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

ICA for Natural images

• ICA components: corresponding to some natural image structures

Page 8: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

PCA for Natural images

• PCA components are orthogonal, which may not correspond to any independent structures in natural images.

Page 9: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Applications: denoising images

• Noise and image are independent.

Original Noisy Median filter ICA

Page 10: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Statistical independence

• Definition – independence

Page 11: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Source ambiguity

• Independent sources can be recovered only up to sign, scale and permutation.

• If 𝐬 is changed by sign, scale and permutation, there exists another mixing matrix, so that the observed signals stay 𝐱 unchanged.• Proof: P is a permutation matrix, and D is a diagonal scaling matrix

𝐱 = 𝐴𝑃−1𝐷−1 [𝑃𝐷 𝐬]

𝐱 = 𝐴𝐬

Page 12: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Preprocessing: subtracting mean

• Mean: 𝐦 = 𝐄 𝐱

• 𝐱 −𝐦

• In this case, the original sources 𝐬 also have zero mean

Page 13: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Preprocessing: whitening • Covariance matrix of observed signals

• Do SVD,

• Let , then is the whitened signals, because

• Define as a new mixing matrix , then

• We also have

Page 14: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Preprocessing: Benefit

• Reducing the number of parameters• The orthogonal matrix 𝐴∗ of N by N only has (N-1)(N-2)/2 free parameters.

Page 15: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Solving ICA

• Problem: Given whitened zero mean x, find an orthogonal matrix W, so that the components in y=Wx are as much independent as possible.

• Question: how to measure the independence between the components?• Central limit theorem – the sum of a set of i.i.d. random variables approaches

to Gaussian distribution.

Page 16: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Non-Gaussianity and independence

• y=wTx=wTAs is a weighted sum of s, where wT is a row vector of W.

• If y is a mixture of s (up to scale, sign), then y is closer to Gaussian

• Otherwise, y is not a mixture of s, but only one of its components, then y should be far away from Gaussian.

• Non-Gaussianity measures the independence of y.

Page 17: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Measure of non-Gaussianity

• Kurtosis – the forth-order cumulant

Page 18: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

The Fast ICA algorithm (Hyvarinen)

• Lagragian function:𝐿 𝐰 = 𝑓 𝐰 + 𝜆(𝐰𝐰𝑇 − 1)

• Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐰𝐓𝐰 = 1.

• Maximize Kurtosis 𝑓 𝐰 = κ4 𝑦 = 𝐸 𝑦2 − 3, 𝑠. 𝑡. , 𝐰𝑇𝐰 = 1

• Lagragian function 𝑓 𝐰 + 𝜆 𝐰𝑇𝐰− 1

• KKT condition for constrained optimization problem 𝑓′ 𝐰 + 2𝜆𝐰 = 0

4𝐸 𝐰𝑇𝐳 3𝐳 + 2𝜆𝐰 = 0

Page 19: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Algorithm

• Randomly initialize w(1)

• Updatew 𝑘 + 1 ← 𝐸 𝐰(𝑘)𝑇𝐳 3𝐳 − 3𝐰 𝑘

𝒘 𝑘 + 1 ←𝒘 𝑘 + 1

𝒘 𝑘 + 1

Page 20: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Estimate the other components

• Given an estimate of w1, find other directions to recover more sources

• The 2nd w, with the similar formulation, but an additional constraint that 𝐰 ⊥ 𝐰1

• 3rd, 4th , each item an additional orthogonal constraint will be added…

Page 21: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

The other independence measure

• For all the distributions with the same variance, Gaussian has the maximal entropy.

• Minimizing negentropy

where 𝐲𝑔𝑎𝑢𝑠𝑠 is the Gaussian with the same covariance as y

• Because y=wTz, w is unity, z is a random variable with covariance I, y has a covariance matrix of I

Page 22: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Approximation to Negentropy

• Negentropy is difficult to compute

• Approximation using 3rd order and 4th order cumulant

• Approximation using non-quadratic functions

Page 23: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Question

• In MP3, we will use PCA to project images into a subspace where the obtained components are supposed to be independent. Is this assumption valid?

Page 24: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Question

• In MP3, we will use PCA to project images into a subspace where the obtained components are supposed to be independent. Is this assumption valid?• PCA gets uncorrelated components.

• Under Gaussian, uncorrelated components imply independence.

• So we need to verify if the pixels are generated from a Gaussian• Using Kurtosis and negentropy.

Page 25: Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Summary

• ICA recovers a set of independent components

• PCA finds a set of uncorrelated components

• By central limit theorem, we use nongaussianity to find the independent component • Surrogate: Kurtosis and negentropy

• Fast ICA algorithm – iterative algorithm, no closed-form solution

• Application: separating independent sources from mixture signals• Image denoising

• Voice separation