Top Banner
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th , 2011 Parts of the PCA slides are from previous 10-701 lectures 1
37

Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

May 19, 2018

Download

Documents

trandung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models)

Yi Zhang10-701, Machine Learning, Spring 2011April 6th, 2011

Parts of the PCA slides are from previous 10-701 lectures

1

Page 2: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Outline

Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher’s Linear Discriminant Topic Models and Latent Dirichlet

Allocation

2

Page 3: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Dimension reduction Feature selection – select a subset of features

More generally, feature extraction◦ Not limited to the original features◦ “Dimension reduction” usually refers to this case

3

Page 4: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Dimension reduction

Assumption: data (approximately) lies on a lower dimensional space

Examples:

4

Page 5: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Outline

Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher’s Linear Discriminant Topic Models and Latent Dirichlet

Allocation

5

Page 6: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis

6

Page 7: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis

7

Page 8: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis

8

Page 9: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis

9

Page 10: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis Assume data is centered For a projection direction v◦ Variance of projected data

10

Page 11: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis Assume data is centered For a projection direction v◦ Variance of projected data

◦ Maximize the variance of projected data

11

Page 12: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis Assume data is centered For a projection direction v◦ Variance of projected data

◦ Maximize the variance of projected data

◦ How to solve this ?

12

Page 13: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis PCA formulation

As a result …

13

Page 14: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Principal components analysis

14

Page 15: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Outline

Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher’s Linear Discriminant Topic Models and Latent Dirichlet

Allocation

15

Page 16: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Source separation The classical “cocktail party” problem

◦ Separate the mixed signal into sources

16

Page 17: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Source separation The classical “cocktail party” problem

◦ Separate the mixed signal into sources◦ Assumption: different sources are independent

17

Page 18: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Independent component analysis Let v1, v2, v3, … vd denote the projection

directions of independent components ICA: find these directions such that data

projected onto these directions have maximum statistical independence

18

Page 19: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Independent component analysis Let v1, v2, v3, … vd denote the projection

directions of independent components ICA: find these directions such that data

projected onto these directions have maximum statistical independence

How to actually maximize independence?◦ Minimize the mutual information◦ Or maximize the non-Gaussianity◦ Actual formulation quite complicated !

19

Page 20: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Outline

Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher’s Linear Discriminant Topic Models and Latent Dirichlet

Allocation

20

Page 21: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Recall: PCA

Principal component analysis

◦ Note:

◦ Find the projection direction v such that the variance of projected data is maximized◦ Intuitively, find the intrinsic subspace of the

original feature space (in terms of retaining the data variability)

21

Page 22: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Canonical correlation analysis Now consider two sets of variables x and y◦ x is a vector of p variables◦ y is a vector of q variables◦ Basically, two feature spaces

How to find the connection between two set of variables (or two feature spaces)?

22

Page 23: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Canonical correlation analysis Now consider two sets of variables x and y◦ x is a vector of p variables◦ y is a vector of q variables◦ Basically, two feature spaces

How to find the connection between two set of variables (or two feature spaces)?◦ CCA: find a projection direction u in the space of x,

and a projection direction v in the space of y, so that projected data onto u and v has max correlation◦ Note: CCA simultaneously finds dimension reduction

for two feature spaces

23

Page 24: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Canonical correlation analysis CCA formulation

◦ X is n by p: n samples in p-dimensional space◦ Y is n by q: n samples in q-dimensional space◦ The n samples are paired in X and Y

24

Page 25: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Canonical correlation analysis CCA formulation

◦ X is n by p: n samples in p-dimensional space◦ Y is n by q: n samples in q-dimensional space◦ The n samples are paired in X and Y

How to solve? … kind of complicated …

25

Page 26: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Canonical correlation analysis CCA formulation

◦ X is n by p: n samples in p-dimensional space◦ Y is n by q: n samples in q-dimensional space◦ The n samples are paired in X and Y

How to solve? Generalized eigenproblems !

26

Page 27: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Outline

Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher’s Linear Discriminant Topic Models and Latent Dirichlet

Allocation

27

Page 28: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Fisher’s linear discriminant Now come back to one feature space In addition to features, we also have label◦ Find the dimension reduction that helps separate

different classes of examples !◦ Let’s consider 2-class case

28

Page 29: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Fisher’s linear discriminant

Idea: maximize the ratio of “between-class variance” over “within-class variance” for the projected data

29

Page 30: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Fisher’s linear discriminant

30

Page 31: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Fisher’s linear discriminant Generalize to multi-class cases Still, maximizing the ratio of “between-class

variance” over “within-class variance” of the projected data:

31

Page 32: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Outline

Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher’s Linear Discriminant Topic Models and Latent Dirichlet

Allocation

32

Page 33: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Topic models Topic models: a class of dimension reduction

models on text (from words to topics)

33

Page 34: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Topic models Topic models: a class of dimension reduction

models on text (from words to topics) Bag-of-words representation of documents

34

Page 35: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Topic models Topic models: a class of dimension reduction

models on text (from words to topics) Bag-of-words representation of documents Topic models for representing documents

35

Page 36: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

Latent Dirichlet allocation A fully Bayesian specification of topic models

36

Page 37: Dimension Reduction (PCA, ICA, CCA, FLD, Topic …tom/10701_sp11/recitations/...Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011

◦ Data: words on each documents◦ Estimation: maximizing the data likelihood – difficult!

Latent Dirichlet allocation

37