Top Banner
Machine Learning Unit 5, Introduction to Artificial Intelligence, Stanford online course Made by: Maor Levy, Temple University 2012 1
34

Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

Dec 13, 2015

Download

Documents

Miles Rice
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

1

Machine LearningUnit 5, Introduction to Artificial Intelligence, Stanford online course

Made by: Maor Levy, Temple University 2012

Page 2: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

2

The unsupervised learning problem

Many data points, no labels

Page 3: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

3

Google Street View

Unsupervised Learning?

Page 4: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

4

K-Means

Many data points, no labels

Page 5: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

5

Choose a fixed number of clusters

Choose cluster centers and point-cluster allocations to minimize error

can’t do this by exhaustive search, because there are too many possible allocations.

Algorithm◦ fix cluster centers;

allocate points to closest cluster

◦ fix allocation; compute best cluster centers

x could be any set of features for which we can compute a distance (careful about scaling)

K-Means

x j i2

jelements of i'th cluster

iclusters

* From Marc Pollefeys COMP 256 2003

Page 6: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

6

K-Means

Page 7: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

7

K-Means

* From Marc Pollefeys COMP 256 2003

Page 8: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

8

K-means clustering using intensity alone and color alone

Image Clusters on intensity Clusters on color

* From Marc Pollefeys COMP 256 2003

Results of K-Means Clustering:

Page 9: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

9

Is an approximation to EM◦ Model (hypothesis space): Mixture of N Gaussians◦ Latent variables: Correspondence of data and Gaussians

We notice: ◦ Given the mixture model, it’s easy to calculate the

correspondence◦ Given the correspondence it’s easy to estimate the

mixture models

K-Means

Page 10: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

10

Data generated from mixture of Gaussians

Latent variables: Correspondence between Data Items and Gaussians

Expectation Maximzation: Idea

Page 11: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

11

Generalized K-Means (EM)

Page 12: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

12

Gaussians

Page 13: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

13

ML Fitting Gaussians

Page 14: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

14

Learning a Gaussian Mixture(with known covariance)

k

n

x

x

ni

ji

e

e

1

)(2

1

)(2

1

22

22

M-Step

k

nni

jiij

xxp

xxpzE

1

)|(

)|(][

E-Step

Page 15: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

15

Converges! Proof [Neal/Hinton, McLachlan/Krishnan]:

◦ E/M step does not decrease data likelihood◦ Converges at local minimum or saddle point

But subject to local minima

Expectation Maximization

Page 16: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

16

EM Clustering: Results

http://www.ece.neu.edu/groups/rpl/kmeans/

Page 17: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

17

Number of Clusters unknown Suffers (badly) from local minima Algorithm:

◦ Start new cluster center if many points “unexplained”

◦ Kill cluster center that doesn’t contribute◦ (Use AIC/BIC criterion for all this, if you want to be

formal)

Practical EM

Page 18: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

18

Spectral Clustering

Page 19: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

19

The Two Spiral Problem

Page 20: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

20

Spectral Clustering: Overview

Data Similarities Block-Detection

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 21: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

21

Block matrices have block eigenvectors:

Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]

Eigenvectors and Blocks

1 1 0 0

1 1 0 0

0 0 1 1

0 0 1 1

eigensolver

.71

.71

0

0

0

0

.71

.71

l1= 2 l2= 2 l3= 0 l4= 0

1 1 .2 0

1 1 0 -.2

.2 0 1 1

0 -.2 1 1

eigensolver

.71

.69

.14

0

0

-.14

.69

.71

l1= 2.02 l2= 2.02 l3= -0.02 l4= -0.02

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 22: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

22

Can put items into blocks by eigenvectors:

Resulting clusters independent of row ordering:

Spectral Space

1 1 .2 0

1 1 0 -.2

.2 0 1 1

0 -.2 1 1

.71

.69

.14

0

0

-.14

.69

.71

e1

e2

e1 e2

1 .2 1 0

.2 1 0 1

1 0 1 -.2

0 1 -.2 1

.71

.14

.69

0

0

.69

-.14

.71

e1

e2

e1 e2

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 23: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

23

The key advantage of spectral clustering is the spectral space representation:

The Spectral Advantage

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 24: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

24

Measuring Affinity

Intensity

Texture

Distance

aff x, y exp 12 i

2

I x I y 2

aff x, y exp 12 d

2

x y

2

aff x, y exp 12 t

2

c x c y 2

* From Marc Pollefeys COMP 256 2003

Page 25: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

25

Scale affects affinity

* From Marc Pollefeys COMP 256 2003

Page 26: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

26* From Marc Pollefeys COMP 256 2003

Page 27: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

27

The Space of Digits (in 2D)

M. Brand, MERL

Page 28: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

28

Dimensionality Reduction with PCA

Page 29: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

29

Fit multivariate Gaussian

Compute eigenvectors of Covariance

Project onto eigenvectors with largest eigenvalues

Linear: Principal Components

Page 30: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

30

Other examples of unsupervised learning

Mean face (after alignment)

Slide credit: Santiago Serrano

Page 31: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

31

Eigenfaces

Slide credit: Santiago Serrano

Page 32: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

32

Isomap Local Linear Embedding

Non-Linear Techniques

Isomap, Science, M. Balasubmaranian and E. Schwartz

Page 33: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

33

Scape (Drago Anguelov et al)

Page 34: Made by: Maor Levy, Temple University 2012 1. Many data points, no labels 2.

34

References:◦ Sebastian Thrun and Peter Norvig, Artificial Intelligence, Stanford

University http://www.stanford.edu/class/cs221/notes/cs221-lecture6-fall11.pdf