Top Banner
C. Ding, NMF => Unsupervised Clustering 1 (Semi-)Nonnegative Matrix Factorization and K-mean Clustering Xiaofeng He Lawrence Berkeley Nat’l Lab Horst Simon Lawrence Berkeley Nat’l Lab Tao Li Florida Int’l Univ. Michael Jordan UC Berkeley Haesun Park Georgia Tech Chris Ding Lawrence Berkeley National Laboratory
42

(Semi-)Nonnegative Matrix Factorization and K-mean Clustering

Feb 08, 2017

Download

Documents

vannhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 1

(Semi-)Nonnegative Matrix Factorization and

K-mean Clustering

Xiaofeng He Lawrence Berkeley Nat’l LabHorst Simon Lawrence Berkeley Nat’l LabTao Li Florida Int’l Univ.Michael Jordan UC BerkeleyHaesun Park Georgia Tech

Chris DingLawrence Berkeley National Laboratory

Page 2: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 2

Nonnegative Matrix Factorization (NMF)

),,,( 21 nxxxX L=Data Matrix: n points in p-dim:

TFGX ≈Decomposition (low-rank approximation)

Nonnegative Matrices0,0,0 ≥≥≥ ijijij GFX

),,,( 21 kgggG L=),,,( 21 kfffF L=

is an image, document, webpage, etc

ix

Page 3: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 3

Some historical notes

• Earlier work by statistics people (G. Golub)• P. Paatero (1994) Environmetrices• Lee and Seung (1999, 2000)

– Parts of whole (no cancellation)– A multiplicative update algorithm

Page 4: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 4

0 0050 710

080 20 0

.

.

.

.

.

.

.

M

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

Pixel vector

Page 5: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 5

XFGT ≈

),,,( 21 kfffF L= ),,,( 21 kgggG L=

Lee and Seung (1999): Parts-based Perspective

original

),,,( 21 nxxxX L=

Page 6: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 6

TFGX ≈ ),,,( 21 kfffF L=

Straightforward NMF doesn’t get parts-based picture

Several People explicitly sparsify F to get parts-based picture

Donono & Stodden (2003) study condition for parts-of-whole

(Li, et al, 2001; Hoyer 2003)

“Parts of Whole” Picture

Page 7: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 7

Meanwhile …….A number of studies empirically show the usefulness of NMF for pattern discovery/clustering:

Xu et al (SIGIR’03)Brunet et al (PNAS’04)Many others

We claim:

NMF factors give holistic pictures of the data

Page 8: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 8

Our Experiments: NMF gives holistic pictures

Page 9: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 9

Our Experiments: NMF gives holistic pictures

Page 10: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 10

Task:Prove NMF is doing “Data Clustering”

NMF => K-means Clustering

Page 11: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 11

NMF-Kmeans Theorem

2

0||X||min

0,

T

FFG

GIGTG

−≥=

)(Trmin0,

XGXGXX TTT

GIGGT−

≥=

G -orthogonal NMF is equivalent to relaxed K-means clustering.

Proof.

(Ding, He, Simon, SDM 2005)

Page 12: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 12

• Also called “isodata”, “vector quantization”• Developed in 1960’s (Lloyd, MacQueen, Hartigan, etc)

K-means clustering

• Computationally Efficient (order-mN)• Most widely used in practice

– Benchmark to evaluate other algorithms

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1

||||min

TnxxxX ),,,( 21 L=Given n points in m-dim:

K-means objective

Page 13: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 13

Reformulate K-means Clustering

∑ ∑ ∑=

∈−= i

K

kCji j

Ti

kiK k

xxn

xJ1

,2 1||||

}2/1/)00,11,00( k

T

n

k nhk

LLL=

Cluster membership indicators:

∑ ∑=

−=i

K

kk

TTkiK XhXhxJ

1

2

),,( 1 KhhH L=

)(Trmax0,

XHXH TT

HIHHT ≥=Solving K-mean =>

(Zha, Ding, Gu, He, Simon, NIPS 2001) (Ding & He, ICML 2004)

Page 14: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 14

Reformulate K-means Clustering

Cluster membership indicators :

Hhhh ==

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢

),,(

1100000

0011100

0000011

321

C1 C2 C3

Page 15: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 15

NMF-Kmeans Theorem

2

0||X||min

0,

T

FFG

GIGTG

−≥=

)(Trmin0,

XGXGXX TTT

GIGGT−

≥=

G -orthogonal NMF is equivalent to relaxed K-means clustering.

Proof.

(Ding, He, Simon, SDM 2005)

Page 16: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 16

Kernel K-means Clustering

Map feature vector to higher-dim space

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

Page 17: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 17

Symmetric NMF:

Symmetric NMF

Is Equivalence to )(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=

THHW ≈

Orthogonal symmetric NMF is equivalent to Kernel K-means clustering.

Symmetric Nonnegative matrix

Page 18: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 18

Orthogonality in NMF

Strict orthogonal G: hard clustering

Non-orthogonal G: soft clustering),( 21 hhH =

Ambiguous/outlier points

),,,( 21 nxxxX L=

Page 19: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 19

K-means Clustering Theorem

2

0,||X||min T

GIGGGF

T +±±≥=−

G -orthogonal NMF is equivalent to relaxed K-means clustering.

(Ding, Li, Jordan, 2006)

Proof requires only G-orthogonality and nonnegativity

),,,( 21 kgggG L=

),,,( 21 kfffF L= => cluster centroids

=> cluster indicators

Page 20: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 20

NMF Generalizations

SVD: TT VUGFX Σ== ±±±

TGFX +±± =Semi-NMF:

Tri-NMF:

Convex-NMF:

Kernel-NMF:

TGWXX ++±± =

TGSFX +±+± =

TGWXX ++±± = )()( φφ

(Ding, Li, Jordan, 2006)

(Ding, Li, Peng, Park, KDD 2006)

Page 21: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 21

Semi-NMF:

• For any mixed-sign input data (centered data)• Clustrering and Low-rank approximation

TGFX +±± =

Update F:

Update G:

1)( −= GGXGF T

ikikT

ikikT

ikik FFGXFFFGXFGG

])([)(])([)(

+−

−+

++←

(Ding, Li, Jordan, 2006)

||||min TFGX −

Page 22: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 22

In NMF TGFX +++ =TGFX +±± =In Semi-NMF

For fk factor to capture the notion of cluster centroid, Require fk to be a convex combination of input data

Convex-NMF

is in a large space

+=++= XWFxwxwf nnkk ,111 L

TGWXX ++±± =

(Ding, Li, Jordan, 2006)

),,,( 21 kfffF L=

For F interpretability, ±= XWF(Affine combination )

Page 23: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 23

Convex-NMF:

||||min TXWGX −

Update F:

Update G:

ikTT

ikT

ikTT

ikT

ikik GWGXXGXXGWGXXGXXWW

])[(])[(])[(])[(

+−

−+

++←

ikTT

ikTT

ikTT

ikTT

ikik WXXGWXXWWXXGWXXW

GG])([])([])([])([

+−

−+

++

(Ding, Li, Jordan, 2006)

TGWXX ++±± =

Computing algorithm

Page 24: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 24

Semi-NMF factors: Convex-NMF factors:

Page 25: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 25

Semi-NMF factors: Convex-NMF factors:

Page 26: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 26

Page 27: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 27

- Sparse factorization is a recent trend.- Sparsity is usually explicitly enforced- Convex-NMF factors are naturally sparse

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

===

000001000001000001000001000001

),,( 1 keeWG L

From this we infer convex NMF factors are naturally sparse

Sparsity of Convex-NMF

2222 ||)(|||||||||| TTkk kXX

TF

T WGIvWGIGXWX T −=−=− ∑ σ

Consider 22 ||)(|||||| TTkk

T WGIeWGI −=− ∑Solution is

Page 28: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 28

48476 1cluster

xxxxxxxx48476 2cluster

xxxxxxxx

A Simple Example

08.0|||| =− Kmeansconvex CF53.0|||| =− Kmeanssemi CF

30877.0,27944.0,27940.0|||| =− TFGXSVD Semi Convex

Page 29: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 29

Experiments on 7 datasets

NMF variants always perform better than K-means

Page 30: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 30

Kernel NMF -- Generalized Convex NMF

Map feature vector to higher-dim space

NMF/semi-NMF

)( ii xx φ→TFGX =)(φ

Minimization objective depends on kernel only:

)()(),()Tr(||)()(|| 2 TTT WGIXXGWIWGXX −−=− φφφφ

(Ding & He, ICML 2004)

)](,),(),([)( 21 nxxxX φφφφ L=

depends on the explicit mapping function )(•φ

TGWXX ])([)( φφ =Kernel NMF:

Page 31: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 31

Kernel K-means Clustering

Map feature vector to higher-dim space

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

Page 32: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 32

NMF and PLSI : Equivalence

So far we only use the Frobenius norm as the NMF objective function. Another objective is the KL divergence

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

Page 33: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 33

ikTT

ikT

ikTT

ikT

ikik GWGXXGXXGWGXXGXXWW

])[(])[(])[(])[(

+−

−+

++←

Kernel-NMF Algorithm

Update F:

Update G:

ikTT

ikTT

ikTT

ikTT

ikik WXXGWXXWWXXGWXXW

GG])([])([])([])([

+−

−+

++

(Ding, Li, Jordan, 2006)

Computing algorithm depends only on the kernel

)(),( XX φφ

Page 34: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 34

Orthogonal Nonnegative Tri-Factorization

2

0,||X||min

0,

T

FIFFGSF

GIGTG

T +±+±≥=−

≥=

3-factor NMF with explicit orthogonality constraints

Simultaneous K-means clustering of rows and columns

),,,( 21 kgggG L=

),,,( 21 kfffF L= => Row cluster indicators

=> Column cluster indicators(Ding, Li, Peng, Park, KDD 2006)

1. Solution is unique2. Can’t reduce to NMF

Page 35: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 35

NMF-like algorithms are different ways to relax F , G !

),,(,,/ 11

knnkkk nndiagDXGDFnXgf L=== −

IGGGGXXGXGDXJ TTTnK =−=−= − ~~,||~~|||||| 221

2

1

2

1

2

1|||||||||||| T

n

ikiik

K

kCiki

K

kK FGXfxgfxJ

k

−=−=−= ∑∑∑∑==∈=

),,,( 21 kgggG L=

),,,( 21 kfffF L= = cluster centroids

= cluster indicators

),,,( 21 nxxxX L= = input data

K-means clustering objective function

Page 36: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 36

NMF PLSINMF objective functions• Frobenius norm• KL-divergence: ij

Tij

n

jFG

xij

m

iKL FGxxJ

ijTij )(log

1)(

1+−= ∑∑

==

),(log),(11

j

n

jiji

m

iPLSI dwpdwxJ ∑∑

==

=

)|()()|(),( kjkk

kiji zdpzpzwpdwp ∑=

Probabilistic LSI (Hoffman, 1999) is a latent variable model for clustering:

constant+−= −KLNMFPLSI JJWe can show(Ding, Li, Peng, AAAI 2006)

Page 37: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 37

Summary• NMF is doing K-means clustering (or PLSI)• Interpretability is key to motivate new NMF-

like factorizations– Semi-NMF, Convex-NMF, Kernel-NMF, Tri-NMF

• NMF-like algorithms always outperform K-means clustering

• Advantage: hard/soft clustering• Convex-NMF enforces notion of cluster centroids

and is naturally sparse

NMF: A new/rich paradigm for unsupervised learning

Page 38: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 38

References

• On the Equivalence of Nonnegative Matrix Factorization and K-means /Spectral clustering, Chris Ding, XiaofengHe, Horst Simon, SDM 2005.

• Convex and Semi-Nonnegative Matrix Factorization, Chris Ding, Tao Li, Michael Jordan, submitted

• Orthogonal Non-negative Matrix Tri-Factorization for clustering, Chris Ding, Tao Li, Wei Peng, Haesun Park,KDD 2006.

• Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square and a Hybrid Algorithm, Chris Ding, Tao Li, Wei Peng, AAAI 2006.

Page 39: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 39

Data Clustering: NMF and PCA

2

0,||X||min T

GIGGGF

T +±±≥=−

NMF is useful due to nonnegativity.

G-orthogonality and nonnegativity

),,,( 21 kgggG L=

),,,( 21 kfffF L= => cluster centroids

=> cluster indicators

What happens if we ignore nonnegativity?

Page 40: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 40

K-means clustering PCA

2

0,||))((X||min T

GIGGRGRF

T +±±≥=−

Ignore nonnegativity => orth. transform R

)]()([Trmax GRXXGR TT

GR

GRVFRUVUX T ==Σ= ,,

TTT UUFRFRFF == ))((

Equivelevant to

Solution is given by SVD:

Cluster indicator projection:

Centroid subspace projection:

TTT VVGRGRGG == ))((

PCA/SVD is automatically doing K-means clustering

(Ding & He, ICML 2004)

Page 41: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 41

48476 1cluster

xxxxxxxx48476 2cluster

xxxxxxxx

A Simple Example

08.0|||| =− Kmeansconvex CF53.0|||| =− Kmeanssemi CF

30877.0,27944.0,27940.0|||| =− TFGXSVD Semi Convex

Page 42: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 42

NMF = Spectral Clustering (Normalized Cut)

Normalized Cut ⇒

cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut LL

Re-write:}

||||/)00,11,00( 2/12/1k

Tn

k hDDyk

LLL=

IYYYW T =tosubject :Optimize ),~YTY

Tr(max

2/12/1~ −−= WDDW

2

0,||~||min T

HIHHHHW

T−

≥=

kTk

kTk

T

T

k DhhhWDh

DhhhWDhhhJ

)()(),,(11

111

−++−= LLNcut

(Gu , et al, 2001)