Non Negative Matrix Factorization Hamdi Jenzri. Outline Introduction Non-Negative Matrix Factorization (NMF) Cost functions Algorithms Multiplicative.

Non Negative Matrix Factorization

Hamdi Jenzri

2

Outline Introduction Non-Negative Matrix Factorization (NMF) Cost functions Algorithms

Multiplicative update algorithm Gradient descent algorithm Alternating least squares algorithm

NMF vs. SVD Initialization Issue Experiments

Image Dataset Landmine Dataset

Conclusion & Potential Future Work

3

Introduction In many data-processing tasks, negative numbers are

physically meaningless Pixel values in an image Vector representation of words in a text document…

Classical tools cannot guarantee to maintain the non-negativity Principal Component Analysis Singular Value Decomposition Vector Quantization…

Non-negative Matrix Factorization

4






5

Non-Negative Matrix Factorization

Given a non-negative matrix V, find non-negative matrix factors W

and H such that:

V ≈ W H

V is an nxm matrix whose columns are n-dimensional data vectors,

where m is the number of vectors in the data set.

W is an nxr non-negative matrix

H is an rxm non-negative matrix

Usually, r is chosen to be smaller than n or m, so that W and H are

smaller than the original matrix V

6

Non-Negative Matrix Factorization

Significance of this approximation:

It can be rewritten column by column as

v ≈ W h

Where v and h are the corresponding columns of V and H

Each data vector v is approximated by a linear combination of the columns of

W, weighted by the components of h

Therefore, W can be regarded as containing a basis that is optimized for the

linear approximation of the data in V

Since relatively few basis vectors are used to represent many data vectors,

good approximation can only be achieved if the basis vectors discover

structure that is latent in the data

7






8

Cost functions

To find an approximate factorization V ≈ W H, we first need to define

cost functions that quantify the quality of the approximation

Such cost functions can be constructed using some measure of

distance between two non-negative matrices A and B

Square of the Euclidean distance between A and B

||A – B||2 = ∑ij (Aij - Bij)2

Divergence of A from B

D (A||B) = ∑ij (Aij log(Aij/Bij) – Aij + Bij)

It reduces to the Kullback-Leibler divergence, or relative entropy, when ∑ij

Aij = ∑ij Bij = 1

9

Cost functions

The formulation of the NMF problem as an optimization

problem can be stated as:

Minimize f (W, H) = ||V – WH||2 with respect to W and H,

subject to the constraints W, H ≥ 0

Minimize f (W, H) = D (V || WH) with respect to W and H,

subject to the constraints W, H ≥ 0

These functions are convex in W only or H only, they are

not convex in both variables together

10






11

Multiplicative update algorithm

Lee and Seung

Convergence to a stationary point that may or may not be a local minimum

12

Gradient descent algorithm

and are the step size parameters A projection step is commonly used after each update rule

to set negative elements to zeros Chu et al., 2004; Lee and Seung, 2001

rand (n, r); % initialize Wrand (r, m); % initialize H

13

Alternating least squares algorithm

It aids sparsity More flexible: able to escape a poor path Paatero and Tapper, 1994

rand (n, r);

V

VT

rand (n, r);

14

Convergence

There is no insurance of convergence to local minimum

No uniqueness

If (W, H) is a minimum

Then, (WD, D-1H) is too, where D is a non-negative invertible

matrix

Still, NMF is quite appealing for data mining applications

since, in practice, even local minima can provide desirable

properties such as data compression and feature extraction

15






16

NMF vs. SVD

Property NMF SVD

Formulation A = WH A = U∑VT

Optimality (in terms of squared distance)

Speed & robustness

Uniqueness

Sensitivity to initialization

Orthogonality

Sparsity

Non-negativity

Interpretability

17






18

Initialization Issue NMF algorithms are iterative Initialization of W and/or H A good initialization can improve

Speed Accuracy Convergence

Some initializations: Random initialization Centroid initialization (clustering) SVD-centroid initialization Random Vcol Random C initialization (densest columns)

19






20

0.0380

00.1212

1.0059

1.5773

1.3683

1.3294

1.9902

1.9229

0.4446

00.1438

H

||V – WH||F = 156.7879

Image Dataset

21

Different initialization

0.7274

1.0908

1.0515

0.2588

00.0692

0.3391

0.4743

0.5276

0.6496

0.8730

0.8000

H

||V – WH||F = 25.6828

22

1.2341

1.0807

1.2351

0.0761

0.0346

0.4035

0.1069

0.1781 0 1.283

01.333

20.951

2

H

||V – WH||F = 101.8359

23

Landmine Dataset

Used Data set: BAE-LMED

24

Results: varying r for Multiplicative update algorithm, random initialization

25

Results: Varying the initialization for the Multiplicative update algorithm, r = 9

26

Results: Comparing algorithms for the best found r = 9, random initialization

27

Results: Comparing best combination to Basic EHD performance

28

Columns of H

FA

Mines

29

Different Datasets

30

Different Datasets

31






32

Conclusion & Potential Future work NMF presents a way to represent the data in a different

basis Although its convergence and initialization issues, it is

quite appealing in many data mining tasks Other formulations do exist for the NMF problem

Constrained NMF Incremental NMF Bayesian NMF

Future work will include Trying other Landmine Datasets Bayesian NMF

33

References

Michael W. Berry et al., “Algorithms and Applications for

Approximate Nonnegative Matrix Factorization”, June 2006

Daniel D. Lee and H. Sebastian Seung, "Algorithms for Non-

negative Matrix Factorization". Advances in Neural Information

Processing Systems, 2001

Chih-Jen Lin, “Projected Gradient Methods for Non-negative

Matrix Factorization”, Neural Computation, june 2007

Amy N. Langville et al., “Initializations for Nonnegative Matrix

Factorization”, KDD 2006

Non Negative Matrix Factorization Hamdi Jenzri. Outline Introduction Non-Negative Matrix Factorization (NMF) Cost functions Algorithms Multiplicative.

Documents