Nonnegative Matrix Factorization: Algorithms and Applications Haesun Park [email protected]School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, USA SIAM International Conference on Data Mining, April, 2011 This work was supported in part by the National Science Foundation. Haesun Park [email protected]Nonnegative Matrix Factorization: Algorithms and Applications
53
Embed
Nonnegative Matrix Factorization: Algorithms and … Matrix Factorization: Algorithms and Applications Haesun Park [email protected] School of Computational Science and Engineering
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Nonnegative Matrix Factorization:Algorithms and Applications
School of Computational Science and EngineeringGeorgia Institute of Technology
Atlanta, GA, USA
SIAM International Conference on Data Mining, April, 2011
This work was supported in part by the National Science Foundation.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Co-authors
Jingu Kim CSE, Georgia Tech
Yunlong He Math, Georgia Tech
Da Kuang CSE, Georgia Tech
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Outline
Overview of NMFFast algorithms with Frobenius norm
Theoretical results on convergenceMultiplicative updatingAlternating nonnegativity constrained least Squares: Active-settype methods, ...Hierarchical alternating least squares
Variations/Extensions of NMF : sparse NMF, regularized NMF,nonnegative PARAFACEfficient adaptive NMF algorithmsApplications of NMF, NMF for ClusteringExtensive computational resultsDiscussions
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
k×n s.t. A ≈WH.minW≥0,H≥0 ‖A−WH‖FNMF improves the approximation as k increases:If rank+(A) > k ,
minWk+1≥0,Hk+1≥0
‖A−Wk+1Hk+1‖F < minWk≥0,Hk≥0
‖A−WkHk‖F ,
Wi ∈ R+m×i and Hi ∈ R+
i×n
But SVD does better: if A = UΣV T , then‖A− Uk ΣkV T
k ‖F ≤ min‖A−WH‖F , W ∈ R+m×k and H ∈ R+
k×n
So Why NMF? Dimension Reduction withBetter Interpretation/Lower Dim. Representation for NonnegativeData.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Rank of A ∈ R+m×n
(J. Cohen and U. Rothblum, LAA, 93)
rank+(A), is the smallest integer k for which there existV ∈ R+
m×k and U ∈ R+k×n such that A = VU.
Note: rank(A) ≤ rank+(A) ≤ min(m,n)If rank(A) ≤ 2, then rank+(A) = rank(A).If either m ∈ {1,2,3} or n ∈ {1,2,3}, then rank+(A) = rank(A).
(Perron-Frobenius) There are nonnegative left and right singularvectors u1 and v1 of A associated with the largest singular valueσ1.rank 1 SVD of A = best rank-one NMF of A
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Rank of A ∈ R+m×n
(J. Cohen and U. Rothblum, LAA, 93)
rank+(A), is the smallest integer k for which there existV ∈ R+
m×k and U ∈ R+k×n such that A = VU.
Note: rank(A) ≤ rank+(A) ≤ min(m,n)If rank(A) ≤ 2, then rank+(A) = rank(A).If either m ∈ {1,2,3} or n ∈ {1,2,3}, then rank+(A) = rank(A).
(Perron-Frobenius) There are nonnegative left and right singularvectors u1 and v1 of A associated with the largest singular valueσ1.rank 1 SVD of A = best rank-one NMF of A
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Applications of NMF
Text miningTopic model: NMF as an alternative way for PLSI ( Gaussier et al.,05; Ding et al., 08)Document clustering (Xu et al., 03; Shahnaz et al., 06)Topic detection and trend tracking, email analysis (Berry et al., 05;Keila et al., 05; Cao et al., 07)
Image analysis and computer visionFeature representation, sparse coding (Lee et al., 99; Guillamet etal., 01; Hoyer et al., 02; Li et al. 01)Video tracking (Bucak et al., 07)
Social networkCommunity structure and trend detection ( Chi et al., 07; Wang etal., 08)Recommendation system (Zhang et al., 06)
Bioinformatics-microarray data analysis (Brunet et al., 04, H. Kimand Park, 07)Acoustic signal processing, blind source separating (Cichocki etal., 04)Financial data (Drakakis et al., 08)Chemometrics (Andersson and Bro, 00)and SO MANY MORE...
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Algorithms for NMF
Multiplicative update rules: Lee and Seung, 99Alternating least squares (ALS): Berry et al 06Alternating nonnegative least squares (ANLS)
Lin, 07, Projected gradient descentD. Kim et al., 07, Quasi-NewtonH. Kim and Park, 08, Active-setJ. Kim and Park, 08, Block principal pivoting
Other algorithms and variantsCichocki et al., 07, Hierarchical ALS (HALS)Ho, 08, Rank-one Residue Iteration (RRI)Zdunek, Cichocki, Amari 06, Quasi-NewtonChu and Lin, 07, Low dim polytope approx.Other rank-1 downdating based algorithms (Vavasis,..)C. Ding, T. Li, tri-factor NMF, orthogonal NMF, ...Cichocki, Zdunek, Phan, Amari: NMF and NTF: Applications toExploratory Multi-way Data Analysis and Blind Source Separation,Wiley, 09Andersson and Bro, Nonnegative Tensor Factorization, 00And SO MANY MORE...
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Block Coordinate Descent (BCD) Method
A constrained nonlinear problem:
min f (x)(e.g., f (W ,H) = ‖A−WH‖F )
subject to x ∈ X = X1 × X2 × · · · × Xp,
where x = (x1, x2, . . . , xp), xi ∈ Xi ⊂ Rni , i = 1, . . . ,p.Block Coordinate Descent method generatesx (k+1) = (x (k+1)
1 , . . . , x (k+1)p ) by
x (k+1)i = arg min
ξ∈Xi
f (x (k+1)1 , . . . , x (k+1)
i−1 , ξ, x (k)i+1, . . . , x
(k)p ).
Th. (Bertsekas, 99): Suppose f is continuously differentiable over theCartesian product of closed, convex sets X1,X2, . . . ,Xp and supposefor each i and x ∈ X , the minimum for
minξ∈Xi
f (x (k+1)1 , . . . , x (k+1)
i−1 , ξ, x (k)i+1, . . . , x
(k)p )
is uniquely attained. Then every limit point of the sequence generatedby the BCD method {x (k)} is a stationary point.NOTE: Uniqueness not required when p = 2 (Grippo and Sciandrone, 00).
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Block Coordinate Descent (BCD) Method
A constrained nonlinear problem:
min f (x)(e.g., f (W ,H) = ‖A−WH‖F )
subject to x ∈ X = X1 × X2 × · · · × Xp,
where x = (x1, x2, . . . , xp), xi ∈ Xi ⊂ Rni , i = 1, . . . ,p.Block Coordinate Descent method generatesx (k+1) = (x (k+1)
1 , . . . , x (k+1)p ) by
x (k+1)i = arg min
ξ∈Xi
f (x (k+1)1 , . . . , x (k+1)
i−1 , ξ, x (k)i+1, . . . , x
(k)p ).
Th. (Bertsekas, 99): Suppose f is continuously differentiable over theCartesian product of closed, convex sets X1,X2, . . . ,Xp and supposefor each i and x ∈ X , the minimum for
minξ∈Xi
f (x (k+1)1 , . . . , x (k+1)
i−1 , ξ, x (k)i+1, . . . , x
(k)p )
is uniquely attained. Then every limit point of the sequence generatedby the BCD method {x (k)} is a stationary point.NOTE: Uniqueness not required when p = 2 (Grippo and Sciandrone, 00).
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
BCD with k(m + n) Scalar Blocks
W
H
A
Minimize functions of wij or hij while all other components in Wand H are fixed:
wij ← arg minwij≥0
‖(rTi −
∑k 6=j
wikhTk )− wijhT
j ‖2
hij ← arg minhij≥0‖(aj −
∑k 6=i
wkhkj)− wihij‖2
where W =(
w1 · · · wk), H =
hT1...
hTk
and
A =(
a1 · · · an)
=
rT1...
rTm
Scalar quadratic function, closed form solution.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
BCD with k(m + n) Scalar Blocks
Lee and Seung (01)’s multiplicative updating (MU) rule
wij ← wij(AHT )ij
(WHHT )ij, hij ← hij
(W T A)ij
(W T WH)ij
Derivation based on gradient-descent form:
wij ← wij +wij
(WHHT )ij
[(AHT )ij − (WHHT )ij
]hij ← hij +
hij
(W T WH)ij
[(W T A)ij − (W T WH)ij
]Rewriting of the solution of coordinate descent:
wij ←[wij +
1(HHT )jj
((AHT )ij − (WHHT )ij
)]+
hij ←[hij +
1(W T W )ii
((W T A)ij − (W T WH)ij
)]+
In MU, conservative steps are taken to ensure nonnegativity.Bertsekas’ Th. on convergence is not applicable to MU.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
BCD with 2k Vector Blocks
W
H
A
Minimize functions of wi or hi while all other components in Wand H are fixed:
‖A−k∑
j=1
wjhTj ‖F = ‖(A−
k∑j=1j 6=i
wjhTj )− wihT
i ‖F = ‖R(i) − wihTi ‖F
wi ← arg minwi≥0‖R(i) − wihT
i ‖F
hi ← arg minhi≥0‖R(i) − wihT
i ‖F
Each subproblem has the form minx≥0 ‖cxT −G‖F andhas a closed form solution x = [GT c
cT c ]+ !Hierarchical Alternating Least Squares (HALS) (Cichocki et al, 07, 09),(actually HA-NLS)Rank-one Residue Iteration (RRI) (Ho, 08)
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
BCD with Scalar Blocks vs. 2k Vector Blocks
W
H
A
W
H
A
In scalar BCD, w1j ,w2j , · · · ,wmj can be computed independently.Also, hi1,hi2, · · · ,hin can be computed independently.→ scalar BCD⇔ 2k vector BCD in NMF
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Successive Rank-1 Deflation in SVD and NMF
Successive rank-1 deflation works for SVD but not for NMFA− σ1u1vT
1 ≈ σ2u2vT2 ? A− w1hT
1 ≈ w2hT2 ?0@ 4 6 0
6 4 00 0 1
1A =
0B@1√2
− 1√2
01√2
1√2
00 0 1
1CA0@ 10 0 0
0 2 00 0 1
1A0B@
1√2
1√2
01√2
− 1√2
00 0 1
1CAThe sum of two successive best rank-1 nonnegative approx. is 4 6 0
6 4 00 0 1
≈ 5 5 0
5 5 00 0 0
+
0 0 00 0 00 0 1
The best rank-2 nonnegative approx. is
WH =
4 6 06 4 00 0 0
=
4 66 40 0
( 1 0 00 1 0
)NOTE: 2k vector BCD 6= successive rank-1 deflation for NMF
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
BCD with 2 Matrix Blocks
W
H
A
Minimize functions of W or H while the other is fixed:
W ← arg minW≥0‖HT W T − AT‖F
H ← arg minH≥0‖WH − A‖F
Alternating Nonnegativity-constrained Least Squares (ANLS)No closed form solution.
Projected gradient method (Lin, 07)
Projected quasi-Newton method (D. Kim et al., 07)
Active-set method (H. Kim and Park, 08)
Block principal pivoting method (J. Kim and Park, 08)
ALS (M. Berry et al. 06) ??
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
NLS : minX≥0 ‖CX − B‖2F =
∑minxi ‖Cxi − bi‖2
2
Nonnegativity-constrained Least Squares (NLS) problemProjected Gradient method (Lin, 07) x (k+1) ← P+(x (k) − αk∇f (x (k)))* P+(·): Projection operator to the nonnegative orthant* Back-tracking selection of step αkProjected Quasi-Newton method (Kim et al., 07)
x (k+1) ←[
yzk
]=
[P+
[y (k) − αD(k)∇f (y (k))
]0
]* Gradient scaling only for nonzero variablesThese do not fully exploit the structure of the NLS problems inNMFActive Set method (H. Kim and Park, (08)
Lawson and Hanson (74), Bro and De Jong (97), Van Benthem and Keenan (04) )
Block principal pivoting method (J. Kim and Park, 08)
linear complementarity problems (LCP) (Judice and Pires, 94)
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Active-set type Algorithms forminx≥0 ‖Cx − b‖2, C : m × k
KKT conditions: y = CT Cx − CT by ≥ 0, x ≥ 0, xiyi = 0, i = 1, · · · , kIf we know P = {i |xi > 0} in the solution in advancethen we only need to solve min ‖CPxP − b‖2, and the rest ofxi = 0, where CP : columns of C with the indices in P
C x b
+
+
0
0
+
*
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Active-set type Algorithms forminx≥0 ‖Cx − b‖2, C : m × k
KKT conditions: y = CT Cx − CT by ≥ 0, x ≥ 0, xiyi = 0, i = 1, · · · , kActive set method (Lawson and Hanson 74)
E = {1, · · · , k} (i.e. x = 0 initially), P = nullRepeat while E not null and yi < 0 for some i
Exchange indices between E and P while keeping feasibility andreducing the objective function value
Block Principal Pivoting method (Portugal et al. 94 MathComp):Lacks any monotonicity or feasibility but finds a correctactive-passive set partitioning.Guess two index sets P and E that partition {1, · · · , k}Repeat
Let xE = 0 and xP = arg minxP ‖CPxP − b‖22
Then yE = CTE (CPxP − b) and yP = 0
If xP ≥ 0 and yE ≥ 0, then optimal values are found.Otherwise, update P and E .
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
k = 10, Initially P = {1,2,3,4,5}, E = {6,7,8,9,10}Update by CT
P CPxP = CTP b, and yE = CT
E (CPxP − b)
P
P
P
P
P
E
E
E
E
E
0
0
0
0
0
0
0
0
0
0
yx
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP − b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP − b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
0
0
0
0
0
0
0
0
0
0
yx
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP − b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP − b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
P
E
P
P
E
E
E
P
E
E
0
0
0
0
0
0
0
0
0
0
yx
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP − b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
P
E
P
P
E
E
E
P
E
E
+
0
+
+
0
0
0
+
0
0
0
+
0
0
+
+
+
0
+
+
yx
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP − b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
P
E
P
P
E
E
E
P
E
E
+
0
+
+
0
0
0
+
0
0
0
+
0
0
+
+
+
0
+
+
yx
Solved!
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Refined Exchange Rules
Active set algorithm is a special instance of single principalpivoting algorithm (H. Kim and Park, SIMAX 08)
Block exchange rule without modification does not always work.
The residual is not guaranteed to monotonically decrease.Block exchange rule may cycle (although rarely).Modification: if the block exchange rule fails to decrease thenumber of infeasible variables, use a backup exchange ruleWith this modification, block principal pivoting algorithm finds thesolution of NLS in a finite number of iterations.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Structure of NLS problems in NMF (J. Kim and Park, 08)
Matrix is long and thin, solutions vectors short, many right handside vectors.minH≥0 ‖WH − A‖2F
minW≥0∥∥HT W T − AT
∥∥2F
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Efficient Algorithm for minX≥0 ‖CX − B‖2F (J. Kim and Park, 08)
Precompute CT C and CT BUpdate xP and yE by CT
P CPxP = CTP b and yE = CT
E CPxP − CTE b
All coefficients can be retrieved from CT C and CT BCT C and CT B is small. Storage is not a problem.
→Exploit common P and E sets among col. in B in each iteration.X is flat and wide. → More common cases of P and E sets.
Proposed algorithm for NMF (ANLS/BPP):ANLS framework + Block principal pivoting algorithm for NLSwith improvements for multiple right-hand sides
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Sparse NMF and Regularized NMF
Sparse NMF (for sparse H) (H. Kim and Park, Bioinformatics, 07)
minW ,H
‖A−WH‖2F + η ‖W‖2F + β
n∑j=1
‖H(:, j)‖21
,∀ij ,Wij ,Hij ≥ 0
ANLS reformulation (H. Kim and Park, 07) : alternate the following
minH≥0
∥∥∥∥( W√βe1×k
)H −
(A
01×n
)∥∥∥∥2
F
minW≥0
∥∥∥∥( HT√ηIk
)W T −
(AT
0k×m
)∥∥∥∥2
F
Regularized NMF (Pauca, et al. 06):
minW ,H
{‖A−WH‖2F + η ‖W‖2F + β ‖H‖2F
},∀ij ,Wij ,Hij ≥ 0.
ANLS reformulation : alternate the following
minH≥0
∥∥∥∥( W√βIk
)H −
(A
0k×n
)∥∥∥∥2
F
minW≥0
∥∥∥∥( HT√ηIk
)W T −
(AT
0k×m
)∥∥∥∥2
F
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative PARAFAC
Consider a 3-way Nonnegative Tensor T ∈ Rm×n×p+ and
its PARAFAC minA,B,C≥0 ‖T− [[ABC]]‖2Fwhere A ∈ Rm×k
+ , B ∈ Rn×k+ , C ∈ Rp×k
+ .The loading matrices (A,B, and C) can be iteratively estimatedby an NLS algorithm such as block principal pivoting method.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative PARAFAC (J. Kim and Park, in preparation)
Iterate until a stopping criteria is satisfied:minA≥0
∥∥YBCAT − T(1)
∥∥F
minB≥0∥∥YACBT − T(2)
∥∥F
minC≥0∥∥YABCT − T(3)
∥∥F where
YBC = B � C ∈ R(np)×k , T(1) ∈ R(np)×m,YAC = A� C ∈ R(mp)×k , T(2) ∈ R(mp)×n,YAB = A� B ∈ R(mn)×k , T(3) ∈ R(mn)×p unfolded matrices,and F �G(mn)×(k) = [f1 ⊗ g1 f2 ⊗ g2 · · · fk ⊗ gk ] is theKhatri-Rao product of F ∈ Rm×k and G ∈ Rn×k .
Matrices are longer and thinner, ideal for ANLS/BPP.Can be similarly extended to higher order tensors.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Experimental Results (NMF) (J. Kim and Park, 2011)
NMF Algorithms ComparedName Description AuthorANLS-BPP ANLS / block principal pivoting J. Kim and HP 08ANLS-AS ANLS / active set H. Kim and HP 08ANLS-PGRAD ANLS / projected gradient Lin 07ANLS-PQN ANLS / projected quasi-Newton D. Kim et al. 07HALS Hierarchical ALS Cichocki et al. 07MU Multiplicative updating Lee and Seung 01ALS Alternating least squares Berry et al. 06
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Active-set vs. Block principal pivoting (J. Kim and Park, 2011)
0 10 20 30 40 50 60 70 80 90 100
100
iter
ela
pse
d
ATNT, k=10
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
0 10 20 30 40 50 60 70 80 90 10010
0
101
102
103
iter
ela
pse
d
TDT2, k=160
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
0 10 20 30 40 50 60 70 80 90 10010
−1
100
101
102
iter
ela
pse
d
ATNT, k=10
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
0 10 20 30 40 50 60 70 80 90 10010
0
101
102
103
104
iter
ela
pse
d
TDT2, k=160
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
ATNT image data: 10, 304× 400, k = 10, TDT2 text data:19, 009× 3, 087, k = 160Top:time per iteration, bottom:cumulative time
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Residual vs. Execution time (J. Kim and Park, 2011)
0 10 20 30 40 50 60 70 80 90 100
0.84
0.85
0.86
0.87
0.88
0.89
0.9
time(sec)
rela
tive o
bj. v
alu
e
TDT2, k=10
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0 100 200 300 400 500 600 700
0.58
0.59
0.6
0.61
0.62
0.63
0.64
0.65
time(sec)
rela
tive o
bj. v
alu
e
TDT2, k=160
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
TDT2 text data: 19, 009× 3, 087, k = 10 and k = 160
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Residual vs. Execution time (J. Kim and Park, 2011)
0 10 20 30 40 50 60 70 80 90 1000.2
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.3
time(sec)
rela
tive o
bj. v
alu
e
ATNT, k=10
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0 100 200 300 400 500 600 700
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
0.55
time(sec)
rela
tive o
bj. v
alu
e
20 Newsgroups, k=160
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
ATNT image data: 10, 304× 400, k = 10 and20 Newsgroups text data: 26, 214× 11, 314, k = 160
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Residual vs. Execution time (J. Kim and Park, 2011)
0 50 100 150 200 250 300
0.15
0.2
0.25
0.3
0.35
0.4
time(sec)
rela
tive o
bj. v
alu
e
PIE 64, k=80
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0 100 200 300 400 500 600 700
0.15
0.2
0.25
0.3
0.35
time(sec)
rela
tive o
bj. v
alu
e
PIE 64, k=160
HALS
MU
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
PIE 64 image data: 4, 096× 11, 554, k = 80 and k = 160
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
BPP vs HALS : Influence of Sparsity (J. Kim and Park, 2011)
0 100 200 30010
−20
10−10
100
1010
time(sec)
rela
tive o
bj.
valu
e
syn−90, k=160
HALS
ANLS−BPP
0 100 200 30010
−20
10−10
100
1010
time(sec)
rela
tive o
bj.
valu
e
syn−95, k=160
HALS
ANLS−BPP
0 20 40 600
0.2
0.4
0.6
0.8
1
iter
pro
port
ion o
f ele
ments
syn−90, k=160
W sparsity
H sparsity
W change
H change
0 20 40 600
0.2
0.4
0.6
0.8
1
iter
pro
po
rtio
n o
f e
lem
en
ts
syn−95, k=160
W sparsity
H sparsity
W change
H change
Synthetic data 10, 000× 2, 000 created by factors with different sparsitiesLeft: 90% sparsity, Right: 95% sparsity
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank k → k(He, Kim, Cichocki, and Park, in preparation)
Given (W ,H) with k , how to compute (W , H) with k fast?E.g., model selection for NMF clustering
AdaNMFInitialize W and H using W and H
If k > k , compute NMF for A−WH ≈ ∆W ∆H. Set W = [W ∆W ]and H = [H; ∆H]If k < k , initialize W and H with k pairs of (wi ,hi ) with largest‖wihT
i ‖F = ‖wi‖2‖hi‖2
Update W and H using HALS algorithm.Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Model Selection in NMF Clustering (He, Kim, Cichocki, and Park, in preparation)
Consensus matrix based on A ≈WH:
Ctij =
{0 max(H(:, i)) = max(H(:, j))
1 max(H(:, i)) 6= max(H(:, j)), t = 1, . . . , l
Dispersion coefficient ρ(k) = 1n2
∑ni=1∑n
j=1 4(Cij − 12)2, where
C = 1l∑
Ct
Reordered Consensus Matrix, k=3
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Reordered Consensus matrix, k=4
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3 3.5 4 4.5 5 5.5 6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Approximation Rank k
Dis
pe
rsio
n c
oe
ffic
ien
t
Reordered Consensus Matrix, k=5
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Reordered Consensus Matrix, k=6
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
AdaNMF Recompute Warm−restart
4
5
6
7
8
9
10
11
12
Exe
cutio
n tim
e (
seco
nds)
Clustering results on MNIST digit images (784× 2000) by AdaNMFwith k = 3,4,5 and 6. Averaged consensus matrices, dispersioncoefficient, execution time
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank(He, Kim, Cichocki, and Park, in preparation)
0 1 2 3 4 5 6 70.06
0.08
0.1
0.12
0.14
0.16
0.18
Execution time
Re
lative
err
or
Recompute
Warm−restart
AdaNMF
0 2 4 6 8 10 120.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Execution time
Re
lative
err
or
Recompute
Warm−restart
AdaNMF
Relative error vs. exec. time of AdaNMF and “recompute”. Given anNMF of 600× 600 synthetic matrix with k = 60, compute NMF withk = 50,80.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank(He, Kim, Cichocki, and Park, in preparation)
Theorem: For A ∈ Rm×n+ , If rank+(A) > k , then
min ‖A−W (k+1)H(k+1)‖F < min ‖A−W (k)H(k)‖F ,where W (i) ∈ Rm×i
+ and H(i) ∈ Ri×n+ .
0 20 40 60 80 100 120 140 1600
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Approximation Rank k of NMF
Rela
tive O
bje
ctiv
e F
unct
ion V
alu
e o
f N
MF
rank=20
rank=40
rank=60
rank=80
5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Reduced rank k of NMF
Cla
ssifi
catio
n e
rror
Training error
Testing error
Rank path on synthetic data set: relative residual vs. kORL Face image (10304× 400) classification errors (by LMNN) on trainingand testing set vs. k .k -dim rep. HT of training data T by BPP minHT≥0 ‖WHT − T‖F
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
NMF for Dynamic Data (DynNMF) (He, Kim, Cichocki, and Park, in preparation)
Given an NMF (W ,H) for A = [δA A], how to compute NMF(W , H) for A = [A ∆A] fast ?(Updating and Downdating)
DynNMF (Sliding Window NMF)Initialize H as follows:
Let H be the remaining columns of H.Solve min∆H≥0 ‖W ∆H −∆A‖2
F using block principal pivotingSet H = [H ∆H]
Run HALS on A with initial factors W = W and HHaesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
DynNMF for Dynamic Data (He, Kim, Cichocki, and Park, in preparation)
PET2001 data with 3064 images from a surveillance video.DynNMF on 110,592× 400 data matrix each time, with 100 newcolumns and 100 obsolete columns. The residual images track themoving vehicle in the video.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
NMF as a Clustering Method (Kuang and Park, in preparation)
Clustering and Lower Rank Approximation are related.NMF for Clustering: Document (Xu et al. 03), Image (Cai et al. 08), Microarray (Kim & Park 07), etc.
Equivalence of objective functions between k-means and NMF:(Ding, et al., 05; Kim & Park, 08)
minn∑
i=1
‖ai − wSi‖22 = min ‖A−WH‖2F
Si = j when i-th point is assigned to j-th cluster (j ∈ {1, · · · , k})k-means: W : k cluster centroids, H ∈ ENMF: W : basis vectors for rank-k approximation,
H: representation of A in W space(E: matrices whose columns are columns of an identity matrix )NOTE: The equivalence of obj. functions holds when H ∈ E, A ≥ 0.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
NMF and K-meansmin ‖A−WH‖2F s.t. H ∈ E
Paths to solution:K-means: Expectation-MinimizationNMF: Relax the condition on H to H ≥ 0 with orthogonal rows orH ≥ 0 with sparse columns - soft clustering
TDT2 text data set: (clustering accuracy aver. among 100 runs)# clusters 2 6 10 14 18K-means 0.8099 0.7295 0.7015 0.6675 0.6675
NMF/ANLS 0.9990 0.8717 0.7436 0.7021 0.7160
Sparsity constraint improves clustering result (J. Kim and Park, 08):minW≥0,H≥0 ‖A−WH‖2F + η‖W‖2F + β
∑nj=1 ‖H(:, j)‖21
# of times achieving optimal assignment(a synthetic data set, with a clear cluster structure ):
k 3 6 9 12 15NMF 69 65 74 68 44
SNMF 100 100 100 100 97
NMF and SNMF much better than k-means in general.Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
NMF, K-means, and Spectral Clustering (Kuang and Park, in preparation)
Equivalence of objective functions is not enough to explain theclustering capability of NMF:
NMF is more related to spherical k-means, than to k-means→ NMF shown to work well in text data clustering
Spectral clustering→ Eigenvectors (Ng et al. 01), A normalized if needed, Laplacian,...
Symmetric NMF (Ding et al.)→ can handle nonlinear structure, andS ≥ 0 natually captures a cluster structure in S
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Summary/Discussions
Overview of NMF with Frobenius norm and algorithmsFast algorithms and convergence via BCD frameworkAdaptive NMF algorithmsVariations/Extensions of NMF : nonnegative PARAFAC andsparse NMFNMF for clusteringExtensive computational comparisons
NMF for clustering and semi-supervised clusteringNMF and probability related methodsNMF and geometric understandingNMF algorithms for large scale problems, parallelimplementation? GPU?Fast NMF with other divergences (Bregman and Csiszardivergences)NMF for blind source separation? Uniqueness?More theoretical study on NMF especially for foundations forcomputational methods
NMF Matlab codes and papers available athttp://www.cc.gatech.edu/∼hpark andhttp://www.cc.gatech.edu/∼jingu
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Summary/Discussions
Overview of NMF with Frobenius norm and algorithmsFast algorithms and convergence via BCD frameworkAdaptive NMF algorithmsVariations/Extensions of NMF : nonnegative PARAFAC andsparse NMFNMF for clusteringExtensive computational comparisons
NMF for clustering and semi-supervised clusteringNMF and probability related methodsNMF and geometric understandingNMF algorithms for large scale problems, parallelimplementation? GPU?Fast NMF with other divergences (Bregman and Csiszardivergences)NMF for blind source separation? Uniqueness?More theoretical study on NMF especially for foundations forcomputational methods
NMF Matlab codes and papers available athttp://www.cc.gatech.edu/∼hpark andhttp://www.cc.gatech.edu/∼jingu
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
CollaboratorsToday’s talk
Jingu Kim Yunlong He Da KuangCSE, Georgia Tech Math, Georgia Tech CSE, Georgia Tech
Krishnakumar Balasubramanian CSE, Georgia TechProf. Michael Berry EECS, Univ. of Tennessee
Prof. Moody Chu Math, North Carolina State Univ.Dr. Andrzej Cichocki Brain Science Institute, RIKEN, Japan
Prof. Chris Ding CSE, UT ArlingtonProf. Lars Elden Math, Linkoping Univ., Sweden
Dr. Mariya Ishteva CSE, Georgia TechDr. Hyunsoo Kim Wistar Inst.
Anoop Korattikara CS, UC IrvineProf. Guy Lebanon CSE, Georgia Tech
Liangda Li CSE, Georgia TechProf. Tao Li CS, Florida International Univ.
Prof. Robert Plemmons CS, Wake Forest Univ.Andrey Puretskiy EECS, Univ. of Tennessee
Prof. Max Welling CS, UC IrvineDr. Stan Young NISS Thank you!
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Related Papers by H. Park’s Group
H. Kim and H. Park, Sparse Non-negative Matrix Factorizations viaAlternating Non-negativity-constrained Least Squares for MicroarrayData Analysis. Bioinformatics, 23(12):1495-1502, 2007.H. Kim, H. Park, and L. Eldén. Non-negative Tensor FactorizationBased on Alternating Large-scale Non-negativity-constrained LeastSquares. Proc. of IEEE 7th International Conference on Bioinformaticsand Bioengineering (BIBE), pp. 1147-1151, 2007.H. Kim and H. Park, Nonnegative Matrix Factorization Based onAlternating Non-negativity-constrained Least Squares and the ActiveSet Method. SIAM Journal on Matrix Analysis and Applications,30(2):713-730, 2008.J. Kim and H. Park, Sparse Nonnegative Matrix Factorization forClustering, Georgia Tech Technical Report GT-CSE-08-01, 2008.J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: ANew Algorithm and Comparisons. Proc. of the 8th IEEE InternationalConference on Data Mining (ICDM), pp. 353-362, 2008.B. Drake, J. Kim, M. Mallick, and H. Park, Supervised Raman SpectraEstimation Based on Nonnegative Rank Deficient Least Squares. InProceedings of the 13th International Conference on InformationFusion, Edinburgh, UK, 2010.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
Related Papers by H. Park’s Group
A. Korattikara, L. Boyles, M. Welling, J. Kim, and H. Park, StatisticalOptimization of Non-Negative Matrix Factorization. Proc. of TheFourteenth International Conference on Artificial Intelligence andStatistics, JMLR: W&CP 15, 2011.J. Kim and H. Park, Fast Nonnegative Matrix Factorization: anActive-set-like Method and Comparisons, Submitted for review, 2011.J. Kim and H. Park, Fast Nonnegative Tensor Factorization with anActive-set-like Method, In High-Performance Scientific Computing:Algorithms and Applications, Springer, in preparation.Y. He, J. Kim, A. Cichocki, and H. Park, Fast Adaptive NMF Algorithmsfor Varying Reduced Rank and Dynamic Data, in preparation.L. Li, G. Lebanon, and H. Park, Fast Algorithm for Non-Negative MatrixFactorization with Bregman and Csiszar Divergences, in preparation.D. Kuang and H. Park, Nonnegative Matrix Factorization for Sphericaland Spectral Clustering, in preparation.
Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications