Consistent estimation of Mixed Memberships with Successive Projections Maxim Panov joint work with E. Marshakov, R. Ushakov and N. Mokrov Skoltech and IITP 15.05.2018

Consistent estimation of Mixed Memberships with Successive · Graph models Mixed membership stochastic block model

Jan 03, 2020



Consistent estimation of Mixed Memberships withSuccessive Projections

Maxim Panovjoint work with E. Marshakov, R. Ushakov and N. Mokrov

Skoltech and IITP


Community detectionProblem statement

Graph G (E ,V ):I nodes vj ;I edges Aij .

Problem: we want to partition graph in such a way that there are few edgesbetween groups.

Community detectionOverlapping communities

Non-overlapping vs. overlapping communities

Graph modelsErdos-Renyi graph

Simplest possible random graph model

Aij = Bernoulli(p),

where Aij are independent and p ∈ [0, 1].

Figure: Erdos-Renyi graph with p = 0.5.

Graph modelsGeneralized Erdos-Renyi graph

Simple generalization of Erdos-Renyi model:

Aij = Bernoulli(pij),

where pij ∈ [0, 1].

In a matrix form we can write

A ∼ Bernoulli(P),

where P = {pij}ni,j=1.

Question: what types of matrix P allow for community structure?

Graph modelsStochastic block model (SBM)

Figure: Example of stochastic block model and corresponding graph.

Graph modelsMixed membership stochastic block model (MMSB)

Graph edges are generated according generalized Erdos-Renyi model:

A ∼ Bernoulli(P).

The probability matrix P can be factorized as



B ∈ [0, 1]K×K is a symmetric matrix of community-community probabilities;

Θ ∈ [0, 1]n×K is a community membership matrix.

ConditionWe assume that

1 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n;

2 (optional) All the community membership vectors are independent draws fromDirichlet distribution, i.e. 𝜃i ∼ Dirichlet(𝛼) for some 𝛼 ∈ RK

+, i = 1, . . . , n.

Graph modelsMMSB examples

As discussed, in MMSB model the probability matrix is


It means that

pij =K∑


𝜃ik𝜃jlbkl .

SBM is particular case of MMSB with the property that for any i ∈ 1, n thereexists k ∈ 1,K such that

𝜃ik = 1 and 𝜃il = 0, k = l

leading to

pij = bkl

Graph modelsIdentifiability of MMSB

Problem: If our goal is estimation of parameters Θ and B, whether the truevalues are unique?

Answer: Of course not, for example


P(1) = M1 I3 M1T = I3 M2 I3 = P(2),

where I3 is an identity matrix of size 3.

Graph modelsIdentifiability of MMSB

Condition (Identifiability)

1 There is at least one “pure” node at each community, i.e. for eachk = 1, . . . ,K there exists i such that 𝜃ik =

∑Kl=1 𝜃il = 1.

2 Matrix B ∈ [0, 1]K×K is full rank.

3 Every row of matrix Θ sums to 1:∑K

k=1 𝜃ik = 1, i = 1, . . . , n.


If the Condition (Identifiability) is satisfied then the MMSB is identifiable, i.e. forevery P = ΘBΘT matrices Θ and B are uniquely defined up to permutation ofcommunities (columns of matrix Θ and rows and columns of matrix B).

Algorithms for parameter estimation in MMSB

There exist several algorithms for parameter estimation in MMSB:

stochastic variational inference (Airoldi at al., 2009; SVI);

tensor spectral method (Anandkumar et al., 2013; Tensor);

geometrical nonnegative matrix factorization (Mao et al., 2013; GeoNMF).

Problems of these methods:

absence of provable guarantees (SVI);

high computational complexity (SVI, Tensor);

applicability only to limited subclass of MMSB (GeoNMF).

Recently, couple of algorithms were proposed (SPACL by Mao et al. andMixed-SCORE by Jin et al.), which are based on the ideas very similar to ours.

Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

To account for sparsity:


where 𝜌 > 0 is a sparsity parameter and we restrict maxk,l Bk,l = 1.

Spectral decomposition of probability matrix (exact):


We can conclude that

U = ΘF,

where F ∈ RK×K is some full rank matrix.

Successive projection overlapping clustering (SPOC)Spectral properties of probability matrix

We can proceed with decomposition

U = ΘF.

Importantly, rows ui of matrix U lie in simplex:

−0.125 −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050










Successive projection overlapping clustering (SPOC)Successive projection algorithm

Question: How to detect simplex?

Answer: Successive projection algorithm (Araujo et al., 2001; Gillis and Vavasis,2014):

1 Find the point with the maximal norm: j* = arg maxj ‖uj‖.

2 fj = uj* .

3 U = U(I − fTj fj



4 Iterate

The final output is matrix F =(fj)Kj=1


Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Spectral decomposition of probability matrix (approximate):


where L ∈ RK×K is diagonal matrix of top-K eigenvalues and U ∈ Rn×K is matrixof corresponding eigenvalues.


U = ΘF + N,

where F ∈ RK×K is some full rank matrix.

Successive projection overlapping clustering (SPOC)Spectral properties of adjacency matrix

Importantly, rows ui of matrix U approximately lie in simplex:

−0.10 −0.05 0.00 0.05 0.10






So, we can compute estimate F of matrix F by SPA algorithm.

Successive projection overlapping clusteringResulting estimates

Estimate of the community-community matrix:


Estimate of community membership matrix:

Θ = UF−1.

Question: What about the efficiency of estimates?

Successive projection overlapping clustering (SPOC)

Algorithm 1 SPOC

Require: Adjacency matrix A and number of communities K .Ensure: Estimated 𝜌, Θ, B.

1: Get the rank-K eigenvalue decomposition A ≃ ULUT.2: Run SPA algorithm with input U, which outputs set of indices J of cardinality

K .3: F = [J, :].4: B = FLFT.5: 𝜌 = maxij Bij .

6: B = 1𝜌 B.

7: Θ = UF−1.

Provable efficiencyDavis-Kahan theorem

Lemma (Variant of Davis-Kahan)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest nonzerosingular value 𝜆K (P).

Let A be any symmetric matrix and U,U ∈ Rn×K be the K leadingeigenvectors of A and P, respectively.

Then there exists a K × K orthogonal matrix OP such that

‖U−UOP‖F ≤ 2√

2K‖A− P‖𝜆K (P)


Provable efficiencyConcentration in spectral norm

Lemma (Lei and Rinaldo, 2015)

Let A be the adjacency matrix of a random graph on n nodes in which edgesoccur independently.

Set E[A] = P = (pij)i,j=1,...,n and assume that nmaxij pij ≤ d for d ≥ c0 log nand c0 > 0.

Then, for any r > 0 there exists a constant C = C (r , c0) such that

‖A− P‖ ≤ C√d

with probability at least 1 − n−r .

Provable efficiencyQuality of SPA

Theorem (Gillis and Vavasis, 2014)

Let G = FW and G = G + N. Suppose that K ≥ 2 and the Condition 2 issatisfied. If in matrix N each column ni satisfies ‖ni‖F ≤ 𝜀 with

𝜀 ≤ 𝜆min(F)


then SPA algorithm with the input (G, r) returns the set of indices J such thatthere exists a permutation 𝜋 which gives

‖gJ(j) − f𝜋(j)‖2 ≤ (432𝜅(F) + 4)𝜀

for all j = 1, . . . , r , where gk and fk are the columns of matrices G and F

correspondingly. Here we denote by 𝜅(F) = 𝜆max (F)𝜆min(F)

is the condition number of the

matrix F.

Provable efficiencyBeyond Davis-Kahan

Lemma (Panov et al., 2017)

Assume that P ∈ Rn×n is a rank K symmetric matrix with smallest non-zerosingular value 𝜆K (P).

Let A be any symmetric matrix such that ‖A− P‖ ≤ 12𝜆K (P) and U,U are

the n × K matrices of eigenvectors for matrices A and P corresponding totop-K eigenvalues.


‖eTi (U−UOP)‖F ≤ 23K 1/2𝜅(P)‖eTi A‖F · ‖A− P‖

𝜆2K (P)

+‖eTi (A− P)U‖F

𝜆K (P),

where ei is a vector of length n with 1 in the i-th position and OP is someorthogonal matrix.

Provable efficiencyFinal theorem

Theorem (Panov et al., 2017)

There exist constants c and C depending only on the condition numbers of thematrices B and Θ and parameter r such that for 𝜌 ≥ c log n

n it holds with aprobability at least 1 − n−r that


‖𝜌B‖F≤ CK

√log n


and Θ−ΘΠT


‖Θ‖F≤ CK

√log n


where Π is some permutation matrix and 𝜌 is maximal value in matrix B.

Provable efficiencyLower bound


Consider the MMSB model. Then there exists a constant c > 0 that for𝜌 ≥ c log n

n the following lower bounds for matrices Θ, B hold





‖Θ‖F≥ CΘ


)> 0.1,




(‖𝜌B− 𝜌B‖F

‖𝜌B‖F≥ CB



)> 0.1,

where CΘ,CB > 0 are some constants.

Provable efficiencyOpen question

We currently have the gap between lower and upper bounds form matrix B:


𝜌n≤ inf


‖𝜌B− 𝜌B‖F‖𝜌B‖F

≤ C1



The idea for improved algorithm:

ExperimentsModel data

Default parameter settings:

number of nodes n = 5000;

number of communities K = 3;

pure nodes number 3;

Dirichlet parameter 𝛼 = 1/3;

Community-community matrix B = diag(0.3, 0.5, 0.7).

We consider several experiments.Each experiment was repeated 20 times and results were averaged over runs.

ExperimentsModel data

Figure: Experiment with varying number of nodes n.

ExperimentsModel data

Figure: Experiment with noisy off-diagonal elements of B.

ExperimentsModel data

Figure: Experiment with skewed B matrix.

ExperimentsReal data

Figure: Experiments on DBLP co-authorship networks.

Conclusions and outlook


We proposed the algorithm SPOC for parameter estimation in MMSB whichis computationally efficient.

Theoretical guarantees on performance are provided.

The algorithm is still not perfect as well as analysis.

Outlook:It is interesting to extend the results to the cases of

dynamical networks;

multiplex networks.

Maxim Panov (Skoltech) Overlapping community detection 15.05.2018 31 / 31