Top Banner
Introduction STBM Model Inference Experiences Conclusion The Stochastic Topic Block Model for the Clustering of Vertices in Networks with Textual edges Presenter: Wei JIANG Authors: Bouveyron Charles, Latouche Pierre, Zreik Rawya Machine Learning Journal Club, CMAP September 28th 2017 Wei JIANG STBM September 28th 2017 1 / 46
47

The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Mar 21, 2019

Download

Documents

ĐỗDung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

The Stochastic Topic Block Model for theClustering of Vertices in Networks with

Textual edges

Presenter: Wei JIANG

Authors: Bouveyron Charles, Latouche Pierre, Zreik RawyaMachine Learning Journal Club, CMAP

September 28th 2017

Wei JIANG STBM September 28th 2017 1 / 46

Page 2: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Outline

1 Introduction

2 STBM Model

3 Inference

4 ExperiencesSimulation studyReal-world data

Wei JIANG STBM September 28th 2017 2 / 46

Page 3: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Motivation

Communication betweenindividuals via

Social mediaFacebookTwitterLinkedin

Electronic formatsEmailWebE-publication

FIGURE – Social network diagram dis-playing friendship ties among aset of Facebook user

Network AnalysisWei JIANG STBM September 28th 2017 3 / 46

Page 4: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Network analysis for clustering users in email system

Directed graphNode : usersEdge : sender -> recipientTask : Clustering users basedon person-to-person link only

How to improve⇒ Also take into account the email content.

Wei JIANG STBM September 28th 2017 4 / 46

Page 5: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Structure of paper

Problem : Discovering clusters of vertices⇐ the networkinteractions and the text content.Model : Stochastic topic block model (STBM) - a probabilisticmodel for networks with textual edges.Inference : Classification variational expectation-maximization(C-VEM)Experience : Simulated data to assess the approach andhighlight its features.Real-world data sets to demonstrate the effectiveness.

Wei JIANG STBM September 28th 2017 5 / 46

Page 6: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Outline

1 Introduction

2 STBM Model

3 Inference

4 ExperiencesSimulation studyReal-world data

Wei JIANG STBM September 28th 2017 6 / 46

Page 7: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Context and notations

Directed networkM verticesA : M ×M adjacency matrix.

Aij =

{1, if there is an edge from i to j0, otherwise

If Aij = 1, then this edge is characterized by a set of Dijdocuments : Wij =

(W d

ij

)d

Each document is made by a collection of Ndij words :

W dij =

(W dn

ij

)n

W = (Wij )ij The set of all documents exchanged for all the edges

Wei JIANG STBM September 28th 2017 7 / 46

Page 8: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Goal

Cluster the vertices into Q latent groups sharing the same connectionprofiles

Presence of edgesDocuments between pairs of vertices

⇔ estimate Y = (Y1, · · · ,YM) of latent variable Yi s.t

Yiq =

{1, vertex i belongs to cluster q0, otherwise

Wei JIANG STBM September 28th 2017 8 / 46

Page 9: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Assumptions

Any kind of relationships between two vertices can be explainedby their latent clusters only.Words in documents are drawn from a mixture distribution overtopics, each document d having its own vector of topicproportions θd .

Wei JIANG STBM September 28th 2017 9 / 46

Page 10: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Overview of STBM Model

FIGURE – Graphical representation of the stochastic topic block model

Wei JIANG STBM September 28th 2017 10 / 46

Page 11: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Modeling the presence of edges

Stochastic block model (Wang & Wong 1987 ; Nowicki & Snijders2001)

Yi ∼M(1, ρ = (ρ1, · · · , ρQ))

Aij |YiqYjr = 1 ∼ B(πqr )⇒ π the Q ×Q matrix of connection probabilities

⇒ p(A,Y |ρ, π) = p(A|Y , π)p(Y |ρ)

Wei JIANG STBM September 28th 2017 11 / 46

Page 12: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Modeling the construction of documents

Latent Dirichlet Allocation (Blei et al. 2003 )Pair of clusters (q, r) of vertices→ vector of topic proportionsθqr =

(θqrk)

k ∼ Dir(α = (α1, · · · , αK ))Here all components of α are fixed to 1.The nth word of d th document between vertex i and j : W dn

ij →Latent topic vectorZ dn

ij |{YiqYjr Aij = 1, θ} ∼ M(1, θqr = (θqr1, · · · , θqrK ))

W dnij |Z dnk

ij = 1 ∼M(1, βk = (βk1, · · · , βkV ))

⇒ Mixture model for words over topics

W dnij |{YiqYjr Aij = 1, θ} ∼

K∑k=1

θqrkM(1, βk )

Wei JIANG STBM September 28th 2017 12 / 46

Page 13: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Modeling the construction of documents

AssumeAll the latent variables Z dn

ij are sampled independently.

Given the latent variables, the words W dnij are independent.

Denote Z =(Z dn

ij

)ijdn ⇒ Joint distribution

p(W ,Z , θ|A,Y , β) = p(W |A,Z , β)p(Z |A,Y , θ)p(θ)

Wei JIANG STBM September 28th 2017 13 / 46

Page 14: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

STBM ModelThe full joint distribution of STBM model

p(A,W ,Y ,Z , θ|ρ, π, β) = p(W ,Z , θ|A,Y , β)p(A,Y |ρ, π)

FIGURE – Graphical representation of the stochastic topic block model

Wei JIANG STBM September 28th 2017 14 / 46

Page 15: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Examples

Wei JIANG STBM September 28th 2017 15 / 46

Page 16: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Examples

The simulated messages (150 words) are from four texts from BBCnews :

1 The birth of Princess Charlotte2 Black holes in astrophysics3 UK politics4 Cancer diseases in medicine

Wei JIANG STBM September 28th 2017 16 / 46

Page 17: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Key Property of STBM Model

Assume that Y is available.Recognize documents in W s.t W = (W̃qr )qr

⇒ All words in W̃qr share the same mixture distribution overtopic.⇒Words in W are drawn from LDA model with D = Q2

independent documents W̃qr .p(A,Y |ρ, π) involves sampling of the clusters + construction ofbinary variables describing presence of edges⇒ correspond to likelihood of SBM model.

For given Y , the full joint distribution factorizes into LDA like term andSBM like term.

Wei JIANG STBM September 28th 2017 17 / 46

Page 18: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Outline

1 Introduction

2 STBM Model

3 Inference

4 ExperiencesSimulation studyReal-world data

Wei JIANG STBM September 28th 2017 18 / 46

Page 19: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Aim : maximize log-likelihood

First fix the number of groups Q and number of topics K

log p(A,W ,Y |ρ, π, β) = log∑

Z

∫θ

p(A,W ,Y ,Z , θ|ρ, π, β)dθ

Model parameters (ρ, π, β)

Z and θ are latent variables.Y = (Y1, · · · ,YM) is seen as a set of binary vectors for which weaim at providing estimates. (Motivated by the key property ofSTBM)

Wei JIANG STBM September 28th 2017 19 / 46

Page 20: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Variational decomposition of log-likelihood

log p(A,W ,Y |ρ, π, β) = L(R(·); Y , ρ, π, β)+KL(R(·)||p(·|A,W ,Y , ρ, π, β))

KL : the Kullback-Leibler divergence between the true andapproximate posterior distribution R(·) of (Z , θ), given the data andmodel parameters.

KL(R(·)||p(·|A,W ,Y , ρ, π, β)) = −∑

Z

∫θ

R(Z , θ) log p(Z ,θ|A,W ,Y ,ρ,π,β)R(Z ,θ) dθ

⇒ Maximizing the lower bound L w.r.t R(Z , θ) induces a minimizationof KL divergence.

Wei JIANG STBM September 28th 2017 20 / 46

Page 21: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Model decomposition of lower bound L

Recall STBM property : The set of latent variables in Y allows the fulljoint distribution be decomposed to the sampling of Y and A+ construction of documents given A and Y.

L(R(·); Y , ρ, π, β) = L̃(R(·); Y , β) + log p(A,Y |ρ, π)

where

L̃(R(·); Y , β) =∑

Z

∫θ

R(Z , θ) logp(W ,Z , θ|A,Y , β)

R(Z , θ)dθ

⇒ For given Y, the two terms can be maximized independently.

Wei JIANG STBM September 28th 2017 21 / 46

Page 22: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

C-VEM algorithm

Aim : Maximize the lower bound L.C-VEM algorithm alternates between the optimization of R(Z , θ), Yand (ρ, π, β).

1 Estimate of R(Z , θ)

Update R(Z dnij ) and R(θ) of the E-step of VEM

2 Estimate of model parameters (ρ, π, β)

Maximize the lower bound L ⇒ β only in L̃ ; ρ, π only in SBMlog-likelihood. (M-step)

3 Estimate of YFix (ρ, π, β) and R(Z , θ) ⇒ Find Y maximizing LTest QM possible cluster assignments ⇒ on line clustering methods

(Classification)

Wei JIANG STBM September 28th 2017 22 / 46

Page 23: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Initialization strategy : Multiple initializations

(Biernacki et al. 2003)For several initializations of a k-means like algorithm on a distancematrix between vertices

1 VEM for LDA is applied on all documents i → j ⇒ Xij = k if k isthe majority topic.

2 distance matrix

∆(i , j) =M∑

h=1

δ(Xih 6= Xjh)AihAjh +M∑

h=1

δ(Xhi 6= Xhj )AhiAhj

Look at all possible edges i → j towards a third vertex h⇒compare the edge type

The distance matrix computes the number of discordances in the wayboth i and j connect to other vertices or vertices connect them.

Wei JIANG STBM September 28th 2017 23 / 46

Page 24: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Model selection

Model selection problem : Estimating number of groups Q andnumber of topics KCriterion : ICL (Biernacki et al. 2000)

Wei JIANG STBM September 28th 2017 24 / 46

Page 25: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Outline

1 Introduction

2 STBM Model

3 Inference

4 ExperiencesSimulation studyReal-world data

Wei JIANG STBM September 28th 2017 25 / 46

Page 26: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Simulation setup

Wei JIANG STBM September 28th 2017 26 / 46

Page 27: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Simulation setup

The simulated messages (150 words) are from four texts from BBCnews :

1 The birth of Princess Charlotte2 Black holes in astrophysics3 UK politics4 Cancer diseases in medicine

Wei JIANG STBM September 28th 2017 27 / 46

Page 28: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Introductory example on scenario CRun C-VEM for STBM on network of scenario C with the actualnumber of groups and topics⇒ Both network structure and the topicinformation should be correctly recovered.

FIGURE – Clustering result for the introductory example (scenario C)

Wei JIANG STBM September 28th 2017 28 / 46

Page 29: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Introductory example on scenario C

Evolution of the lower bound L along iterations (top-left)The most frequent words in the 3 found topics (left-bottom)The estimated model parameters (ρ, π) (right)

Wei JIANG STBM September 28th 2017 29 / 46

Page 30: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Introductory example on scenario C

FIGURE – Summary of connexion probabilities between groups (π, edge widths), groupproportions (ρ, node sizes) and most probable topics for group interactions(edge colors).

Wei JIANG STBM September 28th 2017 30 / 46

Page 31: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Experiment on model selection

Percentage of selections by ICL for each STBM model (Q,K ) on50 simulated networks of each of three scenarios.Highlighted rows and columns correspond to the actual valuesfor Q and K

Wei JIANG STBM September 28th 2017 31 / 46

Page 32: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Benchmark study

Run SBM, LDA and STBM on 20 networks simulated according to thethree scenarios. Average ARI values (Rand, 1971) are reported withstandard deviations for both node and edge clustering.

Easy : same as the previous simulations of three scenarios.

Hard 1 : the communities are very few differentiated (piqq = 0.25and πq 6=r = 0.2.

Wei JIANG STBM September 28th 2017 32 / 46

Page 33: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Simulation study

Benchmark study

Hard 2 : 40% of message words are sampled in different topicsthan the actual topic.

The joint model of network structure and topics allows to recover thecomplex hidden structure in a network with textual edges.

Wei JIANG STBM September 28th 2017 33 / 46

Page 34: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

Enron data set https://www.cs.cmu.edu/~./enron/Email communications between 149 employees from 1999-2002.All messages sent between 2 individuals were coerced in a singlemeta-message⇒ 1234 directed edgesRun V-CEM for STBM, for number of groups Q = 1 : 14 and numberof topics K = 2 : 20⇒ Model selection (Q,K ) = (10,5)

FIGURE – Clustering result with STBM on the Enron data set (Sept.-Dec. 2001)Wei JIANG STBM September 28th 2017 34 / 46

Page 35: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

Enron email network

FIGURE – Most specific words for the 5 found topics with STBM on the Enron data set.

1 Financial and trading activity2 Enron activities in Afghanistan3 California electricity crisis4 Usual logistic issues (building equipment, computers, ...)5 technical discussions on gas deliveries

Wei JIANG STBM September 28th 2017 35 / 46

Page 36: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

Enron email network

Group 10 contains a single individual whohas a central place in the networkfrequently discusses about logistic issues (topic 4) with groups 4, 5,6 and 7.

Wei JIANG STBM September 28th 2017 36 / 46

Page 37: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

Enron email network

Group 8 contains 6 individuals who mainly communicate aboutEnron activities in Afghanistan (topic 2) between them and withother groups.Group 4 and 6 are more focused on trading activities (topic 1).Group 1, 3 and 9 deal with technical issues on gas deliveries(topic 5).

Wei JIANG STBM September 28th 2017 37 / 46

Page 38: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

Enron email network

FIGURE – Clustering results with SBM (left, Q = 8) and STBM (right) on the Enron dataset.

Some clusters found by SBM (ex. red) have been split by STBMsince some nodes use different topics than the rest.SBM isolates two "hubs" (light green)↔ STBM identify a unique"hub" and the second is gathered with others using similar topics.

Wei JIANG STBM September 28th 2017 38 / 46

Page 39: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

Enron email network

FIGURE – Clustering results with SBM (left, Q = 8) and STBM (right) on the Enron dataset.

STBM allows a better and deeper understanding of the Enronnetwork.

Wei JIANG STBM September 28th 2017 39 / 46

Page 40: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

NIPS co-authorship network

1988-2003 editions(Nips 1-17 http://robotics.stanford.edu/~gal/data.html)contains the abstracts of 2 484 accepted papers from 2740contributing authors.⇒ undirected network between 2740 authors with 22640 textualedges.Model selection by ICL : (Q,K ) = (13,7)

Wei JIANG STBM September 28th 2017 40 / 46

Page 41: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

NIPS co-authorship network

FIGURE – Clustering result with STBM on the Nips co-authorship network

Wei JIANG STBM September 28th 2017 41 / 46

Page 42: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

NIPS co-authorship network

FIGURE – Most specific words for the 5 found topics with STBM on the Nips co-authorship network.

Wei JIANG STBM September 28th 2017 42 / 46

Page 43: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Real-world data

NIPS co-authorship network

STBM has proved its ability to bring out concise and relevantanalyses on the structure of a large and dense network.

Wei JIANG STBM September 28th 2017 43 / 46

Page 44: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Conclusion

STBM : modeling and clustering vertices in networks with textualedges

directed or undirected networkapplication to various types of network

C-VEM : model inferenceICL : model selectionNumerical experiments on simulated dataTwo real worlds networks

large co-authorship network → scalability

Wei JIANG STBM September 28th 2017 44 / 46

Page 45: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Authors

Bouveyron Charleshttp://w3.mi.parisdescartes.fr/~cbouveyr/

Pierre Latouchehttp://samm.univ-paris1.fr/Pierre-Latouche

Zreik Rawyahttp://samm.univ-paris1.fr/Rawya-ZREIK

Wei JIANG STBM September 28th 2017 45 / 46

Page 46: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

"Linkage"

https://linkage.fr/

Wei JIANG STBM September 28th 2017 46 / 46

Page 47: The Stochastic Topic Block Model for the Clustering of Vertices in ...zoltan.szabo/jc/2017_09_28_Wei_Jiang.pdf · IntroductionSTBM ModelInference ExperiencesConclusion Outline 1 Introduction

Introduction STBM Model Inference Experiences Conclusion

Wei JIANG STBM September 28th 2017 46 / 46