Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L ... · Input: A, Dns, number of signal edges s and signal vertices m Output: ˆy, Sˆ ... Large Graph Classiﬁcation: Theory

Conjecture

DPE for Graph Classification consistently finds JProof.in progress...

References

[1] Vogelstein, et al. Are mental properties supervenient on brainproperties? Nature Scientific Reports, 2011.[2] Vogelstein, et al. Graph Classification using Signal Subgraphs:Applications in Statistical Connectomics. Submitted to IEEE PAMI(available on arxiv).[3] Vogelstein, et al. Shu✏ed Graph Classification: Theory andConnectome Applications To be submitted to IEEE PAMI (and arxiv) anyday now (available upon request)....[4] Vogelstein, et al. Fast Inexact Graph Matching with Applications inStatistical Connectomics To be submitted to IEEE PAMI (and arxiv) anyday now (available upon request)....[5] Sussman, et al. A consistent dot product embedding for stochasticblockmodel graphs. Submitted to JASA (available on arxiv).[6] Priebe, et al. Optimizing the quantity/quality trade-o↵ in connectomeinference. Communications in Statistics - Theory and Methods, toappear.

Random Variables

Adjacency Matrix: A : ⌦ ! A ⇢ {0, 1}nv⇥n

v

Latent In-Vectors: X = ⌦ ! X ✓ Rd⇥n

v

+

Latent Out-Vectors: Y = ⌦ ! Y ✓ Rd⇥n

v

+

Parameter ✓ = (⇢, ⌧)

In- and Out- Vec Likelihood: ⇢X

,⇢Y

2 43

Block Membership Function: ⌧ : [nv

] ! [3]

Sampling Distribution

(A,Y ),Dn

s

= {(Ai

,Yi

)}i2[n

s

] ⇠ FA,Y

FA,Y =

Y

(u,v)2E

Bern�auv

; hXu

,Yv

i�⇢X

(⌧u

)⇢Y

(⌧v

) 2 FA,Y

Input: A, Dn

s

, dOutput: y , ⌧ (and nuisance parameters ⇢

X

and ⇢Y

)1: Let A

y

= 1n

y

Pi :y

i

=y

Ai

be the average adjacency matrix forclass y

2: Let [eUy

, eDy

, eVy

] = SVD([Ay

]) keeping only the d triplets withlargest singular values.

3: Cluster eU and eV using a perfect K -means clustering algorithmforcing one cluster to have vertices from all classes, and onecluster for each class.

4: Let ⌧ be the cluster assignments for each of the vertices5: Do a DPE for A, and cluster each vertex accordingly. Let J be

the cluster of vertices that are informative with regard to theclassification task.

6: Let

y = argmaxy2Y

Y

u,v2JBern

�auv

; hXu

, Yv

i�⇢X

(⌧u

)⇢Y

(⌧v

)

Definitions

Adjacency Matrices: A,B 2 Rn⇥n

Permutation Matrices: Q = {Q : Q1 = 1,QT1 = 1,Q 2 {0, 1}n⇥n}Doubly Stochastic Matrices: D = {D : D1 = 1,DT1 = 1, d

uv

� 0}

Objective Function

(QAP) Q = argminQ2Q

��A� QBQT��F

= argminQ2Q

hA,QBQTi

Input: A,BOutput: Q1: for i = 1, . . . , i

max

do2: Let Q

i1 be either 1T1/n, I , or something near I .3: Use the Frank-Wolfe algorithm to find a local optimum of

the following relaxed quadratic assignment problem (rQAP):

(rQAP) Qi2 = argmin

D2DhA,QBQTi.

4: Project D onto the Q using the Hungarian Algorithm toobtain Q

i

.5: end for6: Let

Q = argmini2[i

max

]hA,Q

i

BQTi

i

TheoremrQAP has the same minimum as QAP whenever A and B are theadjacency matrices of simple graphs isomorphic to one another.

Proof.The set of doubly stochastic matrices is the convex hull of the setof permutation matrices. Thus, if a permutation matrix minimizesrQAP then it also minimizes QAP. Moreover, hA,Ai = 2m (wherem =

PAuv

). Thus, it is su�cient to show thathA,DBDTi > hA,Ai = 2m. This follows because (DBDT)

uv

1.NB: This is parallel to rLAP being equivalent to LAP.

Random Variables

Adjacency Matrix: A : ⌦ ! A ✓ {0, 1}nv⇥n

v

Permutation Matrix: Q : ⌦ ! Q = {Q : quv

2 {0, 1},Q1 = 1,QT1 = 1}Graph Class: Y : ⌦ ! Y = [n

y

]


FQ,A,Y (a, y ; ✓) = FQFA|Y FY = F

A|Y FYUni(Q)

(Q,A,Y ),Dn

s

= {(Qi

,Ai

,Yi

)}i2[n

s

]iid⇠ FQ,A,Y 2 FQ,A,Y

Random Variables

Adjacency Matrix: A : ⌦ ! A ✓ {0, 1}nv⇥n

v

Graph Class: Y : ⌦ ! Y = [ny

]

Parameter ✓ = (P,⇡,S)Edge Probabilities: P = (p

uv |y ) 2 (0, 1)nv⇥n

v

⇥n

y

Class Priors: ⇡ = {⇡0, . . . ,⇡nY

} 2 4n

y

Signal Subgraph: S = {(u, v) : puv |y

i

6= puv |y

j

8yi

6= yj

} ✓ P(n2v

)


FA,Y (a, y ; ✓) =

Y

uv2SBern(a

uv

; puv |y )⇡y ⇥

Y

(u,v)2E\S

Bern(auv

; puv

)⇡y

(A,Y ),Dn

s

= {(Ai

,Yi

)}i2[n

s

]iid⇠ F

A,Y 2 FA,Y

Let Le�s

be the misclassification rate of the above algorithm.Let eL⇤ be the Bayes optimal misclassification rate for shu✏edgraphs.

TheoremL

e�s

! eL⇤ as s ! 1

Proof.Because the joint space of adjacency matrices, permutationmatrices, and graph classes has finite cardinality, the law of largenumbers ensures that eventually as s ! 1, the plurality of nearestneighbors to a test graph will be identical to the test graph.

TheoremS ! S as n

s

! 1

Proof.A and Y are finite, so by the law of large numbers,T(i) ! " > 0 8i 2 S and T(i) ! 0 8i /2 S.

Graph Matched Frobenius Norm ks

Nearest NeighborAlgorithm

Input: A, a rule for ks

as s ! 1 such that ks

/s ! 0 andk

s

! 1, Dn

s

Output: y

1: Compute the graph-matched Frobenius norm distance betweenA and each training graph:

e�i

= argminQ2Q

��A� QA

i

Q

T��2

F

2: Rank the distances in decreasing order:e�(1) e�(2) · · · e�(n

s

).3: Let

y = argmaxy2Y

X

i :yi

=y

I{d(i) k}

Input: A, Dn

s

, number of signal edges s and signal vertices mOutput: y , S (and nuisance parameters P, ⇡)1: Compute the significance of each edge using Fisher’s Exact

Test, yielding T(1) � T(2) � · � T(nE

) using Dn

s

.2: Rank edges by significance with respect to each vertex,

E

k,(1) � E

k,(2) � . . . � E

k,(n�1) for all k 2 V .3: while not converged do4: Increase critical value c from T(i) to T(i+1)

5: Compute vertex score: wv ;c =

Pu2[V ] I{Tv ,u > c} for each

vertex6: converge if

Pv2[m] wv ;c � s

7: end while8: Let S be the set of s most significant edges incident to the m

best scoring vertices.9: Let y = argmax

y2YQ

(u,v)2S Bern(auv

; puv |y )⇡y

DPE for Graph Classification

Setting

We observe a collection of graphs and their associated classes. Thevertices may be labeled or unlabeled. We assume that only a subsetof vertices are informative with regard to the classification task.

GoalFor a novel graph, find its most likely class and which verticesencode the class-conditional signal.

Statistical Connectomics Application

Classify arbitrarily large graphs, include those with and withoutvertex labels, without necessitating graph matching or estimatingO(n2) parameters.

Fast Inexact Graph Matching

Setting

We observe a pair of unlabeled graphs.

GoalFind the isomorphism that matches the graphs optimally.


A subroutine of our shu✏ed graph classifier.

Shu✏ed Graph Classification

Setting

We observed a collection of graphs without labeled vertices andassociated graph classes.

GoalFor a novel graph, find its most likely class.


Classify brain-graphs for which vertices lack labels. This includescollections of brain-graphs across species or whenever verticesrepresent vertebrate neurons.

Labeled Graph Classification

Setting

We observed a collection of graphs with labeled vertices andassociated graph classes. We assume that only a subset ofedges/vertices are informative with regard to the classification task.

GoalFor a novel graph, find its most likely class and whichedges/vertices encode the class-conditional signal.


Classify brain-graphs for which vertices are labeled (for example,invertebrate brain-graphs where vertices represent neurons orvertebrate brain-graphs where vertices represent brain regions) andfind which edges/vertices encode various cognitive/behavioralproperties.

0 50 100 150 200 250 3000

0.2

0.4

0.6

0.8

1

# training samples

mis

sed−

edge

rate

0 50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

# training samples

mis

clas

sific

atio

n ra

te

cohincnb

100 101 102 1030

0.25

0.5

log assumed # of signal edges

mis

clas

sific

atio

n ra

te

incoherent estimator

Lnb

=0.41

Li nc

=0.27

L⇡

= 0.5

assumed # of signal edges

assu

med

# o

fsi

gnal

ver

tices

coherent estimator

Lcoh

=0.16

200 400 600 8001000

10

20

300.16

0.3

0.4

0.5

100 1020

0.160.25

0.5

log assumed # of signal edges

mis

clas

sific

atio

n ra

te

assumed m=12 coherent estimator

Lc oh

=0.16

assumed # signal edges

assu

med

# o

fsi

gnal

ver

tices

zoomed in coherent estimator

400 500 600

15

18

21

0.16

0.3

0.4

0.5

coherent signal subgraph estimate

verte

x

vertex20 40 60

20

40

60

threshold

coherogram

0.04 0.14 0.29 0.55

20

40

600

10

20

30

Figure: (Left) Simulation demonstrates that the coherent classifieroutperforms the incoherent classifiers as a function of sample size.(Right) MR connectome sex signal subgraph estimation and analysis. Bycross-validating over hyperparameters and models, we estimate that the“best” coherent signal subgraph (for this inference task on these data)has m

coh

= 12 and s

coh

= 360, achieving L

coh

= 0.16.

10−2

10−1

100

Erro

r

Approximate QAP Performance on QAP Benchmark Library

chr1

2c

chr1

5ach

r15c

chr2

0bch

r22b

esc1

6bro

u12

rou1

5

rou2

0ta

i10a

tai1

5ata

i17a

tai2

0ata

i30a

tai3

5ata

i40a

QAP100 QAP3 QAP1 PSOA

chemical electrical unit

Accuracy 100 (0) 59 (0.30) %Restarts 3 (0) 25 (6.7) #Solution Time 42 (0.42) 79 (20) sec.

0 10 20 30 40 500.35

0.4

0.45

0.5

0.55

�

�

⇡

� 0

number of training samples

mis

clas

sific

atio

n ra

te

Connectome Classifier Comparison

Figure: Connectome misclassification rates for various classifiers. 2000Monte Carlo sub-samples of the data were performed for each s, suchthat errorbars were neglibly small. Five classifiers were compared: � is thek

s

NN classifier labeled graphs; is the k

s

NN on a collection of graphinvariants, ⇡ is chance, and �0 is the k

s

NN on shu✏ed graphs (withoutgraph-matching).

Large Graph Classification: Theory and Statistical Connectomics Applications

Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L. Sussman & Carey E. Priebe | JHU, Dept of Applied Math & Statistics

Model Alg TheoryTask Data

TheoremIf, for all channels c, d (c) is known, then almost always"(n) 2 O(log n/n). If, for any color c, d (c) is unknown then almostalways "(n) 2 O(n�1/4).

Proof.The proof of this theorem is an extension of the above proof, withthe following generalizations: (i) multiple dependent channels areincorporated, (ii) d (c) need not be specified (although doing sospeeds up convergence), (iii) K need not be specified.

Semi-sup. Dot Product Embedding

Input: A, (K , d (c))Output: ⌧ (and nuisance parameters B, ⇢)1: for c 2 [C ] do2: [eU(c), eD(c), eV (c)] = SVD(A(c)) keeping only the d (c) triplets

with largest singular values.

3: Let U(c) = eU(c)p

eD(c) and similarly for V (c).4: end for5: Concatenate all embedded scaled vectors,

[U(1)| · · · |U(C)|V (1)| · · · |V (C)]6: Use a perfect K-means to cluster the concatenated vectors.7: Nominate the vertices which are in the cluster with the

plurality of labelled vertices.

TheoremIt almost always holds that

"(n) = |{u 2 V : ⌧(u) 6= ⌧(u)}|/n 2 O(log n/n)

Proof.(sketch)

1. Bound��AAT � (XYT)(XYT)T

�� following Rohe et al (2010).

2. Lower bound the smallest non-zero singular value of XYT.

3. Apply Davis-Kahan Theorem.

4. The normalized dot product embedding of A is approximatelya rotation of the normalized dot product embedding of XYT.

Unsuper. Dot Product Embedding

Input: A, (K , d)Output: ⌧ (and nuisance parameters B, ⇢)1: [eU, eD, eV ] = SVD(A) keeping only the d triplets with largest

singular values2: Cluster eU and eV using a perfect K -means clustering algorithm3: Let ⌧ be the cluster assignments for each of the vertices

Stochastic Blockmodel GraphParameters

Number of Blocks: K 2 NBlock Membership probabilities: ⇢ 2 �

K

Edge Probabilities: B 2 (0, 1)K⇥K

Random Variables

Adjacency Matrix: A 2 {0, 1}n⇥n

Block Membership function: ⌧ : [n] 7! [K ]


(A, ⌧) ⇠ FA,⌧ 2 F

A,⌧

FA,⌧ =

Y

(u,v)2E

P[auv

= 1|⌧u

= i , ⌧v

= j ]P[⌧u

= i ]P[⌧v

= j ]

=Y

(u,v)2E

b⌧u

,⌧v

⇢⌧u

⇢⌧v

Dot Product Embedding in Large (Errorfully Observed) Graphs with Applications in Statistical Connectomics

σ(Donniell E. Fishkind , Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein), Carey E. Priebe{def , dsussma3, mtang10, joshuav, cep}@jhu.edu | JHU, Dept of Applied Math & Statistics

ErrorAlg TheoryTask

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Epsilon

0.3

0.4

0.5

0.6

0.7

Blo

ckA

ssig

nmen

tErr

or

Estimating Block Structure in Simulated Errorfully Observed Graphs

Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n = 1000, p = 0.1 and q = .05. We then generatederrorful versions of these graphs for ✏ 2 {0, 0.05, . . . , 0.95, 1} andz = h(✏) = 5000 + 50000

(sin(⇡/4)) sin(✏⇡/2). Upon dot product embedding wecould calculate the fraction of mis-assigned vertices. That the curve isnot flat suggests that there is a quantity/quality tradeo↵ for estimatingvertex labels.

102

30 270 510 750 990 1230n

# M

iscl

assi

fied

(a) Number of Misclassified

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

30 270 510 750 990 1230n

Mea

n Em

bedd

ing

Erro

r

(b) Average distance from true latent position

Figure: We simulated 100 undirected random graphs from the a�liationmodel for n 2 {30, 270, 510, 750, 990, 1230}, using p = 0.15, q = 0.1,and K = d = 3. The number of nodes in each block is given byni

= n/K . Panel (a) shows the number of misclassified nodes by doingK-means clustering on eU and eV. Panel (b) shows the mean distancebetween the true latent vectors and the embedded vectors, given by therows of eU and eV.

Unsupervised SettingSetting

We errorfully observe a single graph. We believe that there are setsof vertices that are stochastically equivalent.

GoalEstimate vertex labels according to which vertices are stochasticallyequivalent to one another given the errorfully observed graph.

Statistical Connectomics

1. How many brain regions are there and where are they?

2. How many cell types are there?

3. Does the number of cell types change if we allow for colorededges?

(No)

Errorfully Observed Graph ModelParameters

Set of edges in unobserved graph: E ⇢ [n]⇥ [n]

Probability of errorful edge: ✏ 2 [0, 1]

Number of edges observed: z 2 {1, 2, . . . }

Note: z = h(✏) for some function h : [0, 1] 7! N

Observation ProcedureWe observe the adjacency matrix A as follows:

for i in 1 to z doFlip a 0-1 coin with probability ✏ coin lands on 1.if Coin lands 0 thenChoose an edge from E and add to A

elseChoose an edge from [n]2 and add to A

end ifend for

Semi-Supervised Setting

Setting

We observe a single graph. Some vertices are labeled good andsome bad. Edges are colored.

GoalFind the unlabeled vertex that is most likely bad.

Statistical ConnectomicsGiven some cell types, can we estimate the cell type of another?

50 100

200

400

800

1600

n

0

0.2

0.4

0.6

0.8

1

NS

RR

Consistency of Vertex Nomination

Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n 2 {50, 100, 200, 400, 800, 1600}, p = 0.1 andq = 0.05. We then used DPE and computed the sum of distances to the10 vectors associated with labeled vertices and ranked vertices accordingto minimizing this sum of distances. We used the normalized sum ofreciprocal ranks (NSRR) metric given byP

m�m

0

v=1 (1/rank(v))/P

m�m

0

v=1 (1/v). NSRR close to 1 is goodperformance.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Epsilon

0.0

0.2

0.4

0.6

0.8

NS

RR

Vertex Nomination in Simulated Errorfully Observed Graphs

Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n = 1000, p = 0.1 and q = 0.05 and thengenerated errorful versions using ✏ 2 {0, 0.05, . . . , 0.95, 1} andz = h(✏) = 5000 + 50000

(sin(⇡/4)) sin(✏⇡/2). We then computed NSRR tomeasure performance of VN as a function of edge sampling error rate.Suggests there is a quantity/quality trade-o↵ for the vertex nominationproblem.

Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L ... · Input: A, Dns, number of signal edges s and signal vertices m Output: ˆy, Sˆ ... Large Graph Classiﬁcation: Theory

Documents