Conjecture DPE for Graph Classification consistently finds J Proof. in progress... References [1] Vogelstein, et al. Are mental properties supervenient on brain properties? Nature Scientific Reports, 2011. [2] Vogelstein, et al. Graph Classification using Signal Subgraphs: Applications in Statistical Connectomics. Submitted to IEEE PAMI (available on arxiv). [3] Vogelstein, et al. Shu✏ed Graph Classification: Theory and Connectome Applications To be submitted to IEEE PAMI (and arxiv) any day now (available upon request).... [4] Vogelstein, et al. Fast Inexact Graph Matching with Applications in Statistical Connectomics To be submitted to IEEE PAMI (and arxiv) any day now (available upon request).... [5] Sussman, et al. A consistent dot product embedding for stochastic blockmodel graphs. Submitted to JASA (available on arxiv). [6] Priebe, et al. Optimizing the quantity/quality trade-o↵ in connectome inference. Communications in Statistics - Theory and Methods, to appear. Random Variables Adjacency Matrix: A : ⌦ ! A ⇢ {0, 1} nv ⇥nv Latent In-Vectors: X = ⌦ ! X ✓ R d ⇥nv + Latent Out-Vectors: Y = ⌦ ! Y ✓ R d ⇥nv + Parameter ✓ =(⇢, ⌧ ) In- and Out- Vec Likelihood: ⇢ X , ⇢ Y 24 3 Block Membership Function: ⌧ :[n v ] ! [3] Sampling Distribution (A, Y ), D ns = {(A i , Y i )} i 2[ns ] ⇠ F A,Y F A,Y = Y (u,v )2E Bern ( a uv ; hX u , Y v i ) ⇢ X (⌧ u )⇢ Y (⌧ v ) 2 F A,Y Input: A, D ns , d Output: ˆ y ,ˆ ⌧ (and nuisance parameters ˆ ⇢ X and ˆ ⇢ Y ) 1: Let ¯ A y = 1 ny P i :y i =y A i be the average adjacency matrix for class y 2: Let [ e U y , e D y , e V y ]= SVD([A y ]) keeping only the d triplets with largest singular values. 3: Cluster e U and e V using a perfect K -means clustering algorithm forcing one cluster to have vertices from all classes, and one cluster for each class. 4: Let ˆ ⌧ be the cluster assignments for each of the vertices 5: Do a DPE for A, and cluster each vertex accordingly. Let J be the cluster of vertices that are informative with regard to the classification task. 6: Let ˆ y = argmax y 2Y Y u,v 2J Bern ( a uv ; h ˆ X u , ˆ Y v i ) ˆ ⇢ X (ˆ ⌧ u )ˆ ⇢ Y (ˆ ⌧ v ) Definitions Adjacency Matrices: A, B 2 R n⇥n Permutation Matrices: Q = {Q : Q 1=1, Q T 1=1, Q 2 {0, 1} n⇥n } Doubly Stochastic Matrices: D = {D : D 1=1, D T 1=1, d uv ≥ 0} Objective Function (QAP) ˆ Q = argmin Q2Q A - QBQ T F = argmin Q2Q hA, QBQ T i Input: A, B Output: ˆ Q 1: for i =1,..., i max do 2: Let ˆ Q i 1 be either 1 T 1/n, I , or something near I . 3: Use the Frank-Wolfe algorithm to find a local optimum of the following relaxed quadratic assignment problem (rQAP): (rQAP) ˆ Q i 2 = argmin D2D hA, QBQ T i. 4: Project ˆ D onto the Q using the Hungarian Algorithm to obtain ˆ Q i . 5: end for 6: Let ˆ Q = argmin i 2[imax ] hA, Q i BQ T i i Theorem rQAP has the same minimum as QAP whenever A and B are the adjacency matrices of simple graphs isomorphic to one another. Proof. The set of doubly stochastic matrices is the convex hull of the set of permutation matrices. Thus, if a permutation matrix minimizes rQAP then it also minimizes QAP. Moreover, hA, Ai =2m (where m = P A uv ). Thus, it is sufficient to show that hA, DBD T i > hA, Ai =2m. This follows because (DBD T ) uv 1. NB: This is parallel to rLAP being equivalent to LAP. Random Variables Adjacency Matrix: A : ⌦ ! A ✓ {0, 1} nv ⇥nv Permutation Matrix: Q : ⌦ ! Q = {Q : q uv 2 {0, 1}, Q 1=1, Q T 1=1} Graph Class: Y : ⌦ ! Y =[n y ] Sampling Distribution F Q,A,Y (a, y ; ✓)= F Q F A|Y F Y = F A|Y F Y Uni(Q) (Q, A, Y ), D ns = {(Q i , A i , Y i )} i 2[ns ] iid ⇠ F Q,A,Y 2 F Q,A,Y Random Variables Adjacency Matrix: A : ⌦ ! A ✓ {0, 1} nv ⇥nv Graph Class: Y : ⌦ ! Y =[n y ] Parameter ✓ =(P, ⇡, S ) Edge Probabilities: P =(p uv |y ) 2 (0, 1) nv ⇥nv ⇥ny Class Priors: ⇡ = {⇡ 0 ,..., ⇡ n Y } 24 ny Signal Subgraph: S = {(u, v ): p uv |y i 6= p uv |y j 8y i 6= y j } ✓ P (n 2 v ) Sampling Distribution F A,Y (a, y ; ✓)= Y uv 2S Bern(a uv ; p uv |y )⇡ y ⇥ Y (u,v )2E\S Bern(a uv ; p uv )⇡ y (A, Y ), D ns = {(A i , Y i )} i 2[ns ] iid ⇠ F A,Y 2 F A,Y Let ˆ L e δ s be the misclassification rate of the above algorithm. Let e L ⇤ be the Bayes optimal misclassification rate for shu✏ed graphs. Theorem ˆ L e δ s ! e L ⇤ as s !1 Proof. Because the joint space of adjacency matrices, permutation matrices, and graph classes has finite cardinality, the law of large numbers ensures that eventually as s !1, the plurality of nearest neighbors to a test graph will be identical to the test graph. Theorem ˆ S ! S as n s !1 Proof. A and Y are finite, so by the law of large numbers, T (i ) ! " > 0 8i 2 S and T (i ) ! 0 8i / 2 S . Graph Matched Frobenius Norm k s Nearest Neighbor Algorithm Input: A, a rule for k s as s !1 such that k s /s ! 0 and k s !1, D ns Output: ˆ y 1: Compute the graph-matched Frobenius norm distance between A and each training graph: e δ i = argmin Q2Q A - QA i Q T 2 F 2: Rank the distances in decreasing order: e δ (1) e δ (2) ··· e δ (ns ) . 3: Let ˆ y = argmax y 2Y X i :y i =y I{d (i ) k } Input: A, D ns , number of signal edges s and signal vertices m Output: ˆ y , ˆ S (and nuisance parameters ˆ P,ˆ ⇡) 1: Compute the significance of each edge using Fisher’s Exact Test, yielding T (1) ≥ T (2) ≥ · ≥ T (n E ) using D ns . 2: Rank edges by significance with respect to each vertex, E k ,(1) ≥ E k ,(2) ≥ ... ≥ E k ,(n-1) for all k 2 V . 3: while not converged do 4: Increase critical value c from T (i ) to T (i +1) 5: Compute vertex score: w v ;c = P u2[V ] I{T v ,u > c } for each vertex 6: converge if P v 2[m] w v ;c ≥ s 7: end while 8: Let ˆ S be the set of s most significant edges incident to the m best scoring vertices. 9: Let ˆ y = argmax y 2Y Q (u,v )2 ˆ S Bern(a uv ;ˆ p uv |y )ˆ ⇡ y DPE for Graph Classification Setting We observe a collection of graphs and their associated classes. The vertices may be labeled or unlabeled. We assume that only a subset of vertices are informative with regard to the classification task. Goal For a novel graph, find its most likely class and which vertices encode the class-conditional signal. Statistical Connectomics Application Classify arbitrarily large graphs, include those with and without vertex labels, without necessitating graph matching or estimating O(n 2 ) parameters. Fast Inexact Graph Matching Setting We observe a pair of unlabeled graphs. Goal Find the isomorphism that matches the graphs optimally. Statistical Connectomics Application A subroutine of our shu✏ed graph classifier. Shu✏ed Graph Classification Setting We observed a collection of graphs without labeled vertices and associated graph classes. Goal For a novel graph, find its most likely class. Statistical Connectomics Application Classify brain-graphs for which vertices lack labels. This includes collections of brain-graphs across species or whenever vertices represent vertebrate neurons. Labeled Graph Classification Setting We observed a collection of graphs with labeled vertices and associated graph classes. We assume that only a subset of edges/vertices are informative with regard to the classification task. Goal For a novel graph, find its most likely class and which edges/vertices encode the class-conditional signal. Statistical Connectomics Application Classify brain-graphs for which vertices are labeled (for example, invertebrate brain-graphs where vertices represent neurons or vertebrate brain-graphs where vertices represent brain regions) and find which edges/vertices encode various cognitive/behavioral properties. 0 50 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 # training samples missed−edge rate 0 50 100 150 200 250 300 0.1 0.2 0.3 0.4 0.5 # training samples misclassification rate coh inc nb 10 0 10 1 10 2 10 3 0 0.25 0.5 log assumed # of signal edges misclassification rate incoherent estimator ˆ Lnb=0.41 ˆ Linc=0.27 ˆ Lˆ ⇡ =0. 5 assumed # of signal edges assumed # of signal vertices coherent estimator ˆ Lcoh=0.16 200 400 600 800 1000 10 20 30 0.16 0.3 0.4 0.5 10 0 10 2 0 0.16 0.25 0.5 log assumed # of signal edges misclassification rate assumed m=12 coherent estimator ˆ Lcoh=0.16 assumed # signal edges assumed # of signal vertices zoomed in coherent estimator 400 500 600 15 18 21 0.16 0.3 0.4 0.5 coherent signal subgraph estimate vertex vertex 20 40 60 20 40 60 threshold coherogram 0.04 0.14 0.29 0.55 20 40 60 0 10 20 30 Figure: (Left) Simulation demonstrates that the coherent classifier outperforms the incoherent classifiers as a function of sample size. (Right) MR connectome sex signal subgraph estimation and analysis. By cross-validating over hyperparameters and models, we estimate that the “best” coherent signal subgraph (for this inference task on these data) has ˆ m coh = 12 and ˆ s coh = 360, achieving ˆ L coh =0.16. 10 −2 10 −1 10 0 Error Approximate QAP Performance on QAP Benchmark Library chr12c chr15a chr15c chr20b chr22b esc16b rou12 rou15 rou20 tai10a tai15a tai17a tai20a tai30a tai35a tai40a QAP 100 QAP 3 QAP 1 PSOA chemical electrical unit Accuracy 100 (0) 59 (0.30) % Restarts 3 (0) 25 (6.7) # Solution Time 42 (0.42) 79 (20) sec. 0 10 20 30 40 50 0.35 0.4 0.45 0.5 0.55 δ ˜ δ ˆ ⇡ δ 0 number of training samples misclassification rate Connectome Classifier Comparison Figure: Connectome misclassification rates for various classifiers. 2000 Monte Carlo sub-samples of the data were performed for each s , such that errorbars were neglibly small. Five classifiers were compared: δ is the k s NN classifier labeled graphs; is the k s NN on a collection of graph invariants,ˆ ⇡ is chance, and δ 0 is the k s NN on shu✏ed graphs (without graph-matching). Large Graph Classification: Theory and Statistical Connectomics Applications Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L. Sussman & Carey E. Priebe | JHU, Dept of Applied Math & Statistics Model Alg Theory Task Data