Saurabh Nagrecha and Nitesh V. Chawla University of Notre Dame Recurrent Subgraph Prediction PReSub Horst Bunke University of Bern
Saurabh Nagrecha and Nitesh V. Chawla University of Notre Dame
Recurrent Subgraph Prediction"PReSub
Horst Bunke University of Bern
2!
Interactions in networks aren’t always dyadic.
Fig 1a: Simple group messages (hub-and-spoke).
Fig 1b: More complex hierarchies in businesses.
3!
Influence in social networks – [Leskovec, J., et al. Advances in Knowledge Discovery
and Data Mining (2006)] Inferring attacks in anonymized social networks – [Backstrom, L., et al. WWW (2007)]
Functional Discovery in biological networks – [Hu, H., et al. Bioinformatics (2005)]
4!
Fig 2: Weekly snapshots of the Enron email corpus.
Source Node Destination Node t
432 23432 54
4254 437854 54
473743 32 55
93535 35443 55
tn-1 tn tn+1
5!Fig 3: Distribution of recurrent edges in the Enron network (across all timestamps).
Recurrent Edges
Freq
uenc
y of
Rec
urre
nce
6!
Say we have network snapshots as below: G1 = {l1, l2, l3} G2 = {l2, l3, l4} G3 = {l1, l2, l4} G4 = {l1, l2, l3}
We aren’t interested in the links that are static signals; instead we want to register the “blips”: G’1 = { } G’2 = { } G’3 = {l1} G’4 = {l3}
Network instance = bag of links = set of transactions
Most frequently recurring links = GetFrequentTransactions(allTransactions, minSupport) = GetFrequentTransactions([G’1, …, G’n], minSupport)
7!
There are predictable patterns in networks. Can we identify: What these patterns are When they occur Effective “early warning” methods to predict them
8!
There are predictable patterns in networks. Can we identify: What these patterns are When they occur Effective “early warning” methods to predict them
Solution: Frequent Subgraph Mining
Recurrent Edges Fr
eque
ncy
of R
ecur
renc
e
9!
There are predictable patterns in networks. Can we identify: What these patterns are When they occur Effective “early warning” methods to predict them
Solution: Subgraph Prediction
10!
There are predictable patterns in networks. Can we identify: What these patterns are When they occur Effective “early warning” methods to predict them
Solution: Early Warning Subgraphs
11!
Predict individual links of the subgraph. If l subgraph, predict for occurrence of l subgraph, predict for non-occurrence of l. State-of-the-art link prediction methods employed using LPMade suite [Lichtenwalter, R. & Chawla, N. JMLR (2011)].
Fig 4: Exact graph matching.
∈ ∉
12!
We use GEDs to contextualize the subgraphs occurrence with respect to the global scenario in the network. Advantages of using GEDs: o Flexible definition allows for weighted/
unweighted, directed/undirected graphs. o Inexact matching allows for “unknown”
links in graph instances. o Near-linear time approximate
implementation [Andoni, A. & Onak, K., SIAM Journal on Computing (2012)].
Fig 5: A pictorial summary of vector space embedding in GEDs.
13!
Networks often exhibit a telltale “build up” to the desired structure. Use these early warning subgraphs as features to predict target subgraph. Learn how these breadcrumbs lead to target subgraph for given data.
Fig 6: Early warning subgraphs as features to predict the target subgraph.
14!
Our method achieves high AUROC performance in predicting subgraphs, and outperforms link prediction on:
Commercial cellular phone calls Wikipedia Co-authorship Enron Email Corpus Facebook Wall Posts
Conclusion: We need to think of subgraphs as emergent structures in their own right and not just a composition of links.
Fig 7: AUROC Performance of our method v/s baseline link prediction.
15!
Army Research Laboratory (ARL)
U.S. Air Force Office of Scientific Research (AFOSR)
Defense Advanced Research Projects Agency (DARPA)
National Science Foundation (NSF)
Research was supported in part by the Army Research Laboratory under Cooperative Agreement Number W911NF- 09-2-0053, National Science Foundation (NSF) Grant OCI- 1029584, and by the grant FA9550-12-1-0405 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA).
16!
– Leskovec, J., Singh, A. & Kleinberg, J. Patterns of influence in a recommendation network. In Advances in Knowledge Discovery and Data Mining, 380-389 (Springer, 2006).
– Backstrom, L., Dwork, C. & Kleinberg, J. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th international conference on World Wide Web, 181-190 (ACM, 2007).
– Hu, H., Yan, X., Huang, Y., Han, J. & Zhou, X. J. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21, i213-i221 (2005).
– Lichtenwalter, R. N. & Chawla, N. V. LPMade: Link prediction made easy. Journal of Machine Learning Research 12, 2489-2492 (2011).
– Andoni, A. & Onak, K. Approximating Edit Distance in Near-Linear Time. SIAM Journal on Computing 41, 1635-1648 (2012).
19!
Fig 9: If a very broad window is chosen, the fine-grained aspects of recurrence may be lost, and if a very narrow window is chosen, there may be very little difference between two consecutive snapshots.