Top Banner
Arizona State University Inside the Atoms: Mining a Network of Networks and Beyond Hanghang Tong [email protected] http://tonghanghang.org - 1 - @KDD BigMine 16: the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining
40

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Apr 14, 2017

Download

Data & Analytics

BigMine
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

InsidetheAtoms:MiningaNetworkofNetworksandBeyond

Hanghang Tong [email protected]

http://tonghanghang.org

- 1 -

@KDD BigMine 16: the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining

Page 2: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Hospital Networks

US Power Grid

Biological Networks

Collaboration Networks

Observation: Graphs are everywhere!

- 2 -

Traffic Network

Brain Networks

Page 3: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Graph Mining: An Overview

- 3 -

Observation: Mining stops at nodes/links (atom) level. Q: Is there a level x (x=4, 5, …)? What is it?

graph

subgraph

node/link

Page 4: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

A Motivating Example: Cross-Network Association (e.g., candidate gene prioritization problem)

- 4 -

§  Problem Definition –  Given: (1) two networks P and G,

and (2) their partial association A;

–  Find: missing associations in A.

§  Solutions: Graph Ranking –  Given: a green node (disease); –  Find: the most relevant blue nodes (genes).

P G A

A Powerful Primitive in (A1) drug discovery; (A2) social recommendation; (3) QA post-tagging, etc.

(PPI)

(Phenotype)

Page 5: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

A Motivating Example: Cross-Network Association (e.g., candidate gene prioritization problem)

- 5 -

§  Problem Definition –  Given: (1) two networks P and G,

and (2) their partial association A;

–  Find: missing associations in A.

§  Solutions: Graph Ranking –  Given: a green node (disease); –  Find: the most relevant blue nodes (genes).

§  Limitations: Each green node (disease) might have its own PPI network!

O. Magger, Y. Y. Waldman, E. Ruppin, and R. Sharan. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Computational Biology, 8(9), 2012.

P G A

Page 6: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

A Motivating Example: Cross-Network Association (e.g., candidate gene prioritization problem)

- 6 -

•  ADiseaseNetworkP•  APPINetworkG

a

b

c

d

G A

4 5

3

6 7

2 1 P

•  ADiseaseNetworkP•  A set of :ssue-specific PPINetworksG1,…,G7

4 5

3

6 7

2 1 P

A G1 a b

d c

G2 a

c d b

G7 a b

d c

… … …

Page 7: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

A Set of Networks: More Applications

- 7 -

Collaborations

System of Systems

Brain Networks

Cyber-Physics Systems

Page 8: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Roadmap

§ Motivations § NoN: A Network of Networks

– NoN Modeling

– NoN Mining

§ Beyond NoN § Some of Our Other Recent Work

- 8 -

Page 9: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Modeling NoN

§  Q: How to represent a set of inter-connected networks (e.g., Tissue-Specific PPI Networks)?

- 9 -

4 5

3

6 7

2 1 P

A G1 a b

d c

G2 a

c d b

G7 a b

d c

… …

… …

Page 10: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Introducing the NoN Model

§  A: each green node (disease) itself is a network

- 10 -

NoN (A Network of Networks) := a triplet R = <G, A, θ> •  G: Main Network (the green, disease to disease networks) •  A: Domain Networks (the blue, tissue-specific PPI networks) •  θ: Mapping function (each green, main node à a blue, domain network)

J. Ni, H. Tong, W. Fan, X. Zhang: Inside the atoms: ranking on a network of networks. KDD 2014

Page 11: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoN Models: Examples

Applications The Main Network (G) Domain Networks (A) Gene-Pheno Assoc. Disease Sim Network Tissue-specific PPI Nets LBSN Geo-proximity network Social Networks Brain Initiative Person-Person Network Brain Networks Team of Teams Project Dependence Net Team Networks Scholarly Data Res. Area Sim Network Collaboration Networks

- 11 -

NoN (A Network of Networks) := a triplet R = <G, A, θ> •  G: Main Network (the green, disease to disease networks) •  A: Domain Networks (the blue, tissue-specific PPI networks) •  θ: Mapping function (each green, main node à a blue, domain network)

Page 12: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoN - Generalizations

§ G1: Multi-layered NoN – Candidate Gene Prioritization: Disease-tissue-

protein

– Geo-social networks: City-district-person

§ G2: Soft Mapping function θ –  1-to-many, or many-to-many

- 12 - •  C. Chen, J. He, N. Bliss and H. Tong: “On the Connectivity of Multi-layered Networks: Models, Measures and

Optimal Control” ICDM 2015.

Page 13: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoN vs. Some Popular Multi-Network Models

§  They are all special case of our NoN model! – Tensor: a special NoN with

1)  A full clique main network (G);

2)  All domain networks (A) sharing the same node sets

– Hypergraph: a special NoN with 1)  All domain networks (A) being empty

– Multiplex: a special NoN with 1)  Two-layers

2)  All domain networks (A) sharing the same node sets

- 13 -

Page 14: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Roadmap

§ Motivations § NoN: A Network of Networks

– NoN Modeling

– NoN Mining: Ranking and Clustering

§ Beyond NoN § Some of Our Other Recent Work

- 14 -

Page 15: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoN Mining - Ranking A1: Given a disease (e.g. P1), what are the most relevant genes (blue nodes)?

- 15 -

A2: Who is most influential, considering both the within- and cross-area influence?

Page 16: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Ranking on a Single Network

- 16 -

Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12

0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02

1

4

3

2

5 6

7

9 10

8 11

12 0.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant Nearby nodes, higher scores

Background

4rr

H. Tong, C. Faloutsos, J.-Y. Pan: Fast Random Walk with Restart and Its Applications. ICDM 2006. (best paper award at 2006, ICDM 2015 10-Yeart Highest Impact Paper Award)

Page 17: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Ranking on a Single Network

- 17 -

Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12

0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02

1

4

3

2

5 6

7

9 10

8 11

12 0.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant Nearby nodes, higher scores

4rr

Background

Footnote: “Maxwell Equation” for Web [Soumen Chakrabarti]

ri = c x A x ri + (1-c) x ei

Page 18: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Ranking on a Single Network

- 18 -

Background

An Optimization Viewpoint of “Maxwell Equation” for Web (Symmetric A)

ri = c x A x ri + (1-c) x ei

= argmin cri'(I – A)ri + (1-c) x||ri – ei||2

Network Smoothness Query Preference

Page 19: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Ranking on NoN § Optimization Formulation:

§  Intuition: –  Similar ranking scores for an overlapped node, if their

G(i,j) is high.

–  A set of correlated g random walks

- 19 - J. Ni, H. Tong, W. Fan, X. Zhang: Inside the atoms: ranking on a network of networks. KDD 2014

#1: within-network smoothness #2: query preference #2: query preference

#3: cross-network consistency

Page 20: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Ranking on NoN § Optimization Formulation:

§ Equivalence: J(r) = J(r1,…,rg)

–  Intuition: a single R.W. on the integrated graph A

–  Property: J(r) is positive-definite!

- 20 -

~

#1: within-network smoothness #2: query preference

#3: cross-network consistency

Page 21: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Ranking on NoN

§ Equivalence: J(r) = J(r1,…,rg)

–  Intuition: One single random walk on the integrated graph A

–  Property: J(r) is positive-definite!

§ Algorithms: –  #1: A linear algorithm à the optimal solution

–  #2: Any existing fast solution on a single network

–  #3: Further Speedup: O(T(m+ng)) à O(T(g log(g) + z)) •  g << n; and z << m (key idea: using main network to do pruning)

- 21 -

~

Page 22: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoN Ranking - Results

- 22 -

A1: Candidate Gene Prioritization •  Which genes are most relevant wrt

disease a?

ROC Curve Comparison

A2: Co-authorship Prediction •  Which DM authors are most likely to

collaborate with a given Med author?

AUC and Accuracy

Page 23: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoN Mining - Clustering

§ Obj. Function:

- 23 - J. Ni, H. Tong, W. Fan, X. Zhang: Flexible and Robust Multi-Network Clustering. KDD 2015

Similar Intuition ! P-value vs. (biologically meaningful) clusters

§ Results:

Page 24: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Roadmap

§ Motivations § NoN: A Network of Networks

– NoN Modeling

– NoN Mining

§ Beyond NoN: From NoN to NoX § Some of Our Other Recent Work

- 24 -

Page 25: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

NoT: A Network of Time Series §  Problem Definition

- 25 - •  Y. Cai, H. Tong, W. Fan and P. Ji: Fast Mining of a Network of Coevolving Time Series. SDM 2015. •  Y. Cai, H. Tong, W. Fan, P. Ji, Q. He:Facets: Fast Comprehensive Mining of Coevolving High-order Time Series. KDD 2015

§  Models &

Algorithms

§  Results

0 50 100 150200

400

600

800

1000

1200

1400

frame #

coord

inate

original

DCMF

DMF

dynaMMo

DCMFdynaMMo

DMF

MARKER PLACEMENT GUIDE

The marker placement in this document is only one of many possible combinations. T his

guide will only show the standard marker placement that’s being used in the motion capture laboratory. The marker placement in this guide resembles the one that is shown and explained in the Vicon 512 manual. As such, the Vicon 512 Manual can offer

additional information. The difference with the marker set in this document from the Vicon 512 Manual is the addition of 4 m arkers, namely RARM, L ARM, RLEG, a nd LLEG.

Before starting, below are some general rules of thumb one should follow:• Have the person who’s going to be motion captured wear tight fitt ing clot hes—strap

down any areas of the clothing that is loose. The marker balls’ posit ion should move

as lit tle as poss ible and should be properly seen.• Place the marker balls as close to the bone as possible. T his follows t he rule of

having the marker balls stay stationary during movement.

Page 26: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

iBall: A Network of Regression Models

- 26 - •  Y. Yao, H. Tong, F. Xu, J. Lu: Predicting long-term impact of CQA posts: a comprehensive viewpoint. KDD 2014 •  L. Li, H. Tong: The Child is Father of the Man: Foresee the Success at the Early Stage. KDD 2015. •  “Data Mining Reveals the Secret to Getting Good Answers”, MIT Technology Review, 2013

§  Results

§  Models & Algorithms §  Problem Definition

D1

D3

D2 D4

Page 27: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Fascinate: Cross-Layer Dependence Inference on Multi-Layered Networks

- 27 -

§ R

esul

ts

§  Methods §  Problem Definition

Infer Unobserved Cross-Layer Links Cross-Layer Inference = Collective CF

Effectiveness Efficiency •  C. Chen, J. He, N. Bliss and H. Tong: “On the Connectivity of Multi-layered Networks: Models, Measures and Optimal Control” ICDM15. •  C. Chen, H. Tong, L. Xie, L. Ying and Q. He: “FASCINATE: Fast Cross-Layer Dependence Inference on Multi-layered Networks”, KDD16, 3:15pm, Monday, Plaza Room A/B

Page 28: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Conclusion: a Network of X § Summary

– NoN: Network + Networks

– NoT: Network + Time Series

–  iBall: Network + Regression

– Fascinate: Network + Inference

§  Take Home Messages – Modeling: `No’ (i.e., a Network of X) as the answer

•  Networks as data à as context

– Algorithms: Networks as the contextual regularizer - 28 -

Page 29: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Roadmap

§ Motivations § NoN: A Network of Networks

§ Beyond NoN § Some of Our Other Recent Work

– Team Replacement

– TravelModeLogger

– BrainQuest

- 29 -

– Network Alignment

– Optimal Networks

– Visual Influence Sum

Page 30: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Replacing the Irreplaceable: Team Replacement Recommendation

- 30 -

•  L. Li, H. Tong, N. Cao, K. Ehrlich, Y.-R. Lin and N. Buchler: Replacing the Irreplaceable: Fast Algorithms for Team Member Recommendation, WWW 2015

•  N. Cao, Y.-R. Lin, L. Li, H. Tong: g-Miner: Interactive Visual Group Mining on Multivariate Graphs, ACM CHI 2015 •  System prototype & video demo: http://team-net-work.org

§  Problem Definition

§ S

yste

m

§  Sol.

§ R

esul

ts

Page 31: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Travel Mode Identification w/ Smartphones

- 31 -

§ P

rob.

Dfn

•  X. Su, H. Tong and P. Ji: Accelerometer-based Activity Recognition on Smartphone. CIKM 2014 •  X. Su, H. Caceres, H. Tong and Q. He: Travel Mode Identification with Smartphones. TRB 2015

§ M

etho

d

§ R

esul

ts

§  Open Challenges

²  Battery Consumption (sampling rates, sensor selection)

²  On-line algorithms ²  Adaptive (summer vs. winter;

high-way vs. local)

Page 32: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

BrainQuest: Visual Brain Comparison

- 32 -

Quest brains to spot picture diff.

•  L. Shi, H. Tong, X. Mu: BrainQuest: Perception-Guided Visual Brain Comparison, ICDM 2015 •  L. Shi, H. Tong, M. Daianu, X. Mu and P. Thompson Block-wise Human Brain Network Visual Comparison Using NodeTrix Representation. VIS'16

Page 33: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

BrainQuest: Visual Brain Comparison

- 33 -

Quest computers to spot brain diff.

•  L. Shi, H. Tong, X. Mu: BrainQuest: Perception-Guided Visual Brain Comparison, ICDM 2015 •  L. Shi, H. Tong, M. Daianu, X. Mu and P. Thompson Block-wise Human Brain Network Visual Comparison Using NodeTrix Representation. VIS'16

Page 34: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

BrainQuest: Visual Brain Comparison

- 34 -

Quest computers to spot brain diff.

AD group (n1) Control group (n2)

•  L. Shi, H. Tong, X. Mu: BrainQuest: Perception-Guided Visual Brain Comparison, ICDM 2015 •  L. Shi, H. Tong, M. Daianu, X. Mu and P. Thompson Block-wise Human Brain Network Visual Comparison Using NodeTrix Representation. VIS'16

Page 35: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

BrainQuest: Visual Brain Comparison

- 35 -

§ V

A F

ram

ewor

k §  Model & Algorithm

§ P

robl

em D

fn. §  Results

Spot structural diff. between two groups of brain networks

•  L. Shi, H. Tong, X. Mu: BrainQuest: Perception-Guided Visual Brain Comparison, ICDM 2015 •  L. Shi, H. Tong, M. Daianu, X. Mu and P. Thompson Block-wise Human Brain Network Visual Comparison Using NodeTrix Representation. VIS'16

Page 36: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Query-Specific Optimal Networks

- 36 - L. Li, Y. Yao, J. Tang, W. Fan, H. Tong: QUINT: On Query-Specific Optimal Networks. KDD 2016. 10:00am, Monday, Plaza Room A/B

§  Goal: Optimal Networks –  Query-Specific

–  Optimal Topology + Weights –  On-line Learning

§  + Error Estimation

§  Results

§  Methods: VERY efficient way to estimate

Acc

urac

y (M

AP

) S

cala

bilit

y s x

ij

Query node

Positive node@Q(x, s)

@As(i, j)

Q(j, s)⇥Q(x, i)

/

Neighbor of Neighbor ofs x

Page 37: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Attributed Network Alignment

•  D. Koutra, H. Tong, D. Lubensky:BIG-ALIGN: Fast Bipartite Graph Alignment. ICDM 2013. •  S. Zhang and H. Tong: Final: Fast Attributed Network Alighnment. KDD 2016, 3:15pm, Monday, Plaza Room A/B

§ Fo

rmul

atio

n §  Algorithms

§ P

robl

em D

fn. §  Results

Accuracy vs. TimeAccuracy vs. Noise

•  Iterative Alg. •  Global Optimal •  Same Complexity as

ISORANK •  Further Speed-up

•  Low-Rank Approximation •  On-Query Alignment (Linear)

Page 38: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Vegas: Influence Graph Visual Summarization

- 38 - •  L. Shi, H. Tong, J. Tang and C. Lin: Flow-based Influence Graph Visual Summarization, ICDM 2014 •  L. Shi, H. Tong, J. Tang, C. Lin: VEGAS: Visual influEnce GrAph Summarization on Citation Networks. TKDE 2015

§ S

olut

ion

§ R

esul

ts

“Stochastic High-Level Petri Net and Applications”

§  Prob. Dfn.

Who/What How/Why

Page 39: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

Q&A

Inside the atom is a whole new world!

- 39 -

•  “A whole new world •  Every turn a surprise •  With new horizons to pursue •  Every moment red-letter ……”

Page 40: Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

Arizona State University

§  Collaborators: –  Norbou Buchler, Nan Cao, Madelaine Daianu, Kate Ehrlich, Wei

Fan, Qing He, Ping Ji, Yu-ru Lin, Lei Shi, Chuang Lin, Jie Tang, Paul M. Thompson, Lei Xie, Yuan Yao, Lei Ying, Xiang Zhang

§  Students: –  Liangyue Li –  Chen Chen

–  Yongjie Cai (now at Google) –  Xing Su

–  Si Zhang

Acknowledgement

- 40 -