Top Banner
Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Student ([email protected])
106

Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Mar 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Mining Large DynamicGraphs and Tensors

Kijung Shin

Ph.D. Student

([email protected])

Page 2: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Thesis Committee• Prof. Christos Faloutsos (Chair)

• Prof. Tom M. Mitchell

• Prof. Leman Akoglu

• Prof. Philip S. Yu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 2/106

Page 3: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 3/106

Mining Large Dynamic Graphs and Tensors

Page 4: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Graphs: Social Networks

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 4/106

Page 5: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Graphs: Purchase History

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 5/106

Page 6: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Graphs: Many More

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 6/106

Page 7: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Properties of Real-world Graphs• Large: many nodes, more edges

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 7/106

2B+ active users

500M+ products

•Dynamic: additions/deletions of nodes and edges

40B+ web pages

5M+ articles

Page 8: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Properties of Real-world Graphs•Rich with Attributes: timestamps, scores, text, etc.

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 8/106

Page 9: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Matrices for Graphs

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 9/106

0 0

1 1

1 0

0

1 1

Graph Adjacency Matrix

Page 10: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Tensors for Rich Graphs

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 10/106

1

00

0

0

• Tensors: multi-dimensional array

3-order tensor(3-dimensional array)

+ Stars(4-order tensor)

+ Text(5-order tensor)

0

Page 11: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Research Goal and Tasks•Goal:

•Tasks◦ T1. Structure Analysis

◦ T2. Anomaly Detection

◦ T3. Behavior Modeling

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 11/106

To Understand Large Dynamic Graphs and Tensors

on User Behavior

Page 12: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Tasks

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 12/106

Structure

Anomaly& Fraud

BehaviorModelContrast

Page 13: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Completed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 13/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

GraphsTriangle Count[ICDM17][PAKDD18][submitted to KDD]

Anomalous Subgraph

[ICDM16]* [KAIS18]*

PurchaseBehavior

[IJCAI17]Degeneracy [ICDM16]* [KAIS18]*

Tensors Summarization[WSDM17]

Dense Subtensors[PKDD16][WSDM17]

[KDD17][TKDD18]

Progressive Behavior[WWW18]

* Duplicated

Page 14: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Approaches (Tools) •A1. Distributed or external-memory algorithms

•A2. Streaming algorithms based on sampling

•A3. Approximation algorithms

• and their combinations

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 14/106

Page 15: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work <<◦ T1. Structure Analysis

◦ T2. Anomaly Detection

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 15/106

Page 16: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Completed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 16/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

GraphsTriangle Count[ICDM17][PAKDD18][submitted to KDD]

Anomalous Subgraph

[ICDM16]* [KAIS18]*

PurchaseBehavior

[IJCAI17]Degeneracy [ICDM16]* [KAIS18]*

Tensors Summarization[WSDM17]

Dense Subtensors[PKDD16][WSDM17]

[KDD17][TKDD18]

Progressive Behavior[WWW18]

* Duplicated

skip

Page 17: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis ▪T1.1 Waiting-Room Sampling <<

▪T1.2-T1.3 Related Completed Work

◦ T2. Anomaly Detection

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 17/106Kijung Shin, “WRS: Waiting Room Sampling for Accurate Triangle Counting in Real

Graph Streams”, ICDM 2017

Page 18: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Graph Stream Model•Widely-used data model for graphs

• Sequence of edges◦ graph is given over time as a sequence of edges◦ appropriate for dynamic graphs

• Limited memory◦ cannot store all edges in the stream◦ only samples or summaries◦ appropriate for large graphs

18/106

Sources Destination

T1.1 / T1.2 / T1.3Completed / Proposed

Page 19: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Relaxed Graph Stream Model•Chronological order

◦ edges are streamed in the order that they are created

◦ natural for dynamic graphs

◦ temporal patterns can exist

◦ algorithms can exploit the patterns

19/106

Sources Destination

Created at9:21 AM

Created at9:08 AM

Created at9:02 AM

T1.1 / T1.2 / T1.3Completed / Proposed

Page 20: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Triangles in a Graph•A triangle is 3 nodes connected to each other

• The count of triangles has many applications◦ Community detection, spam detection, query optimization

20/106

• Global triangle count: count of all triangles in the graph

• Local triangle count: count of the triangles incident to each node

3

2

1 2

3

4

1

3

2

1

T1.1 / T1.2 / T1.3Completed / Proposed

Page 21: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Problem Definition•Given:

◦ a sequence of edges in the chronological order

◦ memory budget 𝑘 (i.e., up to 𝑘 edges can be stored)

• Estimate: count of global triangles

• To Minimize: estimation error

21/106T1.1 / T1.2 / T1.3Completed / Proposed

“What are temporal patterns in real graph streams?”

“How can we exploit the patterns for accurate triangle counting?”

Page 22: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis ▪T1.1 Waiting-Room Sampling

◦ Temporal Pattern <<◦ Algorithm◦ Experiments

▪T1.2-T1.3 Related Completed Work

◦ T2. Anomaly Detection◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

22/106T1.1 / T1.2 / T1.3Completed / Proposed

Page 23: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Time Interval of a Triangle• Time interval of a triangle:

23/106

–arrival order

of its last edge arrival order

of its first edge

arrival order

1 2 3 4 5 6 7 8

Time interval

Time interval: 7 − 2 = 5

T1.1 / T1.2 / T1.3Completed / Proposed

Page 24: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Time Interval Distribution• Temporal Locality:

◦ average time interval is

◦ 2X shorter in the chronological order

◦ than in a random order

24/106

random arrival order

chronological arrival orderrandom order

chronological

order

T1.1 / T1.2 / T1.3Completed / Proposed

Page 25: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Temporal Locality•One interpretation:

◦ edges are more likely to form

◦ triangles with edges close in time

◦ than with edges far in time

•Another interpretation: ◦ new edges are more likely to form

◦ triangles with recent edges

◦ than with old edges

25/106

“How can we exploit temporal locality for accurate triangle counting?”

chronological

order

random

order

T1.1 / T1.2 / T1.3Completed / Proposed

Page 26: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis ▪T1.1 Waiting-Room Sampling

◦ Temporal Pattern◦ Algorithm <<◦ Experiments

▪T1.2-T1.3 Related Completed Work

◦ T2. Anomaly Detection◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

26/106T1.1 / T1.2 / T1.3Completed / Proposed

Page 27: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Algorithm Overview•∆: estimate of triangle count

•𝑝𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered

27/106T1.1 / T1.2 / T1.3Completed / Proposed

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣 𝑢 − 𝑣𝑦

∆← ∆ + 1/𝑝𝑢𝑣𝑦

𝑢|𝑥

𝑢|𝑣

𝑣|𝑥

𝑣|𝑦

(2) Counting Step

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

memory

(1) Arrival Step (3) Sampling Step

𝑢 − 𝑣new edge

Page 28: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Algorithm Overview (cont.)•∆: estimate of triangle count

•𝑝𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered

28/106

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣

memory

new edge

(1) Arrival Step

T1.1 / T1.2 / T1.3Completed / Proposed

Page 29: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Algorithm Overview (cont.)•∆: estimate of triangle count

•𝑝𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered

29/106

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣 𝑢 − 𝑣𝑥

∆← ∆ + 1/𝑝𝑢𝑣𝑥

discover!

memory

(1) Arrival Step (2) Counting Step

𝑢 − 𝑣new edge

T1.1 / T1.2 / T1.3Completed / Proposed

Page 30: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Algorithm Overview (cont.)•∆: estimate of triangle count

•𝑝𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered

30/106

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣 𝑢 − 𝑣𝑦

∆← ∆ + 1/𝑝𝑢𝑣𝑦

discover!

(2) Counting Step

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

memory

(1) Arrival Step

𝑢 − 𝑣new edge

T1.1 / T1.2 / T1.3Completed / Proposed

Page 31: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Algorithm Overview (cont.)•∆: estimate of triangle count

•𝑝𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered

31/106

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣 𝑢 − 𝑣𝑦

∆← ∆ + 1/𝑝𝑢𝑣𝑦

𝑢|𝑥

𝑢|𝑣

𝑣|𝑥

𝑣|𝑦

(2) Counting Step

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

memory

(1) Arrival Step (3) Sampling Step

𝑢 − 𝑣new edge

T1.1 / T1.2 / T1.3Completed / Proposed

Page 32: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Goal of Sampling Step• to maximize discovering probability 𝑝𝑢𝑣𝑤

Theorem. Variance of our estimate:

Theorem. Unbiasedness of our estimate:

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 = 𝐵𝑖𝑎𝑠 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

32/106

Var ∆ ≈ σ(𝑢,𝑣,𝑤) (1/𝑝𝑢𝑣𝑤 − 1)

Bias[∆] = Exp ∆ −True count = 0

0

True Count

T1.1 / T1.2 / T1.3Completed / Proposed

Page 33: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Increasing Discovering Prob.

•Recall Temporal Locality:◦ new edges are more likely to form

◦ triangles with recent edges

◦ than with old edges

•Waiting-Room Sampling (WRS)◦ treats recent edges better than old edges

◦ to exploit temporal locality

33/106

“How can we increase discovering probabilities of triangles?”

T1.1 / T1.2 / T1.3Completed / Proposed

Page 34: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Waiting-Room Sampling (WRS)•Divides memory space into two parts

◦ Waiting Room: latest edges are always stored

◦ Reservoir: the remaining edges are sampled

34/106

𝑒79 𝑒78 𝑒77 𝑒76

Waiting Room (FIFO) Reservoir (Random Replace)

𝛼% of budget 100 − 𝛼 % of budget

𝑒80New edge

𝑒61 𝑒7 𝑒18 𝑒25 𝑒40 𝑒1 𝑒28

T1.1 / T1.2 / T1.3Completed / Proposed

Page 35: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

WRS: Sampling Steps (Step 1)

35/106

𝒆𝟕𝟔Popped edge

𝑒79 𝑒78 𝑒77 𝒆𝟕𝟔

Waiting Room (FIFO) Reservoir (Random Replace)

𝒆𝟖𝟎New edge

𝑒61 𝑒7 𝑒18 𝑒25 𝑒40 𝑒1 𝑒28

𝒆𝟖𝟎 𝑒79 𝑒78 𝑒77 𝑒61 𝑒7 𝑒18 𝑒25 𝑒40 𝑒1 𝑒28

Waiting Room (FIFO) Reservoir (Random Replace)

T1.1 / T1.2 / T1.3Completed / Proposed

Page 36: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

WRS: Sampling Steps (Step 2)

36/106

Popped edge 𝒆𝟕𝟔

𝑒80 𝑒79 𝑒78 𝑒77 𝑒61 𝑒7 𝑒18 𝑒25 𝑒40 𝑒1 𝑒28

𝑒61 𝑒7 𝑒18 𝑒25 𝒆𝟕𝟔 𝑒1 𝑒28

𝑒61 𝑒7 𝑒18 𝑒25 𝑒40 𝑒1 𝑒28

Waiting Room (FIFO)

replace!

store

discard

or or

Reservoir (Random Replace)

T1.1 / T1.2 / T1.3Completed / Proposed

Page 37: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Summary of Algorithm

37/106

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣

memory

new edge

(1) Arrival Step

𝑢|𝑥

𝑢|𝑦

𝑣|𝑥

𝑣|𝑦

𝑢 − 𝑣 𝑢 − 𝑣𝑥

∆← ∆ + 1/𝑝𝑢𝑣𝑥

discover!

(2) Discovery Step

𝑢|𝑥

𝑢|𝑣

𝑣|𝑥

𝑣|𝑦

(3) Sampling Step

Waiting-Room Sampling!

T1.1 / T1.2 / T1.3Completed / Proposed

Page 38: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis ▪T1.1 Waiting-Room Sampling

◦ Temporal Pattern◦ Algorithm◦ Experiments <<

▪T1.2-T1.3 Related Completed Work

◦ T2. Anomaly Detection◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

38/106T1.1 / T1.2 / T1.3Completed / Proposed

Page 39: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Experimental Results: Accuracy

39/106

•Datasets:

•WRS is most accurate (reduces error up to 𝟒𝟕%)

T1.1 / T1.2 / T1.3Completed / Proposed

Page 40: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Discovering Probability•WRS increases discovering probability 𝑝𝑢𝑣𝑤

•WRS discovers up to 3 × more triangles

40/106

WRS

Triest-IMPR

MASCOT

better

T1.1 / T1.2 / T1.3Completed / Proposed

Page 41: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis ▪T1.1 Waiting-Room Sampling

▪T1.2-T1.3 Related Completed Work <<

◦ T2. Anomaly Detection

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 41/106

Page 42: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

T1.2 Distributed Counting of Triangles•Goal: to utilize multiple machines for triangle counting in a graph stream?

42/106

Sources Workers Aggregators

Broadcast Shuffle

Sources Workers Aggregators

Multicast Shuffle

Tri-Fly [PAKDD18] DiSLR [submitted to KDD]

T1.1 / T1.2 / T1.3Completed / ProposedKijung Shin, Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos, “Tri-Fly:

Distributed Estimation of Global and Local Triangle Counts in Graph Streams”, PAKDD 2018

Page 43: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

T1.2 Performance of Tri-Fly and DiSLR•𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 = 𝐵𝑖𝑎𝑠 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

43/106

0

DiSLR

Tri-Fly40X

better

better

40X

30X

T1.1 / T1.2 / T1.3Completed / Proposed

Page 44: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

T1.3 Estimation of Degeneracy•Goal: to estimate the degeneracy* in a graph stream?

• Core-Triangle Pattern◦ 3:1 power law between the triangle count and the degeneracy

44/106

*degeneracy: maximum 𝑘 such that a subgraph where every node has degree at least 𝑘 exists.

T1.1 / T1.2 / T1.3Completed / ProposedKijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “Patterns and Anomalies in kCores

of Real-world Graphs with Applications”, KAIS 2018 (previously ICDM 2016)

Page 45: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

T1.3 Core-D Algorithm•Core-D: one-pass streaming algorithm for degeneracy

45/106

መ𝑑 = exp(𝛼 ⋅ log(∆) + 𝛽)

Estimated Degeneracy

Estimated Triangle Count

(obtained by WRS, etc.)

Core-D

better

T1.1 / T1.2 / T1.3Completed / Proposed

Page 46: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Structure Analysis of GraphsModels:◦ Relaxed graph stream model

◦ Distributed graph stream model

Patterns: ◦ Temporal locality

◦ Core-Triangle pattern

Algorithms:◦ WRS, Tri-Fly, and DiSLR

◦ Core-D

Analyses: bias and variance

46/106T1.1 / T1.2 / T1.3Completed / Proposed

Page 47: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Completed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 47/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

GraphsTriangle Count[ICDM17][PAKDD18][submitted to KDD]

Anomalous Subgraph

[ICDM16]* [KAIS18]*

PurchaseBehavior

[IJCAI17]Degeneracy [ICDM16]* [KAIS18]*

Tensors Summarization[WSDM17]

Dense Subtensors[PKDD16][WSDM17]

[KDD17][TKDD18]

Progressive Behavior[WWW18]

* Duplicated

skip

skip

Page 48: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis

◦ T2. Anomaly Detection ▪T2.1 M-Zoom <<

▪T2.2-T2.3 Related Completed Work

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

48/106Mining Large Dynamic Graphs and Tensors (by Kijung Shin)Kijung Shin, Bryan Hooi, and Christos Faloutsos, “Fast, Accurate and Flexible Algorithms for

Dense Subtensor Mining”, TKDD 2018 (previously ECML/PKDD 2016)

Page 49: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Motivation: Review Fraud

49/106

Bob’s

Carol’s

Alice’s

Alice

T2.1 / T2.2 / T2.3Completed / Proposed

Page 50: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Fraud Forms Dense Block

50/106

Res

tau

ran

ts

AccountsRestaurants Accounts

Adjacency Matrix

T2.1 / T2.2 / T2.3Completed / Proposed

Page 51: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Problem: Natural Dense Subgraphs

•Question. How can we distinguish them?

51/106

Res

tau

ran

ts

Accounts

Adjacency Matrix

suspicious dense blocksformed by fraudsters

natural dense blocks(core, community, etc.)

T2.1 / T2.2 / T2.3Completed / Proposed

Page 52: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Solution: Tensor Modeling

•Along the time axis…◦ Natural dense blocks are

sparse (formed gradually)

◦ Suspicious dense blocks are dense (synchronized behavior)

• In the tensor model◦ Suspicious dense blocks

become denser than natural dense blocks

52/106

Res

tau

ran

ts

Accounts

T2.1 / T2.2 / T2.3Completed / Proposed

Page 53: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Solution: Tensor Modeling (cont.)•High-order tensor modeling:

◦ any side information can be used additionally

53/106

IP Address Keywords Number of stars

“Given a large-scale high-order tensor, how can we find dense blocks in it?”

T2.1 / T2.2 / T2.3Completed / Proposed

Page 54: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Problem Definition•Given: (1) 𝑹: an 𝑁-order tensor,

(2) 𝝆: a density measure,

(3) 𝒌: the number of blocks we aim to find

• Find: 𝒌 distinct dense blocks maximizing 𝝆

54/106

𝑹 = 𝒌 = 𝟑

, , }{

T2.1 / T2.2 / T2.3Completed / Proposed

Page 55: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Density Measures

•How should we define “density” (i.e., 𝜌)?◦ no one absolute answer

◦ depends on data, types of anomalies, etc.

•Goal: flexible algorithm working well with various reasonable measures◦ Arithmetic avg. degree ρ𝐴◦ Geometric avg. degree ρ𝐺◦ Suspiciousness (KL Divergence) ρ𝑆◦ Traditional Density: ρ𝑇 𝐵 = EntrySum 𝐵 /Vol(B)

- maximized by a single entry with the maximum value

55/106T2.1 / T2.2 / T2.3Completed / Proposed

Page 56: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Clarification of Blocks (Subtensors)

56/106

Res

tau

ran

ts

Accounts

Res

tau

ran

ts

Accounts

• The concept of blocks (subtensors) is independent of the orders of rows and columns

• Entries in a block do not need to be adjacent

T2.1 / T2.2 / T2.3Completed / Proposed

Page 57: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis◦ T2. Anomaly Detection ▪T2.1 M-Zoom [PKDD 16]

◦ Algorithm <<◦ Experiments

▪T2.2-T2.3 Related Completed Work

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

57/106T2.1 / T2.2 / T2.3Completed / Proposed

Page 58: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Single Dense Block Detection•Greedy search

• Starts from the entire tensor

58/106

5 3 0

4 6 1

2 0 0

1 0 1

0

0

𝜌 = 2.9

T2.1 / T2.2 / T2.3Completed / Proposed

Page 59: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Single Dense Block Detection (cont.)

•Remove a slice to maximize density 𝜌

59/106

5 3 0

4 6 1

2 0 0 𝜌 = 3

T2.1 / T2.2 / T2.3Completed / Proposed

Page 60: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

60/106

5 3

4 6

2 0 𝜌 =3.3

•Remove a slice to maximize density 𝜌

Single Dense Block Detection (cont.)

T2.1 / T2.2 / T2.3Completed / Proposed

Page 61: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

61/106

5 3

4 6

2 0 𝜌 = 3.6

•Remove a slice to maximize density 𝜌

Single Dense Block Detection (cont.)

T2.1 / T2.2 / T2.3Completed / Proposed

Page 62: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

0

1

2

3

4

0 2 4 6 8

Den

sity

Iteration

•Until all slices are removed

62/106

𝜌 = 0

Single Dense Block Detection (cont.)

T2.1 / T2.2 / T2.3Completed / Proposed

Page 63: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

•Output: return the densest block so far

63/106

5 3

4 6

2 0 𝜌 = 3.6

Single Dense Block Detection (cont.)

T2.1 / T2.2 / T2.3Completed / Proposed

Page 64: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Speeding Up Process

• Lemma 1 [Remove Minimum Sum First]

Among slices in the same dimension, removing the slice with smallest sum of entries increases 𝜌 most

64/106

12 > 9 > 2

T2.1 / T2.2 / T2.3Completed / Proposed

Page 65: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Accuracy Guarantee

• Theorem 1 [Approximation Guarantee]

65/106

M-Zoom Result Order Densest Block

• Theorem 2 [Near-linear Time Complexity]

# Entries in each mode

𝑶(𝑵𝑴 log 𝑳)

𝝆𝑨 𝑩 ≥𝟏

𝑵𝝆𝑨 𝑩∗

Order # Non-zeros

T2.1 / T2.2 / T2.3Completed / Proposed

Page 66: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Optional Post Process• Local search

◦ grow or shrink until a local maximum is reached

66/106

grow

shrink

𝝆 = 𝟐

𝝆 = 𝟏. 𝟖

𝝆 = 𝟑. 𝟐𝟗

T2.1 / T2.2 / T2.3Completed / Proposed

result of our previous greedy search

Page 67: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Optional Post Process (cont.)• Local search

◦ grow or shrink until a local maximum is reached

67/106

grow

shrink

𝝆 = 𝟑. 𝟐𝟓

𝝆 = 𝟑. 𝟐𝟗 𝝆 = 𝟑. 𝟑𝟑

T2.1 / T2.2 / T2.3Completed / Proposed

Page 68: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Optional Post Process (cont.)• Local search

◦ grow or shrink until a local maximum is reached

68/106

grow

𝝆 = 𝟑. 𝟐𝟗 𝝆 = 𝟑. 𝟑𝟑

shrink

𝝆 = 𝟑. 𝟖

T2.1 / T2.2 / T2.3Completed / Proposed

Page 69: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Optional Post Process (cont.)• Local search

◦ grow or shrink until a local maximum is reached

•Return the local maximum

69/106

𝝆 = 𝟑. 𝟑𝟑

grow

𝝆 = 𝟑. 𝟖

shrink

𝝆 = 𝟑

Local maximum

T2.1 / T2.2 / T2.3Completed / Proposed

Page 70: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Multiple Block Detection

•Deflation: Remove found blocks before finding others

70/106

Find Find Find

Restore

Remove Remove

T2.1 / T2.2 / T2.3Completed / Proposed

Page 71: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis◦ T2. Anomaly Detection ▪T2.1 M-Zoom [PKDD 16]

◦ Algorithm ◦ Experiments <<

▪T2.2-T2.3 Related Completed Work

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

71/106T2.1 / T2.2 / T2.3Completed / Proposed

Page 72: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Speed & Accuracy

72/106

•Datasets: ….

2X

Density metric: 𝜌𝐺

3X2X

T2.1 / T2.2 / T2.3Completed / Proposed

Density metric: 𝜌𝑆 Density metric: 𝜌𝐴

Page 73: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Discoveries in Practice

11 accountsrevised 10 pages2,305 timeswithin 16 hours

Accounts

Korean Wikipedia

Page

s

Accounts

English Wikipedia

Page

s

8 accountsrevised 12 pages2.5 million times

100%

73/106T2.1 / T2.2 / T2.3Completed / Proposed

Page 74: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Discoveries in Practice (cont.)9 accounts gives 1 product369 reviews withthe same ratingwithin 22 hoursAccounts

App Market(4-order)

a block whose volume = 2andmass = 2 millions

TCP Dump(7-order)

Protocols

100%

100%

74/106T2.1 / T2.2 / T2.3Completed / Proposed

Page 75: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis

◦ T2. Anomaly Detection ▪M-Zoom

▪T2.2-T2.3 Related Completed Work <<

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 75/106

Page 76: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

T2.2 Extension to Web-scale Tensors•Goal: to find dense blocks in a disk-resident or distributed tensor

•D-Cube: gives the same accuracy guarantee of M-Zoom with much less iterations

76/106

Entry sum in slices

Average

100 B nonzerosin 5 hours

T2.1 / T2.2 / T2.3Completed / Proposed 76/106Mining Large Dynamic Graphs and Tensors (by Kijung Shin)Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos,

“D-Cube: Dense-Block Detection in Terabyte-Scale Tensors”, WSDM 2017

Page 77: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

T2.3 Extension to Dynamic Tensors•Goal: to maintain a dense block in a dynamic tensor that changes over time

•DenseStream: incrementally computes a dense block with the same accuracy guarantee of M-Zoom

77/106T2.1 / T2.2 / T2.3Completed / Proposed 77/106T2.1 / T2.2 / T2.3Completed / Proposed 77/106Mining Large Dynamic Graphs and Tensors (by Kijung Shin)Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos,

“DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams”, KDD 2017

Page 78: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Anomaly Detection in Tensors•Algorithms:

◦ M-Zoom, D-Cube, and DenseStream

•Analyses: approximation guarantees

•Discoveries:◦ Edit war, vandalism, and bot activities

◦ Network intrusion

◦ Spam reviews

78/106T2.1 / T2.2 / T2.3Completed / Proposed

Page 79: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Completed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 79/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

GraphsTriangle Count[ICDM17][PAKDD18][submitted to KDD]

Anomalous Subgraph

[ICDM16]* [KAIS18]*

PurchaseBehavior

[IJCAI17]Degeneracy [ICDM16]* [KAIS18]*

Tensors Summarization[WSDM17]

Dense Subtensors[PKDD16][WSDM17]

[KDD17][TKDD18]

Progressive Behavior[WWW18]

* Duplicated

skipskip

skip

Page 80: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Motivation

80/106

…? ? ?Welcome to

profile

profile

profile

Start Goal

T3.1Completed / Proposed 80/106T2.1 / T2.2 / T2.3Completed / Proposed 80/106T2.1 / T2.2 / T2.3Completed / Proposed 80/106Mining Large Dynamic Graphs and Tensors (by Kijung Shin)Kijung Shin, Mahdi Shafiei, Myunghwan Kim, Aastha Jain, and Hema Raghavan,

“Discovering Progression Stages in Trillion-Scale Behavior Logs”, WWW 2018

Page 81: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Problem Definition•Given:

◦ behavior log

◦ number of desired latent stages: 𝑘

• Find: 𝑘 progression stages◦ types of actions

◦ frequency of actions

◦ transitions to other stages

• To best describe the given behavior log

81/106

UsersAct

ion

typ

es

T3.1Completed / Proposed

Page 82: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Behavior Model•Generative process:

◦ Θ𝑠: action-type distribution in stage 𝑠

◦ 𝜙𝑠: time-gap distribution in stage 𝑠

◦ 𝜓𝑠: next-stage distribution in stage 𝑠

•Constraint: “no decline” (progression but no cyclic patterns)

82/106

𝜓0

Θ1

𝜓1

𝜙1 Θ2 𝜙2

𝜓2

Θ2 𝜙2

𝜓2 𝜓3

Θ3 𝜙3

1 2 31 2 3

1 2 32Welcome to

connect message connectjobs

T3.1Completed / Proposed

Page 83: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Optimization Algorithm•Goal: to fit our model to given data

◦ parameters: distributions (i.e., Θ𝑠, 𝜙𝑠, 𝜓𝑠 𝑠) and latent stages

• repeat until convergence ◦ assignment step: assign latent stages while fixing prob. distributions

◦ update step: update prob. distributions while fixing latent stages

▪e.g., Θ𝑠 ← ratio of the types of actions in stage 𝑠

83/106

12

3 “no decline”→ Dynamic Programming

T3.1Completed / Proposed

Page 84: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Scalability & Convergence• Three versions of our algorithm

◦ In-memory

◦ Out-of-core (or external-memory)

◦ Distributed

84/106

1 trillionactions

in 2 hours

5 latent stages

1015

20

T3.1Completed / Proposed

Page 85: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Progression of Users in LinkedIn

85/106

Build one’sProfile

Onboarding Process

Poke around the service

Grow one’sSocial

Network

Consume Newsfeeds

Join

Have 30 connections

T3.1Completed / Proposed

Page 86: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Completed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

GraphsTriangle Count[ICDM17][PAKDD18][submitted to KDD]

Anomalous Subgraph

[ICDM16]* [KAIS18]*

PurchaseBehavior

[IJCAI17]Degeneracy [ICDM16]* [KAIS18]*

Tensors Summarization[WSDM17]

Dense Subtensors[PKDD16][WSDM17]

[KDD17][TKDD18]

Progressive Behavior[WWW18]

* Duplicated

skip

skipskip

Page 87: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis

◦ T2. Anomaly Detection

◦ T3. Behavior Modeling

•Proposed Work <<

•Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 87/106

Page 88: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Proposed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 88/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

Graphs P1. Triangle Counting in Fully Dynamic Stream

P3. Polarization

Modeling

TensorsP2. Fast and

Scalable Tucker Decomposition

* Duplicated

Page 89: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Proposed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 89/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

Graphs P1. Triangle Counting in Fully Dynamic Stream

P3. Polarization

Modeling

TensorsP2. Fast and

Scalable Tucker Decomposition

* Duplicated

Page 90: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P1: Problem Definition•Given:

◦ a fully dynamic graph stream,

▪i.e., list of edge insertions and edge deletions

◦ Memory budget 𝑘

• Estimate: the counts of global and local triangles

• To Minimize: estimation error

90/106

… , , + , , − , , + , , − ,…

P1 / P2 / P3Completed / Proposed

Page 91: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P1: Goal

91/106

Method AccuracyHandle

Deletions?

Triest-FD Lowest Yes

MASCOT Low No

Triest-IMPR High No

WRS Highest No

Proposed Highest Yes

P1 / P2 / P3Completed / Proposed

Page 92: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Proposed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 92/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

Graphs P1. Triangle Counting in Fully Dynamic Stream

P3. Polarization

Modeling

TensorsP2. Fast and

Scalable Tucker Decomposition

* Duplicated

Page 93: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P2: Problem Definition• Tucker Decomposition (a.k.a High-order PCA)

◦ Given: an 𝑁-order input tensor 𝑿

◦ Find: 𝑁 factor matrices 𝐴(1)… 𝐴(𝑁) & core-tensor 𝒀

◦ To satisfy:

93/106

≈𝑿 [input]

𝒀

𝐴(3)

𝐴(1)

𝐴(2)

P1 / P2 / P3Completed / Proposed

Page 94: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P2: Standard Algorithms

94/106

Materialized

Input Intermediate Data Output(large & sparse) (small & dense)(large & dense)

Scalability bottleneck

SVD

400GB - 4TB2GB

2GB

P1 / P2 / P3Completed / Proposed

Page 95: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P2: Completed Work

95/106

Input Intermediate Data Output(large & sparse) (small & dense)(large & dense)

•Our completed work [WSDM17]

On-the-fly SVD

Incurs repeated computation

P1 / P2 / P3Completed / ProposedJinoh Oh, Kijung Shin, Evangelos E. Papalexakis, Christos Faloutsos, and Hwanjo Yu,

“S-HOT: Scalable High-Order Tucker Decomposition”, WSDM 2017.

Page 96: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P2: Proposed Work

96/106

Input Intermediate Data Output(large & sparse) (small & dense)(small & dense)

• Proposed algorithm

Materialized

On-the-fly

Partially materialize intermediate data!

P1 / P2 / P3Completed / Proposed

Page 97: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P2: Expected Performance Gain•Which part of intermediate data should we materialize?

• Exploit skewed degree distributions!

97/106

% of Materialized Data % o

f S

ave

d C

om

pu

tatio

n

P1 / P2 / P3Completed / Proposed

Page 98: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Proposed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 98/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

Graphs P1. Triangle Counting in Fully Dynamic Stream

P3. Polarization

Modeling

TensorsP2. Fast and

Scalable Tucker Decomposition

* Duplicated

Page 99: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P3. Polarization Modeling•Polarization in social networks: division into contrasting groups

99/106

Use of marijuana should be: Legal Illegal

OR

“How do people choose between two ways of polarization?”

change of beliefs

change of edges

P1 / P2 / P3Completed / Proposed

Page 100: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

P3. Problem Definition•Given: time-evolving social network with nodes’ beliefs on controversial issues◦ e.g., legalizing marijuana

• Find: actor-based model with a utility function◦ depending on network features, beliefs, etc.

• To best describe: the polarization in data

•Applications:◦ predict future edges

◦ predict the cascades of beliefs

100/106P1 / P2 / P3Completed / Proposed

Page 101: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Proposed Work by Topics

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 101/106

T1. Structure Analysis

T2. AnomalyDetection

T3. BehaviorModeling

Graphs P1. Triangle Counting in Fully Dynamic Stream

P3. Polarization

Modeling

TensorsP2. Fast and

Scalable Tucker Decomposition

* Duplicated

Page 102: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Timeline• Mar-May 2018

◦ P1. Triangle counting in fully dynamic graph streams

• Jun-Aug 2018◦ P3. Polarization modeling

• Sep-Oct 2018◦ P2. Fast and scalable tucker decomposition

• Nov 2018 –April 2019◦ Thesis Writing & Job Application

• May 2019◦ Defense

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 102/106

Page 103: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Roadmap•Overview

•Completed Work◦ T1. Structure Analysis

◦ T2. Anomaly Detection

◦ T3. Behavior Modeling

• Proposed Work

•Conclusion <<

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 103/106

Page 104: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Conclusion•Goal:

To Understand Large Dynamic Graphs and Tensors

• Subtasks: ◦ structure analysis

◦ anomaly detection

◦ behavior modeling

•Approaches:◦ distributed or external-memory algorithms

◦ streaming algorithms based on sampling

◦ approximation algorithms

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 104/106

Page 105: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

References (Completed work)[1] Kijung Shin, Bryan Hooi, and Christos Faloutsos, “M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees”, ECML/PKDD 2016

[2] Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “CoreScope: Graph Mining Using k-Core Analysis -Patterns, Anomalies and Algorithms”, ICDM 2016

[3] Kijung Shin, “Mining Large Dynamic Graphs and Tensors for Accurate Triangle Counting in Real Graph Streams”, ICDM 2017

[4] Jinoh Oh, Kijung Shin, Evangelos E. Papalexakis, Christos Faloutsos, and Hwanjo Yu, “S-HOT: Scalable High-Order Tucker Decomposition”, WSDM 2017

[5] Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos, “D-Cube: Dense-Block Detection in Terabyte-Scale Tensors”, WSDM 2017

[6] Kijung Shin, Euiwoong Lee, Dhivya Eswaran, and Ariel D. Procaccia, “Why You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017

[7] Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos, “DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams”, KDD 2017

[8] Kijung Shin, Bryan Hooi, and Christos Faloutsos, “Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining”, TKDD 2018

[9] Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “Patterns and Anomalies in k-Cores of Real-world Graphs with Applications”, KAIS 2018

[10] Kijung Shin, Mahdi Shafiei, Myunghwan Kim, Aastha Jain, and Hema Raghavan, “Discovering Progression Stages in Trillion-Scale Behavior Logs”, WWW 2018

[11] Kijung Shin, Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos. “Kijung Shin, Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos. PAKDD 2018.” PAKDD 2018

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 105/106

Page 106: Mining Large Dynamic Graphs and Tensorskijungs/proposal/slides.pdf · 2018-03-13 · Temporal Locality •One interpretation: edges are more likely to form triangles with edges close

Thank You• Papers, software, data: http://www.cs.cmu.edu/~kijungs/proposal/

• Email: [email protected]

• Thanks to:◦ Sponsors:

◦ Admins:

◦ Collaborators:

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 106/106