Top Banner
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012
80

Understanding and Managing Cascades on Large Graphs

Feb 24, 2016

Download

Documents

yadid

Understanding and Managing Cascades on Large Graphs. B. Aditya Prakash Computer Science Virginia Tech. . CS Seminar 11/30/2012. Networks are everywhere!. Facebook Network [2010]. Gene Regulatory Network [ Decourty 2008]. Human Disease Network [ Barabasi 2007]. The Internet [2005]. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Understanding and Managing Cascades on Large Graphs

Understanding and Managing Cascades on

Large GraphsB. Aditya Prakash

Computer ScienceVirginia Tech.

CS Seminar 11/30/2012

Page 2: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook Network [2010]

The Internet [2005]

Page 3: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Dynamical Processes over networks are also everywhere!

Page 4: Understanding and Managing Cascades on Large Graphs

Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology• Localized effects: riots…

Page 5: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

Page 6: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks

• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?

Page 7: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Why do we care? (1: Epidemiology)

CURRENT PRACTICE OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]

Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)

Page 8: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Why do we care? (2: Online Diffusion)

> 800m users, ~$1B revenue [WSJ 2010]

~100m active users

> 50m users

Page 9: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Why do we care? (2: Online Diffusion)

• Dynamical Processes over networks

Celebrity

Buy Versace™!

Followers

Social Media Marketing

Page 10: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Why do we care? (4: To change the world?)

• Dynamical Processes over networks

Social networks and Collaborative Action

Page 11: Understanding and Managing Cascades on Large Graphs

Prakash 2012

High Impact – Multiple Settings

Q. How to squash rumors faster?

Q. How do opinions spread?

Q. How to market better?

epidemic out-breaks

products/viruses

transmit s/w patches

Page 12: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Research Theme

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging

Page 13: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Research Theme – Public Health

DATAModeling # patient

transfers

ANALYSISWill an epidemic

happen?

POLICY/ ACTION

How to control out-breaks?

Page 14: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Research Theme – Social Media

DATAModeling Tweets

spreading

POLICY/ ACTION

How to market better?

ANALYSIS# cascades in

future?

Page 15: Understanding and Managing Cascades on Large Graphs

Prakash 2012

In this talk

Q1: How to immunize and control out-breaks better?Q2: How to find culprits of epidemics?

POLICY/ ACTIONManaging

Page 16: Understanding and Managing Cascades on Large Graphs

Prakash 2012

In this lecture

DATALarge real-world

networks & processes

Q3: How do cascades look like?Q4: How does activity evolve over time?

Page 17: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Outline• Motivation• Part 1: Policy and Action (Algorithms)• Part 2: Learning Models (Empirical Studies)• Conclusion

Page 18: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Part 1: Algorithms• Q1: Whom to immunize?• Q2: How to detect culprits?

Page 19: Understanding and Managing Cascades on Large Graphs

Prakash 2012

• Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation”

in ACM CIKM 2012 (Best Paper Award)

[Thanks to Hanghang Tong for some slides!]

Page 20: Understanding and Managing Cascades on Large Graphs

An Example: Flu/Virus Propagation

HealthySick

Contact

1: Sneeze to neighbors2: Some neighbors Sick3: Try to recover

Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point existing work - Q2: Minimize the propagation - Q3: Maximize the propagation

20

This paper

Page 21: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Vulnerability measure λ [ICDM 2011, PKDD2010]

Increasing λ Increasing vulnerability

λ is the epidemic threshold

“Safe” “Vulnerable” “Deadly”

Page 22: Understanding and Managing Cascades on Large Graphs

Minimizing Propagation: Edge Deletion•Given: a graph A, virus prop model and budget k; •Find: delete k ‘best’ edges from A to minimize λ

Bad

Good

Page 23: Understanding and Managing Cascades on Large Graphs

Q: How to find k best edges to delete efficiently?

Left eigen-score of source

Right eigen-score of target

Page 24: Understanding and Managing Cascades on Large Graphs

Minimizing Propagation: Evaluations

Time Ticks

Log (Infected Ratio)

(better)

Our Method

Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)

Page 25: Understanding and Managing Cascades on Large Graphs

Discussions: Node Deletion vs. Edge Deletion•Observations:

• Node or Edge Deletion λ Decrease• Nodes on A = Edges on its line graph L(A)

•Questions?• Edge Deletion on A = Node Deletion on L(A)? • Which strategy is better (when both feasible)?

Original Graph A Line Graph L(A)

Page 26: Understanding and Managing Cascades on Large Graphs

Discussions: Node Deletion vs. Edge Deletion•Q: Is Edge Deletion on A = Node Deletion on L(A)?•A: Yes!

•But, Node Deletion itself is not easy:

26

Theorem: Hardness of Node Deletion.Find Optimal k-node Immunization is NP-Hard

Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A)

Page 27: Understanding and Managing Cascades on Large Graphs

Discussions: Node Deletion vs. Edge Deletion•Q: Which strategy is better (when both feasible)?•A: Edge Deletion > Node Deletion

27

(better)

Green: Node Deletion (e.g., shutdown a twitter account)Red: Edge Deletion (e.g., un-friend two users)

Page 28: Understanding and Managing Cascades on Large Graphs

Maximizing Propagation: Edge Addition•Given: a graph A, virus prop model and budget k; •Find: add k ‘best’ new edges into A.

• By 1st order perturbation, we have λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je)

• So, we are done need O(n2-m) complexity

Left eigen-score of source

Right eigen-score of target

Low GvHigh Gv 28

Page 29: Understanding and Managing Cascades on Large Graphs

λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je)• Q: How to Find k new edges w/ highest Gv(S) ?• A: Modified Fagin’s algorithm

k

k

#3:Searchspace k+d

k+d

Searchspace

:existing edgeTime Complexity: O(m+nt+kt2), t = max(k,d)

#1: Sorting Sources by u

#2: Sorting Targets by v

Maximizing Propagation: Edge Addition

Page 30: Understanding and Managing Cascades on Large Graphs

Maximizing Propagation: Evaluation

Time Ticks

Log (Infected Ratio)

(better)

30

Our Method

Page 31: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos

Under Submission

Page 32: Understanding and Managing Cascades on Large Graphs

Prakash 2012

?

?

Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).

k = 2

Previously: Full Static Immunization

Page 33: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Fractional Asymmetric Immunization

• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0

Page 34: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Now: Fractional Asymmetric Immunization

• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0

Page 35: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Fractional Asymmetric Immunization

• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0

Page 36: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

Page 37: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

= f

Page 38: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Fractional Asymmetric Immunization

Hospital Another Hospital

Problem: Given k units of disinfectant, how to distribute them to maximize

hospitals saved?

Page 39: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Our Algorithm “SMART-ALLOC”

CURRENT PRACTICE SMART-ALLOC

[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred

~6x fewer!

Page 40: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Running Time

Simulations SMART-ALLOC

> 1 week

14 secs

> 30,000x speed-up!

Wall-Clock Time

Lower is better

Page 41: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Experiments

K = 200 K = 2000

PENN-NETWORK SECOND-LIFE

~5 x ~2.5 x

Lower is better

Page 42: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Part 1: Algorithms• Q2: Whom to immunize?• Q3: How to detect culprits?

Page 43: Understanding and Managing Cascades on Large Graphs

Prakash and Faloutsos 2012 43

• B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels

Page 44: Understanding and Managing Cascades on Large Graphs

Prakash and Faloutsos 2012

Culprits: Problem definition

44

2-d grid‘+’ -> infectedWho started it?

Page 45: Understanding and Managing Cascades on Large Graphs

Prakash and Faloutsos 2012

Culprits: Problem definition

45

2-d grid‘+’ -> infectedWho started it?

Prior work: [Lappas et al. 2010, Shah et al. 2011]

Page 46: Understanding and Managing Cascades on Large Graphs

Prakash and Faloutsos 2012 46

Culprits: Exoneration

Page 47: Understanding and Managing Cascades on Large Graphs

Prakash and Faloutsos 2012 47

Culprits: Exoneration

Page 48: Understanding and Managing Cascades on Large Graphs

Prakash and Faloutsos 2012 48

Who are the culprits• Two-part solution

– use MDL for number of seeds– for a given number:

• exoneration = centrality + penalty

• Running time =– linear! (in edges and nodes)

Page 49: Understanding and Managing Cascades on Large Graphs

Modeling using MDL• Minimum Description Length Principle ==

Induction by compression• Related to Bayesian approaches• MDL = Model + Data • Model

– Scoring the seed-set

Number of possible |S|-sized setsEn-coding integer |S|

Page 50: Understanding and Managing Cascades on Large Graphs

Modeling using MDL• Data: Propagation Ripples

Original Graph

Infected Snapshot

Ripple R2Ripple R1

Page 51: Understanding and Managing Cascades on Large Graphs

Modeling using MDL• Ripple cost

• Total MDL cost

How the ‘frontier’ advancesHow long is the ripple

Ripple R

Page 52: Understanding and Managing Cascades on Large Graphs

How to optimize the score?• Two-step process

– Given k, quickly identify high-quality set– Given these nodes, optimize the ripple R

Page 53: Understanding and Managing Cascades on Large Graphs

Optimizing the score• High-quality k-seed-set

– Exoneration

• Best single seed: – Smallest eigenvector of Laplacian sub-matrix

• Exonerate neighbors • Repeat

Page 54: Understanding and Managing Cascades on Large Graphs

Optimizing the score• Optimizing R

– Just get the MLE ripple!

• Finally use MDL score to tell us the best set

• NetSleuth: Linear running time in nodes and edges

Page 55: Understanding and Managing Cascades on Large Graphs

Experiments• Evaluation functions:

– MDL based

– Overlap based

(JD == Jaccard distance)

Closer to 1 the better

Page 56: Understanding and Managing Cascades on Large Graphs

Experiments

Page 57: Understanding and Managing Cascades on Large Graphs

Experiments

Page 58: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Part 2: Empirical Studies• Q4: How do cascades look like?• Q5: How does activity evolve over time?

Page 59: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Cascading Behavior in Large Blog

Graphs

How does information propagate over the blogosphere?

Blogs Posts

LinksInformation

cascade

J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.

Page 60: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Cascades on the Blogosphere

Cascade is graph induced by a time ordered propagation of information (edges)

Cascades

B1 B2

B4B3

a

b c

de

B1 B2

B4B3

11

2

1 3

1

d

e

b c

e

a

Blogosphereblogs + posts

Blog networklinks among blogs

Post networklinks among posts

Page 61: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Blog data 45,000 blogs participating in cascades All their posts for 3 months (Aug-Sept ‘05) 2.4 million posts ~5 million links (245,404 inside the dataset)

Time [1 day]

Num

ber o

f pos

tsNumber of posts

Page 62: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag

Page 63: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Popularity over time

Post popularity drops-off – exponentially?POWER LAW!Exponent?

# in links(log)

days after post(log)

Page 64: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Popularity over time

Post popularity drops-off – exponentially?POWER LAW!Exponent? -1.6 • close to -1.5: Barabasi’s stack model• and like the zero-crossings of a random walk

# in links(log)

-1.6

days after post(log)

Page 65: Understanding and Managing Cascades on Large Graphs

-1.5 slope

Prakash 2012

J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Page 66: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Part 2: Empirical Studies• Q4: How do cascades look like?• Q5: How does activity evolve over time?

Page 67: Understanding and Managing Cascades on Large Graphs

Prakash 2012

• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

Page 68: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Rise and fall patterns in social media

• Can we find a unifying model, which includes these patterns?

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

Page 69: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Rise and fall patterns in social media

• Answer: YES!

• We can represent all patterns by single model

In Matsubara+ SIGKDD 2012

Page 70: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function (how infective a blog posting is)

Time n=0 Time n=nb Time n=nb+1

β

Power Law

Page 71: Understanding and Managing Cascades on Large Graphs

-1.5 slope

Prakash 2012

J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Page 72: Understanding and Managing Cascades on Large Graphs

Prakash 2012

SpikeM - with periodicity• Full equation of SpikeM

Periodicity

12pmPeak activity 3am

Low activity

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Details

Page 73: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Tail-part forecasts• SpikeM can capture tail part

Page 74: Understanding and Managing Cascades on Large Graphs

Prakash 2012

“What-if” forecasting

e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date

? ?

(1) First spike (2) Release date (3) Two weeks before release

Page 75: Understanding and Managing Cascades on Large Graphs

Prakash 2012

“What-if” forecasting–SpikeM can forecast not only tail-part, but also rise-part!

• SpikeM can forecast upcoming spikes

(1) First spike (2) Release date (3) Two weeks before release

Page 76: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Outline• Motivation• Part 1: Understanding Epidemics (Theory)• Part 2: Policy and Action (Algorithms)• Part 3: Learning Models (Empirical Studies)• Conclusion

Page 77: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Conclusions• Fast Immunization

– Min/Max. drop in eigenvalue, NetGel, NetMelt

• Finding Culprits Automatically– MDL+Exoneration, Linear Time Algo

• Bursts: SpikeM model– Exponential growth, Power-law decay

Page 78: Understanding and Managing Cascades on Large Graphs

Prakash 2012

ML & Stats.

Comp. Systems

Theory & Algo.

Biology

Econ.

Social Science

Engg.

Propagation on Networks

Page 79: Understanding and Managing Cascades on Large Graphs

Prakash 2012

References1. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni

Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon2. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan

Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.)

3. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue4. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya

Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain

5. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011.

6. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia

7. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain

8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C.

9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore

10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India

11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France.

12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami

13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke.

14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In IEEE ICDE 2007, Istanbul, Turkey.

Page 80: Understanding and Managing Cascades on Large Graphs

Prakash 2012

Understanding and Managing Cascades on Large Networks

B. Aditya Prakash http://www.cs.vt.edu/~badityap

Sounds Interesting? I am looking for Ph.D. students---drop me an email with your CV!