Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012
Feb 24, 2016
Understanding and Managing Cascades on
Large GraphsB. Aditya Prakash
Computer ScienceVirginia Tech.
CS Seminar 11/30/2012
Prakash 2012
Networks are everywhere!
Human Disease Network [Barabasi 2007]
Gene Regulatory Network [Decourty 2008]
Facebook Network [2010]
The Internet [2005]
Prakash 2012
Dynamical Processes over networks are also everywhere!
Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology• Localized effects: riots…
Prakash 2012
Why do we care? (1: Epidemiology)
• Dynamical Processes over networks[AJPH 2007]
CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts
Diseases over contact networks
Prakash 2012
Why do we care? (1: Epidemiology)
• Dynamical Processes over networks
• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred
[US-MEDICARE NETWORK 2005]
Problem: Given k units of disinfectant, whom to immunize?
Prakash 2012
Why do we care? (1: Epidemiology)
CURRENT PRACTICE OUR METHOD
~6x fewer!
[US-MEDICARE NETWORK 2005]
Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)
Prakash 2012
Why do we care? (2: Online Diffusion)
> 800m users, ~$1B revenue [WSJ 2010]
~100m active users
> 50m users
Prakash 2012
Why do we care? (2: Online Diffusion)
• Dynamical Processes over networks
Celebrity
Buy Versace™!
Followers
Social Media Marketing
Prakash 2012
Why do we care? (4: To change the world?)
• Dynamical Processes over networks
Social networks and Collaborative Action
Prakash 2012
High Impact – Multiple Settings
Q. How to squash rumors faster?
Q. How do opinions spread?
Q. How to market better?
epidemic out-breaks
products/viruses
transmit s/w patches
Prakash 2012
Research Theme
DATALarge real-world
networks & processes
ANALYSISUnderstanding
POLICY/ ACTIONManaging
Prakash 2012
Research Theme – Public Health
DATAModeling # patient
transfers
ANALYSISWill an epidemic
happen?
POLICY/ ACTION
How to control out-breaks?
Prakash 2012
Research Theme – Social Media
DATAModeling Tweets
spreading
POLICY/ ACTION
How to market better?
ANALYSIS# cascades in
future?
Prakash 2012
In this talk
Q1: How to immunize and control out-breaks better?Q2: How to find culprits of epidemics?
POLICY/ ACTIONManaging
Prakash 2012
In this lecture
DATALarge real-world
networks & processes
Q3: How do cascades look like?Q4: How does activity evolve over time?
Prakash 2012
Outline• Motivation• Part 1: Policy and Action (Algorithms)• Part 2: Learning Models (Empirical Studies)• Conclusion
Prakash 2012
Part 1: Algorithms• Q1: Whom to immunize?• Q2: How to detect culprits?
Prakash 2012
• Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation”
in ACM CIKM 2012 (Best Paper Award)
[Thanks to Hanghang Tong for some slides!]
An Example: Flu/Virus Propagation
HealthySick
Contact
1: Sneeze to neighbors2: Some neighbors Sick3: Try to recover
Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point existing work - Q2: Minimize the propagation - Q3: Maximize the propagation
20
This paper
Prakash 2012
Vulnerability measure λ [ICDM 2011, PKDD2010]
Increasing λ Increasing vulnerability
λ is the epidemic threshold
“Safe” “Vulnerable” “Deadly”
Minimizing Propagation: Edge Deletion•Given: a graph A, virus prop model and budget k; •Find: delete k ‘best’ edges from A to minimize λ
Bad
Good
Q: How to find k best edges to delete efficiently?
Left eigen-score of source
Right eigen-score of target
Minimizing Propagation: Evaluations
Time Ticks
Log (Infected Ratio)
(better)
Our Method
Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)
Discussions: Node Deletion vs. Edge Deletion•Observations:
• Node or Edge Deletion λ Decrease• Nodes on A = Edges on its line graph L(A)
•Questions?• Edge Deletion on A = Node Deletion on L(A)? • Which strategy is better (when both feasible)?
Original Graph A Line Graph L(A)
Discussions: Node Deletion vs. Edge Deletion•Q: Is Edge Deletion on A = Node Deletion on L(A)?•A: Yes!
•But, Node Deletion itself is not easy:
26
Theorem: Hardness of Node Deletion.Find Optimal k-node Immunization is NP-Hard
Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A)
Discussions: Node Deletion vs. Edge Deletion•Q: Which strategy is better (when both feasible)?•A: Edge Deletion > Node Deletion
27
(better)
Green: Node Deletion (e.g., shutdown a twitter account)Red: Edge Deletion (e.g., un-friend two users)
Maximizing Propagation: Edge Addition•Given: a graph A, virus prop model and budget k; •Find: add k ‘best’ new edges into A.
• By 1st order perturbation, we have λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je)
• So, we are done need O(n2-m) complexity
Left eigen-score of source
Right eigen-score of target
Low GvHigh Gv 28
λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je)• Q: How to Find k new edges w/ highest Gv(S) ?• A: Modified Fagin’s algorithm
k
k
#3:Searchspace k+d
k+d
Searchspace
:existing edgeTime Complexity: O(m+nt+kt2), t = max(k,d)
#1: Sorting Sources by u
#2: Sorting Targets by v
Maximizing Propagation: Edge Addition
Maximizing Propagation: Evaluation
Time Ticks
Log (Infected Ratio)
(better)
30
Our Method
Prakash 2012
Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos
Under Submission
Prakash 2012
?
?
Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).
k = 2
Previously: Full Static Immunization
Prakash 2012
Fractional Asymmetric Immunization
• Fractional Effect [ f(x) = ]• Asymmetric Effect
# antidotes = 3
x5.0
Prakash 2012
Now: Fractional Asymmetric Immunization
• Fractional Effect [ f(x) = ]• Asymmetric Effect
# antidotes = 3
x5.0
Prakash 2012
Fractional Asymmetric Immunization
• Fractional Effect [ f(x) = ]• Asymmetric Effect
# antidotes = 3
x5.0
Prakash 2012
Fractional Asymmetric Immunization
Hospital Another Hospital
Drug-resistant Bacteria (like XDR-TB)
Prakash 2012
Fractional Asymmetric Immunization
Hospital Another Hospital
Drug-resistant Bacteria (like XDR-TB)
= f
Prakash 2012
Fractional Asymmetric Immunization
Hospital Another Hospital
Problem: Given k units of disinfectant, how to distribute them to maximize
hospitals saved?
Prakash 2012
Our Algorithm “SMART-ALLOC”
CURRENT PRACTICE SMART-ALLOC
[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred
~6x fewer!
Prakash 2012
Running Time
≈
Simulations SMART-ALLOC
> 1 week
14 secs
> 30,000x speed-up!
Wall-Clock Time
Lower is better
Prakash 2012
Experiments
K = 200 K = 2000
PENN-NETWORK SECOND-LIFE
~5 x ~2.5 x
Lower is better
Prakash 2012
Part 1: Algorithms• Q2: Whom to immunize?• Q3: How to detect culprits?
Prakash and Faloutsos 2012 43
• B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels
Prakash and Faloutsos 2012
Culprits: Problem definition
44
2-d grid‘+’ -> infectedWho started it?
Prakash and Faloutsos 2012
Culprits: Problem definition
45
2-d grid‘+’ -> infectedWho started it?
Prior work: [Lappas et al. 2010, Shah et al. 2011]
Prakash and Faloutsos 2012 46
Culprits: Exoneration
Prakash and Faloutsos 2012 47
Culprits: Exoneration
Prakash and Faloutsos 2012 48
Who are the culprits• Two-part solution
– use MDL for number of seeds– for a given number:
• exoneration = centrality + penalty
• Running time =– linear! (in edges and nodes)
Modeling using MDL• Minimum Description Length Principle ==
Induction by compression• Related to Bayesian approaches• MDL = Model + Data • Model
– Scoring the seed-set
Number of possible |S|-sized setsEn-coding integer |S|
Modeling using MDL• Data: Propagation Ripples
Original Graph
Infected Snapshot
Ripple R2Ripple R1
Modeling using MDL• Ripple cost
• Total MDL cost
How the ‘frontier’ advancesHow long is the ripple
Ripple R
How to optimize the score?• Two-step process
– Given k, quickly identify high-quality set– Given these nodes, optimize the ripple R
Optimizing the score• High-quality k-seed-set
– Exoneration
• Best single seed: – Smallest eigenvector of Laplacian sub-matrix
• Exonerate neighbors • Repeat
Optimizing the score• Optimizing R
– Just get the MLE ripple!
• Finally use MDL score to tell us the best set
• NetSleuth: Linear running time in nodes and edges
Experiments• Evaluation functions:
– MDL based
– Overlap based
(JD == Jaccard distance)
Closer to 1 the better
Experiments
Experiments
Prakash 2012
Part 2: Empirical Studies• Q4: How do cascades look like?• Q5: How does activity evolve over time?
Prakash 2012
Cascading Behavior in Large Blog
Graphs
How does information propagate over the blogosphere?
Blogs Posts
LinksInformation
cascade
J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.
Prakash 2012
Cascades on the Blogosphere
Cascade is graph induced by a time ordered propagation of information (edges)
Cascades
B1 B2
B4B3
a
b c
de
B1 B2
B4B3
11
2
1 3
1
d
e
b c
e
a
Blogosphereblogs + posts
Blog networklinks among blogs
Post networklinks among posts
Prakash 2012
Blog data 45,000 blogs participating in cascades All their posts for 3 months (Aug-Sept ‘05) 2.4 million posts ~5 million links (245,404 inside the dataset)
Time [1 day]
Num
ber o
f pos
tsNumber of posts
Prakash 2012
Popularity over time
Post popularity drops-off – exponentially?
lag: days after post
# in links
1 2 3
@t
@t + lag
Prakash 2012
Popularity over time
Post popularity drops-off – exponentially?POWER LAW!Exponent?
# in links(log)
days after post(log)
Prakash 2012
Popularity over time
Post popularity drops-off – exponentially?POWER LAW!Exponent? -1.6 • close to -1.5: Barabasi’s stack model• and like the zero-crossings of a random walk
# in links(log)
-1.6
days after post(log)
-1.5 slope
Prakash 2012
J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]
Prakash 2012
Part 2: Empirical Studies• Q4: How do cascades look like?• Q5: How does activity evolve over time?
Prakash 2012
• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008
“you can put lipstick on a pig”
“yes we can”
Rise and fall patterns in social media
Prakash 2012
Rise and fall patterns in social media
• Can we find a unifying model, which includes these patterns?
• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]
Prakash 2012
Rise and fall patterns in social media
• Answer: YES!
• We can represent all patterns by single model
In Matsubara+ SIGKDD 2012
Prakash 2012
Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)
Infectiveness of a blog-post at age n:
- Strength of infection (quality of news)
- Decay function (how infective a blog posting is)
Time n=0 Time n=nb Time n=nb+1
β
Power Law
-1.5 slope
Prakash 2012
J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]
Prakash 2012
SpikeM - with periodicity• Full equation of SpikeM
Periodicity
12pmPeak activity 3am
Low activity
Time n
Bloggers change their activity over time
(e.g., daily, weekly, yearly)
activity
Details
Prakash 2012
Tail-part forecasts• SpikeM can capture tail part
Prakash 2012
“What-if” forecasting
e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date
? ?
(1) First spike (2) Release date (3) Two weeks before release
Prakash 2012
“What-if” forecasting–SpikeM can forecast not only tail-part, but also rise-part!
• SpikeM can forecast upcoming spikes
(1) First spike (2) Release date (3) Two weeks before release
Prakash 2012
Outline• Motivation• Part 1: Understanding Epidemics (Theory)• Part 2: Policy and Action (Algorithms)• Part 3: Learning Models (Empirical Studies)• Conclusion
Prakash 2012
Conclusions• Fast Immunization
– Min/Max. drop in eigenvalue, NetGel, NetMelt
• Finding Culprits Automatically– MDL+Exoneration, Linear Time Algo
• Bursts: SpikeM model– Exponential growth, Power-law decay
Prakash 2012
ML & Stats.
Comp. Systems
Theory & Algo.
Biology
Econ.
Social Science
Engg.
Propagation on Networks
Prakash 2012
References1. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni
Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon2. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan
Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.)
3. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue4. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya
Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain
5. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011.
6. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia
7. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain
8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C.
9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore
10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India
11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France.
12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami
13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke.
14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In IEEE ICDE 2007, Istanbul, Turkey.
Prakash 2012
Understanding and Managing Cascades on Large Networks
B. Aditya Prakash http://www.cs.vt.edu/~badityap
Sounds Interesting? I am looking for Ph.D. students---drop me an email with your CV!