Rise and Fall Patterns of Information Diffusion: Model and Implications Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li (UCB), Christos Faloutsos (CMU) KDD 2012 1 Y. Matsubara et al.
Mar 29, 2015
Rise and Fall Patterns of Information Diffusion:Model and ImplicationsYasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li (UCB), Christos Faloutsos (CMU)
KDD 2012 1Y. Matsubara et al.
Motivation
Social media facilitate faster diffusion of news and rumors
KDD 2012 2
Q: How do news and rumors spread in social media?
Y. Matsubara et al.
News spread in social media
MemeTracker [Leskovec et al. KDD’09] short phrases sourced from U.S. politics in 2008
KDD 2012 3
“you can put lipstick on a pig” (# of mentions in blogs)
“yes we can”
Y. Matsubara et al.
(per hour, 1 week)
News spread in social media
MemeTracker [Leskovec et al. KDD’09] short phrases sourced from U.S. politics in 2008
KDD 2012 4
“you can put lipstick on a pig” (# of mentions in blogs)
“yes we can”
Y. Matsubara et al.
Breaking news
Decay News
spread
(per hour, 1 week)
Rise and fall patterns in social media
Twitter (# of hashtags per hour)
Google trend (# of queries per week)
KDD 2012 5Y. Matsubara et al.
“#assange” “#stevejobs”
“harry potter” (2010 - 2011) “tsunami” (in 2005)
(per hour, 1week) (per hour, 1 week)
(per week, 1 year) (per week, 2 years)
Rise and fall patterns in social media
KDD 2012 6
How many patterns are there? -Earlier work claims there’re several classes• four classes on YouTube [Crane et al.
PNAS’08]• six classes on Media [Yang et al.
WSDM’11]
Y. Matsubara et al.
Rise and fall patterns in social media
KDD 2012 7
Q. How many classes are there after all?
A. Our answer is “ONE”!
We can represent all patterns by single model
Y. Matsubara et al.
Outline
KDD 2012 8
- Motivation- Problem definition- Proposed method- Experiments- Discussion – SpikeM at work- Conclusions
Y. Matsubara et al.
Problem definition
KDD 2012 9
Goal: predict/model social activity
Given:
- Network of bloggers/users
- External shock/event- Quality of the event βFind:
- How blogging activity will evolve over time
Problem 1 (What-if?)β
Y. Matsubara et al.
Problem definition
KDD 2012 10
Goal: predict/model social activity
Given:
- Behavior of spikesFind:
- Equation/model that can explain them, e.g.,
- # of potential bloggers- Strength of external
shock- Quality of the event β
Epidemic process by word-of-mouth
Problem 2 (Model design)
β
Y. Matsubara et al.
Outline
KDD 2012 11
- Motivation- Problem definition- Proposed method- Experiments- Discussion – SpikeM at work- Conclusions
Y. Matsubara et al.
Proposed method
SpikeM capture 3 properties of real spike
KDD 2012 12
1.
periodicities
Y. Matsubara et al.
Proposed method
SpikeM capture 3 properties of real spike
KDD 2012 13
2. avoid infinity
Y. Matsubara et al.
1.
periodicities
Proposed method
SpikeM capture 3 properties of real spike
KDD 2012 14
3. power-law fall
Y. Matsubara et al.
1.
periodicities2. avoid infinity
Proposed method
SpikeM capture 3 properties of real spike
KDD 2012 15
3. power-law fall
SpikeM capture behavior of real spikes
using few parameters Y. Matsubara et al.
1.
periodicities2. avoid infinity
Main idea (details)- 1. Un-informed bloggers (clique of N
bloggers/nodes)
KDD 2012 16
Time n=0
Y. Matsubara et al.
Nodes (bloggers) consist of two states
- Un-informed of rumor
- informed, and Blogged about
rumor
U
B
Main idea (details)- 1. Un-informed bloggers (clique of N
bloggers/nodes)- 2. External shock at time nb (e.g, breaking
news)
KDD 2012 17
Time n=0 Time n=nb
Y. Matsubara et al.
External shock- Event happened at time - bloggers are informed, blog about
news
Main idea (details)- 1. Un-informed bloggers (clique of N
bloggers/nodes)- 2. External shock at time nb (e.g, breaking
news)- 3. Infection (word-of-mouth effects)
KDD 2012 18
Time n=0 Time n=nb Time n=nb+1
β
Y. Matsubara et al.
Infectiveness of a blog-post- Strength of infection (quality of news)
- Decay function (how infective a blog posting is)
Main idea (details)- 1. Un-informed bloggers (clique of N
bloggers/nodes)- 2. External shock at time nb (e.g, breaking
news)- 3. Infection (word-of-mouth effects)
KDD 2012 19
Time n=0 Time n=nb Time n=nb+1
β
Y. Matsubara et al.
Infectiveness of a blog-post- Strength of infection (quality of news)
- Decay function (how infective a blog posting is)
Decay function:
Linear scale Log scale
-1.5
SpikeM-base (details)
Equations of SpikeM (base)
KDD 2012 20
- Total population of available bloggers
- Strength of infection/news- External shock at birth (time )- Background noise
Y. Matsubara et al.
Un-informed
Blogged
SpikeM - with periodicity (details)Full equation of SpikeM
KDD 2012 21Y. Matsubara et al.
Un-informed
Blogged Periodicity
12pmPeak activity 3am
Low activity
Time n
activity
Bloggers change their activity over
time(e.g., daily, weekly, yearly)
Learning parameters- Given a real time sequence
- Minimize the error (Levenberg-Marquardt (LM) fitting)
Model fitting (Details)
SpikeM consists of 7 parameters
KDD 2012 22Y. Matsubara et al.
Analysis
SpikeM matches realityexponential rise and power-raw fall
KDD 2012 23
Y. Matsubara et al.
rise fall
SpikeM vs. SI model (susceptible infected model)
Analysis
KDD 2012 24
Linear-log
Log-log
Rise-part
SpikeM: exponential SI model: exponential
Y. Matsubara et al.
rise fall
Reverse
x-axis
Analysis
KDD 2012 25
Linear-log
Log-log
Y. Matsubara et al.
rise fall
Fall-part SpikeM: power law SI model:exponential
SpikeM matches reality
Outline
KDD 2012 26
- Motivation- Problem definition- Proposed method- Experiments- Discussion – SpikeM at work- Conclusions
Y. Matsubara et al.
Experiments
We answer the following questions…
KDD 2012 27
Q1. Match real spikes- Q1-1: K-SC clusters- Q1-2: MemeTracker- Q1-3: Twitter - Q1-4: Google trend
Q2. Forecast future patterns
Y. Matsubara et al.
Q1-1 Explaining K-SC clusters
KDD 2012 28
Six patterns of K-SC [Yang et al. WSDM’11]
SpikeM can generate all patterns in K-SC Y. Matsubara et al.
Q1-2 Matching MemeTracker patterns
KDD 2012 29
MemeTracker (memes in blogs) [Leskovec et al. KDD’09]
SpikeM can fit various patterns in blog
Linear scale
Log scale
Y. Matsubara et al.
Noise-robust fitting
Outliers
Q1-3 Matching Twitter data
KDD 2012 30
Twitter data (hashtags)
SpikeM can generate various patterns in social media
Y. Matsubara et al.
Linear scale
Log scale
Q1-4 Matching Google trend data
KDD 2012 31
Volume of searches for queries on Google
SpikeM can capture various patternsY. Matsubara et al.
Q2 Tail-part forecasts
KDD 2012 32
- Given a first part of the spike - forecast the tail part
SpikeM can capture tail part (AR: fail)
Y. Matsubara et al.
Outline
KDD 2012 33
- Motivation- Problem definition- Proposed method- Experiments- Discussion – SpikeM at work- Conclusions
Y. Matsubara et al.
SpikeM at work
SpikeM is capable of various applications
KDD 2012 34
- A1. What-if
forecasting- A2. Outlier detection- A3. Reverse
engineering
Y. Matsubara et al.
A1. “What-if” forecasting
KDD 2012 35
Forecast not only tail-part, but also rise-part!
e.g., given (1) first spike,(2) release date of two sequel movies (3) access volume before the release date
? ?
(1) First spike (2) Release date
(3) Two weeks before release
Y. Matsubara et al.
A1. “What-if” forecasting
KDD 2012 36
Forecast not only tail-part, but also rise-part!
SpikeM can forecast upcoming spikes Y. Matsubara et al.
(1) First spike (2) Release date
(3) Two weeks before release
A2. Outlier detection
KDD 2012 37
Fitting result of “tsunami (Google trend)” in log-log scale
Y. Matsubara et al.
Another earthquake
One year after Indian
Ocean earthquake
Indian Ocean
earthquake
A3. Reverse engineering
KDD 2012 38
SpikeM provide an intuitive explanationPDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Meme
A3. Reverse engineering
KDD 2012 39
SpikeM provide an intuitive explanationPDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Observation 1
Total population N is almost sameMeme
A3. Reverse engineering
KDD 2012 40
SpikeM provide an intuitive explanationPDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Observation 2
Strength of first burst
(news) is
Meme
A3. Reverse engineering
KDD 2012 41
SpikeM provide an intuitive explanationPDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Observation 3
Daily periodicity with phase shift
Every meme has the same periodicity without
lag
(Twitter)Daily periodicity with
more spread in(i.e., Multiple time zone)
Meme
Outline
KDD 2012 42
- Motivation- Background- Proposed method- Experiments- Discussion – SpikeM at work- Conclusions
Y. Matsubara et al.
Conclusions
KDD 2012 43
SpikeM has following advantages:
• Unification powerIt includes earlier patterns/models
• Practicality: It Matches real datasets
• ParsimonyIt requires only 7 parameters
• Usefulness: What-if scenarios, outliers, etc.
Y. Matsubara et al.
Acknowledgements
Thanks Jaewon Yang & Jure Leskovec for the six clusters
[WSDM’11]
Funding
44KDD 2012 Y. Matsubara et al.
Thank you
KDD 2012 45Y. Matsubara et al.
Code: http://www.kecl.ntt.co.jp/csl/sirg/people/yasuko/software.html
Email: matsubara.yasuko lab.ntt.co.jp
Yasushi Sakurai
Yasuko Matsubara
B. Aditya Prakash
Lei Li Christos Faloutsos