StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences [email protected],[email protected]http://www.nascgroup.org/~ chengsuqi Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng
23
Embed
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization. Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences [email protected],[email protected] http://www.nascgroup.org/~ chengsuqi. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization
Suqi ChengResearch Center of Web Data Sciences & Engineering
• Greedy approximate algorithm– iteratively select nodes with the largest marginal influence spread– provide 1-1/e-ε approximation
• Properties of I(S) under independent cascade model– submodularity: I(S{v}) - I(S) I(T{v}) - I(S) iff vV, S T V
– monotonicity: I(S{v}) I(S)
Influence spread estimation
11
Preliminaries-2
• Monte Carlo simulation for influence spread estimation– to approximate true values of influence spread by realizations
method An instance Advantage Disadvantage
simulation modeling the information cascade process
relatively low time complexity
estimate one seed set at a time
snapshot[Chen, KDD’09]
removing each edge (u, v) from G with probability 1-p(u, v)
can estimate any seed set simultaneously
relatively high time complexity
equivalent
12
Motivation
• In existing greedy algorithms– a risk of unguaranteed submodularity and monotonicity of influence
spread function
influence graph snapshot1 snapshot 2
iteration 1 iteration 2
Submodularity is breaked!
0 4 0 4
1 4 1 2 4 2
( { }) ( ) ({ }) ( ) 1
( { }) ( ) ({ , }) ({ }) 3
I S v I S I v I
I S v I S I v v I v
– caused by using different results of Monte Carlo simulation across different influence spread estimation
– a very large value of R is required, e.g. R=20000R: number of Monte Carlo simulations for estimation
13
StaticGreedy algorithm
• Core idea: to always use the same snapshots for influence spread estimation– influence spread function is submodular and monotone– a small value of R is required, e.g. R=100
Part1: Generate R static snapshots
Part 2: Greedy selection
14
Performance analysis: Convergence rate
• provide (1-1/e-ε)-approximation with a small value of R
d R,k
log R
*,
, *
( ) ( )
( )k R k
R kk
I S I Sd
I S
seed set size = 50
NetHEPT: a benchmark networkuniform independent cascade (UIC) model: p(u, v) = p = 0.01weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)
15
Performance analysis: Scalabilitylo
g R
min
seed set size
min ,min{ | 0.005}R kR R d
seed set size
log
runn
ing
time
(sec
)
≈103 times≈102 times
Minimal R required Running time
R is significantly reduced Running time is significantly reduced
16
Performance analysis: Complexity
2
,
' 10
' u v
R R
m p m
n: number of nodes in social influence graphm: number of edges in social influence graphm’: expected number of edges in a snapshot
17
Speed up StaticGreedy
• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner
• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)
• Independent cascade models– uniform independent cascade(UIC) model: p(u, v) = p = 0.01– weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v)
• Metrics: Influence spread, running time
20
Experiments: influence spread
• StaticGreedy achieves better accuracy than other heuristics
NetPHY
DBLP
UIC model
UIC model
WIC model
WIC model
21
Experiments: running time• StaticGreedy runs >103 times faster than CELFGreedy• StaticGreedy has comparable scalability to state-of-the-art heuristics• StaticGreedyDU always runs faster than StaticGreedyCELF
log
runn
ing
time
(sec
)
UIC model WIC model
22
conclusion• Essential reason of the inefficiency of existing greedy algorithms
– a risk of unguaranteed submodularity and monotonicity– caused by different Monte Carlo simulations across different estimations– a very large value of R is required guaranteed accuracy + inefficiency
• StaticGreedy algorithm– guaranteed submodularity and monotonicity– using the same Monte Carlo simulations across different estimations– a small value of R is required guaranteed accuracy + high scalability
– runs >103 times quicker than conventional greedy algorithms
• A dynamic update strategy to speed up StaticGreedy– about 10 times faster