Debunking the Myths of Influence Maximization Akhil Arora 1 , Sainyam Galhotra 1 , Sayan Ranu [email protected]University of Massachusetts, Amherst January 27, 2017 NEDB, 2017 1 The first two authors have contributed equally to this work. Influence Maximization January 27, 2017 NEDB, 2017 1 / 19
33
Embed
Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Influence Maximization Akhil Arora1, Sainyam
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .
Influence Maximization January 27, 2017 NEDB, 2017 4 / 19
Other Applications
Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns
Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .
Influence Maximization January 27, 2017 NEDB, 2017 4 / 19
Other Applications
Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns
Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .
Influence Maximization January 27, 2017 NEDB, 2017 4 / 19
Existing Information Propagation models
Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.
0.5 0.2
0.7
0.1
0.1
0.3
0.6
0.4
0.2
0.4
A
B C
D
E
F
G
H
Influence Maximization January 27, 2017 NEDB, 2017 5 / 19
Existing Information Propagation models
Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.
0.5 0.2
0.7
0.1
0.1
0.3
0.6
0.4
0.2
0.4
A
B C
D
E
F
G
H
Influence Maximization January 27, 2017 NEDB, 2017 5 / 19
Existing Information Propagation models
0.5 0.2
0.7
0.1
0.1
0.3
0.6
0.4
0.2
0.4
A
B
D
C
E
F
G
H
Influence Maximization January 27, 2017 NEDB, 2017 6 / 19
Existing Information Propagation models
0.5 0.2
0.7
0.1
0.1
0.3
0.6
0.4
0.2
0.4
A
B C
D
F
E
G
H
Influence Maximization January 27, 2017 NEDB, 2017 7 / 19
Existing Information Propagation models
0.5 0.2
0.7
0.1
0.1
0.3
0.6
0.4
0.2
0.4
A
B C
D
E
F
G
H
Influence Maximization January 27, 2017 NEDB, 2017 8 / 19
The Influence Maximization (IM) Problem
Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-set
Task: Identify the set of most-influential nodes in a networkMaximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation
Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation
Influence Maximization January 27, 2017 NEDB, 2017 9 / 19
The Influence Maximization (IM) Problem
Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network
Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation
Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation
Influence Maximization January 27, 2017 NEDB, 2017 9 / 19
The Influence Maximization (IM) Problem
Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network
Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation
Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation
Influence Maximization January 27, 2017 NEDB, 2017 9 / 19
Need for benchmarking? Wide variety of techniques
MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread
SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence
Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard
Influence Maximization January 27, 2017 NEDB, 2017 10 / 19
Need for benchmarking? Wide variety of techniques
MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread
SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence
Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard
Influence Maximization January 27, 2017 NEDB, 2017 10 / 19
Need for benchmarking? Wide variety of techniques
MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread
SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence
Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard
Influence Maximization January 27, 2017 NEDB, 2017 10 / 19
Need for benchmarking? Wide variety of techniques
MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread
SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence
Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard
Influence Maximization January 27, 2017 NEDB, 2017 10 / 19
Need for benchmarking? : Ambiguities
Existing Literature: Use IC, WC interchangeablyActual scenario: Varied behaviour in terms of the spread of seednodes, efficiency and scalability aspects of different techniques.
0 100 200
102
104
106
Ru
nn
ing
tim
e (
se
cs)
Seeds (k)
ICWC
Figure: IMM (ε = 0.5) for Orkut dataset
Influence Maximization January 27, 2017 NEDB, 2017 11 / 19
Need for benchmarking? : Ambiguities
State-of-the-art technique in one aspect behaves the worst inanother aspect of the problem.
0 100 200Seeds (k)
103
104
105
Mem
ory
(MB
)
EaSyIMIMM
(a) Memory
0 100 200Seeds (k)
101
102
103
104
Run
ning
tim
e (s
ecs)
EaSyIMIMM
(b) Running Time
Influence Maximization January 27, 2017 NEDB, 2017 12 / 19
Important Questions
How to choose the most appropriate IM technique in a givenspecific scenario?
What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?
Influence Maximization January 27, 2017 NEDB, 2017 13 / 19
Important Questions
How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?
Are the claims made by the recent papers true?
Influence Maximization January 27, 2017 NEDB, 2017 13 / 19
Important Questions
How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?
Influence Maximization January 27, 2017 NEDB, 2017 13 / 19
Our Framework
Generic framework applicable on all techniques.Unified approach to tune the external parameters.
Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.
Influence Maximization January 27, 2017 NEDB, 2017 14 / 19
Myths
IMM is always faster than TIM+?
Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x
Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+
over HepPH dataset for 200 seeds
Influence Maximization January 27, 2017 NEDB, 2017 15 / 19
Myths
IMM is always faster than TIM+?
Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x
Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+
over HepPH dataset for 200 seeds
Influence Maximization January 27, 2017 NEDB, 2017 15 / 19
Myths
CELF++ is the fastest IM technique in the MC estimationparadigm?
50
60
70
80
1 2 3 4 5 6 7 8 9 101112
Ru
nn
ing
Tim
e (
in m
in)
Independent Runs (WC)
CELFCELF++
(c) Nethept (WC)
120
130
140
150
160
170
180
1 2 3 4 5 6 7 8 9 101112
Ru
nn
ing
Tim
e (
in m
in)
Independent Runs (LT)
CELFCELF++
(d) Nethept (LT)
Influence Maximization January 27, 2017 NEDB, 2017 16 / 19
Myths
CELF++ is the fastest IM technique in the MC estimationparadigm?
50
60
70
80
1 2 3 4 5 6 7 8 9 101112
Ru
nn
ing
Tim
e (
in m
in)
Independent Runs (WC)
CELFCELF++
(e) Nethept (WC)
120
130
140
150
160
170
180
1 2 3 4 5 6 7 8 9 101112
Ru
nn
ing
Tim
e (
in m
in)
Independent Runs (LT)
CELFCELF++
(f) Nethept (LT)
Influence Maximization January 27, 2017 NEDB, 2017 16 / 19
Myths
SIMPATH is faster the LDAG?
0
0.2
0.4
0.6
0.8
1
1.2
40 80 120 160 200
Tim
e (
in m
in)
#Seeds (k)
LDAGSIMPATH
(g) Nethept
5
10
15
20
25
30
35
40 80 120 160 200
Tim
e (
in m
in)
#Seeds (k)
LDAGSIMPATH
(h) DBLP
Influence Maximization January 27, 2017 NEDB, 2017 17 / 19
Myths
SIMPATH is faster the LDAG?
0
0.2
0.4
0.6
0.8
1
1.2
40 80 120 160 200
Tim
e (
in m
in)
#Seeds (k)
LDAGSIMPATH
(i) Nethept
5
10
15
20
25
30
35
40 80 120 160 200
Tim
e (
in m
in)
#Seeds (k)
LDAGSIMPATH
(j) DBLP
Influence Maximization January 27, 2017 NEDB, 2017 17 / 19
Conclusions
No technique is the best on all aspects of IM.
Quality
EfficiencyMemory Footprint
TIM/IMMPMC
CELF/CELF++EaSyIM
ME
IRIEIMRank
StaticGreedy
LDAGSIMPATH
(k) Qualitative catego-rization of IM techniques
(l) Which technique to choose & when?
Influence Maximization January 27, 2017 NEDB, 2017 18 / 19
Thanks!
For more details, please refer :A. Arora, S. Galhotra, S. Ranu. Debunking the Myths of InfluenceMaximization : An In-Depth Benchmarking Study. SIGMOD 2017
Influence Maximization January 27, 2017 NEDB, 2017 19 / 19