Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Inﬂuence Maximization Akhil Arora1, Sainyam

Debunking the Myths of InfluenceMaximization

Akhil Arora1, Sainyam Galhotra1, Sayan Ranu

[email protected] of Massachusetts, Amherst

January 27, 2017NEDB, 2017

1The first two authors have contributed equally to this work.Influence Maximization January 27, 2017 NEDB, 2017 1 / 19

[email protected]

Information Propagation2: Need for Modelling??

Many real-world processes can be interpreted using conceptsfrom information propagationFor example: Spread of Diseases

2Propagation/Flow/Spread/Diffusion, would be used interchangeablyInfluence Maximization January 27, 2017 NEDB, 2017 2 / 19

Need for Modelling??

Traffic Congestion and its propagation

Influence Maximization January 27, 2017 NEDB, 2017 3 / 19

Other Applications

Using the word-of-mouth effect for:

Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .


Other Applications

Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns



Other Applications

Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns



Existing Information Propagation models

Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

E

F

G

H



Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.

0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

E

F

G

H



0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B

D

C

E

F

G

H



0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

F

E

G

H



0.5 0.2

0.7

0.1

0.1

0.3

0.6

0.4

0.2

0.4

A

B C

D

E

F

G

H


The Influence Maximization (IM) Problem

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-set

Task: Identify the set of most-influential nodes in a networkMaximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation



Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network

Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation




Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network

Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation



Need for benchmarking? Wide variety of techniques

MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread

SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence

Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard

















Need for benchmarking? : Ambiguities

Existing Literature: Use IC, WC interchangeablyActual scenario: Varied behaviour in terms of the spread of seednodes, efficiency and scalability aspects of different techniques.

0 100 200

102

104

106

Ru

nn

ing

tim

e (

se

cs)

Seeds (k)

ICWC

Figure: IMM (ε = 0.5) for Orkut dataset


Need for benchmarking? : Ambiguities

State-of-the-art technique in one aspect behaves the worst inanother aspect of the problem.

0 100 200Seeds (k)

103

104

105

Mem

ory

(MB

)

EaSyIMIMM

(a) Memory

0 100 200Seeds (k)

101

102

103

104

Run

ning

tim

e (s

ecs)

EaSyIMIMM

(b) Running Time


Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?

What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?


Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?

Are the claims made by the recent papers true?


Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?


Our Framework

Generic framework applicable on all techniques.Unified approach to tune the external parameters.

Setu

p

Algorithms Propagation Models Datasets Configurations

IM F

ram

ew

orkSeed Selection Spread Computation

Monte-Carlo (MC)

Simulations

Insig

hts

Eva

luati

on

Quality

Efficiency Scalability

Node

Ord

ering

Influen

ce

Estim

atio

n

Convergence

Robustness

Upda

te

Data

Str

uctu

res

#Parameters

Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.


Our Framework

Generic framework applicable on all techniques.Unified approach to tune the external parameters.

Setu

p

Algorithms Propagation Models Datasets Configurations

IM F

ram

ew

orkSeed Selection Spread Computation

Monte-Carlo (MC)

Simulations

Insig

hts

Eva

luati

on

Quality

Efficiency Scalability

Node

Ord

ering

Influen

ce

Estim

atio

n

Convergence

Robustness

Upda

te

Data

Str

uctu

res

#Parameters

Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.


Myths

IMM is always faster than TIM+?

Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x

Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+

over HepPH dataset for 200 seeds


Myths

IMM is always faster than TIM+?

Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x

Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+

over HepPH dataset for 200 seeds


Myths

CELF++ is the fastest IM technique in the MC estimationparadigm?

50

60

70

80

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (WC)

CELFCELF++

(c) Nethept (WC)

120

130

140

150

160

170

180

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (LT)

CELFCELF++

(d) Nethept (LT)


Myths

CELF++ is the fastest IM technique in the MC estimationparadigm?

50

60

70

80

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (WC)

CELFCELF++

(e) Nethept (WC)

120

130

140

150

160

170

180

1 2 3 4 5 6 7 8 9 101112

Ru

nn

ing

Tim

e (

in m

in)

Independent Runs (LT)

CELFCELF++

(f) Nethept (LT)


Myths

SIMPATH is faster the LDAG?

0

0.2

0.4

0.6

0.8

1

1.2

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(g) Nethept

5

10

15

20

25

30

35

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(h) DBLP


Myths

SIMPATH is faster the LDAG?

0

0.2

0.4

0.6

0.8

1

1.2

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(i) Nethept

5

10

15

20

25

30

35

40 80 120 160 200

Tim

e (

in m

in)

#Seeds (k)

LDAGSIMPATH

(j) DBLP


Conclusions

No technique is the best on all aspects of IM.

Quality

EfficiencyMemory Footprint

TIM/IMMPMC

CELF/CELF++EaSyIM

ME

IRIEIMRank

StaticGreedy

LDAGSIMPATH

(k) Qualitative catego-rization of IM techniques

(l) Which technique to choose & when?


Thanks!

For more details, please refer :A. Arora, S. Galhotra, S. Ranu. Debunking the Myths of InfluenceMaximization : An In-Depth Benchmarking Study. SIGMOD 2017


Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf · 2020-01-25 · Debunking the Myths of Inﬂuence Maximization Akhil Arora1, Sainyam

Documents