Top Banner
From Viral Marketing to Social Advertising: Ad Allocation Under Social Influence Çiğdem Aslay (UPF) 1 Supervisors: Prof. Dr. Ricardo Baeza-Yates (UPF) Dr. Francesco Bonchi (ISI)
54

Aslay Ph.D. Defense

Jan 26, 2017

Download

Data & Analytics

Cigdem Aslay
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Aslay Ph.D. Defense

From Viral Marketing to Social Advertising: Ad Allocation Under Social Influence

Çiğdem Aslay (UPF)

1

Supervisors: Prof. Dr. Ricardo Baeza-Yates (UPF)

Dr. Francesco Bonchi (ISI)

Page 2: Aslay Ph.D. Defense

Outline

2

• Introduction

• Influence in Online Social Networks

• Viral Marketing and Influence Maximization

• Social Advertising: Promoted Posts

• Part I - Online Topic-aware Influence Maximization Queries

• Part II - Social Advertising: Regret Minimization

• Part III - Social Advertising: Revenue Maximization

• Conclusion

Page 3: Aslay Ph.D. Defense

Influence in Online Social Networks

Grumpy Cat• 25K+ votes in Reddit (< 1 day)• 1M+ views in Imgur • 300+ variants in Reddit • 100+ Quickmeme macros

nice meme! indeed!

(< 2 days)

3

• Social Influence Induced Viral Phenomena

Page 4: Aslay Ph.D. Defense

4

• Attached a promotional message with

a clickable URL for free sign up• Merely spent $50K• 12M users signed up within the first

18 months

• Sign-up to the service only through

invitation from a friend• No money spent on marketing• Resulted in bidding on Ebay for

invites

Influence in Online Social NetworksViral Marketing*

exploit the “word of mouth” effect in a social network to achieve marketing goals through self-replicating viral processes

* S. Jurvetson, “What Exactly is Viral Marketing”, Red Herring

Page 5: Aslay Ph.D. Defense

• Given

• a directed social network G = (V,E)

• a propagation model m

• a cardinality budget k

• Define• S: initial set of k (seed) nodes to start the propagation• σm(S): expected size of the influence propagation from S

• Find

S⇤= argmax

S✓V,|S|=k�m(S)

Influence Maximization

* Kempe et al., “Maximizing the spread of influence through a social network”, KDD 2003 5

Discrete Optimization Problem*

Page 6: Aslay Ph.D. Defense

Influence Propagation Models

Independent Cascade (IC) Model• Each arc (u,v) is associated with an influence probability puv • A node u activated at time t tries to influence each inactive neighbor v, with a

success probability puv

Topic-aware Independent Cascade (TIC) Model*

• An item i described as a distribution over K topics: • Topic specific influence probabilities on arcs: • Item specific success probabilities on arcs:

*N. Barbieri, F. Bonchi and G. Manco, “Topic-aware Social Influence Propagation Models”, ICDM 2012 6

Page 7: Aslay Ph.D. Defense

Complexity and Approximation• Influence Maximization is NP-Hard under both models

• TIC boils down to IC on the probabilistic graph Gi = (V,A,pi) • Reduction from the Set Cover problem

• Greedy algorithm • (1 – 1/e)-approximation* using monotonicity1 and submodularity2

7

#P-hard

*Nemhauser et al., “An analysis of approximations for maximizing submodular set functions I”, Mathematical Programming 1978

Page 8: Aslay Ph.D. Defense

• Implemented by online social networking platforms

• “Promoted Posts” are injected to the social feeds of users

• Similar to organic posts from friends in a social network

• Contain an advertising message: text, image or video

• Can propagate to friends via social actions: “likes”, “shares”

• Each click to a promoted post produces social proof to friends

• Advertisers have to pay for engagements / clicks

8

Social AdvertisingA market that did not exist until Facebook launched its first

advertising service in May 2005, projected to generate $11

billion revenue by 2017*

* http://www.unified.com/historyofsocialadvertising/

Page 9: Aslay Ph.D. Defense

9

Motivation

• Part II - Social Advertising: Regret Minimization • Part III - Social Advertising: Revenue Maximization

• Part I - Online Topic-aware Influence Maximization Queries

Enable online social influence analytics in support of viral marketing decision making

Influence Maximization

Computational Advertising

Page 10: Aslay Ph.D. Defense

Part I Online Topic-aware Influence

Maximization Queries

• C. Aslay, N. Barbieri, F. Bonchi, and R. Baeza-Yates. “Online Topic-aware Influence Maximization Queries”. Published in International Conference on Extending Database Technology (EDBT) 2014.

Page 11: Aslay Ph.D. Defense

Given • a social graph G = (V,E) • a space of Z topics • topic-specific peer-influence probabilities on arcs, pz

u,v

• a query item q, • cardinality budget k

• A TIM query asks to find a seed set of k nodes that maximizes the expected number of nodes adopting item q in the network:

11

Topic-aware Influence Maximization (TIM) Queries

Page 12: Aslay Ph.D. Defense

• TIM query can be processed by any influence maximization algorithm:

• Reduce TIC to IC via the derived graph Gq = (V,A,pq)

• Enjoy (1 – 1/e)-approximation guarantee

12

Topic-aware Influence Maximization (TIM) Queries

*Goyal et al., “CELF++: optimizing the greedy algorithm for influence maximization in social networks ”, WWW 2011

• Challenge: enormous number of potential queries• Any possible point lying on the probability simplex • Any potential query induces a different probabilistic graph

• Indexing is necessary for online TIM query processing• Need milliseconds response to enable online viral marketing analytics

Efficiency compromised:Takes days to process a single query for k = 50 on a graph with 30K nodes and

425K edges with CELF++*

Page 13: Aslay Ph.D. Defense

Influence Index

Index over pre-computed solutions of a limited number

of TIM queries.

13

• Similar peer influence probabilities • Similar influence propagation patterns

Similar items are likely to interest similar users

INFLEX

Page 14: Aslay Ph.D. Defense

Index Construction (Offline) • Phase 1: seed node extraction

• Phase 2: tree-based index construction

• Phase 3: list-based index construction

Query Processing (Online) • Phase1: topic-wise NNs retrieval • Phase 2: aggregation of pre-computed

seed sets of NN’s wrt topic-wise similarity

14

Page 15: Aslay Ph.D. Defense

Selection of Index Items• Space-based selection:

• Equi-distantly positioned topic distributions on the probability simplex

• (+) Fair coverage of the simplex • (-) Disregards the available workload

• Data-driven selection: • Catalog of items learnt from the log of past propagations

• (+) Queried items likely to follow the distributions learnt from past data • (-) Sparsity issues for skewed topic distributions in the catalog

The best of both approaches Simplex Sampling

15

Page 16: Aslay Ph.D. Defense

Selection of Index Items• Sampling from the probability simplex

• Estimate the Dirichlet distribution maximizing the log-likelihood of the

available workload • Generate a large sample with good simplex coverage • Bregman K-means++ clustering on the generated sample • Take distributions on the centroids as the index items

16

Page 17: Aslay Ph.D. Defense

Tree Construction• KL-Divergence for measuring similarity btw. probability distributions

1 Cayton, “Fast Nearest Neighbor Retrieval for Bregman Divergences”, ICML 2008 2 Nielsen et al., “Tailored Bregman Ball Trees for Effective Nearest Neighbors”, EuroCG 2009

Bregman Ball Trees1,2

• Hierarchical space partition based on convex Bregman Balls:

• Bregman k-means++ to generate child nodes from parent nodes • Gaussian clustering to find the optimal number of child nodes (k in k-means)

non-metric search space!

17

Page 18: Aslay Ph.D. Defense

• Neither range nor k-NN search • Anderson-Darling statistical test as stopping criterion

• if so far visited leaves provide “good enough” neigbours, return

• DFS starting from the root node to the leaf nodes • Navigation via projection of the query point onto Bregman balls

• Pruning strategy

• use an upper bound from current NN set:

• visit subtree only if it improves the current bound:

Similarity Search

18

Page 19: Aslay Ph.D. Defense

Rank Aggregation• Combine the seed node rankings of NN’s into a “consensus” ranking

Kemeny-Optimal Rank Aggregation

• Find a ranked list that has the min. Kendall-Tau distance to the input lists • Kendall-Tau distance: # of pairwise disagreements between 2 ranked lists

NP-Hard even for 4 input permutations*

Approximation via techniques from Social Choice Theory

19*Dwork et al., “Rank aggregation methods for the web.”, WWW 2001

Page 20: Aslay Ph.D. Defense

INFLEX – Rank Aggregation

Aggregation weights: non-linear transformation of KL-Divergence

Social Choice Theory strives for fairness..

Weighted Borda Aggregation• Borda score: total # of list-elements preceded in all the input lists • 5-approximation to the optimal Kemeny ranking

Weighted Copeland Aggregation• Copeland score: total # of list-elements that were defeated in the

pairwise comparison among all the input lists • 4-approximation to the optimal Kemeny ranking

20

Page 21: Aslay Ph.D. Defense

Experiments• Real-world FLIXSTER dataset

• Social graph: 30K users, 425k unidirectional social links

• Propagation Log {(User, Movie, Time)}

• Ratings on 12K movies

• Benchmarks devised via various INFLEX components• exactKNN: exact K-NN search (with best performing K)

• approxKNN: approximate K-NN search (with best performing K)

• approxKNN + Sel: approximate K-NN search + automatic list selection

• approxAD: Anderson-Darling test based approximate NN search

• INFLEX: Anderson-Darling test based approximate NN search with automatic list selection

21https://github.com/aslayci/INFLEX

Page 22: Aslay Ph.D. Defense

• Ground truth: standard (offline) greedy algorithm

22

Experiments

Page 23: Aslay Ph.D. Defense

Part II Social Advertising:

Regret Minimization

• C. Aslay, W. Lu, F. Bonchi, A. Goyal, and, L. V. Lakshmanan. “Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret”. Published in International Conference on Very Large Data Bases (VLDB) 2015.

Page 24: Aslay Ph.D. Defense

Social AdvertisingCost per Engagement (CPE) Model

• The social network platform owner (a.k.a. host) – Sells “ad-engagements” (“clicks”) to advertisers – Inserts promoted posts to the social feed of users likely to click

– high click-through-probability (CTP)

• Advertiser – Willing to pay a fixed CPE to host for each click

24

Ad allocation under social influence Strategically allocate users to advertisers, leveraging social influence and the propensity of ads to propagate, subject to limited advertisers’ budgets

Page 25: Aslay Ph.D. Defense

TIC-CTP Propagation ModelExtending TIC model with Click-Through-Probabilities

• Balance between intrinsic relevance in the absence of social proof and

peer influence • Ad-specific CTP for each user: δ(u,i)

• Probability that user u will click ad i in the absence of social proof

• Lemma 4.1: TIC-CTP reduces to TIC model with piH,u = δ(u,i)

• When δ(u,i) = 1 for all u and i, TIC = TIC-CTP

v

u

wH

puw

puv

pHvpHw

pHu

25

Page 26: Aslay Ph.D. Defense

Budget and Regret• Host:

• Owns directed social graph G = (V,E) and TIC-CTP model instance • Sets user attention bound κu for each user u ∊ V

• Advertiser i:

• agrees to pay CPE(i) for each click up to his budget Bi

• Exp. revenue of the host from allocating seed set Si to advertiser i: min(σi(Si) × CPE(i), Bi)

• σi(Si) × CPE(i) < Bi : Lost revenue opportunity for the host • σi(Si) × CPE(i) > Bi : Free service to the advertiser

Host’s regret

26

Page 27: Aslay Ph.D. Defense

Budget and Regret(Raw) Allocation Regret• Regret of the host from allocating seed set Si to advertiser i:

Ri(Si) = |Bi − σi(Si) × CPE(i)|

• Overall allocation regret: R(S1, …, Sh) = Ri(Si)

i=1

h

Penalized Allocation Regret• λ: penalty to discourage selecting large number of poor quality seeds • Regret of the host with seed set size penalization Ri(Si) = |Bi − σi(Si) × CPE(i)| + λ × |Si|

27

Page 28: Aslay Ph.D. Defense

Regret Minimization• Given

• a social graph G = (V,E) • TIC-CTP propagation model • h advertisers with budget Bi and CPE(i) for each advertiser i

• attention bound κu for each user u ∊ V • penalty parameter λ ≥ 0

• Find a valid allocation S = (S1, …, Sh) that minimizes the overall regret of the host from the allocation:

28

Page 29: Aslay Ph.D. Defense

Theoretical Analysis• Regret-Minimization is NP-hard and is NP-hard to approximate

• Reduction from 3-PARTITION problem

• Regret function is neither monotone nor submodular

• Still, a greedy algorithm:

29

selects the (ad,user) that gives the max. reduction in regret

Page 30: Aslay Ph.D. Defense

Approximation guarantee w.r.t. the total budget of all advertisers

• Theorem 4.2: Penalized allocation regret

• Raw allocation regret

• Theorem 4.3:

• Theorem 4.4:

Theoretical Analysis

30

Page 31: Aslay Ph.D. Defense

Scalable Algorithms

Two-Phase Iterative Regret Minimization (TIRM)

* Tang et al., “Influence maximization: Near-optimal time complexity meets practical efficiency”, SIGMOD 2014

Two-Phase Influence Maximization (TIM) Algorithm*

• Estimates influence spread for the most influential “s” nodes from a random sample of “θ(s)” RR-Sets θ(s): statistically sufficient sample size needed for accurate estimation of the influence spread of s nodes

Estimator:

TIM cannot be used for minimizing the regret Does not handle CTPs Requires predefined seed set size s

Built on the Reverse Influence Sampling framework of TIM

31

Page 32: Aslay Ph.D. Defense

(1) RR-sets sampling under TIC-CTP model: RRC-sets • Sample a random RR set R for advertiser i

• Remove every node u in R with probability 1 – δ(u,i)

• Form “RRC-set” from the remaining nodes

Scalability compromised: Requires at least 2 orders of magnitude bigger sample size for CTP = 0.01.

Theorem 4.5: MG(u | S) in IC-CTP = δ(u) * MG(u | S) in IC

TIRM

32

Page 33: Aslay Ph.D. Defense

TIRM

For each advertiser i:

• Start with a “safe” initial seed set size si

• Sample θi(si) RR sets required for si

• Update si based on current regret

• Revise θi(si), sample additional RR sets, revise estimates

(2) Iterative Seed Set Size Estimation

Estimation accuracy of TIRM Theorem 4.6

33

Page 34: Aslay Ph.D. Defense

Datasets and Parameters

TIC EM Learning

Exponential Distribution

WC Model

WC Model

sampled uniformly at random from [0.01, 0.03]

Peer influence probabilities:

CTPs:

34

Experiments

https://github.com/aslayci/TIRM

Page 35: Aslay Ph.D. Defense

Algorithms Tested• MYOPIC: Top κu ads for which u has the highest δ(u,i) * CPE(i)

• MYOPIC+: Budget-aware MYOPIC enhancement • Greedy-IRIE: Instantiation of the Greedy algorithm with IRIE* heuristic • TIRM:

• ε set to 0.1 for quality experiments on FLIXSTER and EPINIONS • ε set to 0.2 for scalability experiments on DBLP and LIVEJOURNAL

* K. Jung, W. Heo, and W. Chen, "IRIE: Scalable and Robust Influence Maximization in Social Networks", ICDM 2012 35

Experiments

Page 36: Aslay Ph.D. Defense

Ove

rall

Reg

ret

6.5%16%

145%

205%

36

2.5%

26%

122%

141%

Page 37: Aslay Ph.D. Defense

Scalability Experiments – Running Time

16 min.s (47 seeds)

5 hours (4649 seeds) 1.5 hours

(5866 seeds)

37

Page 38: Aslay Ph.D. Defense

38

Part III Social Advertising:

Revenue Maximization

• C. Aslay, F. Bonchi, L. V. Lakshmanan, and W. Lu. “Revenue Maximization in Incentivized Social Advertising”. Submitted to International Conference on Very Large Data Bases (VLDB) 2017. (ArXiv e-prints, arXiv: 1612.00531)

Page 39: Aslay Ph.D. Defense

Incentivized Social AdvertisingCPE model with seed user incentives

39

• Advertiser • Pays a fixed CPE to host for each

engagement

• Pays monetary incentive to each seed user engaging with his ad

• Total payment subject to his budget

• Host • Sells ad-engagements to advertisers • Inserts promoted posts to feed of users in exchange for monetary incentives

• Seed users take a cut on the social advertising revenue

Page 40: Aslay Ph.D. Defense

Revenue Maximization• Given

• a social graph G = (V,E) • TIC propagation model • h advertisers with budget Bi and CPE(i) for each ad i

• seed user incentives ci(u) for each user u∈V and for each ad i

• Find an allocation S = (S1, …, Sh) that maximizes the overall revenue of the host from the allocation:

40

Page 41: Aslay Ph.D. Defense

Theoretical Analysis• Revenue-Maximization problem is NP-hard

• Restricted special case with h = 1:

• NP-Hard Submodular-Cost Submodular-Knapsack* (SCSK) problem

41*Iyer et al., “Submodular optimization with submodular cover and submodular knapsack constraints”, NIPS 2013.

Partition matroid

Submodular knapsack constraints

• Family 𝘊 of feasible solutions form an Independence System

• Two greedy approximation algorithms w.r.t. sensitivity to seed user costs during the node selection

Page 42: Aslay Ph.D. Defense

Theoretical Analysis• Cost-agnostic greedy algorithm

• Selects (node,ad) pair giving the max. marginal increase in revenue

• Theorem 5.2: Approximation guarantee follows* from 𝘊 forming an independence system

where • R and r are, respectively, upper and lower rank of 𝘊

• κπ is the curvature of total revenue function π(.)

42* Conforti et al., "Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some

generalizations of the Rado-Edmonds theorem.", Discrete Applied Mathematics 1984

Page 43: Aslay Ph.D. Defense

Theoretical Analysis• Cost-sensitive greedy algorithm

• Selects the (node,ad) pair giving the max. rate of marginal gain in

revenue per marginal gain in payment

• Theorem 5.3: Approximation guarantee obtained

where • ρmax and ρmin are, respectively, max. and min. singleton payments

• κρi is the curvature of ad i’s payment function ρi(.)

43

Page 44: Aslay Ph.D. Defense

Scalable AlgorithmsTwo-Phase Iterative Revenue Maximization• Built on the Reverse Influence Sampling framework of TIRM (Part II)

• Latent seed set size estimation

44

• Two-Phase Iterative Cost-Agnostic Revenue Maximization (TI-CARM)

• Two-Phase Iterative Cost-Sensitive Revenue Maximization (TI-CSRM)

Page 45: Aslay Ph.D. Defense

Datasets and Parameters

TIC EM Learning

TIC WC Model

WC Model

WC Model

Peer influence probabilities:

45

Experiments

Page 46: Aslay Ph.D. Defense

Algorithms Tested

46

Experiments

} • TI-CARM

• TI-CSRM • PageRank

• For each ad i, select the best candidate user wrt Pagerank ordering

• Among those, select the (user, ad) pair giving maximum marginal increase in the

revenue of the host

• ε set to 0.1 for quality experiments on FLIXSTER and EPINIONS • ε set to 0.2 for scalability experiments on DBLP and LIVEJOURNAL

Page 47: Aslay Ph.D. Defense

Experiments

47

Revenue vs Seed User Incentive Costs

Page 48: Aslay Ph.D. Defense

Experiments

48

Revenue vs Window Size

Page 49: Aslay Ph.D. Defense

Experiments

49

Scalability Results - Running Time

Page 50: Aslay Ph.D. Defense

Experiments

50

Scalability Results - Memory (GB)

Page 51: Aslay Ph.D. Defense

• Novel problem formulation

• Initiated the investigation of topic-aware influence indexing techniques in

the influence maximization literature

• First step towards enabling online social influence analytics

• Orthogonal to efforts on scalable and efficient influence maximization

algorithms

• Many direct follow ups1,2,3

51

ContributionsPart I - Online Topic-aware Influence Maximization Queries

1 S. Chen et al., "Online Topic-aware Influence Maximization", VLDB 2015

2 Li et al., "Real-time Targeted Influence Maximization for Online Advertisements", VLDB 2015

3 W. Chen et al., "Real-Time Topic-aware Influence Maximization using Preprocessing", ICCS 2015

C. Aslay, N. Barbieri, F. Bonchi, and R. Baeza-Yates. “Online Topic-aware Influence Maximization Queries”. Published in EDBT 2014.

Page 52: Aslay Ph.D. Defense

• Initiated the investigation in the area of Social Advertising through the Viral

Marketing lens to address problems that Influence Maximization and

Computational Advertising literature fail to address in isolation

• Introduced novel discrete optimization problem with provable approximation

guarantees

• Introduced TIC-CTP propagation model

• Extended the state-of-the-art influence maximization algorithms for scalable

greedy approximation

• Latent seed set size estimation

• Handling TIC-CTP propagation model52

ContributionsPart II - Social Advertising: Regret Minimization

C. Aslay, W. Lu, F. Bonchi, A. Goyal, and, L. V. Lakshmanan. “Viral Marketing Meets Social Advertising: Ad Allocation with Minimum

Regret”. Published in VLDB 2015.

Page 53: Aslay Ph.D. Defense

53

ContributionsPart III - Social Advertising: Revenue Maximization

*Iyer et al., “Submodular optimization with submodular cover and submodular knapsack constraints”, NIPS 2013.

• Initiated the investigation in the area of Incentivized Social Advertising through the Viral Marketing lens

• Introduced novel discrete optimization problem

• Provided cost-agnostic and cost-sensitive approximation guarantees to submodular function maximization subject to a matroid and multiple submodular knapsack constraints

• Generalization of the restricted single submodular knapsack version of the problem (SCSK*)

• Theoretical results also valid for linear knapsack constraints

• = 0 when payment function for ad i is modular

C. Aslay, F. Bonchi, L. V. Lakshmanan, and W. Lu. “Revenue Maximization in Incentivized Social Advertising”. Submitted to

VLDB 2017. (arXiv: 1612.00531)

Page 54: Aslay Ph.D. Defense

Thank you!

54