Predicting Information Diffusion in Social Networks Using Content and User’s Profiles

Predicting Information Diffusion on Social Networks withPartial Knowledge

Anis NajarLaboratoire d’informatique de

Paris 6University Pierre et Marie

Curie- UPMCParis, France

[email protected]

Ludovic DenoyerLaboratoire d’informatique de


Curie - UPMCParis, France

[email protected]

Patrick GallinariLaboratoire d’informatique de


Curie - UPMCParis, France

[email protected]

ABSTRACTModels of information diffusion and propagation over largesocial media usually rely on a Close World Assumption: in-formation can only propagate onto the network relationalstructure, it cannot come from external sources, the networkstructure is supposed fully known by the model. These as-sumptions are nonrealistic for many propagation processesextracted from Social Websites. We address the problem ofpredicting information propagation when the network diffu-sion structure is unknown and without making any closedworld assumption. Instead of modeling a diffusion process,we propose to directly predict the final propagation state ofthe information over a whole user set. We describe a gen-eral model, able to learn predicting which users are the mostlikely to be contaminated by the information knowing an ini-tial state of the network. Different instances are proposedand evaluated on artificial datasets.

Categories and Subject DescriptorsI.m [Computing Methodologies]: Miscellaneous

KeywordsDiffusion, Social Networks, Machine Learning

1. INTRODUCTIONThe diffusion and propagation of information over large

social media has been an active research domain recently.Propagation models are often inspired from earlier work onepidemiology or marketing. Most of them consider that anode is either active or inactive and that active nodes cancontaminate or propagate the information to other nodes.The different models differ on the assumptions upon the wayinformation spreads from one node to another. Besides pro-viding models for the propagation process, these techniquescan be used for tasks like opinion leader detection [3]. Mostmodels rely on strong assumptions:

1. the propagation network is completely known,

2. information can only propagate onto this network andcannot come from external sources,

Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). Distribution of these papers is limited to classroom use,and personal use by others.WWW 2012 Companion, April 16–20, 2012, Lyon, France.ACM 978-1-4503-1230-1/12/04.

3. only one type of information is considered.

For many practical cases, these assumptions are hardly met.The network might be partly known or even completelyunknown. Most often, it is impossible to characterize thewhole network due to its size and to the difficulty of trackingthe different events characterizing the relational structure.In social networks, information often comes from externalsources so that the same information may appear at differ-ent places and times in the network without being prop-agated through the network [1]. Users in a network willpropagate information differently according to their profileor to their domain of interest. They will then play differentroles according to the nature of the information to be prop-agated. Models generally attempt to mimic the diffusion/propagation behavior at the node level in order to reproducecascades of information observed at different places in thenetwork or to reproduce the global contagion mechanism.Their main goal is then to explain or analyze the actual dif-fusion process.

We consider here the problem of information propagationprediction on social media: given a state of contaminationof the network at a given time t what will be the state ofcontamination at time t′ > t? This is related to, but differ-ent from the problem of diffusion modeling. For the former,the goal is to predict the state of the network at a giventime, while for the later, the goal is to model each step ofthe diffusion process. We present a prediction model whichdoes not rely on hypothesis 1 and 2. The model does notrequire the knowledge of the network structure, although itcould take benefit from a partial or complete knowledge ofthis structure. This model is based on a regression frame-work. It can incorporate the effect of external informationsources so that the information is not restricted to propa-gate onto the network only. Extensions of this model canalso deal with multiple sources (hypothesis 3) but are notdetailed here. The paper is organized as follows. In Section2 we introduce notations and define the prediction task. InSection 3 we present our model. In Section 4 we describelarge scale experiments made to evaluate the model. In sec-tion 5 we review related work.

2. NOTATIONS AND TASKS DEFINITION

2.1 NotationsWe introduce here notations used throughout the paper.

WWW 2012 – MSND'12 Workshop April 16–20, 2012, Lyon, France

1197

• A social network is modeled as a graph G = (N , E),where N = (n1, ...nN ) is a set of nodes or users, andE = {ei,j ∈ [0; 1]} denotes edges representing relationsbetween users, such that ei,j is the weight of the rela-tion between nodes. i and j, ei,j = 0 means that thereis no link between user i and user j.

• Information propagation is modeled here as a discreteprocess, so that at each step of this process, the net-work state may be represented as a vector of contami-nation. Let Mk be a contamination matrix represent-ing the propagation of an information1:

Mk =

⎛⎜⎜⎜⎜⎝

mk1,1 mk

1,2 . . . mk1,Tk

mk2,1 mk

2,2 . . . mk2,Tk

......

. . ....

mkN,1 mk

2,2 . . . mkN,Tk

⎞⎟⎟⎟⎟⎠

mki,t is the contamination of user i at time t. Clas-

sically mki,t ∈ 0; 1, i.e. user is contaminated or not.

We will also consider the case mki,t ∈ [0; 1] when our

knowledge about the contamination is uncertain. T k

corresponds to the duration of the contamination pro-cess: after time T k no more individuals will be con-taminated. We use here a relative datation i.e t = 1corresponds to the first time a user has been contam-inated for information k - e.g. the date of the firstapparition of a particular tweet on a microblog site forexample and T k is the time where the propagation hasfinished. For a given network, different informationcascades will be observed corresponding to differentMks.

The model parameters will be estimated from samples ofpropagation cascades. We denote by

(M1, ...,M �

)a set of

training propagation matrices used for estimating the modelparameters and by

(M �+1, ...,MM

)a set of test matrices

used for evaluation. For example,(M1, ...,M �

)may corre-

spond to past observations and(M �+1, ...,MM

)to future

observations to be predicted.

2.2 Prediction taskExisting propagation models are used to model how in-

formation spreads over a social network. These models mayalso be used for predicting the information propagation:given an initial state of the network, the model is run andpredicts at each step the propagation at each node. Theyusually make Closed World Assumptions 1 and 2 : they con-sider that the diffusion network is known and that informa-tion can only propagate through the network without inter-action from the external world. In many cases, this is notrealistic. We propose here a prediction model which doesnot rely on these assumptions. For this, we will focus on thefollowing task: predict the final contamination state of thenetwork given an initial contamination. This task amountsat learning a correspondence between the initial and finalstates of the network without considering the intermediatesteps:1For simplification, we consider that all information followthe same propagation process (assumption (iii), i.e. we donot differentiate the propagation according to the messagecontent.

(G,mk

1

)⇒ mk

1,Tk (1)

where mk1 = (mk

1,1, ...,mkN,1)

T 2 is the vector of initial con-tamination i.e the contamination of all the users by a giveninformation the first time this information appears in ob-served data, mk

Tk = mk1,Tk , ...,m

kN,Tk )T is the vector repre-

senting the final state of contamination we want to predict,G denotes the social network, which might be partially ob-served or even completely unknown.

3. PROPOSED APPROACH

3.1 General ModelThe proposed approach directly predicts the final contam-

ination values without going through the modeling of thewhole diffusion process at each time step and at each nodelike most models do. To compare the two methodologies,one can make an analogy with the predictive and model-ing or generative approaches for discrimination or regressionproblems. Predictive approaches take a direct route to theprediction problem, while modeling ones learn the genera-tive process of the data. Both approaches have their ownadvantages: generative methods work better when able tocapture the real data distribution or when only few trainingdata is available. If there is no hint about the distributionof the data - predictive models generally achieve a betterperformance. We will come back to this point in the exper-iments section. As far as we know, this model is the firstattempt to use direct predictive models in the context ofinformation diffusion.

Let us denote by fθ a parameterized regression modelwhere θ is a set of parameters to be learned on the trainingset. fθ will be trained to associate a final contaminationstate, to any initial contamination state:

fθ(G,mk

1

)⇒ mk

1,Tk (2)

Different types of predictors may be used. We focus hereon a family of predictors with the following form:

fθ,j(m1) = g

(N∑i=1

wj,iθj,imi,0

)(3)

where wj,i is a predefined parameter that may be used torepresent the graph structure of the network when available,g is a transfer function and {θi,j}i,j∈[1;N ]2 is the set of pa-rameters to be learned.

3.2 Instances of the General ModelWe present here different variants of the model that have

been used in the tests presented in the experimental sec-tion. They correspond to different assumptions made onthe propagation schema that has to be learned.

• Linear Model (LM) The simplest model is a classicallinear regression:

fLMθ,j (m1) =

N∑i=1

θj,imi,1 (4)

2T is the transpose operator


1198

This model learns one influence weight θj,i betweeneach couple of users (uj , ui), these weights have realvalues and can represent either positive or negativeinfluence between users. The number of parameters isN2.

• Logistic Model (LoM) LoM is the logistic version ofthe LM model. It can be written as:

fLoMθ,j (m1) = logit

(N∑i=1

θj,imi,1

)(5)

where logit is the classical logistic function. This modelforces the predicting contamination to be between 0and 1. The number of parameters is also N2

• Positive Linear Model (PLM) The positive linear modelis a constrained version of the linear model where allinfluence weights are forced to be positive. Here wehave used the following implementation of the con-straint:

fPLMθ,j (m1) =

N∑i=1

θ2j,imi,1 (6)

where the θj,i are as before real values. The influenceof a user over another one can only be positive hereThe number of parameters is again N2. The PLoMmodel is the Logistic equivalent to this model.

• Graph Based Positive Linear Model (GPLM) None ofthe above model considers the structure of the socialnetwork G. The knowledge of this structure may beeasily taken into account in our general predictive for-mulation. In the case of the PLM model, this variant,denoted GPLM, takes the following form:

fGPLMθ,j (m1) =

N∑i=1

wj,iθ2j,imi,1 (7)

where wj,i is the weight of the edge between ui and uj

in the network. When there is no edge, this weight is 0and there is no propagation between the graph nodes.GPLM thus restricts the propagation to the knowngraph structure. One advantage of this model is thatit only learns |E| parameters instead of N2, resulting ina faster algorithm. A drawback is that when the closedworld assumption is false, it will usually do worse thanthe more general models above.

3.3 LearningLearning the predictor is performed classically by mini-

mizing a loss function on the training set over the modelparameters. Let Δ(a, b) measure the cost of predicting awhen the target is b, the global loss function takes the form:

L(θ) =l∑

k=1

Δ(fθ(mk1),m

kTk

) + λ ‖θ‖2

where l is the number of cascades (examples) in the trainingset, ml

1 is the initial state of the network for cascade l, mlTk

isthe corresponding target state, and λ ‖θ‖ is a regularizationterm. Here again different loss functions Δ(., .) could beused. In the experiments we have been using a classicalsquare loss. The training problem then amounts at solving

θ∗ = argminθL(θ)

This is solved using a gradient-descent method. While mostof our models have a complexity O(N2), note that this com-putation can be easily done using GPU − based computers,resulting in models that are able to learn quickly with a verylarge amount of training data.

3.4 ComplexityWe give here an overview of the learning complexity of the

different models and discuss the consequences. Concerningthe general model, we aim at learning one parameter for eachpair of users resulting in N2 parameters. On a network of1,000 users, it means that our approach needs the evaluationof 1 million parameters. This is a major drawback of the pro-posed model which resulting complexity is O(N2) and thuswhich cannot be applied on very large networks, even usingimplementation tricks or GPU. The GPLM model, which isbased on the structure of the graph as a lower complexity ofO(|E|) where |E| is the number of edges in the original net-work. It is easier and faster to train, but it cannot modeledcomplex diffusion processes due to the closed world assump-tion it is based on. The work presented here is a prelimi-nary work and we plan to study different way to reduce thecomplexity of the general model. The first simple idea isto consider a GPLM model that learns one parameter foreach pair of users that are connected by a path of lengthmaximum = L. In this case, the number of parameters toestimate is greater than |E|, but lower than N2 - dependingon the structure of the graph - and the model is able to learnlong-term propagation. In that case, the value of L > 1 isused to determines the mix between the complexity and theexpression power of the model. The other perspective is torewrite the model using sparse L1 regularizers that will en-courage the algorithm to find a sparse solution, where manyof the θ.,. parameters will be set to 0. This can be done bywritting the objective function as:

L(θ) =l∑

k=1

Δ(fθ(mk1),m

kTk

) + λ|θ|

where λ is the meta-parameter that is used to choose thesparsity of the model. The higher λ, the more sparse andfast will be the resulting solution. This solution will beexplored in a future paper.

4. EXPERIMENTS

4.1 DatasetsExperiments have been performed using artificially gener-

ated cascades over real social networks. Since capturing realrepresentative cascades is not trivial, this allows perform-ing extensive experiments on many different situations andto compare with baseline propagation models in situationswhere they can be used. We have captured social graphsfrom different Web sites - We provide here results for twoof these sites: UsAir97 (direct flights between airports) andPolBlogs (Political blogs). The statistics for the two sitesare provided in Table 1 - these graphs are then representa-tive of real diffusion structures. We have then used classicalIndependent Cascade models (ICM) and Linear ThresholdModels (LTM)in order to generate artificial cascades overthese structures. Different structures and different param-eterization of the Independent Cascade Model (IC) and the


1199

Network Nb. Nodes Nb. LinksUsAir97 332 2 126PolBlogs 1 493 19 091

Table 2: Statistics over the UsAir97 and PolBlogsdatasets

Linear Threshold Model (LTM) 3 have been used in orderto provide a variety of training and testing situations. Thegenerated cascades are then considered as the gold standardand the goal is to predict the propagation on these data. Foreach experiment, the ICM or LTM model is used to generate2 000 contamination matrices: 1 000 for training and 1 000for testing. From these graphs we have also extracted partialgraphs by keeping 50%, 75% et 100% of the original nodes- 100% corresponds to the full network. This will allow usto compare generative ICM and LTM approaches with pre-dictive models in situations where the known graph onlyimperfectly reflects the true diffusion structure usec for gen-erating the data and also to analyze how the ICM and LTMmodels degrade when the graph structure is only imperfectlyknown. These partial graphs are generated as follows. Outof a complete graph G with N nodes, one selects N ′ ≤ Nnodes and build a subgraph G′ consisting of these nodes plusthe edges between the nodes from the original graph G.By varying the parameters of the models and of the gen-

eration processes, we have performed a large set of exper-iments. We will present here only some representative re-sults. Note that the behavior of the models is very similaron the different datasets.

4.2 EvaluationPrediction models produce scores at each node. The pre-

dictive models directly produce real final contamination scores.ICM and LTM models can be used with Monte Carlo simu-lation in order to estimate the final probability for each userto be contaminated: starting from an initial contaminationstate for a given graph, the diffusion process for the modelis simulated on the graph structure until it stabilizes andproduces the final contamination state. LTM models aredeterministic so that they produce only one final state fora given initial condition. The contaminated nodes will havea value of 1 while for the others it will be 0. ICM modelsare stochastic (see Section 5) so that different runs from thesame initial state will produce different propagation valueson the nodes. For estimating the node scores, 1000 propa-gations are run for the same initial contamination state andthe scores obtained at each node after stabilization are aver-aged over all the runs, giving a probability of contamination.The scores produced are thus real values which will play thesame role as the scores obtained with the predictor models.

For the performance measure we have been using precision-recall (P-R) curves [5]. The node scores obtained with agiven model are ordered in decreasing order of their values(in the case of binary scores all nodes with score 1 are abovethe nodes with score 0), and then P-R curves are computedfrom these ranked lists as it is classically done for examplefor lists returned by search engines. This avoids defining de-cision thresholds for the contamination values and provides

3A description of the IC and LTM models used in the ex-periments is provided in the Section 5

a richer information on the systems’ behavior. P-R curvesreflect the ability of the prediction model to produce a highrank for users that are susceptible to be contaminated givenan initial network state.

4.3 ExperimentsFigure 1 illustrates the performance of the predictive mod-

els for the UsAir97 dataset for cascades generated by an ICMmodel and Figure 2 for the Polblogs dataset with cascadesgenerated with an LTM model. In all the figures, are plot-ted the R/P curves for the model used for generating thecascade (respectively ICM and LTM), for the best predic-tive model, for the alternative generative model (LTM ifdata have been generated via ICM an vice versa). For com-parison, we have also plotted the performance of a randomprediction model which predicts a random score and of anIdentity model which predicts the initial input state. Figure1-1 to 1-3 give respectively the performance for the partialgraphs with 50% and 75% of the initial graph nodes and forthe complete graph. In all cases, the predictive models doesnot take into account the graph structure and learns or pre-dict only by considering the initial and final contaminations.

For ICM generated data, the predictive model is almostas good as the ICM model for the complete graph (Figure1-3). ICM performance slightly degrades on partial graphsbut remains good meaning that on these datasets, ICM isrobust to a degradation of the graph structure. The predic-tive model performance on the other sides does not degradeand progressively becomes higher than the one of the gener-ating model (Figures 1-2 and 1-1). The less is known aboutthe graph, the higher is the difference between the two mod-els. Although the predictive model has learned on completegraphs, it is extremely robust to missing information andpredicts well in all situations. The performance of the bestLTM model is rather low for all the situations: LTM cannotpredict cascades generated by ICM and is sometimes worseat that than the simple identity model.

Figure 3-1, 3-2 and 3-3 compare different predictive mod-els on the ICM and LTM generated cascades. Models withpositive constraints behave better than unconstrained pre-dictors. There is no negative interaction between nodes inall the data generated for the experiments and the positivityconstraints help the model to learn solutions that generalizebetter. Note that our predictive models are however able tolearn both positive and negative interactions which is an in-teresting property since negative interactions do happen inmany cases [2]. ICM or LTM inspired models can also han-dle negative interactions [2], however they must be adaptedfor such case, whereas the same predictive model can handleboth cases since it only relies on the data to be predictedand does not make any hypothesis on the way it propagates.

Table 2 provides F1 scores for the UsAir data set for thedifferent models. It can be seen that predictive models arebetter than ICM or LTM as soon as the network informationis incomplete and close to the performance of the model usedfor generating the data when the full graph is known to thismodel.

Another set of experiments has been performed by incor-porating the knowledge of the graph or of the partial graphin the predictive model. Figures 5-1 to 5-3 compare predic-tion with and without this graph knowledge for predictivemodels. Performances are quite similar for both models forall cases and even slightly lower sometimes in the case of


1200

https://www.researchgate.net/publication/229025007_Influence_Maximization_in_Social_Networks_When_Negative_Opinions_May_Emerge_and_Propagate?el=1_x_8&enrichId=rgreq-14e0d086-0f51-42b1-90ac-da102e256747&enrichSource=Y292ZXJQYWdlOzI2MjIwNjY4NTtBUzoxMDM1OTc4ODI3NDA3NTBAMTQwMTcxMTA2OTk5NA==

https://www.researchgate.net/publication/229025007_Influence_Maximization_in_Social_Networks_When_Negative_Opinions_May_Emerge_and_Propagate?el=1_x_8&enrichId=rgreq-14e0d086-0f51-42b1-90ac-da102e256747&enrichSource=Y292ZXJQYWdlOzI2MjIwNjY4NTtBUzoxMDM1OTc4ODI3NDA3NTBAMTQwMTcxMTA2OTk5NA==

https://www.researchgate.net/publication/200045867_An_Introduction_to_Information_Retrieval_DRAFT?el=1_x_8&enrichId=rgreq-14e0d086-0f51-42b1-90ac-da102e256747&enrichSource=Y292ZXJQYWdlOzI2MjIwNjY4NTtBUzoxMDM1OTc4ODI3NDA3NTBAMTQwMTcxMTA2OTk5NA==

Prediction Model Partial Network 50% Partial Network 75 % Full NetworkIC 0.1% 62.7 64.9 74.7IC 0.3% 83.1 85.2 89.0IC 0.5% 87.0 87.1 87.4

LTM 0.1% 86.9 86.9 86.1LTM 0.3% 69.9 74.3 71.4LTM 0.5% 47.2 47.6 44.1Identity 36.4 37.6 37.0Random 60.2 59.9 60.0

LM 87.8 86.6 86.6PLM 88.7 88.2 88.6LoM 88.7 88.8 88.1PLoM 86.4 85.6 86.1

Table 1: F1 measure on the UsAir97 Corpus with a generating model IC 0.3 (i.e. the probability for a nodeto activate any of its neighbors is 0.3 - see Section 5)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1

PLMICM 0.3LTM 0.1identityrandom

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1


Partial Network 50% Partial Network 75 % Full Network

Figure 1: P-R Curves obtained on the UsAir97 network, with a simulation model IC 0.3 for different size ofthe partial network. Only the best model of each family( IC,LTM and Discriminant) has been illustrated.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PLMIC 0.1LTM 0.5identityrandom

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PLMIC 0.1LTM 0.5identityrandom


Figure 2: P-R Curves obtained on the PolBlogs network, with a simulation model LTM 0.5 for different sizeof the partial network (i.e. the threshold for being contaminated at a node is 0.5 - see Section 5). Only thebest model of each family(ICM,LTM and Discriminant) has been illustrated. For the full network, LTM 0.5gives perfect results.


1201

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1

LMPLMLoMPLoMidentityrandom

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1



Figure 3: P-R Curves obtained on the UsAir97 network, with a simulation model IC 0.3 for different size ofthe partial network and for the different discriminant models described in the paper.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1



Figure 4: P-R Curves obtained on the PolBlog network, with a simulation model LTM 0.5 for different sizeof the partial network and for the different discriminant models described in the paper.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PLMGPLMidentityrandom

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1



Figure 5: P-R Curves obtained on the PolBlogs network, with a simulation model LTM 0.5 for different sizeof the partial network. Comparison between PLM and the graph based GPLM


1202

partial graphs. There is no performance gain obtained byexploiting the graph structures for predictive models. How-ever the complexity of the graph-based predictive model islower than the one of full predictive models which might be-come advantageous in the case of large or very large graphs.

5. RELATED WORKMost LTM and ICM inspired models make use of pre-

defined parameters. Recently, some papers have proposedto learn the model parameters from data using maximumlikelihood [9, 4, 8]. An interesting problem also recentlyaddressed by learning from observations is the inference ofdiffusion networks [7, 6]. Finally, modeling the diffusion overunknown networks is addressed in [10], this paper focuses onmodeling the temporal dynamics of the diffusion and globalstatistics like the volume of infection.

We provide here a brief description of the IndependentCascade (ICM) and Linear Threshold (LTM) models usedin this paper. ICM and LTM are two basic reference modelswhich have been widely studied and for which many exten-sions have been considered. [3] describes a unified view ofthese models and several extensions. Both models operateon a directed graph G. A node may be active or inactive.Starting from an initial set of active nodes, a discrete pro-cess is unfolded in time where at each time step more nodesmay become active under the influence of their neighbours.In both models, an active nodes remains active, although invariants or in related models, a node may recover and be-come again inactive? Information propagates on the graphuntil no more node can become active.

ICM operates in a push mode. It start from a set of activenodes A(0). When a node v becomes active at time t, it willget a unique chance to active each of its neighbours w. wwill become active at time t + 1 according to a probabilitypv,w. Whether or not w becomes active, v is not allowed toattempt activating w in later steps. The pv,w are parametersof the model. In the experiments performed here, all nodeshave the same probability p to contaminate their neighbors.ICM 0.3 for example will denote a model with p = 0.3.

LTM operates on a pull mode. Each node v is given athreshold tv which may be chosen at random uniformly in[0, 1], although related models use fixed threshold values.Edges (w, v) in G, with w a parent of v are weighted bya positive value bvw such that

∑w∈N(v) bvw ≤ 1. Starting

from an initial set of active nodes A0, the contaminationprocess unfolds as follows: at time step t, if v is active, itremains so, otherwise it will become active if the weight sumof its parents in G is above its threshold tv:

∑w∈N(v) bvw ≥

tv. The bvw are parameters of the model. In the experimentsall the nodes have the same threshold t and LTM0.3 denotesa model with t = 0.3.

6. CONCLUSIONFor predicting the final information propagation state over

information networks, we have introduced a new approachwhich directly aims at predicting this final state withoutmodeling the whole diffusion process over the network. Thisapproach does not make closed world assumptions familiarto most information diffusion models. We have proposeda general predictive model implementing this approach anddifferent instances of this model. Tests have been performedon different artificially generated cascades over real social

network structures. These experiments have shown that thepredictive approach is able to learn predicting final contam-ination states from data generated by different models andoutperforms these models as soon as the information aboutthe network structure becomes unreliable. Future work willexamine the behavior and performance of theses predictivemodels over real propagation processes observed from largesize datasets.

7. AKNOWLEDGMENTSThis work was partially supported by the French National

Agency of Research (ExDeuss/Cedres and MLVIS Projects).

8. REFERENCES[1] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto,

and Krishna P. Gummadi. Measuring User Influencein Twitter: The Million Follower Fallacy. In InProceedings of the 4th International AAAI Conferenceon Weblogs and Social Media (ICWSM).

[2] Wei Chen, Alex Collins, Rachel Cummings, Te Ke,Zhenming Liu, David Rincon, Xiaorui Sun, YajunWang, Wei Wei, and Yifei Yuan. Influencemaximization in social networks when negativeopinions may emerge and propagate. In SDM, pages379–390, 2011.

[3] David Kempe, Jon Kleinberg, and Eva Tardos.Maximizing the spread of influence through a socialnetwork. In Proceedings of the ninth ACM SIGKDDinternational conference on Knowledge discovery anddata mining - KDD ’03, page 137, New York, NewYork, USA, August 2003. ACM Press.

[4] Masahiro Kimura, Kazumi Saito, Kouzou Ohara, andHiroshi Motoda. Learning information diffusion modelin a social network for predicting influence of nodes.Intell. Data Anal., 15(4):633–652, 2011.

[5] Christopher D. Manning, Prabhakar Raghavan, andHinrich Schutze. Introduction to information retrieval.Cambridge University Press, 2008.

[6] Seth A. Myers and Jure Leskovec. On the convexity oflatent social network inference. In NIPS, pages1741–1749, 2010.

[7] Manuel Gomez Rodriguez, David Balduzzi, andBernhard Scholkopf. Uncovering the temporaldynamics of diffusion networks. In Lise Getoor andTobias Scheffer, editors, Proceedings of the 28thInternational Conference on Machine Learning(ICML-11), ICML ’11, pages 561–568, New York, NY,USA, June 2011. ACM.

[8] Kazumi Saito, Ryohei Nakano, and Masahiro Kimura.Prediction of information diffusion probabilities forindependent cascade model. In KES (3), pages 67–75,2008.

[9] Kazumi Saito, Kouzou Ohara, Yuki Yamagishi,Masahiro Kimura, and Hiroshi Motoda. Learningdiffusion probability based on node attributes in socialnetworks. In Marzena Kryszkiewicz, Henryk Rybinski,Andrzej Skowron, and Zbigniew W. Ras, editors,ISMIS, volume 6804 of Lecture Notes in ComputerScience, pages 153–162. Springer, 2011.

[10] Jaewon Yang and Jure Leskovec. Modelinginformation diffusion in implicit networks. In ICDM,pages 599–608, 2010.


1203

Predicting Information Diffusion in Social Networks Using Content and User’s Profiles

Documents