An introduction to social network challengespeople.irisa.fr/Arnaud.Martin/publi/slideeEGC2018.pdf · An introduction to social network challenges Arnaud Martin Arnaud.Martin@univ-rennes1.fr

Post on 25-Sep-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

An introduction to social networkchallenges

Arnaud MartinArnaud.Martin@univ-rennes1.fr

Universite de Rennes 1 - IRISA - DRUID, Lannion, France

Paris, January, 22th 2018

An introduction to social network challenges, A. Martin - 22/01/18

1/73

OutlineSocial NetworkModelModel informationMining

1. What is a social network?

2. How to model a social network?

3. How to model information on social networks?

4. How to analyse social network?

An introduction to social network challenges, A. Martin - 22/01/18

2/73

What is a social network?Social NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

3/73

(1/11)

What is a social network?Social NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

4/73

(2/11)

What is a social network?Social NetworkModelModel informationMining

Collaborative platforms

An introduction to social network challenges, A. Martin - 22/01/18

5/73

(3/11)

What is a social network?Social NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

6/73

(4/11)

What is a social network?Social NetworkModelModel informationMining

A definitionA finite set of social actors (individual, organisations) with relations(collaboration, advice, control, influence, etc.) between them.

Remarks:

I Technical definition

I Is it really always finite?

I Relations and actors are never fixed

I Most of time not only one social network, not only one kind ofgroup (community)

An introduction to social network challenges, A. Martin - 22/01/18

7/73

(5/11)

Application domainsSocial NetworkModelModel informationMining

I Sociology

I Ethnology

I Economy

I Demography

I Criminal networks

I Social media

I Literary

I Ecology

I etc.

An introduction to social network challenges, A. Martin - 22/01/18

8/73

(6/11)

Notion of communitiySocial NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

9/73

(7/11)

Notion of communitiySocial NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

10/73

(8/11)

Social network: a matter of scaleSocial NetworkModelModel informationMining

I 31% of world population connected on social network

I Facebook: 1,8 billions of users/month - 17.9 billions of $

I Qzone: 653 millions of users/month

I Instagram: 600 millions of users/month

I Twitter: 317 millions of users/month

I LinkedIn: 106 millions of users/month

I Snapchat: 150 millions of users/day

An introduction to social network challenges, A. Martin - 22/01/18

11/73

(9/11)

Social network: main challengesSocial NetworkModelModel informationMining

I economical challenges:games, publicities, business image, marketing (viralmarketing), etc.

I political challenges:social influence, e.g. Jasmin revolution, Obama elections,Trump tweets, etc.

I social challenges:share knowledge: all information at any time, communication(to find a job, a partner, etc.), etc.

An introduction to social network challenges, A. Martin - 22/01/18

12/73

(10/11)

Social network: main scientific challengesSocial NetworkModelModel informationMining

I big data management:How to access to the data? How to make requests on thedata? How to reduce complexity of processes?, etc.

I social mining:How to extract information from the data? How tocharacterize the data?, etc.

I privacy and security:How to protect people data? How to assure the security ofpeople?, etc.

An introduction to social network challenges, A. Martin - 22/01/18

13/73

(11/11)

OutlineSocial NetworkModelModel informationMining

1. What is a social network?

2. How to model a social network?

3. How to model information on social networks?

4. How to analyse social network?

An introduction to social network challenges, A. Martin - 22/01/18

14/73

How to model a social network?Social NetworkModelModel informationMining

A graph: GA set (V,E) with V = v1, . . . , vV a set of vertices/nodes andE = e1, . . . , eE a set of edges/linksek ∈ E is couple of (vi, vj).

I |V | = V : order of the graph

I |E| = E number of edges

I vi and vj are neighbour or adjacent if ∃ek ∈ E such asek = (vi, vj)

I N(u) = v ∈ V, (u, v) ∈ E: the neighbourhood of u

I Node degree: d(u) = |N(u)| i.e. the number of edges from u.

I Centrality of a node: d(u)E−1

I Link density: D = 2EV (V−1)

See Ernesto Estrada talk for more features on the graphs...

An introduction to social network challenges, A. Martin - 22/01/18

15/73

(2/16)

How to model a social network?Social NetworkModelModel informationMining

a graph:

an adjacent matrix:

1 2 3 4 5 6123456

1 1 0 0 0 01 1 1 0 1 10 1 1 0 1 00 0 0 1 0 10 1 1 0 1 00 1 0 1 0 1

An introduction to social network challenges, A. Martin - 22/01/18

16/73

(3/16)

How to model a social network?Social NetworkModelModel informationMining

a graph:

a list of adjacent nodes:1: 22: 1, 3, 5, 63: 2, 54: 65: 2,36: 2, 4

An introduction to social network challenges, A. Martin - 22/01/18

17/73

(4/16)

How to model a social network?Social NetworkModelModel informationMining

Challenge: drawing large graphs

An introduction to social network challenges, A. Martin - 22/01/18

18/73

(5/16)

Specificity of social networkSocial NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

19/73

(6/16)

Specificity of social networkSocial NetworkModelModel informationMining

Main social networks are scale-free network and have a degreedistribution given by a power distribution:

P (k) = Ck−γ

P (k) is the proportion of nodes with the degree k, in general2 ≤ γ ≤ 3 C a constant. The density of a graph depend on theapplication domain (Melancon, 2006)

Flikr social online network (Scholz 2015)An introduction to social network challenges, A. Martin - 22/01/18

20/73

(7/16)

Specificity: Milgram paradoxSocial NetworkModelModel informationMining

(Milgram 1967): In average, the number of links between twopersons (nodes) is small (around 6).(Facebook, 2011): Each person is linked to other by 4.74 relations(in average).

An introduction to social network challenges, A. Martin - 22/01/18

21/73

(8/16)

Specificity: Small world networkSocial NetworkModelModel informationMining

In social networks:

I The number of neighbours for a given node is approximatelythe same than the number of neighbours of its neighbours

I The distance L between two randomly chosen nodes is givenby:

L ' lnE

An introduction to social network challenges, A. Martin - 22/01/18

22/73

(9/16)

Distance on graphSocial NetworkModelModel informationMining

I geodesic distance: between two vertices is the shortest path(number of edges)

I eccentricity: ε(u) is the greatest geodesic distance between uand another vertex

I radius: minu∈V

= ε(u)

I graph diameter: maxu∈V

= ε(u)

Problem: detection of cycles - NP-hard algorithms

I intermediary centrality of a node:

IC(u) =∑

s 6=u,t6=u,s 6=t

σst(u)

σst

σst: number of shortest paths between s and t,σst(u): number of shortest paths between s and t passing by u

An introduction to social network challenges, A. Martin - 22/01/18

23/73

(10/16)

How to model a social network?Social NetworkModelModel informationMining

a directed graph: (e.g. followers in Tweeter)

an adjacent matrix:

1 2 3 4 5 6123456

0 1 0 0 0 00 0 0 0 1 00 1 0 0 0 00 0 0 0 0 00 0 1 0 0 00 1 0 1 0 0

An introduction to social network challenges, A. Martin - 22/01/18

24/73

(11/16)

What is a community on a network?Social NetworkModelModel informationMining

(Fortunato, 2010) some properties for a community:

I Two neighbours in a same community are approximately thesame

I Two neighbours in a same community must be near

I The nodes of a community have a high average degree

I A community contains a high proportion of triplets (highclustering coefficient)

I A community has a large embeddedness (ratio on internal andexternal degree)

An introduction to social network challenges, A. Martin - 22/01/18

25/73

(12/16)

What is a community on a network?Social NetworkModelModel informationMining

(Fortunato, 2010) some properties for a community:

I Two neighbours in a same community are approximately thesame

I Two neighbours in a same community must be near

I The nodes of a community have a high average degree

I A community contains a high proportion of triplets (highclustering coefficient)

C(u) =2|eij = (vi, vj) ∈ E : vi, vj ∈ N(u)|

|N(u)(N(u)− 1)|

C(G) =1

V

∑u ∈V

C(u)

I A community has a large embeddedness (ratio on internal andexternal degree)

An introduction to social network challenges, A. Martin - 22/01/18

25/73

(12/16)

What is a community on a network?Social NetworkModelModel informationMining

(Fortunato, 2010) some properties for a community:I Two neighbours in a same community are approximately the

sameI Two neighbours in a same community must be nearI The nodes of a community have a high average degreeI A community contains a high proportion of triplets (high

clustering coefficient)I A community has a large embeddedness (ratio on internal and

external degree)For a given sub-graph Gc of G, A it adjacent matrix, u ∈ Gc:

kintu =∑j∈Gc

Auj

kextu =∑j /∈Gc

Auj

An introduction to social network challenges, A. Martin - 22/01/18

25/73

(12/16)

What is a community on a network?Social NetworkModelModel informationMining

(Fortunato, 2010) some properties for a community:

I Two neighbours in a same community are approximately thesame

I Two neighbours in a same community must be near

I The nodes of a community have a high average degree

I A community contains a high proportion of triplets (highclustering coefficient)

I A community has a large embeddedness (ratio on internal andexternal degree)For a given sub-graph Gc of G, A it adjacent matrix, u ∈ Gc:

ξu =kintu

kintu + kextu

An introduction to social network challenges, A. Martin - 22/01/18

25/73

(12/16)

What is a community on a network?Social NetworkModelModel informationMining

(Fortunato, 2010) some properties for a community:

I Two neighbours in a same community are approximately thesame

I Two neighbours in a same community must be near

I The nodes of a community have a high average degree

I A community contains a high proportion of triplets (highclustering coefficient)

I A community has a large embeddedness (ratio on internal andexternal degree)

Challenge: Give a definition of a community on a social network

See Mauro Sozio and Florence Sedes talks...

An introduction to social network challenges, A. Martin - 22/01/18

25/73

(12/16)

Challenges: classical problemsSocial NetworkModelModel informationMining

I Travelling salesman problem: Find the shortest way to visitgiven nodes only one time (NP-hard) - equivalent to vehiclerouting problem

I Graph labelling and colouring: give a label to all nodes (orlinks) (NP-hard)

I Maximum flow: in flow network (valued directed graph) findthe largest possible total flow

I Large graph compression

I Maximal clique enumeration (NP-hard)

I Independent set problem: find the largest possibleindependent set (set of vertices with no two of which areadjacent) (NP-hard)

Most of problem on graph are equivalent to NP-hard optimisationproblems. Some approximation algorithms are developed.

An introduction to social network challenges, A. Martin - 22/01/18

26/73

(13/16)

Challenge: realistic social network generationSocial NetworkModelModel informationMining

For different communities social network:Lancichinetti-Fortunato-Radicchi LFR benchmark: based on powerlaw distribution, need:

I number of nodes

I minimum and maximum for the community sizes

I average, maximum degree

I etc

defines:

I number of edges

I number of communities

An introduction to social network challenges, A. Martin - 22/01/18

27/73

(14/16)

Challenge: realistic social network generationSocial NetworkModelModel informationMining

Zachary’s Karate club network LFR generation

An introduction to social network challenges, A. Martin - 22/01/18

28/73

(15/16)

Challenge: realistic social network generationSocial NetworkModelModel informationMining

(Largeron, et al, 2015), see Christine Largeron talk...

I Local preferential attachment: new link between vertices withhigh degree

I Small world

I Community structure: vertices are connected to vertices in asame group compared to other group (large embeddedness)

I Community homogeneity: similarity of vertices in a samegroup

I Homophily: vertices in a same group are more similar thanwith the other groups

allows:

I dynamical generation of social networks

I fix the number of vertices

I fix the number of communities

An introduction to social network challenges, A. Martin - 22/01/18

29/73

(16/16)

OutlineSocial NetworkModelModel informationMining

1. What is a social network?

2. How to model a social network?

3. How to model information on social networks?

4. How to analyse social network?

An introduction to social network challenges, A. Martin - 22/01/18

30/73

How to model information on social network?Social NetworkModelModel informationMining

On social network some information can be considered:

I information on the links: LinkedIn, etc.

I information on the nodes: Facebook, LinkedIn, etc.

I information (message) throw the network: Tweeter,collaborative platforms, etc.

An introduction to social network challenges, A. Martin - 22/01/18

31/73

(1/20)

Valued graphsSocial NetworkModelModel informationMining

G = (V,E,w) where w : e ∈ E −→ X

an adjacent matrix:

1 2 3 4 5 6123456

0 w12 0 0 0 0w12 0 w23 0 w25 w26

0 w23 0 0 w35 00 0 0 0 0 w46

0 w25 w35 0 0 00 w26 0 w46 0 0

An introduction to social network challenges, A. Martin - 22/01/18

32/73

(2/20)

Valued graphsSocial NetworkModelModel informationMining

G = (V,E, p) where p : e ∈ E −→ X

an adjacent matrix:

1 2 3 4 5 6123456

0 p12 0 0 0 0p12 0 p23 0 p25 p26

0 p23 0 0 p35 00 0 0 0 0 p46

0 p25 p35 0 0 00 p26 0 p46 0 0

An introduction to social network challenges, A. Martin - 22/01/18

33/73

(3/20)

p12(friend) = 0.8p12(family) = 0.15p12(colleague) = 0.05

Valued graphsSocial NetworkModelModel informationMining

G = (V,E,m) where m : e ∈ E −→ X

an adjacent matrix:

1 2 3 4 5 6123456

0 m12 0 0 0 0m12 0 m23 0 m25 m26

0 m23 0 0 m35 00 0 0 0 0 m46

0 m25 m35 0 0 00 m26 0 m46 0 0

An introduction to social network challenges, A. Martin - 22/01/18

34/73

(4/20)

Veracity of informationDoubtReliability

Limits of the theory of probabilitiesSocial NetworkModelModel informationMining

A probability is a positive and additive measure, p is defined on aσ-algebra of Ω = ω1, ω2, . . . , ωn and takes values in [0,1].

It verifies: p(∅) = 0, p(Ω) = 1,∑X∈Ω

p(X) = 1

I Difficulties to model the absence of knowledge (ex: Sirius)

I Constraint on the classes (exhaustive and exclusive)

I Constraint on the measures (additivity)

If one symptom f (for fiver) is always true when a patient get aillness A (flu) (p(f |A) = 1), and if we observe this symptom f ,then the probability of the patient having A increases (becausep(A|f) = p(A)/p(f) so p(A|f) ≥ p(A)).The additivity constraint require then that the probability of thepatient having not A decreases: p(A|f) = 1− p(A|f) sop(A|f) ≤ p(A) While there is no reason if the symptom f can bealso observe in some other diseases.

An introduction to social network challenges, A. Martin - 22/01/18

35/73

Belief (5/20)

Bases on Belief functionsSocial NetworkModelModel informationMining

I Use of functions defined on sub-sets instead of singletons suchas probabilities

I Discernment frame: Ω = ω1, . . . , ωn, with ωi are exclusiveand exhaustive classes

I Power set: 2Ω = ∅, ω1, ω2, ω1 ∪ ω2, . . . ,Ω.I Several functions in one to one correspondence model

uncertainty and imprecision: mass functions, belief functions,plausilibity functions

I Extension of 2Ω to DΩ, hyper power set in order to model theconflicts

I DΩ closed set by union and intersectionoperators

I DΩr : reduced set with constraints

(ω2 ∩ ω3 ≡ ∅)

An introduction to social network challenges, A. Martin - 22/01/18

36/73

Belief (6/20)

Mass functionsSocial NetworkModelModel informationMining

I The basic belief functions (bba or mass functions) are definedon 2Ω and take values in [0, 1]

I Normalisation condition:∑X∈2Ω

m(X) = 1

I A focal element is an element X of 2Ω such as m(X) > 0

I Closed world: m(∅) = 0

I We note mj the mass function of the source Sj

An introduction to social network challenges, A. Martin - 22/01/18

37/73

Belief (7/20)

Mass functionsSocial NetworkModelModel informationMining

Special cases:

I If only focal elements are ωi then mj is a probability

I mj(Ω) = 1: total ignorance of SjI categorical mass function: mj(X) = 1 (noted mX): Sj has

an imprecise knowledge

I mj(ωi) = 1: Sj has a precise knowledge

I simple mass functions Xw:mj(X) = w and mj(Ω) = 1− w: Sj has an uncertain andimprecise knowledge

An introduction to social network challenges, A. Martin - 22/01/18

38/73

Belief (8/20)

DiscountingSocial NetworkModelModel informationMining

From (Shafer, 1976):mαj (X) = αjmj(X), ∀X ∈ 2Ω

mαj (Ω) = 1− αj(1−mj(Ω))

αj ∈ [0, 1] discounting coefficient can be seen as the reliability ofthe source SjIf αj = 0 the source are completely unreliable, all the mass istransferred on Ω, the total ignorance

An introduction to social network challenges, A. Martin - 22/01/18

39/73

Belief (9/20)

Fusion architectureSocial NetworkModelModel informationMining

s sources S1, S2, ..., Ss that must take a decision on anobservation x in a set of n classes x ∈ Ω = ω1, ω2, . . . , ωnclasses

ω1 . . . ωi . . . ωnS1...Sj...Ss

m1

1(x) . . . m1i (x) . . . m1

n(x)...

. . ....

. . ....

mj1(x) . . . mj

i (x) . . . mjn(x)

.... . .

.... . .

...ms

1(x) . . . msi (x) . . . ms

n(x)

An introduction to social network challenges, A. Martin - 22/01/18

40/73

Belief (10/20)

Conjunctive rulesSocial NetworkModelModel informationMining

I Assume: two cognitively independent and reliable sources S1

and S2.

I The conjunctive rule is given for m1 and m2 bbas of S1 andS2, for all X ∈ 2Ω, with X 6= ∅ by:

mConj(X) =∑

Y1∩Y2=X

m1(Y1)m2(Y2) (1)

∅ ω1 ω2 ω3 Ω

m1 0 0.5 0.1 0 0.4

m2 0 0.2 0 0.5 0.3

m 0.32 0.33 0.03 0.2 0.12

An introduction to social network challenges, A. Martin - 22/01/18

41/73

Belief (11/20)

Dempster’s ruleSocial NetworkModelModel informationMining

I Dempster’s rule:

mD(X) =1

1− κmConj(X) (2)

where κ =∑

A∩B=∅

m1(A)m2(B) is generally called conflict or

global conflict. That is the sum of the partial conflicts.

I That is not a conflict measure.

I Conjunctive rules are not idempotent

An introduction to social network challenges, A. Martin - 22/01/18

42/73

Belief (12/20)

Decision on ΩSocial NetworkModelModel informationMining

I In general the decision is made on Ω and not on 2Ω

I Pessimist: maxω∈Ω bel(ω)I Optimist: maxω∈Ω pl(ω)I Compromise: maxω∈Ω betP (ω)

Pignistic probability:

betP(ω) =∑

Y ∈2Ω,ω∩Y 6=∅

1

|Y |m(Y )

1−m(∅)(3)

An introduction to social network challenges, A. Martin - 22/01/18

43/73

Belief (13/20)

Information on social networkSocial NetworkModelModel informationMining

On social network some information can be considered:

I information on the links: LinkedIn, etc.

I information on the nodes: Facebook, LinkedIn, etc.

I information (message) throw the network: Tweeter,collaborative platforms, etc.

An introduction to social network challenges, A. Martin - 22/01/18

44/73

(14/20)

Node-attributed graphsSocial NetworkModelModel informationMining

G = (V,E, F ) where F : V −→ XF (v) = [f1(v), . . . , fa(v)]

Attributes can be qualitative, quantitative (fuzzy, interval,probabilistic, belief, etc.).see Christine Largeron talk...

An introduction to social network challenges, A. Martin - 22/01/18

45/73

(15/20)

Node and link-attributed graphsSocial NetworkModelModel informationMining

G = (V,E,mu,me) where mu : V −→ X and me : e ∈ E −→ Xmu(v) = [m1(v), . . . ,ma(v)]

(Ben Dhaou, 2014, 2017)

An introduction to social network challenges, A. Martin - 22/01/18

46/73

(16/20)

Information on social networkSocial NetworkModelModel informationMining

On social network some information can be considered:

I information on the links: LinkedIn, etc.

I information on the nodes: Facebook, LinkedIn, etc.

I information (message) throw the network: Tweeter,collaborative platforms, etc.

An introduction to social network challenges, A. Martin - 22/01/18

47/73

(17/20)

Information on social networkSocial NetworkModelModel informationMining

Characteristics of the messages:

I A message is a text (can be short text 140 characters onTwitter)

I That is not in general literature (many typos, errors, etc.)

I A message has an author

I A message can be send to some recipients

I A message has in general a date

I A message can have a label (type of message)

I A message can have an influence on the evolution of thenetwork

An introduction to social network challenges, A. Martin - 22/01/18

48/73

(18/20)

Evolution of information on social networkSocial NetworkModelModel informationMining

Information on

I the existence of a node in the network

I the existence of a link between two nodes

I existence at time t can be model by a probability or a belief

An introduction to social network challenges, A. Martin - 22/01/18

49/73

(19/20)

Challenge: privacy-preserving data miningSocial NetworkModelModel informationMining

How can we protect our personal data?How do not send personal information?What is personal, what is public?

I Cryptography and network security

I Watermarking (Gross-Amblard, 2003)

I Preference elicitation in Personal Information managementSystems (Allard et al., 2017)

See Oana Goga talk.

An introduction to social network challenges, A. Martin - 22/01/18

50/73

(20/20)

OutlineSocial NetworkModelModel informationMining

1. What is a social network?

2. How to model a social network?

3. How to model information on social networks?

4. How to analyse social network?

An introduction to social network challenges, A. Martin - 22/01/18

51/73

Message miningSocial NetworkModelModel informationMining

Challenges:

I Understand the messages

I Characterise emotion in the message

I Characterise the writer by the text (level of expertise, sociallevel, etc.)

I Characterise positive/negative/neutral message

I Detect fake news

I Detect new topics, interest centres, etc.

Methods: coming from text mining must be lingual independent,robust to the form of the message, time dependent, etc.

An introduction to social network challenges, A. Martin - 22/01/18

52/73

Message (1/16)

Person identificationSocial NetworkModelModel informationMining

Challenges:

I Find criminals on a social network

I Find influencers for viral marketing

I Find spammers on participating platforms

I Find experts on participating platforms

I etc.

An introduction to social network challenges, A. Martin - 22/01/18

53/73

Person (2/16)

Expert identification in stackoverflowSocial NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

54/73

Person (3/16)

Expert identification in stackoverflowSocial NetworkModelModel informationMining

(Attiaoui, et al. 2017)

An introduction to social network challenges, A. Martin - 22/01/18

55/73

Person (4/16)

Expert identification in stackoverflowSocial NetworkModelModel informationMining

Evolution of the percentage of each class over 15 months.

Data set: 37 Go, 2 Million users, 2.5 Million answers, 1.7 Millionquestions, Data from December 2013 to March 2015

An introduction to social network challenges, A. Martin - 22/01/18

56/73

Person (5/16)

Influencers identificationSocial NetworkModelModel informationMining

Problem: Given a social network, find a set of influencers that areable to trigger a large cascade.

An introduction to social network challenges, A. Martin - 22/01/18

57/73

Person (6/16)

Influencers identificationSocial NetworkModelModel informationMining

Problem: Given a social network, find a set of influencers that areable to trigger a large cascade.

An introduction to social network challenges, A. Martin - 22/01/18

57/73

Person (6/16)

Influencers identificationSocial NetworkModelModel informationMining

Problem: Given a social network, find a set of influencers that areable to trigger a large cascade.

An introduction to social network challenges, A. Martin - 22/01/18

57/73

Person (6/16)

Influencers identificationSocial NetworkModelModel informationMining

Solution: Influencers on Twitter (Jendoubi, et al, 2016, 2017)I Define an influence measure based on belief functions by:

I Ω = I, P I for influencer, P for passiveI Calculate belief weights on each edge (u, v)I Integrate opinion of tweetI Combine the mass functions

I Compute influence maximisation by CELF algorithm(Leskovec et al. 2007)

An introduction to social network challenges, A. Martin - 22/01/18

58/73

Person (7/16)

Influencers identificationSocial NetworkModelModel informationMining

Solution: Influencers on Twitter (Jendoubi, et al, 2016, 2017)I Define an influence measure based on belief functions by:

I Ω = I, P I for influencer, P for passiveI Calculate belief weights on each edge (u, v)

from numbers of common neighbours, number of tweets whereu mentions v, number of tweets where v retweets from u

I Integrate opinion of tweetI Combine the mass functions

I Compute influence maximisation by CELF algorithm(Leskovec et al. 2007)

An introduction to social network challenges, A. Martin - 22/01/18

58/73

Person (7/16)

Influencers identificationSocial NetworkModelModel informationMining

Solution: Influencers on Twitter (Jendoubi, et al, 2016, 2017)I Define an influence measure based on belief functions by:

I Ω = I, P I for influencer, P for passiveI Calculate belief weights on each edge (u, v)I Integrate opinion of tweet

I Give a label to each word in the tweet using Stanford POSTagger with the model GATE Twitter part-of-speech tagger,

I Use the SentiWordNet dictionary to get the polarity of eachword in the tweet

I Build a belief function on Θ = Pos,Neg,NeutI Combine the mass functions

I Compute influence maximisation by CELF algorithm(Leskovec et al. 2007)

An introduction to social network challenges, A. Martin - 22/01/18

58/73

Person (7/16)

Community detectionSocial NetworkModelModel informationMining

Define first type of communities expected:

I Hard communities: each node v belongs to one and only onecommunity in Ω = C1, . . . , Cn

µvk = 1 if v ∈ Ckµvk = 0 otherwise

I Fuzzy communities: each node v has a degree of membership

µvk ∈ [0, 1] to each community withn∑k=1

µvk = 1

An introduction to social network challenges, A. Martin - 22/01/18

59/73

Community (8/16)

Community detectionSocial NetworkModelModel informationMining

Define first type of communities expected:

I Possibilistic communities: the conditionn∑k=1

µvk = 1 is

relaxed. µvk can be interpreted as a degree of possibility thata node v belongs to the community Ck

I Rough communities: the membership of node v to communityCk is described by a pair (µ

vk, µvk) ∈ 0, 12 indicating its

membership to the lower and upper approximations ofcommunity Ck

I Belief communities: the membership of each node v isdescribed by a belief function mv over Ω.

An introduction to social network challenges, A. Martin - 22/01/18

60/73

Community (9/16)

Community detectionSocial NetworkModelModel informationMining

An introduction to social network challenges, A. Martin - 22/01/18

61/73

Community (10/16)

Community detectionSocial NetworkModelModel informationMining

Define first type of communities expected:

I Hard communities: each node v belongs to one and only onecommunity in Ω = C1, . . . , Cn

I Overlapped communities: each node v belongs to more thanone community in Ω, C1, . . . , Cn are not exclusive

With belief functions, work on DΩ, hyper power set in orderto model the overlapped communities:

I DΩ closed set by union and intersection operatorsI DΩ

r : reduced set with constraints (C2 ∩ C3 ≡ ∅)

See Remy Cazabet talk...

An introduction to social network challenges, A. Martin - 22/01/18

62/73

Community (11/16)

Community detectionSocial NetworkModelModel informationMining

Methods: Depend on information in input and expected in output

1 Selection: can be from databases by requests, or by scanningthe web

An introduction to social network challenges, A. Martin - 22/01/18

63/73

Community (12/16)

Community detectionSocial NetworkModelModel informationMining

Methods: Depend on information in input and expected in output

2 Preprocessing: depend on the data, transform the data ingraph, list of adjacent nodes, belief functions information, etc.

An introduction to social network challenges, A. Martin - 22/01/18

63/73

Community (12/16)

Community detectionSocial NetworkModelModel informationMining

Methods: Depend on information in input and expected in output

3 Transformation: Calculate extracted feature (by supervised orunsupervised methods)

An introduction to social network challenges, A. Martin - 22/01/18

63/73

Community (12/16)

Community detectionSocial NetworkModelModel informationMining

Methods: Depend on information in input and expected in output

4 Data Mining: Classify the data (by supervised or unsupervisedmethods)

An introduction to social network challenges, A. Martin - 22/01/18

63/73

Community (12/16)

Community detectionSocial NetworkModelModel informationMining

Methods: Depend on information in input and expected in output

5 Evaluation: Calculate some measures on the obtained patterns

An introduction to social network challenges, A. Martin - 22/01/18

63/73

Community (12/16)

Community detectionSocial NetworkModelModel informationMining

Characterisation of classical clustering methods (a challenge):

1. hierarchical methods by division or agglomeration buildpartitionsExamples: Louvain algorithm, spectral approaches, etc.

2. partitioning methods:Examples: C-means, Fuzzy C-means, Evidential C-means(Zhou et al., 2015)

3. Label propagation methods

Need

I a distance (or similarity) on data (structure of the graph andinformation on the graph)

I an optimisation process

An introduction to social network challenges, A. Martin - 22/01/18

64/73

Community (13/16)

Community detection: SELPSocial NetworkModelModel informationMining

Initialize the bba of each node in the network

Select a node nx in VU, find its rx direct neighbors and construct rx bbas

Calculate the fused bba of node nx

Output the bba of each node

Input the labeled nodes in the graph

the maximum of mass assignment of nx is larger than a given threshold

Yes

No

move node nx from set VU to set VL

There is no node in VU

Yes

No

Semi-supervised Evidential Label Propagation algorithm (Zhou et al., 2018)

An introduction to social network challenges, A. Martin - 22/01/18

65/73

Community (14/16)

Community detection: SELPSocial NetworkModelModel informationMining

Example on Karate Club network

1

2

3

4

5

6

78

9

10

11

12

13

14

15

16

17

18

19

20

2122

23

24

25

26

27

28

29

30

31

32

33

34

Initialization

Labeled data in ω1Labeled data in ω2Labeled as noisy dataunlabeled data

An introduction to social network challenges, A. Martin - 22/01/18

66/73

Community (15/16)

Community detection: SELPSocial NetworkModelModel informationMining

Example on Karate Club network

1

2

3

4

5

6

78

9

10

11

12

13

14

15

16

17

18

19

20

2122

23

24

25

26

27

28

29

30

31

32

33

34

Iteration 1

Labeled data in ω1Labeled data in ω2Labeled as noisy dataunlabeled data

An introduction to social network challenges, A. Martin - 22/01/18

66/73

Community (15/16)

Community detection: SELPSocial NetworkModelModel informationMining

Example on Karate Club network

1

2

3

4

5

6

78

9

10

11

12

13

14

15

16

17

18

19

20

2122

23

24

25

26

27

28

29

30

31

32

33

34

Iteration 2

Labeled data in ω1Labeled data in ω2Labeled as noisy dataunlabeled data

An introduction to social network challenges, A. Martin - 22/01/18

66/73

Community (15/16)

Community detection: SELPSocial NetworkModelModel informationMining

Example on Karate Club network

1

2

3

4

5

6

78

9

10

11

12

13

14

15

16

17

18

19

20

2122

23

24

25

26

27

28

29

30

31

32

33

34

Iteration 3

Labeled data in ω1Labeled data in ω2Labeled as noisy dataunlabeled data

An introduction to social network challenges, A. Martin - 22/01/18

66/73

Community (15/16)

Community detection: SELPSocial NetworkModelModel informationMining

Example on Karate Club network

1

2

3

4

5

6

78

9

10

11

12

13

14

15

16

17

18

19

20

2122

23

24

25

26

27

28

29

30

31

32

33

34

Iteration 4

Labeled data in ω1Labeled data in ω2Labeled as noisy dataunlabeled data

An introduction to social network challenges, A. Martin - 22/01/18

66/73

Community (15/16)

Community detection: SELPSocial NetworkModelModel informationMining

Example on Karate Club network

1

2

3

4

5

6

78

9

10

11

12

13

14

15

16

17

18

19

20

2122

23

24

25

26

27

28

29

30

31

32

33

34

Iteration 5

Labeled data in ω1Labeled data in ω2Labeled as noisy dataunlabeled data

An introduction to social network challenges, A. Martin - 22/01/18

66/73

Community (15/16)

Community detectionSocial NetworkModelModel informationMining

Challenges:

I How to learn information on graphs?

I How many communities? A difficult problem in clustering ingeneral

I How to combine methods? Methods of information fusion canbe used

I How to well consider the dynamic aspect of social network?

I How to reduce the time consuming of algorithms? Somealgorithms can be parallelised

I How to evaluate the obtained communities? A difficultproblem in clustering, more difficult when we don’t know whatis a community.

I etc.

An introduction to social network challenges, A. Martin - 22/01/18

67/73

Community (16/16)

To endSocial NetworkModelModel informationMining

Many challenges around social networks

I We don’t know exactly what is a social network

I We are not sure of given information on social network(veracity, precision, existence, etc.)

I We don’t know exactly what is a community

I We have a lot of information

I Almost all our problems need a NP-hard algorithm

Next presentations during these two days will give you someanswers.

An introduction to social network challenges, A. Martin - 22/01/18

68/73

To endSocial NetworkModelModel informationMining

Many challenges around social networks

I We don’t know exactly what is a social network

I We are not sure of given information on social network(veracity, precision, existence, etc.)

I We don’t know exactly what is a community

I We have a lot of information

I Almost all our problems need a NP-hard algorithm

Next presentations during these two days will give you someanswers.

My proposal: use the theory of belief functions in order to wellmodel uncertainty and imprecision of information

An introduction to social network challenges, A. Martin - 22/01/18

68/73

ReferencesSocial NetworkModelModel informationMining

I Stanford POSTagger:http://nlp.stanford.edu/software/tagger.shtml

I GATE Twitter part-of-speech tagger:https://gate.ac.uk/wiki/twitter-postagger.html

I SentiWordNet: http://sentiwordnet.isti.cnr.it/I Santo Fortunato, Community detection in graphs. Physics

Reports, 486(3):75-174, 2010I Santo Fortunato, Darko Hric, Community detection in

networks: A user guide, Physics Reports, 659, pp 1-44, 2016I Guy Melancon, Just how dense are dense graphs in the real

world?: a methodological note. In Proceedings of the 2006AVI workshop on BEyond time and errors: novel evaluationmethods for information visualization, pp 1-7. ACM, 2006

I C. Largeron, P.N. Mougel, R. Rabbany, O.R. Zaiane,Generating attributed networks with communities, PloS one10(4), 2015

An introduction to social network challenges, A. Martin - 22/01/18

69/73

ReferencesSocial NetworkModelModel informationMining

I David Gross-Amblard, Query-preserving watermarking ofrelational databases and XML documents, Proceedings of thetwenty-second ACM SIGMOD-SIGACT-SIGART symposiumon Principles of database systems, 2003

I Tristan Allard, Tassadit Bouadi, Joris Dugueperoux, VirginieSans, From Self-Data to Self-Preferences: Towards PreferenceElicitation in Personal Information Management Systems,International Workshop on Personal Analytics and Privacy (Inconjunction with ECML PKDD 2017)

I Shafer, G. A mathematical theory of evidence. PrincetonUniversity Press, (1976)

I J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos,Cost-effective outbreak detection in networks, KDD 2007

An introduction to social network challenges, A. Martin - 22/01/18

70/73

Personal references: with belief functionsSocial NetworkModelModel informationMining

I Kuang Zhou, Arnaud Martin, Quan Pan, Zhunga Liu, SELP:Semi-supervised evidential label propagation algorithm forgraph data clustering, International Journal of ApproximateReasoning, Elsevier, 2018, 92, pp.139-154

I Dorra Attiaoui, Arnaud Martin, Boutheina Ben Yaghlane,Belief Temporal Analysis of Expert Users: case study StackOverflow, Big Data Analytics and Knowledge DiscoveryDAWAK, Aug 2017, Lyon, France

I Dorra Attiaoui, Arnaud Martin, Boutheina Ben Yaghlane,Belief Measure of Expertise for Experts Detection in QuestionAnswering Communities: case study Stack Overflow, 21stInternational Conference on Knowledge-Based and IntelligentInformation & Engineering Systems, Sep 2017, Marseille,France

An introduction to social network challenges, A. Martin - 22/01/18

71/73

Personal references: with belief functionsSocial NetworkModelModel informationMining

I Siwar Jendoubi, Arnaud Martin, Ludovic Lietard, Hend BenHadji, Boutheina Ben Yaghlane, Two Evidential Data BasedModels for Influence Maximization in Twitter,Knowledge-Based Systems, 2017

I Salma Ben Dhaou, Kuang Zhou, Mouloud Kharoune, ArnaudMartin, Boutheina Ben Yaghlane, The Advantage ofEvidential Attributes in Social Networks, 20th InternationalConference on Information Fusion, Jul 2017, Xi’an, China

I Kuang Zhou, Arnaud Martin, Quan Pan, Zhun-Ga Liu, Medianevidential c-means algorithm and its application to communitydetection, Knowledge-Based Systems, 2015, 74, pp.69 - 88

I Kuang Zhou, Arnaud Martin, Quan Pan, A similarity-basedcommunity detection method with multiple prototyperepresentation, Physica A: Statistical Mechanics and itsApplications, Elsevier, 2015, pp.519-531

An introduction to social network challenges, A. Martin - 22/01/18

72/73

Personal references: www-druid.irisa.frSocial NetworkModelModel informationMining

I Imen Ouled Dlala, Dorra Attiaoui, Arnaud Martin, BoutheinaBen Yaghlane, Trolls Identification within an UncertainFramework, International Conference on Tools with ArtificialIntelligence - ICTAI, Nov 2014, Limassol, Cyprus

I Salma Ben Dhaou, Mouloud Kharoune, Arnaud Martin,Boutheina Ben Yaghlane, Belief Approach for SocialNetworks, Belief 2014, Oxford, United Kingdom

I Kuang Zhou, Arnaud Martin, Quan Pan, EvidentialCommunities for Complex Networks, 15th InternationalConference on Information Processing and Management ofUncertainty in Knowledge-Based Systems, Jul 2014

An introduction to social network challenges, A. Martin - 22/01/18

73/73

top related