Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry Chettapong Punyachonkool Dr. Arnond Sakworawich The First NIDA Business Analytics and Data Sciences Contest/Conference September 2, 2016
Subscriber Churn Prediction Model using Social Network Analysis
In Telecommunication Industry
Chettapong Punyachonkool
Dr. Arnond Sakworawich
The First NIDA Business Analytics and Data Sciences Contest/ConferenceSeptember 2, 2016
Chettapong PunyachonkoolData Engineer, Business Intelligence Strategy
The Siam Commercial Bank
[email protected]/in/chettapong-punyachonkool
Business Analytic and Research
Applied Statistics, NIDA
Topics
› Social Network Analysis basic concepts› Social Network Analysis with R› Visualizing Social Network› Using SNA to predict Subscriber Churn in
Telco
Social Network Analysis Basic Concepts
Social Network
Social Network: A social structure composed of individuals (or organizations) interconnected by one or more specific types of interdependencies such as friendship, kinship, financial exchanges, communication exchanges, etc.
Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu
Social Network Analysis
Social Network Analysis: The application of graph theory to understand, categorize and quantify relationships in a social network.
Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu
Why should you care about SNA? Traditional marketing practices are becoming obsolete.
• Test and control group methodologies no longer work as intended. • Information exchange between individuals within
an online social network is extremely high. • Difficult to keep control group “pure”.
• Need to understand behaviour across and within communities rather than focusing just on individuals.
• Leverage (and protect against) high velocity of information exchange within on-line social networks.
Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu
Why should you care about SNA?
Customer are sceptical: if you want to sell your products to your customers, convince their friends.
Use social network analysis to understand more about your customers and their communities.
Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu
Customer with the Role of an Influencer
• Influential user adopts a product or behaviour.• Influential user tells (and influences) his or her
immediate contacts within the community.• These immediate contacts tell their contacts.
It is important...
• To identify these people.• To influence these people.• To monitor the behaviour of these people.
Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu
Social Network Analysis Application
Source: http://www.martingrandjean.ch/connected-world-air-traffic-network/
Social Network Analysis Application
Source: http://www.martingrandjean.ch/connected-world-air-traffic-network/
Social Network Analysis Application
https://linkurious.icij.org/widget/4c040978
Social Network Analysis Application
Source: Apichart Wisitkitchakarn (2013), Risk Analysis of East Asian Stock Markets,The Capital Market Research Institute, The Stock Exchange of Thailand.
Social Network Analysis Application
Source: Valdis Krebs (2001). Connecting the Dots. Tracking Two Identified Terroristshttp://orgnet.com/tnet.html
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Social Network Analysis (SNA)including a tutorial on concepts and methods
Social Media – Dr. Giorgos Cheliotis ([email protected])Communications and New Media, National University of Singapore
Practical applications
Newman et al, 2006
Newman et al, 2006
A very early example of network analysis comes from the city of Königsberg (now Kaliningrad). Famous mathematician Leonard Euler used a graph to prove that there is no path that crosses each of the city’s bridges only once (Newman et al, 2006).
SNA has its origins in both social science and in the broader fields of network analysis and graph theory
Network analysis concerns itself with the formulation and solution of problems that have a network structure; such structure is usually captured in a graph (see the circled structure to the right)
Graph theory provides a set of abstract concepts and methods for the analysis of graphs. These, in combination with other analytical tools and with methods developed specifically for the visualization and analysis of social (and other) networks, form the basis of what we call SNA methods.
But SNA is not just a methodology; it is a unique perspective on how society functions. Instead of focusing on individuals and their attributes, or on macroscopic social structures, it centers on relationsbetween individuals, groups, or social institutions
Basic Concepts
} Networks} Tie Strength} Key Players} Cohesion
How to represent various social networks
How to identify strong/weak ties in the network
How to identify key/central nodes in network
Measures of overall network structure
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Representing relations as networks
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
12
34
1 2 3 4
Graph
Anne JimMary
John
Can we study their interactions as a
network?
Communication
Anne: Jim, tell the Murrays they’re invited
Jim: Mary, you and your dad should come for dinner!
Jim: Mr. Murray, you should both come for dinner
Anne: Mary, did Jim tell you about the dinner? You must come.
John: Mary, are you hungry?
…
Network terminology
12
34
Graph
Vertex Edge
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
points lines
vertices edges, arcs math
nodes links computer science
sites bonds physics
actors ties, relations sociology
Entering data on a directed graph
12
34
Graph (directed)
Vertex Vertex
1 2
1 3
2 3
2 4
3 4
Edge list
Vertex 1 2 3 4
1 - 1 1 0
2 0 - 1 1
3 0 0 - 0
4 0 0 1 -
Adjacency matrix
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Entering data on a directed graph
12
34
Graph (directed)
Vertex Vertex
1 2
1 3
2 3
2 4
3 4
Edge list
Vertex 1 2 3 4
1 - 1 1 0
2 0 - 1 1
3 0 0 - 0
4 0 0 1 -
Adjacency matrix
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Representing an undirected graph
Vertex Vertex
1 2
1 3
2 3
2 4
3 4
Edge list remains the same
Vertex 1 2 3 4
1 - 1 1 0
2 1 - 1 1
3 1 1 - 1
4 0 1 1 -
Adjacency matrix becomes symmetric
12
34
Graph (undirected)
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Basic Concepts
Networks} Tie Strength
Key PlayersCohesion
How to represent various social networks
How to identify strong/weak ties in the network
How to identify key/central nodes in network
Measures of overall network structure
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Adding weights to edges
Vertex Vertex Weight
1 2 30
1 3 5
2 3 22
2 4 2
3 4 37
Edge list: add column of weights
Vertex 1 2 3 4
1 - 30 5 0
2 30 - 22 2
3 5 22 - 37
4 0 2 37 -
Adjacency matrix: add weights instead of 1
12
34
30
2
37
225
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Adding weights to edges
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
1 2 3 4
Anne JimMary
John
Communication
Anne: Jim, tell the Murrays they’re invited
Jim: Mary, you and your dad should come for dinner!
Jim: Mr. Murray, you should both come for dinner
Anne: Mary, did Jim tell you about the dinner? You must come.
John: Mary, are you hungry?
…
12
34
30
37
225
Graph (undirected)add weights
Edge weights as relationship strength
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
} Edges can represent interactions, flows of information or goods, similarities/affiliations, or social relations
} Specifically for social relations, a ‘proxy’ for the strength of a tie can be:(a) the frequency of interaction (communication)
or the amount of flow (exchange)(b) reciprocity in interaction or flow(c) the type of interaction or flow between the
two parties (e.g., intimate or not)(d) other attributes of the nodes or ties (e.g., kin
relationships)(e) The structure of the nodes’ neighborhood (e.g.
many mutual ‘friends’)
} Surveys and interviews allows us to establish the existence of mutual or one-sided strength/affection with greater certainty, but proxies above are also useful
Basic Concepts
NetworksTie Strength
} Key PlayersCohesion
How to represent various social networks
How to identify strong/weak ties in the network
How to identify key/central nodes in network
Measures of overall network structure
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Interpretation of measures
} Degree
} Betweenness
} Closeness
} Eigenvector
How many people can this person reach directly?
How likely is this person to be the most direct route between two people in the network?
How fast can this person reach everyone in the network?
How well is this person connected to other well-connected people?
Centrality measure Interpretation in social networks
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Degree centrality
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
} A node’s (in-) or (out-)degree is the number of links that lead into or out of the node
} In an undirected graph they are of course identical
} Often used as measure of a node’s degree of connectedness and hence also influence and/or popularity
} Useful in assessing which nodes are central with respect to spreading information and influencing others in their immediate ‘neighborhood’
12
3
45
67
2
3
4
14
11
Nodes 3 and 5 have the highest degree (4)
Hypothetical graph
Betweenness centrality
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
} For a given node v, calculate the number of shortest paths between nodes i and j that pass through v, and divide by all shortest paths between nodes i and j
} Sum the above values for all node pairs i,j
} Sometimes normalized such that the highest value is 1or that the sum of all betweenness centralities in the network is 1
} Shows which nodes are more likely to be in communication paths between other nodes
} Also useful in determining points where the network would break apart (think who would be cut off if nodes 3 or 5 would disappear)
12
3
45
67
0
1.5
6.5
09
00
Node 5 has higher betweenness centrality than 3
Closeness centrality
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
} Calculate the mean length of all shortest paths from a node to all other nodes in the network (i.e. how many hops on average it takes to reach every other node)
} Take the reciprocal of the above value so that higher values are ‘better’ (indicate higher closeness) like in other measures of centrality
} It is a measure of reach, i.e. the speed with which information can reach other nodes from a given starting node
12
3
45
67
0.5
0.67
0.75
0.460.75
0.460.46
Nodes 3 and 5 have the highest (i.e. best) closeness, while node 2 fares almost as well
Note: Sometimes closeness is calculated without taking the reciprocal of the mean shortest path length. Then lower values are ‘better’.
Eigenvector centrality
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
} A node’s eigenvector centrality is proportional to the sum of the eigenvector centralities of all nodes directly connected to it
} In other words, a node with a high eigenvector centrality is connected to other nodes with high eigenvector centrality
} This is similar to how Google ranks web pages: links from highly linked-to pages count more
} Useful in determining who is connected to the most connected nodes
12
3
45
67
0.36
0.49
0.54
0.190.49
0.170.17
Node 3 has the highest eigenvector centrality, closely followed by 2 and 5 Note: The term ‘eigenvector’ comes from mathematics (matrix algebra),
but it is not necessary for understanding how to interpret this measure
Network centralities1
2
3
45
67
0
1.5
6.5
09
00
12
3
45
67
2
3
4
14
11
12
3
45
67
0.5
0.67
0.75
0.460.75
0.460.46
12
3
45
67
0.36
0.49
0.54
0.190.49
0.170.17
Degree centrality Betweenness centrality
Closeness centrality Eigenvector centrality
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Closeness centrality
Source: www.activenetwork.net
Betweenness centrality
Source: www.activenetwork.net
Eigenvector centrality
Source: www.activenetwork.net
Interpretation of measures
Centrality Descripting Value meaningDegree บอกระดบการไหลผานของขอมลของ node - คานอย ขอมลไหนผานนอย
คามาก ขอมลไหนผานมากCloseness - บอกเปนศนยกลางภายของ node ใน
เครอขาย
- นามาใชวดอตราเรวในการแพรกระจาย
ขอมลจาก node ดงกลาวไปในเครอขาย
- คานอย สาคญมาก คามาก สาคญนอย- คานอย กระจายเรว คามาก กระจายชา
Betweenness บอกถงระดบความสาคญหรอระดบบทบาท
ภายในเครอขาย
- คานอย สาคญนอย คามาก สาคญมาก
Eigenvector วดระดบอทธพลของ node ภายในเครอขาย - คานอยมอทธพลนอย
คามาก มอทธพลมาก
Basic Concepts
NetworksTie StrengthKey Players
} Cohesion
How to represent various social networks
How to identify strong/weak ties in the network
How to identify key/central nodes in network
How to characterize a network’s structure
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Reciprocity (degree of)
1 2
3 4
} The ratio of the number of relations which are reciprocated (i.e. there is an edge in both directions) over the total number of relations in the network
} …where two vertices are said to be related if there is at least one edge between them
} In the example to the right this would be 2/5=0.4 (whether this is considered high or low depends on the context)
} A useful indicator of the degree of mutuality and reciprocal exchange in a network, which relate to social cohesion
} Only makes sense in directed graphs
Reciprocity for network = 0.4
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Density
12
34
} A network’s density is the ratio of the number of edges in the network over the total number of possible edges between all pairs of nodes (which is n(n-1)/2, where n is the number of vertices, for an undirected graph)
} In the example network to the right density=5/6=0.83 (i.e. it is a fairly dense network; opposite would be a sparse network)
} It is a common measure of how well connected a network is (in other words, how closely knit it is) – a perfectly connected network is called a clique and has density=1
} A directed graph will have half the density of its undirected equivalent, because there are twice as many possible edges, i.e. n(n-1)
} Density is useful in comparing networks against each other, or in doing the same for different regions within a single network
12
34
density = 5/6 = 0.83
density = 5/12 = 0.42
Edge present in network
Possible but not present
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
12
3
45
67
1
0.67
0.33
N/a0.17
N/aN/a
} A node’s clustering coefficient is the number of closed triplets in the node’s neighborhood over the total number of triplets in the neighborhood. It is also known as transitivity.
Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)
Cluster A
Cluster B
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Clustering
closed triplets (CT) total number of triplets (TT)
clustering coefficient = #$$$3
12
3
45
67
1
0.67
0.33
N/a0.17
N/aN/a
} A node’s clustering coefficient is the number of closed triplets in the node’s neighborhood over the total number of triplets in the neighborhood. It is also known as transitivity.
} E.g., node 1 to the right has a value of 1 because it is only connected to 2 and 3, and these nodes are also connected to one another (i.e. the only triplet in the neighborhood of 1 is closed). We say that nodes 1,2, and 3 form a clique.
Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)
Cluster A
Cluster B
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Clustering
clustering coefficient (node1) = 133
= 1
12
3
45
67
1
0.67
0.33
N/a0.17
N/aN/a
} A node’s clustering coefficient is the number of closed triplets in the node’s neighborhood over the total number of triplets in the neighborhood. It is also known as transitivity.
Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)
Cluster A
Cluster B
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Clustering
Kn = Degree of node v
Nn = Number of Link between neighbors of node n
CCn = Clustering coefficient of node n
CCn = 2 ) *+,+ ) (,+ − 1)
CC(node2) = 2(2)3 ) (3 − 1) =
46 = 2. 45
12
3
45
67
0
1.5
6.5
09
00
} Clustering algorithms identify clusters or ‘communities’ within networks based on network structure and specific clustering criteria
} Hierarchical clustering
} Similarity based clustering
} Betweenness clustering (example shown to the right with two clusters is based on edge betweenness, an equivalent for edges of the betweenness centrality presented earlier for nodes)
Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)
Cluster A
Cluster B
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Clustering
Average and longest distance
12
3
45
67
} The longest shortest path (distance) between any two nodes in a network is called the network’s diameter
} The diameter of the network on the right is 3; it is a useful measure of the reach of the network (as opposed to looking only at the total number of vertices or edges)
} It also indicates how long it will take at most to reach any node in the network (sparser networks will generally have greater diameters)
} The average of all shortest paths in a network is also interesting because it indicates how far apart any two nodes will be on average (average distance)
diameter
Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore
Social Network Analysis with R&
Visualizing Social Network
Visualizing Social Network
Katherine Ognyanova, www.kateto.netNetSciX 2016 School of Code Workshop, Wroclaw, Poland
Assistant Professor at the School of Communication and Information at Rutgers University.
Network Analysis and Visualization with R and igraph
Basic SNA with R and igraph
} Networks} Tie Strength} Key Players} Cohesion
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Basic SNA with R and igraph
} Networks} Tie Strength} Key Players} Cohesion
} Create network} Edge, Vertex and Attributes} Read network data from files} Turning networks into igraph
objects
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Create Network› g1 <- graph( edges=c(1,2, 2,3, 3, 1), n=3, directed=F )
› plot(g1)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Create Network› g2 <- graph( edges=c(1,2, 2,3, 3, 1), n=10 )
› plot(g2)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Create Network› g3 <- graph( c("John", "Jim", "Jim", "Jill", "Jill", "John"))
› plot(g3)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Create Network› g4 <- graph( c("John", "Jim", "Jim", "Jack", "Jim", "Jack", "John", "John"), isolates=c("Jesse", "Janis", "Jennifer", "Justin") )
› plot(g4, edge.arrow.size=.5, vertex.color="gold", vertex.size=15, vertex.frame.color="gray", vertex.label.color="black", vertex.label.cex=0.8, vertex.label.dist=2, edge.curved=0.2)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Create Network› plot(graph_from_literal(a---b, b---c))
› plot(graph_from_literal(a--+b, b+--c))
› plot(graph_from_literal(a+-+b, b+-+c))
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Edge, Vertex and Attributes› # The edges of the object
› E(g4)
› # The vertices of the object
› V(g4)
› # The network matrix
› g4[]
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Edge, Vertex and Attributes
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
g4
vertex_attr
name gender
edge_attr
type weight
Read Network from files3.1 DATASET 1: edgelist
› Dataset1-Media-Example-NODES.csv
› Dataset1-Media-Example-EDGES.csv
3.2 DATASET 2: matrix
› Dataset2-Media-User-Example-NODES.csv
› Dataset2-Media-User-Example-EDGES.csv
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Turning networks into igraph objectsWe start by converting the raw data to an igraph network object. Here we use igraph’s graph.data.frame function, which takes two data frames: d and vertices.
d describes the edges of the network. Its first two columns are the IDs of the source and the target node for each edge. The following columns are edge attributes (weight, type, label, or anything else).
vertices starts with a column of node IDs. Any following columns are interpreted as node attributes.
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
NYT
Basic SNA with R and igraph
} Networks} Tie Strength} Key Players} Cohesion
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
} Plotting networks with igraph} Assign weight} Assign type of Tie} Deleted Tie
Plotting networks with igraphWe can set the node & edge options in two ways
› Specify them in the plot()
› Set attributes and add them to the igraph object
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
} Networks centrality with igraph} Degree} Closeness} Betweenness} Eigenvector
Basic SNA with R and igraph
} Networks} Tie Strength} Key Players} Cohesion
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Network centrality with igraph
› Degree centrality› Closeness centrality › Betweenness centrality › Eigenvector centrality
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Network centrality with igraph
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
} Density and Reciprocity} Clustering} Averages & longest distance
Basic SNA with R and igraph
} Networks} Tie Strength} Key Players} Cohesion
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Transitivity with igraph
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Community Clustering based on edge betweenness
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Averages & longest distance
Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)
Using SNA to predict Subscriber churn
in Telecommunication industry
Customer Churn
“Churn represents the loss of an existing customer
to a competitor”
A prevalent problem in:› Telecommunication services› Home mortgage refinance› Credit card
Churn is especially important to mobile phone service providers› Easy for a subscriber to switch services.› Mobile Number Portability (MNP) will remove last important
obstacle
What is Customer Churn ?
The CRM core objectives included:
– Customer Acquisition– Customer Retention
› Churn Perdition– Cross-sell/Up Sell– Maximizing Lifetime Customer Value
Core CRM
§ Segmentation
§ Predictive Analytics
§ Customer Acquisition§Costs of Customer Acquisition. (COCA)
§ Servicing§QoS (Network)§Call Center, Service Hall/Center (Net Promotor Score)
§ Customer Retention§Churn Perdition
§ Customer Lifetime Value (CLV)
Core CRM in Telecommunication
Most Telco define their customer segments using some of the following
› Payment type (prepaid vs. postpaid)
› ARPU (Average Revenue Per User >> revenue generated)
› Tenure (age of user : AOU)
› Demographics (location, income, job, gender, age, … etc.)
› Usage – voice, data, other non-voice, roaming
› Handsets/Devices– 2G,3G, 4G Device, Smartphone vs Feature phone
› Package– Package, Price plans
Segmentation
Process current and historical data in order to make predictions about future events.
› Making customer decisions. › Next Best Offer› Package & Price plan
› Cross-sell & Up-sell opportunities
› Credit scoring for setting dynamic limits (risk management)
› Fraud detection (postpaid only)
› Revenue Allocation
› Customer Lifetime Value
Predictive Analytics
› Quality of Service – Network Utilization– Drop Call
› Call Center, Service Hall– The number of times that Customer contact (complain) via
Call Center/ Service Hall– Service Scoring (Net Promotor Score)
Servicing
› Type of Churn
– Voluntary Churn– Involuntary Churn
• Type of Customer Retention
– Reactive– Proactive
Customer Retention
Source: Mohammed Akram Ayyubi Define - The Churn Reasons Taxonomy
The Churn Taxonomy
Data SourceCustomer Demographic:• Zip code• Income• Occupation• Age• Gender• Living Address• Occupation Address
Order:• Customer Type Crop/SME/Indy• Payment Type (Pre/Post)• Current Package• Package Plan• ARPU• Additional produce/service
Customer Relation:• Number of Questions about the services
from e.g. IVR• Number of Visits to retail shops or online
website• Number of Complaints solved• Number of total complaints
Service Usage: (CDR)• Number of calls• Volume of Data usage• Number of Outgoing calls• Number of Incoming calls• Number of Roaming calls• Number of International calls• Total minutes of usage (MOU)/Volume• Number of Drop calls
Data SourceBilling Data:• Total amount of bill• Total number of barred (one-way barred)• Total number of full barred (two-way
barred)
Network:• Cell Site Location• Network Type ( 4G/3G/2G )• Network Utilization• QoS
Data Mining Functionalities, Techniques & CRM Applications
Churn Modeling Techniques
Subscribers 25MPrepaid Subs
4MPostpaid Subs
21MPrepaid APRU
550 THBPostpaid APRU
160 THB
Source: NBTC (http://ttid.nbtc.go.th/mobile_db.html)
ผใชบรการเครอขายโทรศพทแบบรายเดอนประเภทบคคลทOวไปของผ
ใหบรการเครอขายโทรศพทมอถอแหงหนO งทOมสถานะเปนผยกเลกบรการในชวง
เดอนกรกฎาคมถงเดอนสงคาคม พ.ศ. 2558
July August
Subscribers 1.5 MChurner 8.2 K
1. Subscriber Centrality
– X1: Subscriber Centrality Degree – X2: Subscriber Centrality Closeness – X3: Subscriber Centrality Betweenness– X4: Subscriber Centrality Eigenvector
2. Number of Neighbors
– X5: Number of Churn Neighbors – X6: Number of Non-Churn Neighbors
3. Churn Neighbors Centrality
– X7: Churn Neighbors Centrality Degree– X8: Churn Neighbors Centrality Closeness – X9: Churn Neighbors Centrality Betweenness– X10: Churn Neighbors Centrality Eigenvector
4. Non- Churn Neighbors Centrality
– X11: Non-Churn Neighbors Centrality Degree– X12: Non-Churn Neighbors Centrality Closeness– X13: Non-Churn Neighbors Centrality Betweenness – X14: Non-Churn Neighbors Centrality Eigenvector
Subscriber Centrality
Churn Neighbors Centrality
Non- Churn Neighbors Centrality
Subscriber Churn StatusNumber of Neighbors
adjacency matrix
ChurnerCDR
K-Meanclustering
Cluster 1
Cluster 2
Cluster 3
Cluster selection
Centralitycalculation
ChurnPrediction
12 3
6 5 4
JulyJune
July August
Subscribers 1.5 MChurner 8.2 K
ทาการ summarize ขอมลจานวนนาทในการโทร (Call Duration) และ จานวนครR งในการโทร (Number of Call) ใหอยในรปแบบ per subscriber per call
จะไดจานวน ความสมพนธ (links) = 1,747,835 transections
MO MT CallDuration
Numberof Call
A B 3 1
A C 2 4
A D 16 2
B D 23 9
E D 1 1
แบงกลมโดยใชเทคนค K-mean clustering โดยใชจานวนนาทในการโทร (Call Duration) และจานวนครR งในการโทร (Number of Call) เปนตวแปรในการแบงกลม โดยกาหนดให k = 3 จะได cluster ดงรป
Blue : Cluster 1Yellow : Cluster 2
Red : Cluster 3
1. พจารณาจากจานวน สมาชกในแตละกลม เมOอแบงจานวนกลมมากขR น
NumberofClusterK=2 K=3 K=4 K=5 K=6
#tran
sectionin
eachclusterg
roup 1 1,737,166 1,725,216 1,705,381 1,675,572 1,638,3022 10,669 21,801 38,463 61,628 88,1673 818 3,831 9,388 16,9244 160 1,160 3,7015 87 6646 77
2. พจารณา ratio ของคาInter-cluster distances and Intra-cluster distances เมO อแบงจานวนกลมมากขR น
ทาไมถงเลอก K= 3 ?
Blue : Cluster 1 Yellow : Cluster 2 Red : Cluster 3
Cluster Median of Call Duration
Median of number of Call Group Description
Cluster#1 2.13 2.00 โทรนอย - โทรสRนCluster#2 1,998.03 348.50 โทรนอย - โทรนานCluster#3 215.97 101.00 โทรบอย - โทรส-น
เมO อรวมทRงสอง Cluster#2 และ และ Cluster#3 เขาดวยกน จะได cluster ดงรป
Yellow : Cluster 2Red : Cluster 3
Prepaid Customer Heavy Voice Call on Net
A B C D E
A 0 1 1 1 0
B 1 0 0 1 0
C 1 0 0 0 0
D 1 1 0 0 1
E 0 0 0 1 0
MO MT
A B
A C
A D
B D
E D
adjacency matrix Social Network GraphEdge list
ChurnerNon-churner
ChurnerNon-churner
ChurnerNon-churner
นา adjacency matrix ทOไดจากขRนตอนทO 4 มาคานวณคาระดบความเปนศนยกลางของแตละ Subscriber ทRงหมด 4 คา โดยอางองทฤษฎกราฟ ดงนR
Centrality Descripting Value meaning
Degree บอกระดบการไหลผานของขอมลของ node - คานอย ขอมลไหนผานนอย คามาก ขอมลไหนผานมาก
Closeness - บอกเปนศนยกลางภายของ node ในเครอขาย- นามาใชวดอตราเรวในการแพรกระจายขอมลจาก
node ดงกลาวไปในเครอขาย
- คานอย สาคญมาก คามาก สาคญนอย- คานอย กระจายเรว คามาก กระจายชา
Betweenness บอกถงระดบความสาคญหรอระดบบทบาทภายใน
เครอขาย
- คานอย สาคญนอย คามาก สาคญมาก
Eigenvector วดระดบอทธพลของ node ภายในเครอขาย - คานอยมอทธพลนอย
คามาก มอทธพลมาก
5.1 ทาการคานวนคาความเปนศนยกลางของ Subscriber Centrality ทRงหมด จะไดสวนทO5.2 หลงจากนRนคานวนหาคา Neighbours Centrality ของทกๆ subscriber จะไดสวนทO
2, 3, 4 และ 5 ตามลาดบ
(1) Subscriber Centrality
(4) Churn Neighbors Centrality
(5) Non- Churn Neighbors Centrality
(2) Number of Neighbors A B C D E
A 0 1 1 1 0
B 1 0 0 1 0
C 1 0 0 0 0
D 1 1 0 0 1
E 0 0 0 1 0
adjacency matrixข RนตอนทO 5.1
Subscriber Centrality
ข RนตอนทO 5.2 Neighbors Centrality
คานวณคาความระดบความเปนศนยกลางเพOอสรางตวแปรอสระทRงหมด 14 ตวแปร
1. Subscriber Centrality– X1: Subscriber Centrality Degree – X2: Subscriber Centrality Closeness – X3: Subscriber Centrality Betweenness– X4: Subscriber Centrality Eigenvector
2. Number of Neighbors – X5: Number of Churn Neighbors – X6: Number of Non-Churn Neighbors
คานวณคาความระดบความเปนศนยกลางเพOอสรางตวแปรอสระทRงหมด 14 ตวแปร
3. Churn Neighbors Centrality– X7: Churn Neighbors Centrality Degree– X8: Churn Neighbors Centrality Closeness – X9: Churn Neighbors Centrality Betweenness– X10: Churn Neighbors Centrality Eigenvector
4. Non- Churn Neighbors Centrality– X11: Non-Churn Neighbors Centrality Degree– X12: Non-Churn Neighbors Centrality Closeness– X13: Non-Churn Neighbors Centrality Betweenness – X14: Non-Churn Neighbors Centrality Eigenvector
Principal Component Analysis ตวแปรอสระ X3 –X14 เพOอทา Principal Component Analysis โดยเลอกจานวน factor
ทRงหมด 3 factors เพOอใหสามารถแตละ factor เพOอใหสามารถนยามความหมายของแตละ factor ได อยางชดเจน
Principal Component Analysis จาก PCA สามารถสกดปจจย (factor extraction) ไดดงนR
VariablesFactor1:Churner
NeighborsCentrality
Factor2:Non-Churner
NeighborsCentrality
Factor3:SocialNetworkEigenvalue
X8 ChurnNeighborsCentralityCloseness 1X5 NumberofChurnNeighbors 0.998X9 ChurnNeighborsCentralityBetweenness 0.701X7 ChurnNeighborsCentralityDegree 0.555X6 NumberofNon-ChurnNeighbors 1.013X12 Non-ChurnNeighborsCentralityCloseness 1.013X11 Non-ChurnNeighborsCentralityDegree 0.596X13 Non-ChurnNeighborsCentralityBetweenness 0.315X10 ChurnNeighborsCentralityEigenvector 1.048X4 SubscriberCentralityEigenvector 0.866X14 Non-ChurnNeighborsCentralityEigenvector 0.407
Principal Component Analysis
X1: Subscriber Centrality Degree
X2: Subscriber Centrality Closeness
X3:Subscriber Centrality Betweenness
X8: Churn Neighbors Centrality ClosenessX5: Number of Churn NeighborsX9: Churn Neighbors Centrality BetweennessX7: Churn Neighbors Centrality Degree
X6: Number of Non-Churn NeighborsX12: Non-Churn Neighbors Centrality ClosenessX11: Non-Churn Neighbors Centrality DegreeX13: Non-Churn Neighbors Centrality Betweenness
X10: Churn Neighbors Centrality EigenvectorX4: Subscriber Centrality EigenvectorX14: Non-Churn Neighbors Centrality Eigenvector
PA1: Churn Neighbors Centrality
PA2: Non-Churn Neighbors Centrality
PA3: Social Network Eigenvalue
Y: Subscriber Churn Status
ในขRนตอนนR จะสราง Churn prediction model และ Model evaluation
X3:Subscriber Centrality Betweenness
X1: Subscriber Centrality Degree
PC1: Churn Neighbors CentralityPC2: Non-Churn Neighbors Centrality
X2: Subscriber Centrality Closeness
Churn model evaluation
Train
ChurnModels
Test
ChurnPrediction
ROC
PC3: Social Network Eigenvalue
Model#1 Model#2 Model#3Dependent Variable (DV) Y: Subscriber Churn Status Y: Subscriber Churn Status Y: Subscriber Churn Status
Independent Variable (IV) X1: Subscriber Centrality DegreePC1: Non-Churn Neighbors Centrality PC2 :Churn Neighbors Centrality
X1: Subscriber Centrality Degree X2: Subscriber Centrality ClosenessX3: Subscriber Centrality Betweenness
X1: Subscriber Centrality Degree X2: Subscriber Centrality ClosenessPC3: Social Network Eigenvalue
Number of Independent variables 3 3 3Akaike information criterion (AIC) 22710 23546 19540
Pseudo R-Squared
llh -11351.1129 -11769.0146 -9765.7565llhNull -14840.6155 -14840.6155 -14840.6155
McFadden's pseudo r-squared 0.2351 0.2070 0.3420
Cragg and Uhler's pseudo r-squared 0.3687 0.3305 0.5009
Model evaluation
Accuracy 0.7247 0.7201 0.727095% CI (0.7155, 0.7338) (0.7108, 0.7292) (0.7178, 0.736)Kappa 0.4072 0.3969 0.4115
Sensitivity 0.9891 0.9864 0.9944Specificity 0.3938 0.3867 0.3923
Pos Pred Value 0.6713 0.6681 0.6719Neg Pred Value 0.9666 0.9579 0.9824
Prevalence 0.5559 0.5559 0.5559Detection Rate 0.5498 0.5483 0.5527
Detection Prevalence 0.8191 0.8207 0.8226Balanced Accuracy 0.6915 0.6866 0.6934
AUC 0.8556 0.8511 0.8581
Model#1 Model#2 Model3
Confusion Matrix Reference Reference Reference
1 0 1 0 1 0
Predicted1 1,622 56
Predicted1 1,593 70
Predicted1 1,616 29
0 2,497 5,099 0 2,526 5,085 0 2,503 5,126
AUC 0.8556 0.8511 0.8581
ROC Model#1 ROC Model#2 ROC Model#3
Churner Non-churner
Churner / Predicted ChurnerNon-churner / Predicted Non-churnerNon-churner / Predicted ChurnerChurner / Predicted Non-churner
Churner / Predicted ChurnerNon-churner / Predicted Non-churnerNon-churner / Predicted ChurnerChurner / Predicted Non-churner
Churner / Predicted ChurnerNon-churner / Predicted Non-churnerNon-churner / Predicted ChurnerChurner / Predicted Non-churner