Top Banner
www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke [email protected] Commercial Programming Lecture October 2011
47

Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke [email protected] Commercial Programming Lecture October 2011.

Dec 26, 2015

Download

Documents

Annabelle Kelly
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Mining Social Networks

Dr Andy Pryke

[email protected]

Commercial Programming LectureOctober 2011

Page 2: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Contents

What are Social NetworksWhy Analyse Them?Analysis TechniquesExample Applications

Page 3: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Social Network Analysis

Also called Organizational Network Analysis Pre-dates data mining. Developed by sociologists and

anthropologists Formalise their understanding of family and

community relationships.

Page 4: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

What is a Network

Referred to technically as a "graph". Each person (or organisation etc.) is represented as a

node. Visually this is normally a dot or square.

Connections are called “links” or “edges” Represented as a line. Indicates communications (e.g. emails), purchases, visits, or less

tangible things such as emotional relationships. Can be “directed” or “undirected”

e.g. On Twitter, you follow Stephen Fry, but he doesn’t follow you!

Page 5: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Source: Erickson Data Blog

Page 6: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Email communication Graph

Nodes = People Links = Emails Source: orgnet.com

Page 7: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Example - Mapping Links between Blogs

Sources:http://discovermagazine.com/2007/may/map-welcome-to-the-blogospherehttp://datamining.typepad.com/gallery/blog-map-gallery.html

1 - Daily Kos

2 - BoingBoing

3 - LiveJournal Users

4 - Highly Interlinked Blogs

5 - Porn Blogs - not linked in

6 - Sports Blogs - Separate but connected

Page 8: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Example - Twitter Social Network

Source: Bruno Peeters

http://bvlg.blogspot.com/2007/04/twitter-vrienden.html

Page 9: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

VideoNicholas Christakis

The hidden influence of social networks

TED Talk, Feb 2010

Page 10: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Applications of Social Network DM

Typical applications of social network analysis and data mining: Detection of criminal activity, Counter terrorism, "homeland

security" and intelligence Analysis of relationships within companies Sociological and anthropological studies Reciprocal trust schemes such as e-bay ratings Recommended friends on Facebook Filter or recommend social media content Etc….

Page 11: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Complex Network Example

Page 12: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Complex Network Example

Page 13: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

How do we Analyse Networks?

Page 14: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Individual Nodes

Degree Centrality Number of connections to other nodes. High values mean many connections. Can measure links in and out separately

Applications….

Page 15: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Individual Nodes

Degree Centrality Number of connections to other nodes. High values mean many connections. Can measure links in and out separately

Applications Who is most listened to on Twitter? Who has most contacts within a company? Which user’s reviews influence others the most?

Page 16: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Individual Nodes

Closeness Centrality The average number of steps required to reach any

other node. Communications are easier if you don't have to go through too many people.

Applications...

Page 17: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Individual Nodes

Closeness Centrality The average number of steps required to reach any

other node. Communications are easier if you don't have to go through too many people.

Applications Is this person central to the group? Is your message likely to reach the audience?

Page 18: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Individual Nodes

Betweenness Centrality How much of a link between other nodes is this

node? Applications…

Page 19: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Individual Nodes

Betweenness Centrality How much of a link between other nodes is this

node? Applications

Someone who has a high betweenness centrality is often a broker between others.

What happens if this person leaves the network?

Page 20: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Networks as a Whole

Structural holes Gaps in linkage between groups.

Applications…

Page 21: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Networks as a Whole

Structural holes Gaps in linkage between groups.

Applications Bridges across this access information from both, suggesting

influence and understanding of an organisation. Can we create a bridge? Is there an opportunity to control or influence communications

between groups?

Page 22: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Networks as a Whole

Degree of centralisation is the network held together by just a few nodes? Or is it more cohesive? Measures include average and variance of degree centrality

Applications…

Page 23: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics - Networks as a Whole

Degree of centralisation is the network held together by just a few nodes? Or is it more cohesive? Measures include average and variance of degree centrality

Applications Is a crime network vulnerable to disruption? What happens to a company if a few key people leave?

Page 24: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Graph Statistics – More…

There are many other measures, for examples see: http://faculty.ucr.edu/~hanneman/networkshop/index.html http://en.wikipedia.org/wiki/Social_network

Page 25: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Data Mining Approaches to Networks

Structural Equivalence Find nodes with similar roles in the network

Cluster Analysis Identify groups of nodes which are closely connected - and

characterise them Identifying the Most Influential People Predicting Node Types (e.g. Fraudster) Profiling Sub-networks (e.g. terrorist cell)

Page 26: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Twitter - Clustered Network

To reduce clutter, we can cluster people who reference each other,and only show links within clusters.

http://www.neoformix.com/2009/TorontoTwitterCommunity.html

Page 27: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.
Page 28: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.
Page 29: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.
Page 30: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.
Page 31: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.
Page 32: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Data Mining Social Networks - Challenges

Standard problems Incompleteness – We don’t know everything Incorrectness – What we think we know is wrong Inconsistency – We have contradictions in our data

Data transformation - Getting data into a form acceptable by your tools

Fuzzy Boundaries - Networks do not normally have distinct boundaries

Network Dynamics - Relationships change over time

Page 33: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Example Application - Viral Marketing

"In our experiments with the Epinions knowledge-sharing Web site, the most valuable customer had a network value of over 20,000, meaning that marketing to that customer was as effective as marketing to over 20,000 others in the absence of network effects, but the customer's number of direct links to others in the network (i.e., people who read his reviews) was much smaller."Pedro Domingos, Mining Social Networks for Viral Marketing http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf

Page 34: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Example - Identifying Academic Groups

Community Detection in Large-Scale Social NetworksNan Du, Bin Wu, Xin Pei , Bai Wang and Liutong Xu, SIGKDD Workshop on Web Mining and Social Network Analysis, August 12-15, 2007, San Jose , California

Page 35: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Software for Social Network Analysis / DM

StatNet – R Packages - http://statnet.org/

StatNetTutorial - http://www.jstatsoft.org/v24/i09/paper

JUNG – Open Source Java toolkit for SNA - http://jung.sourceforge.net/

NetMiner - Commercial, Comprehensive SNA - http://www.netminer.com/

Pajek - Comprehensive Social Network Analysis, free for academic use - http://pajek.imfm.si/doku.php

Subdue - Graph based data mining tool. Copyright but freely downloadable - http://ailab.wsu.edu/subdue/

More - http://en.wikipedia.org/wiki/Social_network_analysis_software

Page 36: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Looking Forward

Lots and lots of network data out there What about:

Applications for individuals Social Applications (e.g. like TheyWorkForYou.com ) Applications within a University Applications which make money

Potential final year / M.Sc Projects ?

Page 37: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Mining Social Networks

Dr Andy Pryke

[email protected]

Commercial Programming LectureOctober 2011

Page 38: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Bibliography

Very out of date - do look for newer papers and references!

Page 39: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Bibliography - Overview

Paper credited with launching the field - Barnes, J. (1954). Class and Committees in a Norwegian Island Parish. Human Relations, 7, 39-58.

List of systems for Mining Graph data - http://hms.liacs.nl/graphs.html

Introduction to Social Network Analysis - http://www.orgnet.com/sna.html

Network Theory and Analysis in Organizations, a brief overview - http://www.tcw.utwente.nl/theorieenoverzicht/Theory%20clusters/Organizational%20Communication/Network%20Theory%20and%20analysis_also_within_organizations.doc/

Page 40: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Bibliography - Journals and Workshops

Social Networks Journal - http://www.elsevier.com/wps/find/journaldescription.cws_home/505596/description

Workshop on Link Analysis and Group Detectionhttp://kt.ijs.si/Dunja/LinkKDD2006/

SIGKDD Workshop on Web Mining and Social Network Analysis http://workshops.socialnetworkanalysis.info/websnakdd2007/

Page 41: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Bibliography Data Mining Papers

Maitrayee Mukherjee, and Lawrence B. Holderm, Graph-based Data Mining on Social Networks - http://www-2.cs.cmu.edu/~dunja/LinkKDD2004/Maitrayee-Mukherjee-LinkKDD-2004.pdf

Ingrid Fischer and Thorsten Meinl, Graph Based Molecular Data Mining - An Overview - http://www2.informatik.uni-erlangen.de/Forschung/Publikationen/download/graphBasedDM_SMC2004.pdf

Jennifer Xu and Hsinchun Chen, Criminal Network Analysis and Visualization: A Data Mining Perspective, Communications of the ACM - http://ai.eller.arizona.edu/COPLINK/publications/crimenet/Xu_CACM.doc

Page 42: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Bibliography - Data Mining Papers (2)

Pedro Domingos, Mining Social Networks for Viral Marketing - http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf

David Jensen and Jennifer Neville, Data Mining in Social Networks - Looks specifically at predicting film receipts from IMDB data - http://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf

Bootstrapping the FOAF-Web: An Experiment in Social Network Mining - http://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/bootstrapping_the_foaf_web/

Page 43: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Page 44: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Page 45: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Page 46: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Impact of Computers on SNA

The rise in the power and use of computers has had two main impacts.

1. New data is available from logs of email conversations, phone calls, chat and website usage, facebook friends, tweets etc...

2. Computers can be employed for analysis and data mining.

Page 47: Www.the-data-mine.co.uk Mining Social Networks Dr Andy Pryke Andy@the-data-mine.co.uk Commercial Programming Lecture October 2011.

www.the-data-mine.co.uk

Role of computer analysis

Data collected about social networks can be complex and large.

Imagine a network documenting each purchase you've made using a credit/debit card, every phone call and SMS, each email etc.

When these kinds of data are collected over large populations, the resulting graphs are much too large to be understood by eye.