Top Banner
Social Media Data Collection & Network Analysis with Netlytic and R Anatoliy Gruzd [email protected] @gruzd Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of Management Director, Social Media Lab Ryerson University HKBU, Hong Kong Dec 3, 2015 Twitter: @gruzd ANATOLIY GRUZD 1
42

Social Media Data Collection & Network Analysis with Netlytic and R

Jan 15, 2017

Download

Social Media

Anatoliy Gruzd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Social Media Data Collection & Network Analysis with Netlytic and R

Social Media Data Collection & Network Analysis with Netlytic and R

Anatoliy [email protected]@gruzd

Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of ManagementDirector, Social Media LabRyerson University

HKBU, Hong Kong

Dec 3, 2015

Twitter: @gruzd ANATOLIY GRUZD 1

Page 2: Social Media Data Collection & Network Analysis with Netlytic and R

Research at the Social Media Lab

Page 3: Social Media Data Collection & Network Analysis with Netlytic and R

Presentation Slides

http://bit.ly/hk15slides

Twitter: @gruzd ANATOLIY GRUZD 3

Page 4: Social Media Data Collection & Network Analysis with Netlytic and R

Twitter: @gruzd

ANATOLIY GRUZD

Social Media sites have become

an integral part of our daily lives!

Growth of Social Media Data

Facebook

1.5B users

Instagram

400M users

Twitter

300M users

Page 5: Social Media Data Collection & Network Analysis with Netlytic and R

Decision Making

in domains such as Politics, Health Care and Education

Twitter: @gruzd ANATOLIY GRUZD 6

How to Make Sense of Social Media Data?

Self-collected/reported

Public APIs

Data Resellers

Page 6: Social Media Data Collection & Network Analysis with Netlytic and R

How to Make Sense of Social Media Data?Big Data Technology

Twitter: @gruzd ANATOLIY GRUZD 7

Credit: Nathan Lapierre

Page 7: Social Media Data Collection & Network Analysis with Netlytic and R

Twitter: @gruzd ANATOLIY GRUZD 8

Social Media Analytics Toolshttp://socialmedialab.ca/apps/social-media-toolkit/

Page 8: Social Media Data Collection & Network Analysis with Netlytic and R

Data -> Visualizations -> Understanding

How to Make Sense of Social Media Data?

Twitter: @gruzd ANATOLIY GRUZD 9

Page 9: Social Media Data Collection & Network Analysis with Netlytic and R

How to Make Sense of Social Media Data?Example: Geo-based Analysis

Twitter: @gruzd ANATOLIY GRUZD 10

Page 10: Social Media Data Collection & Network Analysis with Netlytic and R

How to Make Sense of Social Media Data?Example: Geo-based Analysis

Twitter: @gruzd ANATOLIY GRUZD 11

Geography of

Twitter Networks

Page 11: Social Media Data Collection & Network Analysis with Netlytic and R

How to Make Sense of Social Media Data?Example: Geo-based + Content AnalysisTracking Hate Speech on Twitter

Twitter: @gruzd ANATOLIY GRUZD 12

Source: http://www.fenuxe.com/tag/geo-coded

Page 12: Social Media Data Collection & Network Analysis with Netlytic and R

Social Network Analysis (SNA)

• Nodes = People

• Edges /Ties (lines) = Relations/

“Who retweeted/ replied/

mentioned whom”

How to Make Sense of Social Media Data?

Twitter: @gruzd ANATOLIY GRUZD 13

Page 13: Social Media Data Collection & Network Analysis with Netlytic and R

Makes it much easier to understand what is going on

in a group

Advantages of

Social Network Analysis

Once the network is discovered, we can find

out:

• How do people interact with each other,

• Who are the most/least active members,

• Who is influential in a group,

• Who is susceptible to being influenced,

etc…

Twitter: @gruzdANATOLIY GRUZD

14

Liberal

ConservativeSpam

Unknown &

Undecided

NDP

Left

Green

Bloc

Other

Gruzd, A. and Roy, J (2014). Political Polarization on Social Media: Do

Birds of a Feather Flock Together on Twitter? Policy & Internet.

Page 14: Social Media Data Collection & Network Analysis with Netlytic and R

Common approach for collecting social network data:

• Self-reported social network data may not be available/accurate

• Surveys or interviews

Problems with surveys or interviews

• Time-consuming

• Questions can be too sensitive

• Answers are subjective or incomplete

• Participant can forget people and

interactions

• Different people perceive events and

relationships differently

How Do We Collect Information About Online Social Networks?

Twitter: @gruzd ANATOLIY GRUZD 15

Page 15: Social Media Data Collection & Network Analysis with Netlytic and R

Studying Online Social Networks

http://www.visualcomplexity.com/vc

Forum networks

Blog networks

Friends’ networks (Facebook,

Twitter, Google+, etc…)

Networks of like-minded people

(YouTube, Flickr, etc…)

Twitter: @gruzd ANATOLIY GRUZD 17

Page 16: Social Media Data Collection & Network Analysis with Netlytic and R

Goal: Automated Networks Discovery

Challenge: Figuring out what content-based features of online interactions can help to uncover nodes and ties between group members

How Do We Collect Information About Online Social Networks?

Twitter: @gruzd ANATOLIY GRUZD 18

Page 17: Social Media Data Collection & Network Analysis with Netlytic and R

Automated Discovery of Social Networks

Emails

Nick

Rick

Dick

• Nodes = People

• Ties = “Who talks to whom”

• Tie strength = The number of

messages exchanged between

individuals

Twitter: @gruzd ANATOLIY GRUZD 19

Page 18: Social Media Data Collection & Network Analysis with Netlytic and R

Automated Discovery of Social Networks

“Many to Many” Communication

ChatMailing listservForum Comments

Twitter: @gruzd ANATOLIY GRUZD 20

Page 19: Social Media Data Collection & Network Analysis with Netlytic and R

@John

@Peter

@Paul • Nodes = People

• Ties = “Who retweeted/

replied/mentioned whom”

• Tie strength = The number of

retweets, replies or mentions

Automated Discovery of Social NetworksTwitter Networks

Twitter: @gruzd ANATOLIY GRUZD 21

Page 20: Social Media Data Collection & Network Analysis with Netlytic and R

Automated Discovery of Social Networks

Twitter Data Examples

Network Ties

@Cheeflo -> @JoeProf@Cheeflo -> @VMosco@JoeProf -> @VMosco

Twitter: @gruzd ANATOLIY GRUZD 22

Network Tie

@Gruzd -> @SidneyEve

Connection type: Mention

Connection type: Reply

Page 21: Social Media Data Collection & Network Analysis with Netlytic and R

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 23

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

Page 22: Social Media Data Collection & Network Analysis with Netlytic and R

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 24

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

Page 23: Social Media Data Collection & Network Analysis with Netlytic and R

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 25

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

What do these visualizations tell us?

Page 24: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

Macro-level

Density

Diameter

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 26Twitter: @gruzd

Page 25: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 27

In-degree suggests “prestige” highlighting the most mentioned or replied Twitter users

Twitter: @gruzd

Page 26: Social Media Data Collection & Network Analysis with Netlytic and R

In-degree centrality#HongKong Twitter network

Twitter: @gruzd ANATOLIY GRUZD 28

SEVENTEEN or SVT is

a S.Korean boy group formed

by Pledis Entertainment

Page 27: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 29

Out-degree reveals active Twitter users with a good awareness of others in the network

Twitter: @gruzd

Page 28: Social Media Data Collection & Network Analysis with Netlytic and R

Out-degree centrality#HongKong Twitter network

Twitter: @gruzd ANATOLIY GRUZD 30

Note: A music fan (many

retweets & replies to others)

Page 29: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 31

Betweenness shows actors who are located on the most number of information paths and who often connect different groups of users in the network

Twitter: @gruzd

Page 30: Social Media Data Collection & Network Analysis with Netlytic and R

Betweenness centrality#HongKong Twitter network

Twitter: @gruzd ANATOLIY GRUZD 32

Note: A fan (retweets/replies to messages

from two different fan communities/sites)

Page 31: Social Media Data Collection & Network Analysis with Netlytic and R

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 33

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

Page 32: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Density indicates the overall connectivity in the network (the total number of connections divided by the total number of possible connections).

It is equal to 1 when everyone is connected to everyone.

ANATOLIY GRUZD 34Twitter: @gruzd

User1 User3

User2Density = 1

Page 33: Social Media Data Collection & Network Analysis with Netlytic and R

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 35Twitter: @gruzd

Page 34: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Diameter gives a general idea of how “wide” the network is; the longest of the shortest paths between any two nodes in the network.

ANATOLIY GRUZD 36Twitter: @gruzd

#1

User1User3

User2

User4

Diameter = 3

#2

#3

Page 35: Social Media Data Collection & Network Analysis with Netlytic and R

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 37Twitter: @gruzd

Page 36: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Reciprocity shows how many online participants are having two-way conversations.

In a scenario when everyone replies to everyone, the reciprocity value will be 1.

ANATOLIY GRUZD 38Twitter: @gruzd

User2

User1User3

User4 Reciprocity=1

Page 37: Social Media Data Collection & Network Analysis with Netlytic and R

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity 0.006 (0.6%) 0.003 (0.3%)

Centralization

Modularity

ANATOLIY GRUZD 39Twitter: @gruzd

Page 38: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Centralization indicates whether a network is dominated by few central participants (values are closer to 1),

or whether more people are contributing to discussion and information dissemination (values are closer to 0).

ANATOLIY GRUZD 40Twitter: @gruzd

User2

User1User3

User4 Centralization=1

Page 39: Social Media Data Collection & Network Analysis with Netlytic and R

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity 0.006 (0.6%) 0.003 (0.3%)

Centralization 0.05 0.11

Modularity

ANATOLIY GRUZD 42Twitter: @gruzd

Page 40: Social Media Data Collection & Network Analysis with Netlytic and R

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Modularity provides an estimate of whether a network consists of one coherent group of participants who are engaged in the same conversation and who are paying attention to each other (values closer to 0);

or whether a network consists of different conversations and communities with a weak overlap (values closer to 1).

ANATOLIY GRUZD 44Twitter: @gruzd

Page 41: Social Media Data Collection & Network Analysis with Netlytic and R

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity 0.006 (0.6%) 0.003 (0.3%)

Centralization 0.05 0.11

Modularity 0.42 0.92

ANATOLIY GRUZD 47Twitter: @gruzd

Page 42: Social Media Data Collection & Network Analysis with Netlytic and R

Practice with Netlytic + R

Twitter: @gruzd Anatoliy Gruzd 48

Twitter hashtag:

#HongKong

Instructions at

http://bit.ly/hknet15