Top Banner
Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by Natalia Dragan Data Mining Techniques(CS6/73015-001) Fall06 Kent State University
50

Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Group Formation in Large Social Networks: Membership, Growth, and EvolutionLars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang

Lan

Presented by Natalia Dragan

Data Mining Techniques(CS6/73015-001) Fall06 Kent State University

Page 2: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Outline

Introduction Motivation Present work contributions Related work Membership, Growth, Evolution Conclusions

Page 3: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Introduction

Structure of society: people tend to come together and form groups

Why it is important to study To understand better decision-making behavior Tracking early stages of an epidemic Following popularity of new ideas and technologies

Significant growth in the scale and richness of on-line communities and social media (MySpace, Face book, LiveJournal, Flickr)

Page 4: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Problems with the studying social processes Easy to build theoretical models (subgraph

branching out rapidly over links in the network or collection of small disconnected components growing in

a ‘speckled’ fashion) Hard to make concrete empirical statements

about these types of processes

Lack of reasonable vocabulary for studying group evolution

Page 5: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Present work Principles by which groups develop and evolve in

large-scale social networks

Crucial point: focusing on networks where the members have explicitly identified themselves as a group or community (vs. an unsupervised graph clustering problem of inferring

“community structures” in a network)

3 main types of questions: Membership, Growth, Change

Page 6: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Membership, Growth, Change Membership

Structural features that influence whether a given individual will join a particular group

Growth Structural features that influence whether a given group will

grow significantly over time

Change How focus of interest changes over time How these changes are correlated with changes in the set

of group members

Page 7: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Related work

Identifying tightly-connected clusters within a given graph Dill et al. consider implicitly identified communities

For a set of features (ZIP code, particular keyword, etc.) they consider a subgraph of the Web consisting of all pages containing this feature

Use of online social networks for data mining The structure of the communities is not exploited

Diffusion on innovation study Property which is “diffusing” in the present work –

membership in a given group

Page 8: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Diffusion of Innovations

Diffusion of Innovations is a theory that analyzes, as well as helps explain, the adaptation of a new innovation. In other words it helps to explain the process of social change.

An innovation is an idea, practice, or object that is perceived as new by an individual or other unit of adoption. The perceived newness of the idea for the individual determines his/her reaction to it (Rogers, 1995).

In addition, diffusion is the process by which an innovation is communicated through certain channels over time among the members of a social system. Thus, the four main elements of the theory are the innovation, communication channels, time, and the social system.

http://hsc.usf.edu/~kmbrown/Diffusion_of_Innovations_Overview.htm

Page 9: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Related work on Diffusion of Innovation How a social network evolves as it’s

members attributes change (Sarkar and Moore, Holme and Newman)

Social network evolution in a university setting (Kossinets and Watts)

Evolution of topics over time (Wang and McCallum)

Property which is “diffusing” in the present work is a membership in a given group

Page 10: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Road map

Introduction Motivation Present work contributions Related work Membership, Growth, Evolution Conclusions

Page 11: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Sources of data

LiveJournal Free on-line community with ~ 10 mln members 300,000 update the content in 24-hour period Maintaining journals, individual and group blogs Declaring who are their friends and to which communities

they belong DBLP

On-line database of computer science publications (about 400,000 papers)

Friendship network – co-authors in the paper Conference - community

Page 12: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Community Membership

Study of processes by which individuals join communities in a social network

Fundamental question about the evolution of communities: who will join in the future?

Membership in a community – “behavior” that spreads through the network Diffusion of innovation study perspective for this question

Page 13: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Dependence on number of friends: start towards membership prediction Underlying premise in diffusion studies: an

individual probability of adopting a new behavior increases with the number of friends (K) already engaging in the behavior

Theoretical models concentrate on the effect of K, while the structural properties are more influential in determining membership

Page 14: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Approach hypothesis

For moderate values of K an individual with K friends in a group is significantly more likely to join if these K fiends are themselves mutual friends than if there are not

Page 15: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Dependence on number of friends

.

..

.

1st snapshot 2nd snapshot

...

C.

..

...

..

C.K = 3

.. ..

..

.

. ..

P(k) = 2/3

..

Probability P(k) of joining community = fraction of triples (u,C,k)

- user (u) , C - community, - friend

. .

Page 16: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Law of Diminishing returns

In economics, diminishing returns is the short form of diminishing marginal returns. In a production system, having fixed and variable inputs, keeping the fixed inputs constant, as more of a variable input is applied, each additional unit of input yields less and less additional output. This concept is also known as the law of increasing opportunity cost or the law of diminishing returns.

http://en.wikipedia.org/wiki/Diminishing_returns

Page 17: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Dependence on number of friends: LiveJournal

Page 18: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Dependence of number of friends: DBLP

Page 19: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Discussion of results

The plots for LJ and DBLP have similar shapes, dominated by “diminishing returns” property (curve continues increasing, but more and more slowly even for large K)

P(2)>2P(1) – benefit of having a second friend is particularly strong (S-shaped behavior)

Curve for LJ is quite smooth (1/2 billion triples vs. 7.8 million for DBLP)

Page 20: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

A broader range of features

Features related to the community C (11) Number of members (|C|) Number of individuals with a friend in C (fringe of C) Number of edges with both ends in the community (|Ec|) etc.

Features related to an individual u and her set S of friends in community C (8) Number of friend in community (|S|) Number of adjacent pairs in S Number of pairs in S connected via a path in Ec etc.

Page 21: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Estimating probability on a broader range of features Decision-tree techniques were applied to

these features to make advances in estimating the probability of an individual joining a community

The technique incorporates Individual’s position in the network (structural

features) Level of activities among members (group

features)

Page 22: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Predictions for LJ and DBLP 1st snapshot 2nd snapshot

C

Fringe Fringe

C

u.... ....

Data point (u,C) Probability UC

LJ: 17,076,344 data points, 875 communities DBLP: 7,651,013 data points

LJ: 14,448 joined community DBLP: 71,618 joined community

20 decisions tree were built for estimation about joining

Page 23: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Splitting process for LJ

Each of 875 communities have half of their fringe members included in the training set (with the independent probability 0.5)

At each node in the decision tree Every possible feature Every binary split treshold for that feature

were examined

Of all such pairs the split which produces the largest decrease in entropy was chosen

Page 24: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Splitting process for LJ New splits were installed until there are fewer than

100 positive cases at the node

Leaf nodes predict the ratio of positive to total cases for that node

Averaging technique For every case the set of decision trees, for which this case

is not included in the training set, were built The average of these predictions is a prediction for the

case

Page 25: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Averaging model (Simple description) Selecting a model that explains the data from all the possible

models, the one which better fits the data is usually selected.

But sometimes there is some model that explains really well the data, creating a model selection uncertainty, which is usually ignored.

BMA (Bayesian Model Averaging) provides a coherent mechanism for accounting for this model uncertainty, combining predictions from the different models according to their probability.

J. A.and Madigan D. Hoeting and A.E.and Volinsky C.T. Raftery. Bayesian model averaging: A tutorial (With Discussion). Statistical Science, 44(4):382--417, 1999

Page 26: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Averaging model (Simple description) Example: we have an evidence D and 3 possible

hypothesis h1, h2 and h3. The posterior probabilities for those hypothesis are P( h1 | D ) = 0.4, P( h2 | D ) = 0.3 and P( h3 | D ) = 0.3 Giving a new observation, h1 classifies it as true and h2 and h3

classify it as false, then the result of the global classifier (BMA) would be calculated as follows:

Page 27: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Top two level splits for predicting single individuals joining communities in LJ

Page 28: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Performance achieved with the decision trees

Prediction performance for single individuals joining communities in LJ

Prediction performance for single individuals joining communities in DBLP

Page 29: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Internal connectedness of friends

Individuals whose friends in community are linked to one another are significantly more likely to join the community

Page 30: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Road map

Introduction Motivation Present work contributions Related work Membership, Growth, Evolution Conclusions

Page 31: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Community Growth Prediction task: identifying which communities

will grow significantly over a given period of time Binary classification problem Training set

Community growth

Class 0 (50.6) Class 1 (49.4%)

>9 % < 18 %

Data set: 13 570 communitiesTo make predictions: 100 decision trees on 100 independent samples using the community features were builtBinary split is installed until a node has less than 50 data points

Page 32: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Top two levels of decision tree splits for predicting community growth in LJ The features and splits varied depending on the sample,

but the top 2 splits were stable

Page 33: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Solution to the problem

Averaging tree techniques was used

Three baselines with a single feature were considered Size of the community Number of people in the fringe of the community Ratio of these two features and combination of all

three features

Page 34: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Results

Predicting community growth: baselines based on three different features, and performance using all features

By including the full set of features predictions with reasonably good performance were received

Page 35: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Road map

Introduction Motivation Present work contributions Related work Membership, Growth, Evolution Conclusions

Page 36: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Movement between communities How people and topics move between communities

Fundamental question: given a set of overlapping communities do topics tend to follow people or do people tend to follow topics

Experiment set up: 87 conferences for which there is DBLP data over at least 15-year period Cumulative set of words in titles is a proxy for top-level

topics

Page 37: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Time Series and Detected Bursts: Term Bursts

OOPSLA’03

.. .

. . ..“Micro-Pattern Evolution”

“Micro-Pattern in Java”

Tw,C(y) = 2/6

. . . . . .2000 2001 2002 2003 2004 2005 y

Tw,C

Term bursts

. . .

. . ..

“Micro-Pattern” is hot atOOPSLA in 2003

Page 38: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Time Series and Detected Bursts: Term Bursts Tw,C(y) – fraction of paper titles at conference C in

year y that contain the word w

Bursts in the usage of w For each time series Tw,C is an interval in which Tw,C(y) is

twice the average rate (“burst rate”) Burst detection technique exploiting stochastic model for

term generation is used Burst intervals serve to identify the “hot topics”

(focus of interest at a conference)

Word w is hot at conference C in year y if the year y is contained in a burst interval of the time series Tw,C

Page 39: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Time Series and Detected Bursts: Movement Bursts Author movement

Authors do not publish every year Movement is asymmetric

Member of a conference C in year y Has published there in the 5 years leading up to y

Author a moves into C from B in year y (B -> C) a has a paper in conference C in year y and a is a member of B in year y-1 Property of two conferences and a year

B C

2002 2003Smith

. .“Micro-Pattern Evolution”

Page 40: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Time Series and Detected Bursts: Movement Bursts

B

.. .

. ...

Dill

MB,C(y) = 2/5

. . . . . .2000 2001 2002 2003 2004 2005 y

MB,C

Movement bursts

. . .. . ..

BrownC

2001 2002

. ..

..

Page 41: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Time Series and Detected Bursts: Movement Bursts MB,C(y) – fraction of authors at conference C

in year y with the property that they are moving into C from B (B -> C)

MB,C – time series representing author movement

B -> C movement bursts an interval of y in which the value MB,C(y)

exceeds the overall average by an absolute difference of .10

Burst detection is used to find burst intervals

Page 42: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Goal of the Experiment 1

Identify how word burst and movement burst intervals are aligned in time?

Word burst intervals identify hot terms Movement burst intervals identify conference

pairs B,C during which there was significant movement

Page 43: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Experiment 1: Papers contributing to Movement Bursts Characteristics of papers associated with some

movement burst into a conference C They exhibit different properties from arbitrary papers at C

Using of terms currently hot at C Using of terms that will be hot at C in the future

Paper at C in y contributes to some movement burst at C If one of the authors is moving B -> C in y y is a part of B -> C movement bursts

2002 2004

Movement burst

. .ICPC’02 OOPSLA’03

2003

“Micro-pattern Evolution”Smith

Page 44: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Papers contributing to Movement Bursts Paper uses hot term

If one of the words in its title is hot for the conference and year in which it appears

Question: do papers contributing to movement bursts differ from arbitrary papers in the way they use hot terms?

Papers contributing to a movement burst contain elevated frequencies of currently and expired hot terms, but lower frequencies of future hot terms

A burst of authors moving into C from B are drawn to topics currently hot at C

Page 45: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Experiment 2: Alignment between different conferences Conferences B and C are topically aligned in a year y

If some word is hot at both B and C in year y Property of two conference and a specific year

Hypothesis: two conferences are more likely to be topically aligned in a given year if there is also a movement burst going between them

“Micro-pattern”

“Micro-pattern”OOPSLA’03ICSM’03

Page 46: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Results

56.34% of all triples (B,C,y) such that there is B->C movement burst containing year y have the property that B and C are topically aligned in year y

16.2 % of all triples (B,C,y) have the property that B and C are topically aligned in year y

The presence of a movement burst between 2 conferences enormously increases the chance they share a hot term

Page 47: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Movement bursts or term bursts come first? There is a B -> C movement burst, and hot

terms w such that B and C are topically aligned via w in some year y inside the movement burst

3 events of interest The start of the burst for w at conference B The start of the burst for w at conference C The start of the B -> C movement burst

Page 48: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Four patterns of author movement and topical alignmentB -> C movement burst Term burst intervals

Shared interest is 50 % more frequent than others

Much more frequent for B and C to have a shared burst term that is already underway before the increase in author movement takes place

194 32

35 61

Page 49: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Conclusions

The ways in which communities in social network grow over time were considered At the level of individuals and their decision to join

communities At a more global level, in which a community can

evolve in membership and content

Page 50: Group Formation in Large Social Networks: Membership, Growth, and Evolution Lars Backstrom, Dan Huttenlocher, Joh Kleinberg, Xiangyang Lan Presented by.

Thank you!