Classifying Online Social Network Users Through the Social ...conferences.telecom-bretagne.eu/fps2012/program/slides/07.pdf · 1 Introduction 2 Classi er proposal 3 The experiments

Classifying Online Social Network Users Throughthe Social Graph

Cristina Pérez Solà and Jordi Herrera Joancomart́ı

Departament d’Enginyeria de la Informació i les ComunicacionsUniversitat Autònoma de Barcelona

October 25th, 2012

Introduction Classifier proposal The experiments Conclusions and further work

1 Introduction

2 Classifier proposal

3 The experiments

4 Conclusions and further work

2 / 23


About the title

Classifying...

Definition

Classification is the problem of identifying to which of a set of categories anew observation belongs. The decision is made on the basis of a training setof data containing observations whose category membership is already known.

3 / 23


About the title

... Online Social Network Users...

4 / 23


About the title

...Through the Social Graph

Definition

A social graph is a graph where nodes represent users in a socialnetwork and edges represent relationships between these users.

5 / 23


What do we want to do?

Goals

Design a user (node) classifier that uses the graph structurealone (no semantic information is needed).

Apply the previously designed classifier to label OSN users.

Demonstrate that OSN user classification is possible withnaively anonymized graphs.

6 / 23


Why is it interesting?

Motivation

User classification as a privacy attack

User classification allows an attacker to infer (private) attributesfrom the user.

Attributes may be sensitive by themselves.

Attribute disclosure may have undesirable consecuences forthe user.

In any case, the user is not able to control the disclosure of theinformation about himself anymore...

7 / 23


1 Introduction

2 Classifier proposalArchitecture overviewClassifier modulesSpecific design details

3 The experiments


8 / 23


Architecture overview

Classifier Architecture

The proposed classifier is implemented with a 5 modulearchitecture, which includes two different classifiers: an initialclassifier and a relational classifier.

Initial

classifierRelational

classifier

Data

preprocessingData

preprocessingClus. coeff.

&

degrees

Class

labels New class

labels

Neighborhood

analysis

9 / 23


Classifier modules

Initial classifier

The initial classifier analyzes the graph structure and maps eachnode to a 2-dimensional sample: degree & clustering coefficient.The output is an initial assignation of nodes to categories.

10 / 23


Classifier modules

Neighborhood analysis

The neighborhood analysis module reports to which kind of nodesis every node connected, using the labels assigned by the initialclassifier.

11 / 23


Classifier modules

Relational classifier

The relational classifier maps users to n-dimensional samples, usingboth degree & clustering coefficient and the neighborhoodinformation to classify users. The output is a new assignation ofnodes to categories, which can differ from the initial classification.

12 / 23


Specific design details

Some details about the classifier

The graph is directed, so we distinguish between indegree andoutdegree (instead of having just degree).

This distinction increases by 2 the number of dimensions inthe neighborhood analysis.

We can have as many categories as we want: we just have toadd more dimensions!

Classifiers are instantiated with Support Vector Machines withsoft margins.

The relational classifier is applied iteratively.

13 / 23


1 Introduction


3 The experimentsExperiment designExperiment results


14 / 23


Experiment design

The main goal

Research question

Is an attacker able to recover attributes from OSN users knowingjust the social graph structure and the attributes of a small subsetof the nodes in the graph?

We are facing a within network classification problem, where nodesfor which the labels are unknown are linked to nodes for which thelabel is known.

15 / 23


Experiment design

Data used in the experiments

We collected data from 936.423 Twitter users, which were allthe neighbors of a subset of 300 nodes.

We constructed two disjoint graphs G1 = (V1,E1) andG2 = (V2,E2) with users and their relationships.

We labeled the nodes of the graphs to obtain the ground oftruth:

Binary classification: individual or company.Multiclass classification: normal user, blogger, celebrity, mediaand organization.

16 / 23


Experiment design

An experiment

Each of the experiments consisted on:

Randomly selecting a subset of nodes (Vtrain) to be used astraining samples: 65%, 50%, 35% and 20% of nodes.

Training the classifiers with those samples.

Classifying the rest of the nodes (Vtest = V r Vtrain).

Evaluating the overall performance using the ground of truth.

We performed 100 experiments for each of the training set sizesand for both classification problems.

17 / 23


Experiment results

Binary Classification Results

0 1 2 3 4 5 6 7 8 9 10

0.5

0.55

0.6

0.65

0.7

0.75

Iteration

Corr

ect ra

teCorrect rates

D1−65% train

D1−50% train

D1−35% train

D1−20% train

D2−65% train

D2−50% train

D2−35% train

D2−20% train

18 / 23


Experiment results

Multiclass Classification Results

0 1 2 3 4 5 6 7 8 9 10

0.3

0.35

0.4

0.45

0.5

0.55

0.6

Iteration

Cor

rect

rat

eCorrect rates

Cata − 65% train

Cata − 50% train

Cata − 35% train

Cata − 20% train

19 / 23


1 Introduction


3 The experiments


20 / 23


Conclusions

Conclusions

Information found in the social graph is enough to performclassification.

It is possible to classify OSN users using a naively anonymizedcopy of a social graph.

Naive anonymization does not protect OSN users fromattribute disclosure.

Success rate varies depening on the training set sizes.

21 / 23


Further work

Further work

Integrate both structural and semantic information to improveclassification.

Study the impact of different graph anonymization techniques(other than the naive anonymization) on the classification.

Analyze the performance of other classification techniques forrelational data.

22 / 23

Classifying Online Social Network Users Throughthe Social Graph

Cristina Pérez Solà and Jordi Herrera Joancomart́ı

Departament d’Enginyeria de la Informació i les ComunicacionsUniversitat Autònoma de Barcelona

October 25th, 2012

Linear SVM

24 / 23

Non linear SVM

25 / 23

Introduction

Classifier proposalArchitecture overviewClassifier modulesSpecific design details

The experimentsExperiment designExperiment results

Conclusions and further work

Appendix

Classifying Online Social Network Users Through the Social ...conferences.telecom-bretagne.eu/fps2012/program/slides/07.pdf · 1 Introduction 2 Classi er proposal 3 The experiments

Documents