Top Banner
Classifying Online Social Network Users Through the Social Graph Cristina P´ erez Sol` a and Jordi Herrera Joancomart´ ı Departament d’Enginyeria de la Informaci´o i les Comunicacions Universitat Aut`onoma de Barcelona October 25th, 2012
25

Classifying Online Social Network Users Through the Social ...conferences.telecom-bretagne.eu/fps2012/program/slides/07.pdf · 1 Introduction 2 Classi er proposal 3 The experiments

Oct 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Classifying Online Social Network Users Throughthe Social Graph

    Cristina Pérez Solà and Jordi Herrera Joancomart́ı

    Departament d’Enginyeria de la Informació i les ComunicacionsUniversitat Autònoma de Barcelona

    October 25th, 2012

  • Introduction Classifier proposal The experiments Conclusions and further work

    1 Introduction

    2 Classifier proposal

    3 The experiments

    4 Conclusions and further work

    2 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    About the title

    Classifying...

    Definition

    Classification is the problem of identifying to which of a set of categories anew observation belongs. The decision is made on the basis of a training setof data containing observations whose category membership is already known.

    3 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    About the title

    ... Online Social Network Users...

    4 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    About the title

    ...Through the Social Graph

    Definition

    A social graph is a graph where nodes represent users in a socialnetwork and edges represent relationships between these users.

    5 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    What do we want to do?

    Goals

    Design a user (node) classifier that uses the graph structurealone (no semantic information is needed).

    Apply the previously designed classifier to label OSN users.

    Demonstrate that OSN user classification is possible withnaively anonymized graphs.

    6 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Why is it interesting?

    Motivation

    User classification as a privacy attack

    User classification allows an attacker to infer (private) attributesfrom the user.

    Attributes may be sensitive by themselves.

    Attribute disclosure may have undesirable consecuences forthe user.

    In any case, the user is not able to control the disclosure of theinformation about himself anymore...

    7 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    1 Introduction

    2 Classifier proposalArchitecture overviewClassifier modulesSpecific design details

    3 The experiments

    4 Conclusions and further work

    8 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Architecture overview

    Classifier Architecture

    The proposed classifier is implemented with a 5 modulearchitecture, which includes two different classifiers: an initialclassifier and a relational classifier.

    Initial

    classifierRelational

    classifier

    Data

    preprocessingData

    preprocessingClus. coeff.

    &

    degrees

    Class

    labels New class

    labels

    Neighborhood

    analysis

    9 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Classifier modules

    Initial classifier

    The initial classifier analyzes the graph structure and maps eachnode to a 2-dimensional sample: degree & clustering coefficient.The output is an initial assignation of nodes to categories.

    10 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Classifier modules

    Neighborhood analysis

    The neighborhood analysis module reports to which kind of nodesis every node connected, using the labels assigned by the initialclassifier.

    11 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Classifier modules

    Relational classifier

    The relational classifier maps users to n-dimensional samples, usingboth degree & clustering coefficient and the neighborhoodinformation to classify users. The output is a new assignation ofnodes to categories, which can differ from the initial classification.

    12 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Specific design details

    Some details about the classifier

    The graph is directed, so we distinguish between indegree andoutdegree (instead of having just degree).

    This distinction increases by 2 the number of dimensions inthe neighborhood analysis.

    We can have as many categories as we want: we just have toadd more dimensions!

    Classifiers are instantiated with Support Vector Machines withsoft margins.

    The relational classifier is applied iteratively.

    13 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    1 Introduction

    2 Classifier proposal

    3 The experimentsExperiment designExperiment results

    4 Conclusions and further work

    14 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Experiment design

    The main goal

    Research question

    Is an attacker able to recover attributes from OSN users knowingjust the social graph structure and the attributes of a small subsetof the nodes in the graph?

    We are facing a within network classification problem, where nodesfor which the labels are unknown are linked to nodes for which thelabel is known.

    15 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Experiment design

    Data used in the experiments

    We collected data from 936.423 Twitter users, which were allthe neighbors of a subset of 300 nodes.

    We constructed two disjoint graphs G1 = (V1,E1) andG2 = (V2,E2) with users and their relationships.

    We labeled the nodes of the graphs to obtain the ground oftruth:

    Binary classification: individual or company.Multiclass classification: normal user, blogger, celebrity, mediaand organization.

    16 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Experiment design

    An experiment

    Each of the experiments consisted on:

    Randomly selecting a subset of nodes (Vtrain) to be used astraining samples: 65%, 50%, 35% and 20% of nodes.

    Training the classifiers with those samples.

    Classifying the rest of the nodes (Vtest = V r Vtrain).

    Evaluating the overall performance using the ground of truth.

    We performed 100 experiments for each of the training set sizesand for both classification problems.

    17 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Experiment results

    Binary Classification Results

    0 1 2 3 4 5 6 7 8 9 10

    0.5

    0.55

    0.6

    0.65

    0.7

    0.75

    Iteration

    Corr

    ect ra

    teCorrect rates

    D1−65% train

    D1−50% train

    D1−35% train

    D1−20% train

    D2−65% train

    D2−50% train

    D2−35% train

    D2−20% train

    18 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Experiment results

    Multiclass Classification Results

    0 1 2 3 4 5 6 7 8 9 10

    0.3

    0.35

    0.4

    0.45

    0.5

    0.55

    0.6

    Iteration

    Cor

    rect

    rat

    eCorrect rates

    Cata − 65% train

    Cata − 50% train

    Cata − 35% train

    Cata − 20% train

    19 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    1 Introduction

    2 Classifier proposal

    3 The experiments

    4 Conclusions and further work

    20 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Conclusions

    Conclusions

    Information found in the social graph is enough to performclassification.

    It is possible to classify OSN users using a naively anonymizedcopy of a social graph.

    Naive anonymization does not protect OSN users fromattribute disclosure.

    Success rate varies depening on the training set sizes.

    21 / 23

  • Introduction Classifier proposal The experiments Conclusions and further work

    Further work

    Further work

    Integrate both structural and semantic information to improveclassification.

    Study the impact of different graph anonymization techniques(other than the naive anonymization) on the classification.

    Analyze the performance of other classification techniques forrelational data.

    22 / 23

  • Classifying Online Social Network Users Throughthe Social Graph

    Cristina Pérez Solà and Jordi Herrera Joancomart́ı

    Departament d’Enginyeria de la Informació i les ComunicacionsUniversitat Autònoma de Barcelona

    October 25th, 2012

  • Linear SVM

    24 / 23

  • Non linear SVM

    25 / 23

    Introduction

    Classifier proposalArchitecture overviewClassifier modulesSpecific design details

    The experimentsExperiment designExperiment results

    Conclusions and further work

    Appendix