Perceived versus Actual Predictability of Personal Information
in Social Networks
Perceived versus Actual Predictability of Personal Information
in Social NetworksEleftherios (Lefteris) Spyromitros-Xioufis1,
Georgios Petkos1, Symeon Papadopoulos1, Rob Heyman2, Yiannis
Kompatsiaris1
1Center for Research and Technology Hellas Information
Technologies Institute (CERTH-ITI)2iMinds-SMIT, Vrije Universiteit
Brussel, Brussels, Belgium
INSCI 2016, Sep 12-14, 2016, Florence, Italy
1
1
Disclosure of Personal Information in OSNsOnline Social Networks
(OSNs) have had transforming impact!People use it for
communication, as news source, to make business,However,
participation in OSNs comes at a price!User-related data is shared
with: a) other OSN users, b) the OSN itself, c) third parties (e.g.
ad networks)Disclosure of specific types of data:e.g. gender, age,
ethnicity, political or religious beliefs, sexual preferences,
financial status, etc.Has implications:e.g. unjustified
discrimination in personnel selection / loan approval Information
need not be explicitly disclosed!Several types of personal
information can be accurately inferred based on implicit cues (e.g.
Facebook likes) using machine learning!
2
Inferring Personal Information3[1] Kosinski, et al. Private
traits and attributes are predictable from digital records of human
behavior. Proceedings of the National Academy of Sciences, 2013.[2]
Schwartz, et al. Personality, gender, and age in the language of
social media: The open-vocabulary approach. PloS one, 2013.
Inferred Information & Privacy in OSNsStudy of user
awareness with regard to inferred information largely neglected by
social research on OSN privacyPrivacy usually presented as a
question of giving access or communicating personal information to
a particular partyE.g. Westins [1] definition of privacy: The claim
of individuals, groups, or institutions to determine for themselves
when, how, and to what extent information about them is
communicated to others.However, access control is non-existent for
inferred information:Users are unaware of the inferences being
madeHave not control over their logicAim of our work:Investigate if
and how users intuitively grasp what can be inferred from their
disclosed data!4[1] Alan Westin. Privacy and freedom. Bodley Head,
London, 1970.
4
Main Research QuestionsOur study attempts to answer the
following questions:PredictabilityHow predictable different types
of personal information are, based on users OSN data?Actual vs
perceived predictabilityHow realistic are user perceptions about
predictability of their personal information?Predictability vs
sensitivityWhat is the relationship between perceived sensitivity
and predictability of personal information?Previous work has
focused mainly on Q1We address Q1 using a variety of data and
methods and additionally we address Q2 and Q3
5
What data is needed for this study?We collected 3 types of data
about 170 Facebook users:OSN data: likes, posts, imagesCollected
through a test Facebook application (Databait1 developed within the
USEMP2 FP7 project)Answers to questions about 96 personal
attributes, organized3 into 9 categories (disclosure
dimensions)E.g. health factors, sexual orientation, income,
political attitude, etc.Answers to questions related to their
perceptions about predictability and sensitivity of the 9
disclosure dimensionsWhat is the purpose of each data type?1 &
2 allow accessing actual predictability of personal
informationTraining sets for supervised learning algorithms3
facilitates a comparison between actual predictability and
perceived predictability/sensitivity of personal information 61
https://databait.hwcomms.com2 http://www.usemp-project.eu/3
http://usemp-mklab.iti.gr/usemp/prepilot_survey_data_statistics.pdf
6
Example from the questionnaire 7What is your sexual orientation?
Ground truth!
Do you think the information on your Facebook profile reveals
your sexual orientation? Either because you yourself have put it
online, or it could be inferred from a combination of posts.
Measures perceived predictability
How sensitive do you find the information you had to reveal
about your sexual orientation in the previous section? (1=not
sensitive at all, 7= very sensitive) Measures perceived
sensitivityResponseNo. of
participantsheterosexual147homosexual14bisexual7n/a2
ResponseNo. of participantsyes134no33n/a3
Predictive Attributes Extracted from OSN Datalikes: binary
vector denoting presence/absence of like (#3.6K)likesCats:
histogram of like category frequencies (#191)likesTerms:
Bag-of-Words (BoW) of terms in description, title and about
sections of likes (#62.5K)msgTerms: BoW vector of terms in user
posts (#25K)lda-t: Distribution of topics in the textual contents
of both likes (description, title and about section) and
postsLatent Dirichlet Allocation with t=20,30,50,100visual:
concepts depicted in user images (#11.9K)Detected using CNN, top 12
concepts per images, 3 variantsvisual-bin: hard 0/1
encodingvisual-freq: concept frequency histogramvisual-conf: sum of
detection scores across all images8
Experimental Setup9
9
Results 1: Evaluating Classifiers10
10
Results 2: Evaluating Features11
11
12
Results 3: Combining Features
12
Results 4: Best Performance per Attribute13
13
Ranking of Dimensions
14RankPerceived predictabilityActual predictabilityActual
predictability according to
[1]1DemographicsDemographics-Demographics2Relationship status and
living conditionPolitical views +3Political views3Sexual
orientationSexual orientation-Religious views4Consumer
profileEmployment/Income+4Sexual orientation5Political
viewsConsumer profile-1Health status6Personality traitsRelationship
status and living condition-4Relationship status and living
condition7Religious viewsReligious views-8Employment/IncomeHealth
status+19Health statusPersonality traits-3
[1] Kosinski, et al. Private traits and attributes are
predictable from digital records of human behavior. Proceedings of
the National Academy of Sciences, 2013.
14
Perceived/Actual Predictability vs Sensitivity15
15
Conclusions & Future WorkConclusionsBoth correct and
incorrect perceptions about predictabilityPredictability of
sensitive information is underestimatedSophisticated privacy
assistance tools are neededSupport users in managing disclosure of
personal informationDatabait: a privacy assistance tool (still in
beta mode)
16
16
Thank you!ResourcesCode/models:
https://github.com/MKLab-ITI/usemp-pscoreDatabait:
https://databait.hwcomms.comContact us
http://www.usemp-project.eu/
17@[email protected]@sympap [email protected]@kompats
[email protected]
17