Top Banner
Personalized Privacy-aware Image Classification 1 Eleftherios (Lefteris) Spyromitros-Xioufis, 1 Symeon Papadopoulos, 2 Adrian Popescu, 1 Yiannis Kompatsiaris 1 Center for Research and Technology Hellas – Information Technologies Institute (CERTH-ITI) 2 CEA-LIST ICMR 2016, June 6-9, 2016, New York children drinking erotic relatives vacations 1
29

Personalized Privacy-Aware Image Classification

Jan 13, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Personalized Privacy-Aware Image Classification

1

Personalized Privacy-aware Image Classification1Eleftherios (Lefteris) Spyromitros-Xioufis, 1Symeon Papadopoulos, 2Adrian Popescu, 1Yiannis Kompatsiaris

1Center for Research and Technology Hellas – Information Technologies Institute (CERTH-ITI)2CEA-LIST

ICMR 2016, June 6-9, 2016, New York

children drinking erotic relatives vacations wedding

Page 2: Personalized Privacy-Aware Image Classification

2

Personalized image privacy classification Photo sharing may compromise privacy

Can we make photo sharing safer?• Yes: build “private” image detectors

• Alerts whenever a “private” image is shared• Personalization is needed because privacy is subjective!

-Would you share such an image? -It depends:

• Teenager?• Life insurance?

Page 3: Personalized Privacy-Aware Image Classification

3

Previous work & limitations1. Focus on generic (“community”) notion of privacy• Models trained on PicAlert [1]

• Flickr images annotated according to a common privacy definition• Consequences:

• Variability in user perceptions not captured • Overoptimistic performance estimates

2. Justifications are hardly comprehensible

[1] Zerr et al., I know what you did last summer!: Privacy-aware image classification and search, CIKM, 2012.

Page 4: Personalized Privacy-Aware Image Classification

4

Our main contributions

Study personalization in image privacy classification

• Compare personalized vs generic models

• Compare two types of personalized models

Semantic visual features

• Better justifications and privacy insights

YourAlert: more realistic than existing benchmarks

Page 5: Personalized Privacy-Aware Image Classification

5

Personalization approaches1. Full personalization: • A different model for each user relying only his feedback• Disadvantage: requires a lot of feedback

2. Partial personalization: • Models rely on user feedback + feedback from other users• Amount of personalization controlled via instance weighting

Page 6: Personalized Privacy-Aware Image Classification

6

Visual and Semantic Features

vlad [1]: aggregation of local image descriptors

cnn [2]: deep visual features

semfeat [3]: outputs of ~17K concept detectors

• Trained using cnn

• Top 100 concepts per image

[1] Spyromitros-Xioufis et al., A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Transactions on Multimedia, 2014.[2] Simonyan and Zisserman, Very deep convolutional networks for large-scale image recognition, ArXiv, 2014.[3] Ginsca et al., Large-Scale Image Mining with Flickr Groups, MultiMedia Modeling, 2015.

Page 7: Personalized Privacy-Aware Image Classification

7

Justifications via semfeat

knitwear

young-back

hand-glasscigar-smoker

smoker

drinker

Freudian

semfeat can be used to justify predictions• A tag cloud of the most discriminative visual concepts

Justifications can be noisy• concept detectors are not perfect• semfeat vocabulary is not privacy-oriented

Page 8: Personalized Privacy-Aware Image Classification

8

semfeat-LDA: an improved semantic representation Solution: project semfeat to a latent space• Images treated as text documents (top 10 concepts)• A text corpus created from private images (Pic+YourAlert)• LDA is applied to create a topic model (30 topics)• 6 privacy-related topics are identified (manually)

A 2nd level semantic representation: semfeat-LDA

Topic Top-5 semfeat concepts assigned to each topicchildren dribbler child godson wimp niecedrinking drinker drunk tipper thinker drunkard

erotic slattern erotic cover-girl maillot backrelatives great-aunt second-cousin grandfather mother great-grandchildvacations seaside vacationer surf-casting casting sandbankwedding groom bride celebrant wedding costume

Page 9: Personalized Privacy-Aware Image Classification

9

semfeat-LDA: more intuitive justifications

children

drinking

erotic

relatives

vacations

wedding

knitwear

young-back

hand-glasscigar-smoker

smoker

drinker

Freudian

1st level semantic representation

2nd level semantic representation

Page 10: Personalized Privacy-Aware Image Classification

10

YourAlert: a realistic benchmark User study• Participants annotate their own photos• Loose guidance allowed adoption of personal privacy notions

• Private “would share only with close OSN friends or not at all”• Public “would share with all OSN friends or even make public”

• Automated extraction and annotation software• Reduced privacy concerns: only features and annotations shared• Users gave their informed consent to use their data

The resulting dataset: YourAlert1 • Stats: 1.5K photos, 27 users, ~16/~40 private/public per user• Main advantages:

• Facilitates realistic evaluation of privacy models• Allows development of personalized models

1Publicly available at: http://mklab.iti.gr/datasets/image-privacy/

Page 11: Personalized Privacy-Aware Image Classification

11

Experimental evaluation Goals• Compare different visual features• Evaluate generic models in a realistic setting• Evaluate personalized and partially personalized models• Gain insights into privacy perceptions via semfeat

Experimental setup• Classifier: regularized logistic regression (LibLinear)• Evaluation measure: Area under ROC (AUC)

Page 12: Personalized Privacy-Aware Image Classification

12

Generic models on PicAlert vs YourAlert

edch bow vlad cnn semfeat0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00PicAlert YourAlert

AUC

perfect

best visual features in [Zerr et al., 2012]

visual features based on aggregation of local

descriptors

deep visual features

semantic visual features based on

cnn

Significantly worse on YourAlert!

random

+20%

Page 13: Personalized Privacy-Aware Image Classification

13

Key findings on generic models Almost perfect performance on PicAlert with cnn• semfeat perform similarly with cnn

Singificantly worse performance on YourAlert• Similar performance for all features

Additional findings• Using more generic training examples does not help• Large variability in performance across users

Page 14: Personalized Privacy-Aware Image Classification

14

Personalized privacy models Evaluation carried out on YourAlert• A modified k-fold cross-validation for unbiased estimates

Personalized model types• ‘user’: only user-specific examples from YourAlert• ‘hybrid’: a mixture of user-specific examples from YourAlert

and generic examples from PicAlert• User-specific examples are weighted higher

Page 15: Personalized Privacy-Aware Image Classification

15

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘user’

Page 16: Personalized Privacy-Aware Image Classification

16

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

𝐷𝑡𝑟𝑎𝑖𝑛𝑢 1

Model type: ‘user’ h𝑢𝑠𝑒𝑟

1

Page 17: Personalized Privacy-Aware Image Classification

17

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

𝐷𝑡𝑟𝑎𝑖𝑛𝑢 2Model type:

‘user’ h𝑢𝑠𝑒𝑟2

Page 18: Personalized Privacy-Aware Image Classification

18

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=1’

Page 19: Personalized Privacy-Aware Image Classification

19

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

𝐷𝑡𝑟𝑎𝑖𝑛𝑢 1

Model type: ‘hybrid w=1’ hh𝑦𝑏𝑟𝑖𝑑𝑤=1

1

Page 20: Personalized Privacy-Aware Image Classification

20

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

𝐷𝑡𝑟𝑎𝑖𝑛𝑢 2Model type:

‘hybrid w=1’ hh𝑦𝑏𝑟𝑖𝑑𝑤=12

Page 21: Personalized Privacy-Aware Image Classification

21

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=2’

Page 22: Personalized Privacy-Aware Image Classification

22

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

𝐷𝑡𝑟𝑎𝑖𝑛𝑢 1

Model type: ‘hybrid w=2’ hh𝑦𝑏𝑟𝑖𝑑𝑤=2

1

Page 23: Personalized Privacy-Aware Image Classification

23

Evaluation of personalized models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

𝐷𝑡𝑟𝑎𝑖𝑛𝑢 2 hh𝑦𝑏𝑟𝑖𝑑𝑤=2

2Model type: ‘hybrid w=2’

Page 24: Personalized Privacy-Aware Image Classification

24

Personalized privacy models

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35vlad semfeat cnn

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90 genericuserhybrid-g w=1hybrid-g w=10hybrid-g w=100hybrid-g w=1000

# user-specific examples / features

AUC

Page 25: Personalized Privacy-Aware Image Classification

25

Key findings on personalized models ‘user’ catches up ‘generic’ with few examples ‘hybrid’ is better than both ‘user’ and ‘generic’• Even with very few user-specific examples• ‘user’ is expected to outperform hybrid with more examples

Weighting user-specific examples higher leads to significantly better performance!

Page 26: Personalized Privacy-Aware Image Classification

26

Privacy insights via semfeat An exploratory analysis• Get insights into the average perception of privacy• Identify deviations from the average perception of privacy

Setup• Build 1 generic and 27 personalized (‘user’) models• Identify 50 most positive and 50 most negative coefficients

Results• Generic

• Interesting Deviations • Alcoholic is private for generic and public for • Tourist is private for and public for generic

child mate son

privateuphill

lakefront waterside

public

Page 27: Personalized Privacy-Aware Image Classification

27

Identifying recurring privacy themes A prototype semfeat-LDA vector for each user• The centroid of the semfeat-LDA vectors of his private images

K-means (k=5) clustering on the prototype vectors

c0: {2,3,19,23,25,26,27} c1: {1,5,6,11,12,13,14,20,21

}

c2: {8,10,17,24} c3: {4,16} c4: {7,9,15,18,22}0.000.020.040.060.080.100.120.140.160.180.20 children

drinkingeroticrelativesvacationswedding

Fact

or L

oadi

ngs

Page 28: Personalized Privacy-Aware Image Classification

28

Future work Predict fine-grained privacy classes• E.g. close-friends, all-friends, friends-of-friends, public

More sophisticated instance sharing strategies• E.g. taking inter-user similarities into account

Adaptation of the semantic vocabulary towards privacy In a larger context • Images are just one piece of the puzzle in users’ privacy

preservation…• Deal with data acquisition and sharing problems

• Collaboration with other groups to conduct larger scale study• Cross-domain collaboration (e.g. legal, social sciences)

• The USEMP1 project is a good example

1 http://www.usemp-project.eu/

Page 29: Personalized Privacy-Aware Image Classification

29

Thank you!

Resources Datasets: http://mklab.iti.gr/datasets/image-privacy Code: https://github.com/MKLab-ITI/image-privacy Contact us

@espyromi / [email protected] @sympap / [email protected] @kompats / [email protected]

http://www.usemp-project.eu/