Presenter : Jian-Ren Chen Authors : Cihan Kaleli , Huseyin Polat 2012 , KBS

Intelligent Database Systems Lab

Presenter : JIAN-REN CHEN

Authors : Cihan Kaleli, Huseyin Polat

2012 , KBS

Privacy-preserving SOM-based recommendations on horizontally

distributed data

1


OutlinesMotivationObjectivesMethodologyPrivacy analysisExperimentsConclusionsComments

2


Motivation• Collaborative Filtering (CF) systems are used to

suggest web pages. limited number of users’ data -> lack of accuracy-> Cold Start Problem

• Horizontally partitioned among multiple vendors

3

http://www.bridgewell.com/recommendation%20Engine.html


Objectives• Those companies holding inadequate number of users’

data might decide to combine their data. accurate predictions Performance

• Privacy-preserving scheme

4


Methodology

Privacy-preserving SOM clustering on horizontally

distributed data

Privacy-preserving k-nn-based predictions on horizontally

distributed data

a. Off-linei. Cluster users’ data distributed among multiple parties using SOM while preserving data owners’ privacy.ii. Compute aggregate data values required for recommendation estimations.

b. Onlinei. Determine a’s cluster.ii. Estimate prediction after receiving required aggregate data from other parties. Return the referral to a.

5


SOM clustering

k-nn-based collaborative filtering

MethodologyDetermine values of initial constants:

Find the winning Kohonen layer neuron:

Update the weight vectors of all neurons:

6


MethodologyPearson correlation coefficient:

The prediction for a on q:

SOM clustering

k-nn-based collaborative filtering

7


Privacy-preserving SOM clustering on

horizontally distributed data

Privacy-preserving k-nn-based predictions

on horizontallydistributed data

Methodology

8

1. number of clusters2. sequence of active party

Determine values of initial constants

SOM

1. all users it holds are assigned to a cluster2. updated Wj vectors to the second party

1. the next party repeats step 22. sends new updated Wj vectors to the next party

The last party sends the updated Wj vectors tothe IP


MethodologyPrivacy-preserving SOM clustering on

horizontally distributed data

Privacy-preserving k-nn-based predictions

on horizontallydistributed data

among C parties, P can be written

paq = va + P, where P is:

choose j percent of the users who did not rate q, where j in (0,)

choose j percent of their zuj values, remove their values, and replace with zero, wherej in(0,].

9


• Attacks and Vulnerabilities:1) A1 : Parties can coalesce for capturing a target

party’s data2) A2 : Paying-off3) V1 : Not able to return any result4) V2 : Missing values in aggregate values vector

Privacy analysis

10


Experiments

• Data sets

11


Experiments

12


Experiments

13


Experiments

14


Conclusions• Integrating split data significantly improves

preciseness.

• Although privacy concerns make accuracy worse,

accuracy losses are smaller than the accuracy gains

due to collaboration.

15


Comments• Advantages– accuracy, performance, and privacy

• Disadvantage– cost, accuracy

• Applications– Collaborative Filtering– Privacy-preserving scheme

16

Presenter : Jian-Ren Chen Authors : Cihan Kaleli , Huseyin Polat 2012 , KBS

Documents

aggregate data values

nnbased predictions

data owners privacy

cluster users data

sombased recommendations

distributed dataprivacy

kbs privacy

privacy concerns