Intelligent Database Systems Presenter : JIAN-REN CHEN Authors : Cihan Kaleli, Huseyin Polat 2012 , KBS Privacy-preserving SOM-based recommendations on horizontally distributed data 1
Feb 24, 2016
Intelligent Database Systems Lab
Presenter : JIAN-REN CHEN
Authors : Cihan Kaleli, Huseyin Polat
2012 , KBS
Privacy-preserving SOM-based recommendations on horizontally
distributed data
1
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyPrivacy analysisExperimentsConclusionsComments
2
Intelligent Database Systems Lab
Motivation• Collaborative Filtering (CF) systems are used to
suggest web pages. limited number of users’ data -> lack of accuracy-> Cold Start Problem
• Horizontally partitioned among multiple vendors
3
Intelligent Database Systems Lab
Objectives• Those companies holding inadequate number of users’
data might decide to combine their data. accurate predictions Performance
• Privacy-preserving scheme
4
Intelligent Database Systems Lab
Methodology
Privacy-preserving SOM clustering on horizontally
distributed data
Privacy-preserving k-nn-based predictions on horizontally
distributed data
a. Off-linei. Cluster users’ data distributed among multiple parties using SOM while preserving data owners’ privacy.ii. Compute aggregate data values required for recommendation estimations.
b. Onlinei. Determine a’s cluster.ii. Estimate prediction after receiving required aggregate data from other parties. Return the referral to a.
5
Intelligent Database Systems Lab
SOM clustering
k-nn-based collaborative filtering
MethodologyDetermine values of initial constants:
Find the winning Kohonen layer neuron:
Update the weight vectors of all neurons:
6
Intelligent Database Systems Lab
MethodologyPearson correlation coefficient:
The prediction for a on q:
SOM clustering
k-nn-based collaborative filtering
7
Intelligent Database Systems Lab
Privacy-preserving SOM clustering on
horizontally distributed data
Privacy-preserving k-nn-based predictions
on horizontallydistributed data
Methodology
8
1. number of clusters2. sequence of active party
Determine values of initial constants
SOM
1. all users it holds are assigned to a cluster2. updated Wj vectors to the second party
1. the next party repeats step 22. sends new updated Wj vectors to the next party
The last party sends the updated Wj vectors tothe IP
Intelligent Database Systems Lab
MethodologyPrivacy-preserving SOM clustering on
horizontally distributed data
Privacy-preserving k-nn-based predictions
on horizontallydistributed data
among C parties, P can be written
paq = va + P, where P is:
choose j percent of the users who did not rate q, where j in (0,)
choose j percent of their zuj values, remove their values, and replace with zero, wherej in(0,].
9
Intelligent Database Systems Lab
• Attacks and Vulnerabilities:1) A1 : Parties can coalesce for capturing a target
party’s data2) A2 : Paying-off3) V1 : Not able to return any result4) V2 : Missing values in aggregate values vector
Privacy analysis
10
Intelligent Database Systems Lab
Experiments
• Data sets
11
Intelligent Database Systems Lab
Experiments
12
Intelligent Database Systems Lab
Experiments
13
Intelligent Database Systems Lab
Experiments
14
Intelligent Database Systems Lab
Conclusions• Integrating split data significantly improves
preciseness.
• Although privacy concerns make accuracy worse,
accuracy losses are smaller than the accuracy gains
due to collaboration.
15
Intelligent Database Systems Lab
Comments• Advantages– accuracy, performance, and privacy
• Disadvantage– cost, accuracy
• Applications– Collaborative Filtering– Privacy-preserving scheme
16