Top Banner
Privacy Preserving K- means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton
17

Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Privacy Preserving K-means Clustering on Vertically

Partitioned Data

Presented by: Jaideep Vaidya

Joint work: Prof. Chris Clifton

Page 2: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Overview

• Global Problem– Privacy Preserving Distributed Data Mining

• Specific Problem– Clustering (K-Means)

• For– Vertically Partitioned Data

• Using– Cryptographic Tools

Page 3: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Medical Records

RPJ Yes Diabetic

CAC No Tumor No

PTR No Tumor Diabetic

Cell Phone Data

RPJ 5210 Li/Ion

CAC none none

PTR 3650 NiCd

Global Database ViewTID Brain Tumor? Diabetes? Model Battery

Vertical Partitioning of Data

Page 4: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Is the problem trivial?

Page 5: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Privacy Preserving Data Mining

• Perturbation– Agrawal & Srikant, Agrawal & Aggarwal, – Rizvi & Haritsa, Evfimievski et al.

• Cryptographic– Lindell & Pinkas, Du & Zhan– Vaidya & Clifton, Kantarcioglu & Clifton

Page 6: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Secure Multiparty Computation (SMC)

• Given a function f and n inputs, distributed at n sites, compute

the result

while revealing nothing to any site except its own input(s) and the result.

xxx n,...,,

21

nxxxfy ,,, 21

Page 7: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Results

• Cluster assignment for entities– Not private

• Cluster centers– Semi-private

2.3 34 19 15.5 5210 Li/Ion Piezo

Page 8: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Secure K-means clustering

Arbitrarily select k starting points

Repeat– Assign to respectively– (re)assign each object to closest cluster

based on distance from mean– Re-compute the cluster means

Until no change

''2

'1 ,,, k

k ,,, 21 ''2

'1 ,,, k

''2

'1 ,,, k

K-means clustering

Page 9: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Assigning objects to closest cluster

k

i

r

D

PPP

O,

O,ity object/entevery For

j

2

1

21

rj

ijki

x 11

minarg Compute

Page 10: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Key Idea

• Disguise site components with random values

• Compare distances while revealing only comparison result

• Permute order of clusters to conceal meaning of comparison results

Page 11: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Closest Cluster Computation

• 3 special sites, P1, P2 and Pr

• P1 generates

– r random vectors such that– Permutation π (over 1 .. K)

iV 01

r

iiV

Page 12: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Permutation ProtocolDu and Atallah ’01

A B,

V

X

EXE ),(

))((

VXE

Homomorphic encryption: Ek(x)*Ek(y) = Ek(x+y)

)(

VX

Page 13: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Closest Cluster Computation

P1

P2

,

V i

2X222 ),( EXE

))(( 222

VXE

Pr

rX

rrr EXE ),(

))((

rrr VXE

Stage 1

P1

Pr-1

P3

Pr

)( 33

VX

)( 11

VX

)( 11

rr VX

Stage 2

2i

ii VX

Page 14: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Closest Cluster Computation

• Stage 3– P2 and Pr determine i, the index of the cluster

with minimum distance

• Stage 4– P1 computes and broadcasts i1

Page 15: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

When to stop?

• Locally compute difference in means

• Globally known threshold

• Use simple random-adding technique to disguise actual values– First party adds random value to its distance and

sends to next party– Each party adds its value to total and sends on– Last party compares with first party’s random

+threshold

Page 16: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Communication Cost

• r parties, n data elements, m bit distances

Bits Rounds

Basic Algorithm

O(knr) O(r+k)

Optimized Algorithm

O(kmr) O(r)

Generic Method

O(kmnr3) 1

Non-Secure Method

O(n) 1

Page 17: Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Conclusion

• Presented a solution for Privacy Preserving K-Means Clustering problem

• How to use clusters?

• Will parties share required information for the possible benefits?

• Improve Efficiency

• Working on EM-Clustering, implementations