Top Banner
Cryptographic methods for privacy aware computing: applications
23

Cryptographic methods for privacy aware computing: applications.

Dec 13, 2015

Download

Documents

Madeline Taylor
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cryptographic methods for privacy aware computing: applications.

Cryptographic methods for privacy aware computing: applications

Page 2: Cryptographic methods for privacy aware computing: applications.

Outline Review: three basic methods Two applications

Distributed decision tree with horizontally partitioned data

Distributed k-means with vertically partitioned data

Page 3: Cryptographic methods for privacy aware computing: applications.

Three basic methods 1-out-K Oblivious Transfer Random share Homomorphic encryption

* Cost is the major concern

Page 4: Cryptographic methods for privacy aware computing: applications.

Two example protocols The basic idea is

Do not release original data Exchange intermediate result

Applying the three basic methods to securely combine them

Page 5: Cryptographic methods for privacy aware computing: applications.

Building decision trees over horizontally partitioned data Horizontally partitioned data

Each party owns a number of records

Entropy-based information gain The classical metric for training decision

tree (ID3 algorithm)

Major ideas in the protocol

Page 6: Cryptographic methods for privacy aware computing: applications.

Horizontally Partitioned Data Table with key and r set of attributes

key X1…Xd

key X1…Xd

Site 1

key X1…Xd

Site 2

key X1…Xd

Site r…

K1k2

kn

K1k2

ki

Ki+1

ki+2

kj

Km+1

km+2

kn

Page 7: Cryptographic methods for privacy aware computing: applications.

Review decision tree algorithm (ID3 algorithm) Find the cut that maximizes gain

certain attribute Ai, sorted v1…vn Certain value in the attribute

For categorical data we use Ai=vi For numerical data we use Ai<vi

Ai label

v1v2

vn

l1l2

ln

cutE(): Entropy of label distribution

2

1

)()()(i

ii SEN

nSEcutGain

Choose the attribute/value that gives the highest gain!

Ai<vi?yes no

Aj<vj? …

Page 8: Cryptographic methods for privacy aware computing: applications.

Key points Calculating entropy

Ai label

v1v2

vn

l1l2

ln

cut

)()(

log)(log)()(Slabelv

vv

Slabelv n

n

n

nvPvPSEntropy

The key is calculating x log x, where x is the sum of values from the two partiesP1 and P2 , i.e., x1 and x2, respectively

-decomposed to several steps-Each step each party knows only a random share of the result

Page 9: Cryptographic methods for privacy aware computing: applications.

stepsStep1: compute shares for xlnx = (u1+u2)(v1+v2) * a major protocol is used to compute ln x = v1 + v2

Step 2: for a condition Ai<vi, find the random shares for E(S), E(S1) and E(S2) respectively.

Step3: repeat step1&2 to all possible (Ai, vi) pairs

Step4: a circuit gate to determine which (Ai, vi) pair results in maximum gain. (Yao’s protocol)

Page 10: Cryptographic methods for privacy aware computing: applications.

2. K-means over vertically partitioned data

Vertically partitioned data Normal K-means algorithm

Applying secure sum and secure comparison among multi-sites in the secure distributed algorithm

Page 11: Cryptographic methods for privacy aware computing: applications.

Vertically Partitioned Data Table with key and r set of attributes

key X1…Xi Xi+1…Xj … Xm+1…Xd

key X1…Xi

Site 1

key Xi+1…Xj

Site 2

key Xm+1…Xd

Site r…

Page 12: Cryptographic methods for privacy aware computing: applications.

Motivation Naïve approach: send all data to a

trusted site and do k-mean clustering there Costly Trusted third party?

Preferable: distributed privacy preserving k-means

Page 13: Cryptographic methods for privacy aware computing: applications.

Basic K-means algorithm 4 main steps:

step1.Randomly select k initial cluster centers (k means)

repeat

step 2. Assign any point i to its closest cluster center

step 3. Recalculate the k means with the new point assignment

Until step 4. the k means do not change

Page 14: Cryptographic methods for privacy aware computing: applications.

Distributed k-means Why k-means can be done over

vertically partitioned data Distance calculation is decomposable All of the 4 steps are decomposable The most costly part (step 2 and 3) can

be done locally We will focus on the step 2 (Assign any

point i to its closest cluster center)

Page 15: Cryptographic methods for privacy aware computing: applications.

step 1 All sites share the index of the initial

random k records as the centroids

µ11 … µ1i

Site 1 Site 2 Site r…

µk1 … µki

µ1i+1 … µ1j

µki+1 … µkj

µ1m …µ1d

µkm … µkd

µ1

µk

Page 16: Cryptographic methods for privacy aware computing: applications.

Step 2: Assign any point x to its closest cluster

center1. Calculate distance of point X (X1, X2, … Xd) to each

cluster center µk

-- each distance calculation is decomposable! d2 = [(X1- µk1)2 +… (Xi- µki)2] + [(Xi+1- µki+1)2 +… (Xj- µkj)2] + …

2. Compare the k full distances to find the minimum one

Site1 site2

Partial distances: d1 + d2 + …

For each X, each site has a k-element vector that is the result for thepartial distance to the k centroids, notated as Xi

Page 17: Cryptographic methods for privacy aware computing: applications.

Privacy concerns for step 2 Some concerns:

Partial distances d1, d2 … may breach privacy (the Xi and µki ) – need to hide it

distance of a point to each cluster may breach privacy – need to hide it

Basic ideas to ensure security Disguise the partial distances Compare distances so that only the comparison result

is learned Permute the order of clusters so the real meaning of

the comparison results is unknown. Need at least 3 non-colluding sites (P1, P2, Pr)

Page 18: Cryptographic methods for privacy aware computing: applications.

Secure Computing of Step 2 Stage1: prepare for secure sum of partial

distances p1 generates V1+V2 + …Vr = 0, Vi is random k-element vector,

used to hide the partial distance for site i

P1 uses “Homomorphic encryption” to do randomization: (paillier encryption)Ei(Xi)Ei(Vi) = Ei(Xi+Vi)

Stage2: calculate secure sum for r-1 parties P1, P3, P4… Pr-1 send their perturbed and

permuted partial distances to Pr Pr sums up the r-1 partial distances (including its

own part)

Page 19: Cryptographic methods for privacy aware computing: applications.

Secure Computing of Step 2

* Xi contains the partial distances to the k partial centroids at site i* Ei(Xi)Ei(Vi) = Ei(Xi+Vi) : Homomorphic encryption, Ei is public key* (Xi) : permutation function, perturb the order of elements in Xi* V1+V2 + …Vr = 0, Vi is used to hide the partial distances

Stage 1 Stage 2

Page 20: Cryptographic methods for privacy aware computing: applications.

Stage 3: secure_add_and_compare to find the minimum distance Involves only Pr and P2

Use a standard Secure Multiparty Computation protocol to find the result

Stage 4:

the index of minimum distance (permuted cluster id) is sent back to P1.

P1 knows the permutation function thus knows the original cluster id.

P1 broadcasts the cluster id to all parties.

212

2 m

r

imi

r

illi xxxx

K-1 comparisons:

Page 21: Cryptographic methods for privacy aware computing: applications.

Step 3: can be done locally Update partial means µi locally

according to the new cluster assignments.

X11 … X1i

Site 1 Site 2 Site r…

Xn1 … Xni

X1i+1 … X1j

Xni+1 … Xnj

X1m …X1d

Xnm … Xnd

Cluster 2

Cluster k

Cluster labels

X21 … X2iCluster k

Page 22: Cryptographic methods for privacy aware computing: applications.

Extra communication cost O(nrk)

n : # of records r: # of parties k: # of means

Also depends on # of iterations

Page 23: Cryptographic methods for privacy aware computing: applications.

Conclusion It is appealing to have cryptographic

privacy preserving protocols The cost is the major concern

It can be reduced using novel algorithms