Top Banner
Security in Outsourced Association Rule Mining
52

Security in Outsourced Association Rule Mining. Agenda Introduction Approximate randomized technique Encryption Summary and future work.

Jan 18, 2016

Download

Documents

Josephine Poole
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Security in Outsourced Association Rule Mining

Page 2: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Agenda

Introduction Approximate randomized technique Encryption Summary and future work

Page 3: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Introduction

Data mining in company know about the past activities of their

customers make strategic decisions

Types of data mining Association rules mining Clustering Classification

Page 4: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Association rules

“X => Y” If a transaction contains itemset X, the

transaction will probably contain itemset Y

Support: number of supporting transactions

Confidence: proportion of transactions containing X which also contains Y

Page 5: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Performing data mining

Build application Development cost? Time?

Buy software Fit requirements? Maintenance?

Outsource

Page 6: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Concerns in outsourcing

Output Execution Assurance Correctness

Security Privacy of records Information of the company

Company

DB

Data Miner

Page 7: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Approximate randomized technique

Page 8: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Approximate solution

Privacy Preserving Mining of Association Rules SIGKDD 2002 Authors: Alexandre Evfimievski,

Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke

Page 9: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Problem formulation

Let the set of transactions be T = {t1, t2, … tN}

Transform T to T’ = {t’1, t’2, … t’N} Mine in T’ Privacy breaches

Itemset A cause a privacy breach of level p if for some item a in A

P[a in ti|A in t’i] >= p

Page 10: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Select-a-size randomization

For each transaction ti in T m = length of ti

Select (non-uniformly) randomly an integer j from [0, m]

Copy uniformly at random j items in ti to t’i

Consider every item a not in ti, add a to t’i with a given probability pm

Page 11: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Run on real data

Privacy breach of level <= 50% P[a in ti|A in t’i] <= 50%

Accuracy = # true positive / (# found itemsets)

Set 1

Itemset Size

True Itemset

TruePositive

False Drops

False Positive

Accuracy

1 65 65 0 0 100%

2 228 212 16 28 88%

3 22 18 4 5 78%

Page 12: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Accuracy

Set 2:

Itemset Size

True Itemset

TruePositive

False Drops

False Positive

Accuracy

1 266 254 12 31 89%

2 217 195 22 45 81%

3 48 43 5 26 62%

Page 13: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Problems

Estimated counts of large itemsets varies Lower accuracy of association rules

"beer and diaper" story customers who buy diapers tend also to buy

beer hard to believe some strange rules

Expensive to make wrong decision Supermarket: layout design Health center: identify new disease

Page 14: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Security concerns

Individual transaction is protected Private association rules can be

estimated by other parties Adversary actions may be based on

found association rules

Page 15: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Encryption

Page 16: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Problem formulation

Let the set of transactions be T = {t1, t2, … tN}

I is the entire set of items All ti is a subset of I

Transform T to T’ = {t’1, t’2, … t’N} A third party mines in T’ and gets

AR’ Transform AR’ to AR

Page 17: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Architecture

DB DBTransformer

AssociationRules

AssociationRules

Mappings

Page 18: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Encryption

To protect a message, simple encryption can be applied “GOOD DOG” can be encrypted as “PLLX XLP”

Association rule encryption 752 => 891? Milk => Bread

Transaction encryption <8, 69, 153, 756>? <Cheese, Fork, Ice-cream, Clock>

Page 19: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Simple scheme

Encryption For every transaction ti

For every item x in ti

Add f(x) to t’i where f is a bi-jective function

Decryption For every association rule ri

For every item y in r Replace y by f-1(y)

Page 20: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Problems with simple encryption

They are easy to crack “PLLX XLP”

26P3 combinations, with at least one vowel

Association rules # Bread > # Car

# association rules, # large itemsets are disclosed

Solution Use a more complex scheme

Page 21: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Fake items

Probability to make a correct guess of a single mapping = 1 / |I|

Randomly add some fake items to each transaction Decrease the above probability to 1 / (|

I| + |F|)

Page 22: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

One-to-n Mapping

Originally, we are “one-to-one” mapping One item One item A 1 B 2 C 3

We form “one-to-n” mapping A 1, 4, 5 B 2 C 3, 5 Greatly increase the number of possible

mapping of an item

|I|+|F|C1 + |I|+|F|C2 + … |I|+|F|C|F|

Page 23: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Example transformation

T = {A} {B} {C} {A, B} {A, C} {B, C} {A, B, C}

T’ = {1, 4, 5} {2} {3, 5} {1, 2, 4, 5} {1, 3, 5} {2, 3, 5} {1, 2, 3, 4, 5}

A 1, 4, 5B 2C 3, 5

Page 24: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Limitation on the mapping f

For any item x, there does not exist items y1, y2, …, yk (x ≠ y1 ≠ … ≠ yk ) Such that f(x) subset in f(y1) U f(y2) U…f(yk)

Consider an example A 1, 2 B 2, 3 C 3, 4 AC 1, 2, 3, 4 ABC 1, 2, 3, 4

Page 25: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Limitation on the mapping f

For any item x f(x) – Ui != x, i in I f(i) != empty

Every item must map to something unique

Page 26: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Mapping generation – Item Extend

Initialize every item to map to something unique I’

For every item x in IE Randomly pick some mappings Extend each mapping by x

Page 27: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Example run

A 1 B 2 C 3 IE = {4, 5}

Page 28: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Considering item 4

A 1 B 2 C 3

A 1, 4 B 2 C 3

Pick A

Page 29: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Considering item 5

A 1 B 2 C 3

A 1, 4, 5 B 2 C 3, 5

Pick A, C

Page 30: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Item Extend

Every item must map to something unique Say 1 is unique to f(A)

suppT(A) = suppT’(1) For a transaction t without item A

Add a subset of unique mapping set to t’ with some probability

{1, 4} is unique mapping set in f(A) {}, {1}, {4}, {1, 4} may be added

A 1, 4, 5B 2C 3, 5

Page 31: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Fake items again

Now, every item in t’i must be in some mappings

Randomly add some fake items in |F| to each transaction

Mapping f: I -> |I’| U |IE| U |F| |I’|: core “unique” items |IE|: expanding items |F|: fake items

Page 32: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Basic transformation framework

For each transaction t For each item x in t

Add f(x) to t’ For item i in I - t

Add randomly subset of unique mapping set of f(i) to t’

For item f in F Toss a biased coin for each item, add f to

t’ if head (probability should be difference)

Page 33: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Recovering association rules

Given an encrypted rule in AR’ r’: X => Y

If there exists i1, i2, …, im in I Uk=1

m f(ik) = X And there exists j1, j2, …, jn in I

Uk=1n f(jk) = XUY

r: {i1, i2, … im} => {j1, j2, …, jn} – {i1, i2, … im} is a rule in AR

Otherwise, the rule is not correct

Page 34: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Example

Given 1 => 4 (rejected) 2 => 1, 5 (rejected) 2 => 1, 3, 5 (rejected) 2 => 1, 3, 4, 5 (B => AC) 2, 3, 5 => 1, 4 (BC => A)

2, 3, 5 => BC 1, 2, 3, 4, 5 => ABC

Mapping fA 1, 4, 5B 2C 3, 5

Page 35: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Correctness

Proposition For any item x, y, f is transformation

mapping suppT(x) = suppT’(f(x)) suppT(xUy) = suppT’(f(x) U f(y))

For any itemset X, Y, F is the transformation mapping

suppT(X) = suppT’(F(X)) suppT(XUY) = suppT’(F(X) U F(Y))

No false drops and false positives

Page 36: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Summary

Generation of mappings One-to-n mappings Item Extend

Transformation of transactions Mapping f(x) Subsets of unique mapping set Fake items

Recovering association rules Reverse mappings and filtering

Page 37: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Test run

# Items = 1k, |T| = 1k Without transformation

One rule Time: 8s

Item Extend 147 rules Total times: 26s Mappings generation and

transformation: 219ms

Page 38: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Future Work

Define parameters to the problem Size of |IE| Size of |F|

Give a clear measure of security Give a clear measure of overhead Correctness of association rules

Query execution proof Result verification

Page 39: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

The End

Page 40: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Choosing probability

Uniform distribution or any fixed distribution give patterns which may be easily identified

Random probability distribution {}: 70%, {1}: 5%, {4}: 15%, {1, 4}:

20% Storage: need additional storage

Back

Page 41: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Algorithm for transformation

Transformation is the most costly process

Execution time linear to database size |T|

Should be as fast as possible

Page 42: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Optimization

Mapping Retrieval For an item x, use a hash table to retrieve the

mapping, h(x) Adding fake items

First randomly (according to the probability of adding items) determine the number of items to add

Randomly pick in the set (non-uniform distribution)

Gives a much shorter runtime in average

Page 43: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Choice of mapped items

1 2 … |I|+|IE|+|F| * (1+ δ)

Acceptable as long as it is not easy to identify I’, IE, F

One way is to use random permutation of first |I| + |IE| + |F| natural numbers

First |I| numbers are mapped to |I’| Next |IE| numbers are IE

Page 44: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Cut and paste randomization

One case of select-a-size randomization The way to perform selection of j

Given an integer Km > 0

Randomly choose j in [0, Km] If (j > m)

Set j = m

Overall input parameters Km

pm

Page 45: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Effects on support

Support of A in T’ A in t, without replaced A’ in t, randomly add A

Support of AB in T’ AB in t, without replaced A and B AB’ in t, randomly add B A’B in t, randomly add A A’B’ in t, randomly add A and B

Page 46: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Estimating original support

Support of A in T, x Support of A in T’, y x * P(A remains in original transaction)

+ (|DB| - x) * pm = y

Support of AB in T Support of AB in T’ Support of AB’, A’B in T’ Support of A’B’ in T’

Page 47: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Apriori property

Suppose m = 2 for all t in T |T| = 10, |I| = {A, B} pm= 0, j = 1, Support of B in T’ suppT’ (B)= 0

E(suppT(B)) = 0 suppT’ (A)= 10 suppT’ (AB)= 0 E(suppT(AB)) = suppT’ (A) * 1 = 10

Page 48: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Apriori property

An expected large itemset may have an expected small sub-set

But generally the support of subsets are not too small

Instead of using the support threshold to filter all small candidates, use a smaller value

Page 49: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Apriori algorithm

Generate candidate sets Scan database for counts Recover the predicted support Discard candidates with support

smaller than <= candidate limit Save for output candidates with

support >= support threshold Apriori_gen(remaining candidate)

Page 50: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Candidate limit

A high value Increase numbers of false drops Poor correctness

A small value Increase number of candidate sets High running time

Experiment Support threshold: smin

estimated s.d.: δ smin – δ is found to be a good value

Page 51: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Other applications

Outsourced transaction database (secure) storage

Outsourced association rule mining using data stream

Secure distributed association rule mining with third party miner

Page 52: Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

Outsourced database with association rule mining service

DB

Transformer

AssociationRules

AssociationRules

Mappings

Transactions

Query