Top Banner
L8: Introduction to privacy-preserving computations Privacy-preserving Technologies / LTAT.04.007 Dan Bogdanov [email protected]
36

L8: Introduction to privacy-preserving computations

Jan 29, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: L8: Introduction to privacy-preserving computations

L8: Introduction to privacy-preserving computationsPrivacy-preserving Technologies / LTAT.04.007

Dan [email protected]

Page 2: L8: Introduction to privacy-preserving computations

Motivation – why privacy-preserving computing?

Page 3: L8: Introduction to privacy-preserving computations

Example: Linking Tax And Education Data

3

Page 4: L8: Introduction to privacy-preserving computations

Regulatory Barriers On Data Linking

4

Page 5: L8: Introduction to privacy-preserving computations

Using privacy technologies to solve it

Source data:10 million tax records,600 000 education records.

Each record upload using secret sharing(think: “encryption”)Records linked and processed using secure multi-party computation (think: “data not decrypted for processing”) Data never existed outside the source in an unencrypted state.Solution based on Sharemind MPC. Cybernetica

Educationrecords

Employmenttax records

Estonian Information

System's Authority

Ministry of Finance

IT Center

Ministry of Education and Research

Estonian Tax and Customs Board

5

Page 6: L8: Introduction to privacy-preserving computations

6

Page 7: L8: Introduction to privacy-preserving computations

Tax and Customs

Board

Employmenttax payments

Ministry of Education

and Science

Higher studyevents

Monthly income

University career of a person

Aggregate by person

Average yearly income

Aggregate by year

Employment record of a person

Complete record of a person

Merge by person's ID

Analysis table

Compute additional attributes and

align tax payments Extract data

Extract data

Higher studyevents

Secret shareand upload

Employmenttax payments

Expand by years and aggregate by person

Aggregate by month

Data stored with secret sharing andprocessed with secure multi-party computation

Analysis results

?

Analysis results

Recoverresults from

shares

Statisticalanalyst

7

Page 8: L8: Introduction to privacy-preserving computations

Sharemind-powered Analytics

Data scientists used analytics tools based on secure multi-party computation.The MPC system also prevented queries outside the study plan.Reports were given to industry, universities and the government.Result: no clear relation between working during studies and not graduating.

DataAnalyst

UniversitiesCompanies

Policymakers

Cybernetica

Estonian Information

System's Authority

Ministry of Finance

IT Center

8

Page 9: L8: Introduction to privacy-preserving computations

9

A privacy-preserving statistics tool inspired by R

Page 10: L8: Introduction to privacy-preserving computations

10

Page 11: L8: Introduction to privacy-preserving computations

8

2. Tulemused Tulemused kinnitavad, et nominaalajaga lõpetajate osakaal on madal tudengite hulgas üldiselt ja IKT-tudengite hulgas eriti. IKT-tudengite hulgas varieerub nominaalajaga lõpetajate osakaal bakalaureuse-õppes 20 protsendi piirimail, mis on madalam kui muude õppekavade tudengite vastav number (vt Joonis 1). Samasugune tendents ilmneb rakenduskõrgharidusõppe puhul. Magistriõppes on nominaalajaga lõpetajate osakaal veidi kõrgem, varieerudes sõltuvalt aastast 30% ja 40% vahel, kuid siingi on IKT õppurite hulgas see madalam kui teistel.

Joonis 1. Nominaalajaga lõpetajate osakaal immatrikuleerimisaastate lõikes, IKT- ja mitte-IKT õppekavad, bakalaureuseõpe

Meessoost tudengid lõpetavad nominaalaja jooksul väiksema tõenäosusega kui naistudengid (vt Joonis 2, Joonis 3). IKT tudengite madalam nominaalajaga lõpetamise tõenäosus ilmneb mõlema soo puhul.

Non-IT graduation rate is around 40%

IT graduation rate is around 20%

11

Page 12: L8: Introduction to privacy-preserving computations

10

Kooliti on nominaalajaga lõpetavate IKT-tudengite osakaal bakalaureuseõppes kõrgeim Tartu Ülikoolis, järgnevad TTÜ ja TLÜ. Rakenduskõrghariduses on kõrgeim nominaalajaga lõpetajate osakaal TLÜ-s, järgnevad Infotehnoloogia Kolledž ja TTÜ. Magistriõppes on kõrgeim nominaalajaga lõpetajate osakaal TÜ-s, järgnevad TTÜ ja TLÜ (järjestus varieerub aastati). ˇ

Nominaalaja jooksul töötamist vaadates selgub üllatuslikult, et IKT-tudengid ei tööta õpingute ajal rohkem kui mitte-IKT õppekavadel õppivad tudengid. Bakalaureuseõppes on kõigi õppeaastate lõikes enamikul aastatel mitte-IKT õppekavade tudengite hulgas tööhõive määr kõrgem kui IKT-tudengitel (vt Joonis 4). Sama on järeldus rakenduskõrghariduse õppurite osas. Magistriõppes, kus tööhõive määrad ületavad 80%, on aga tulemus vastupidine: IKT-tudengite hulgas on tööhõive määr kõrgem kui mitte-IKT õppekavade õppuritel.

Joonis 4. Nominaalaja jooksul töötanud tudengite osakaal kõigist tudengitest aastati, IKT- ja mitte-IKT õppekavad, bakalaureuseõpe

Naissoost tudengite tööhõive määr on mõnevõrra kõrgem kui meessoost õppuritel, seda nii IKT- kui mitte-IKT tudengite hulgas (Joonis 5, Joonis 6). Soolised erinevused hõivemäärades varieeruvad aastati – on aastaid, kus erinevus on märkimisväärne, ning aastaid, kus olulist erinevust pole.

Non-IT and IT students have similar employment ratios, but IT students lost more in the financial crisis

12

Page 13: L8: Introduction to privacy-preserving computations

Regulatory status of the project

In an official response, after a study of the system, the Estonian DPA suggested that

neither the hosts of the servers running the statisticsnor the analysts making the queriescould feasibly re-identify individuals in the source database (this was pre-GDPR).

The Internal Supervision Department of the Tax and Customs Board agreed to provide unmodified tax records after a code and process review.Follow-up legal review in the FP7 PRACTICE by a research from the University of Göttingen suggested that the same precedent could hold under GDPR as well.

DATA-DRIVEN SERVICES ON CONFIDENTIAL DATA

13

Page 14: L8: Introduction to privacy-preserving computations

A general model for privacy-preserving computing

Page 15: L8: Introduction to privacy-preserving computations

Concept of secure computing

15

encrypteddatabase

standard tools

secure computing

When a standard computerencrypts data, it must bedecrypted before analysis

Secure computing systemscan analyze data without removing the encryption.

Page 16: L8: Introduction to privacy-preserving computations

Extended definition of secure multi-party computation

16

Input parties

IP1

IPk

...

Computing parties

CP1

CPl

x11

xk1

...

x1i

xki

...

x1l

xkl

...

y1

yl

yj...

Result parties

RP1

RPm

x1

xk

y1

ym

Step 1: upload and storage of inputs

Step 3: publishingof results

Step 2: secure

computing

...

Page 17: L8: Introduction to privacy-preserving computations

Technique: property-preserving cryptography

Analogy: symmetric crypto that preserves a relation on inputs (e.g., order, equality).Pros:

Low performance overhead.Fits well into existing systems.

Cons: Only allows a few operations (e.g., only equality comparison or ordering).Multi-user systems are a challenge, but can be done with proxy re-encryption.

17

Page 18: L8: Introduction to privacy-preserving computations

Technique: homomorphic encryption

18

Analogy: asymmetric crypto that allows addition and multiplication of ciphertexts.Pros:

Fits well into existing systems.Cons:

High performance overhead.Multi-user systems are a challenge, but can be done with proxy re-encryption.

Page 19: L8: Introduction to privacy-preserving computations

Analogy: cryptographic versions of electrical circuits.Pros:

Flexible programming model. Cons:

Medium performance overhead.Fixed number of parties (can be solved by combining with other techniques).

Technique: garbled circuits

19

Page 20: L8: Introduction to privacy-preserving computations

Example: millionaire’s problem

20

Page 21: L8: Introduction to privacy-preserving computations

Analogy: give a number of people a random piece of each secret value and let them collaborate to compute results.Pros:

Low-to-medium performance overhead.Flexible programming model.

Cons: Distributed deployments do not fit into all existing systems.

Technique: secret sharing

21

Page 22: L8: Introduction to privacy-preserving computations

Analogy: think of a computer process that hides the data from its ownerPros:

Minimal performance overhead.Relatively easy to convert applications to work on trusted execution environments

Cons: Side-channel attack mitigations are complicated to implement.

Technique: trusted execution environments

Ik

Rn

CSC

22

Page 23: L8: Introduction to privacy-preserving computations

Lecture exercise: modelling parties for a COVID-19 social distancing tracking application

Page 24: L8: Introduction to privacy-preserving computations

Think of an application that would support social distancing and limit infection rates. Write down very clearly, what is the expected benefit of the system.

Write down the list of input parties and the data they would provide.Write down the list of computing parties and describe the kind of processing they would perform. Write down the list of result parties and describe the outputs they would receive.

Bonus tasks, time permitting:Think of minimizing personal data processing using process redesign.See if any of the secure computing paradigms described above could support your application.

Prepare in 12 minutes and then we’ll have 1-2 students present their ideas.

Lecture task

24

Page 25: L8: Introduction to privacy-preserving computations

Programmable privacy-preserving computations

Page 26: L8: Introduction to privacy-preserving computations

A protection domain kind (PDK) is a set of data representations, algorithms and protocols for storing and computing on protected data.Examples:

SMC based on secret sharing,SMC based on garbled circuits,(fully) homomorphic encryption,trusted hardware (e.g., Intel SGX).

PDK as an abstraction of a secure computing paradigm

26

Page 27: L8: Introduction to privacy-preserving computations

A protection domain (PD) is a set of data that is protected with the same resources and for which there is a well-defined set of algorithms and protocols for computing on that data while keeping the protection.Examples:

data held by a fixed group of servers performing secure multi-party computation,data encrypted under a fixed key of a homomorphic encryption scheme.

Protection domain as an instance of a PDK

27

Page 28: L8: Introduction to privacy-preserving computations

Secureprimitiveoperations

Application

• private outputs from private inputs,

• have privacy proofs,• remain private under

sequential or parallel composition,

• optimized to have a low resource footprint.

• publish selected results tomake system useful,

• do not leak private inputs or show leakage as acceptable,

• compositions of secureprimitive operations,

• optimize for runningtime.

Privacy-preservingalgorithms

Application logic

Application model for privacy-preserving computing

28

Page 29: L8: Introduction to privacy-preserving computations

We pick frequent itemset mining as a problem of choice.Frequent itemset mining is a data mining problem that helps with shopping basket analysis and the simplest kinds of recommender systems.

What kind of things do people buy from stores together most often?If the service provider knows this, they can recommend one to a customer who is planning to buy the other.

The simpler algorithms include Apriori (breadth-first search) and Eclat (depth-first search).We will know look at the basic primitive of frequent itemset mining and then build a privacy-preserving approach.

Converting an algorithm to a privacy-preserving one

29

Page 30: L8: Introduction to privacy-preserving computations

t1 rendang nasi lemak chicken satay

t2 nasi lemak lontong

t3 chicken satay

t4 rendang nasi lemak

t5 nasi lemak chicken satay

t6 nasi lemak chicken satay

t7 lontong

rendang nasi lemak lontong chicken

satay

t1 1 1 0 1t2 0 1 1 0t3 0 0 0 1t4 1 1 0 0t5 0 1 0 1t6 0 1 0 1t7 0 0 1 0

Private data representations are the key towarddesaigning privacy-preserving algorithms.

Privacy-preserving data representations

30

Page 31: L8: Introduction to privacy-preserving computations

The data representation allows for very efficientcalculation of item supports.

rendang nasi lemak lontong chicken

satay

t1 1 1 0 1t2 0 1 1 0t3 0 0 0 1t4 1 1 0 0t5 0 1 0 1t6 0 1 0 1t7 0 0 1 0

nasi lemak

1101110

∑ = 5

Calculating the support of an item

31

Page 32: L8: Introduction to privacy-preserving computations

Checking the joint support of a pair of itemssimply requires a multiplication

chicken satay

1010110

rendang nasi lemak lontong chicken

satay

t1 1 1 0 1t2 0 1 1 0t3 0 0 0 1t4 1 1 0 0t5 0 1 0 1t6 0 1 0 1t7 0 0 1 0

nasi lemak

1101110

∑ = 3

xxxxxxx

=======

nasi lemak & chicken satay

1000110

Calculating support for a set of items

32

Page 33: L8: Introduction to privacy-preserving computations

Depth-first search would be intuitive for pruning.

{ rendang } { nasi lemak } { lontong } { chicken satay }

rendang, nasi lemak{ } rendang,

lontong{ } rendang, chicken satay{ } nasi lemak,

lontong{ } nasi lemak, chicken satay{ } lontong,

chicken satay{ }rendang, nasi lemak,lontong{ } rendang,

nasi lemak,chicken satay{ } nasi lemak,

lontongchicken satay{ }rendang,

lontong,chicken satay{ }

rendang, nasi lemak,lontongchicken satay{ }

33

Evaluating itemsets with a depth-first strategy

Page 34: L8: Introduction to privacy-preserving computations

However, breadth-first search can be done in parallel.

{ rendang } { nasi lemak } { lontong } { chicken satay }

rendang, nasi lemak,lontong{ } rendang,

nasi lemak,chicken satay{ } nasi lemak,

lontongchicken satay{ }rendang,

lontong,chicken satay{ }

rendang, nasi lemak,lontongchicken satay{ }

rendang, nasi lemak{ } rendang,

lontong{ } rendang, chicken satay{ } nasi lemak,

lontong{ } nasi lemak, chicken satay{ } lontong,

chicken satay{ }

Evaluating itemsets with a breadth-first strategy

34

Page 35: L8: Introduction to privacy-preserving computations

Challenge: exploring all possible itemsets leads is slow due to combinatorial explosion.Pruning the search tree requires us to declassify itemset supports during computation (leak?).Solution: consider that the algorithm will publish all frequent itemsets, as that is its intended goal.We will compare support to the threshold privately, only declassifying the result bit.We will prune the search tree based on that bit.Not a leak - if the itemset is frequent, we would have learned it from the outputs anyway.

Balancing optimizations with privacy preservation

35

Page 36: L8: Introduction to privacy-preserving computations