1 Towards an end-to- Towards an end-to- end architecture for end architecture for handling sensitive handling sensitive data data Hector Garcia-Molina Rajeev Motwani and students
Dec 22, 2015
1
Towards an end-to-end Towards an end-to-end architecture forarchitecture for
handling sensitive datahandling sensitive data
Hector Garcia-Molina
Rajeev Motwani
and students
2
DB Perspective
• Performance
• Preservation
• Distribution (P2P)
• Bad Guys:
• eavesdrop
• corrupt
• Trust
3
DB Perspective
• Preservation
privacy +-
preservation
+
- easy
easy
goal
4
Privacy Spectrum
• Prevention
• Detection
• Containment
5
Prevention: Our Work
• Privacy-Preserving OLAP
• Distributed Architecture for Secure DBMS (P)
• Data Preservation in P2P Systems
• P2P Trust and Reputation Management (P)
• P2P Privacy Preserving Indexing (P)
6
Distributed Architecturefor Secure DBMS
• Motivation: Outsourcing– Secure Database Provider (SDP)
EncryptClient Service
Provider
7
Performance Problem
EncryptClient
Client-side
Processor
Query Q Q’
“Relevant Data”
Answer
Problem: Q’ “SELECT *”
ServiceProvider
8
The Power of Two
Client DSP1
DSP2
9
Basic Idea
{ CC#, expDate, name }
{ expDate, name }
{ CC# }
10
Another Example
{ salary }
{ rand }
{ salary + rand }
11
The Power of Two
DSP1
DSP2
Client-side
Processor
Query QQ1
Q2
Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)
12
Challenges
• Find a decomposition that– Obeys all privacy constraints– Minimizes execution cost for given workload
• For given query, find good plan
13
Example
R(id, a, b, c), privacy constraint: { a, b, c }
R1(id, a)R2(id, b, c)
R1(id, a, b)R2(id, c)
R1(id, a, b)R2(id, b, c)
R1(id, a, c)R2(id, b, c)
…
Most popular queries:• Select on a, b• Select on b, c
R1(id, a, b)R2(id, b, c)
14
Detection: Our Work
• Simulatable Auditing (P)
• k-Anonymity– algorithms and hardness
15
Containment: Our Work
• Paranoid Platform for Privacy Preferences (P)
• Entity Resolution
16
Containment
• Trusting– privacy policies
• Paranoid
17
Example: Trusting
alicedealsRus
(1) browse policy
(2) give info
(3) cross fingers
• Example P3P Policies:– Current purpose: completion and support of the recurring
subscription activity
– Recipients: DealsRUs and/or entities acting as their agents or entities for whom DealsRUs are acting as an agent...
18
Example: Email
alicea@z dealsRus
(1) temp a12@w
alice’sagent
(2) a12@w
(3) To:a12@w(4) To: a@z
P4P: Paranoid Platformfor Privacy Preferences
Framework
Data/Control Types: t1 ... tn
API API
Strategy/Reference
Implementation
20
Private Information
ownership
function
cont
rol
individual
organization
complete privacy
limited time use
no predicate input
no integration
accountable
sharable
identifier
service handle
input to predicate
copy
21
Entity Resolution
N: a A: b CC#: c Ph: e
e1
N: a Exp: d Ph: e
e2
• Applications:– mailing lists, customer files, counter-terrorism, ...
22
Privacy
Nm: AliceAd: 32 FoxPh: 5551212
1.0
Nm: AliceAd: 32 FoxPh: 5551212
1.0Nm: AliceAd: 32 Fox
1.0Nm: AliceAd: 32 FoxPh: 5551212
0.7Nm: AliceAd: 32 FoxPh: 5551212Ad: 14 Cat
1.0
Bob
Alice
23
Leakage
Nm: AliceAd: 32 FoxPh: 5551212
1.0
Nm: AliceAd: 32 FoxPh: 5550000
0.7
Bob
Alice
L = 0.6 (between 0 and 1)
24
Multi-Record Leakage
Nm: AliceAd: 32 FoxPh: 5551212
1.0
Bob
Alice
LL = 0.9 (between 0 and 1, e.g., max L)
r1, L = 0.9r2, L = 0.8r3, L = 0.7
25
Q1: Added Vulnerability?
Bob
Alice
ΔLL = ??
r1 r2 r3 r4
p
r4 may cause Bob’s records tosnap together!
26
Q2: Disinformation?
Bob
Alice
ΔLL = ??
r1 r2 r3 r4 (lies)
p
What is mostcost effectivedisinformation?
27
Q3: Verification?
Bob
Alicep
What is best factto verify to increaseconfidence in hypothesis?
r1, 0.9r2, 0.8r3, 0.7...
hypothesis h (0.6)
28
Privacy Spectrum
• Prevention
• Detection
• Containment