Generalizing PIR for Practical Private Retrieval of Public Data
Post on 31-Jan-2016
21 Views
Preview:
DESCRIPTION
Transcript
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi
Department of Computer ScienceUC Santa Barbara
DBSec 2010
The Problem◦ Practical private retrieval of public data
Main Challenges◦ Strong privacy, practical cost of retrieval
Our proposal◦ Absolute privacy in a bounding box
Contributions◦ Private retrieval service charge model◦ Bounding-box PIR: generalizing k-Anonymity and PIR◦ Query by key in one round
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 2
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 3
public data
Server
Private query method
Client
query obfuscatedquery
I don’t want to reveal my personal interest.
Untrustyserver
I can provide this private retrieval
service, if you pay for it.
Private data profile
Desiderata◦ Practical
Minimize computation and communication costs◦ Flexible
Allow clients to specify their desired degree of privacy ρ and service charge budget µ. Satisfy ρ without exceeding µ.
Metrics of interests◦ Performance metrics
Computation Cost Ccomp Communication Cost Ccomm
◦ Quality of service metrics Privacy Breach Probability Pbrh (Pbrh ≤ ρ) Server Charge Csrv (Csrv ≤ µ)
Challenge◦ Difficult to achieve both strong privacy and practical retrieval cost
at the same time
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 4
Principle◦ Blur a data value with a range or partition s.t. each value is
indistinguishable among at least k values. [Sama98, Swee02]
Analysis: use k bit data to anonymize 1 requested bit ◦ E.g. k =30, query “June 17, 1972” -> obfuscated query “June, 1972”◦ Ccomp = k, Ccomm = k +1◦ Pbrh = 1/k, Csrv = k
Pros Flexible Computationally cheap
Cons Potential proximity breach for numeric data (due to a narrow
anonymous range) [Li08] Plain text communication, subject to attack with background
knowledge
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 5
Principle◦ Achieve computationally complete privacy by applying
cryptographic computations over the entire public data [Kush97]
Pros◦ Complete privacy for clients◦ Secure communication
Cons◦ Orders of magnitude less efficient than simply transferring the
entire data from the server to the client [Sion07]
X1
X2
…
…
…
Xn
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 6
X=public data
ServerClientq=“give me ith
record” encrypted(q)
encrypted-result=f(X, encrypted(q))Xi
Quadratic Residue (QR) x is a quadratic residue (QR) mod N if
◦ E.g. N=35, 11 is QR (92=11 mod 35), 3 is QNR (no y exists for y2=3 mod 35)
◦ Essential properties: QR ×QR = QR QR ×QNR = QNR
Let N =p1×p2, p1 and p2 are large primes of m/2 bits.
Quadratic Residuosity Assumption (QRA)◦ Determining if a number is a QR or a QNR is
computationally hard if p1 and p2 are not given.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 8
Adapted from Tan’s presentation
0 1 01
1 1 01
0 1 01
0 1 11
e
g
Get M2,3
e=2, g=3, N=35, m=6
QNR={3,12,13,17,27,33}
QR={1,4,9,11,16,29}
4 16 17 11
QNR
z 4
z 3
z 2
z 1
z2=QNR => M2,3=1
z2=QR => M2,3=0
M2,3
17
33
17
27
public data size: n = 16
Organize data in an s×t (4×4) binary matrix M
Principles◦ Rely on cPIR cryptographic operations to achieve strong privacy◦ Trade partial privacy of cPIR for practical performance◦ Adopt the flexible privacy principle of k-Anonymity
Basic idea◦ Bound expensive cryptographic computations in an r×c bounding
box BB, a sub-matrix on M.◦ (1) Satisfy client’s privacy requirement: r×c = 1/ρ◦ (2) Minimize Ccomm -> minimize (c + b×r)
Properties◦ The bounding box contains both the data whose values are close
to the query value and the data whose values are not close.◦ Unify k-Anonymity and cPIR by varying dimensions of the
bounding box
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 9
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 10
0 1 01
1 1 01
0 1 01
0 1 11
e
g
Get M2,3
e=2, g=3, N=35, m=6
QNR={3,12,13,17,27,33}
QR={1,4,9,11,16,29}
z2=QNR => M2,3=1
M2,3
17
27
16 17
QNR
y:z:
BB
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 11
8 33 56 89
7 26 54 80
5 23 53 79
1 16 45 72
Public data size: n = 16
Query: retrieve the item with key 53
g
e cPIR
8 33 56 89
7 26 54 80
5 23 53 79
1 16 45 72
Ccomp = k = 4
Ccomm = k +1 = 5
Pbrh = 1/ k = ¼
Csrv = k = 4
8 33 56 89
7 26 54 80
5 23 53 79
1 16 45 72
g
e k-Anonymity
g
e bbPIR
Bounding box
Limitation of previous formulation: query by matrix address
Solution for query by key: find address by key◦ Candidate solution I: third party translation, like in Casper
[Mokb07] Cons: security subject to a third party
◦ Candidate solution II: an index structure on server mapping key to address [Chor97] Cons: needs O(b × logn) times communication
◦ Our proposal: server publishes a histogram H on the key field to authorized clients. Client calculates an address range for the queried entry by
searching the bin in which the entry falls. Pros: If the bin size w ≤ s, only need to run one round of bbPIR
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 12
In clients’ view, server matrix M is a histogram matrix HM, thus the address of the requested item x maps to an address range of the items in the same bin with x.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 13
M2,3
40
--
26
HM1,3 (M1,3, M2,3)
w=2
100
--
94
79
--
72
53
--
45
23
--
16
5
--
1
138
--
101
93
--
80
70
--
54
13
--
7
g
e
947245161
1007953235
1018054267
1078960338
13893704013
g
e
Implementation of three private retrieval methods◦ bbPIR, cPIR◦ k-Anonymity: anonymize the private query item by specifying a
consecutive range that covers the item
Data set◦ Generated n=106 data records with 3 attributes based on an
Adult census data set with 32561 records of 15 attributes.◦ Only for experiment on proximity privacy of numeric data,
generated 106 numeric data following Zipf distribution in [0.0, 1.0].
Settings◦ Test bed: Intel 2.40GHz CPU, 3GB memory, Federal Core 8 OS◦ Default parameter values: ρ = 0.001, µ = 50, k = 1000, m =
1024
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 14
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 15
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 16
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 17
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 18
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 19
We proposed a practical, flexible and secure approach for private retrieval of public data in single server settings, called Bounding-Box PIR (bbPIR).
bbPIR generalizes cPIR and k-Anonymity based private retrieval methods.
We incorporated the realistic assumption of charging clients for the exposed service data.
We achieved query by key without running additional rounds of bbPIR.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 20
[Sama98] P. Samarati et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, 1998.
[Swee02] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002.
[Li08] J. Li et al. Preservation of proximity privacy in publishing numerical sensitive data. In SIGMOD 2008.
[Mokb07] M. Mokbel et al. The new casper: A privacy-aware location-based database server. In ICDE 2007.
[Kush97] E. Kushilevitz et al. Replication is not needed: Single database, computationally-private information retrieval. In FOCS 1997.
[Sion07] R. Sion et al. On the computational practicality of private information retrieval. In NDSS 2007.
[Chor97] B. Chor et al. Private information retrieval by keywords. Technical Report, TRCS 0917, Technian.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 21
top related