Top Banner
Self-Enforcing Private Inference Control Yanjiang Yang, Yingjiu Li * , Jianying Zhou, and Feng Bao Institute for Infocomm Research, Singapore * School of Information Systems, Singapore Management University {yjyang,jyzhou,baofeng}@i2r.a-star.edu.sg,[email protected] Abstract. Private inference control enables simultaneous enforcement of inference control and protection of users’ query privacy. Private inference control is a useful tool for database applications, especially when users are increasingly concerned about individual privacy nowadays. However, protection of query privacy on top of inference control is a double-edged sword: without letting the database server know the content of user queries, users can easily launch DoS attacks. To assuage DoS attacks in private inference control, we propose the concept of self-enforcing private inference control, whose intuition is to force users to only make inference-free queries by enforcing inference control themselves; otherwise, penalty will inflict upon the violating users. Towards instantiating the concept, we formalize a model on self-enforcing private inference control, and propose a concrete provably secure scheme, based on Woodruff and Staddon’s work. In our construction, “penalty” is instantiated to be a deprivation of users’ access privilege: so long as a user makes an inference-enabling query, his access privilege is forfeited and he is rejected to query the database any further. We also discuss several important issues that complement and enhance the basic scheme. 1 Introduction The inference problem, first studied in statistical databases [2, 11] and then extended to multilevel databases and general-purpose databases [14], has been a long standing issue in database security. Inference refers to the fact that sensitive information beyond one’s privileges (with respect to the ac- cess control policy) can be inferred from the unsensitive data to which one is granted access. Access control (e.g., mandatory access control mecha- nism) cannot solve the inference problem, since inference results from the combination of a series of legitimate queries that are authorized by access control. The set of queries whose responses lead to inference is said to form an inference channel. The aim of inference control is to prevent the for- mation of inference channels. Many of the inference control methods audit queries in order to ensure that a user’s current query, together with his past queries, cannot form any inference channel (see for example, [12, 25,
16

Self-enforcing Private Inference Control

Mar 04, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control

Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

Institute for Infocomm Research, Singapore∗School of Information Systems, Singapore Management Universityyjyang,jyzhou,[email protected],[email protected]

Abstract. Private inference control enables simultaneous enforcement ofinference control and protection of users’ query privacy. Private inferencecontrol is a useful tool for database applications, especially when usersare increasingly concerned about individual privacy nowadays. However,protection of query privacy on top of inference control is a double-edgedsword: without letting the database server know the content of user queries,users can easily launch DoS attacks. To assuage DoS attacks in privateinference control, we propose the concept of self-enforcing private inferencecontrol, whose intuition is to force users to only make inference-free queriesby enforcing inference control themselves; otherwise, penalty will inflictupon the violating users.

Towards instantiating the concept, we formalize a model on self-enforcingprivate inference control, and propose a concrete provably secure scheme,based on Woodruff and Staddon’s work. In our construction, “penalty”is instantiated to be a deprivation of users’ access privilege: so long as auser makes an inference-enabling query, his access privilege is forfeited andhe is rejected to query the database any further. We also discuss severalimportant issues that complement and enhance the basic scheme.

1 Introduction

The inference problem, first studied in statistical databases [2, 11] and thenextended to multilevel databases and general-purpose databases [14], hasbeen a long standing issue in database security. Inference refers to the factthat sensitive information beyond one’s privileges (with respect to the ac-cess control policy) can be inferred from the unsensitive data to which oneis granted access. Access control (e.g., mandatory access control mecha-nism) cannot solve the inference problem, since inference results from thecombination of a series of legitimate queries that are authorized by accesscontrol. The set of queries whose responses lead to inference is said to forman inference channel. The aim of inference control is to prevent the for-mation of inference channels. Many of the inference control methods auditqueries in order to ensure that a user’s current query, together with hispast queries, cannot form any inference channel (see for example, [12, 25,

Page 2: Self-enforcing Private Inference Control

2 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

13, 8, 10, 16, 20, 21, 28]). By complementing access control, inference controlworks as an extra line of defence to database secrecy.

What forms an inference channel depends closely on the data to be pro-tected and the protection objective. For example, in statistical databases,the objective is to answer queries on statistics of records (e.g., sum, mean,etc) without releasing individual records [4]; so an inference channel is cre-ated if an individual value can be deduced from a set of queries, each ofwhich returns a statistic value of multiple records. In a multilevel database,a set of queries involving data classified at lower levels forms an inferencechannel if data classified at a high level can be derived from them, by meansof database integrity constrains such as functional, multivalued, and joindependencies [3, 26]. Our concern in this work is the inference channels thatresult in identifying the subjects contained in the database. As an example,let us consider a database of medical records for individuals: the explicitidentifying information such as patient name or social security number isclearly sensitive; in contrast, individual attributes such as age, ZIP code,date of birth, profession, and marital status are not personally identifiable,as each of them alone usually does not contain sufficient information touniquely identify any individuals, thereby should not be classified as sensi-tive. However, a combination of all or some of these non-sensitive attributesmay be uniquely identifiable, thus forming an inference channel. Inferencecontrol in this context works by blocking users from obtaining responses ofthe queries that cover all the attributes necessary to complete an inferencechannel.

In the setting of database access, equally important is that users whoquery the database also have privacy concern. Exposure of what data auser is accessing to the database server may lead to the compromise ofuser privacy [6, 7, 18, 19, 22], and possibly subject the user to targeted mar-keting, financial losses, and etc. It is thus desirable that inference controlis enforced by the server in a way that query privacy is also preserved(i.e., the server does not learn the content of the query when control queryinference). Note that to that end, a solution necessarily implements pri-vate information retrieval (PIR) (e.g., [18, 22]) so as to satisfy the queryprivacy requirement, but goes beyond that to also fulfil the inference con-trol requirement. Woodruff and Staddon [27] are believed to be the firstto systematically study this problem, and proposed private inference con-trol (PIC) to attain both requirements. Private inference control achievesthat if the current query of a user, together with his prior queries, forman inference channel, then the user obtains a random value instead of thedesired data as the query response from the server, who learns nothing on

Page 3: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 3

the query. Jagannathan and Wright [17] later extended private inferencecontrol to aggregate database queries. In particular, a user query is a set ofindices of the data in the database; the response of a query computes anaggregate function (e.g., sum, average) of the data items indexed by thequery. The techniques by Aiello et al. [1] can also be used to attain privateinference control for some specific scenarios, but not for the general settingaddressed by [27, 17].

Private inference control turns out to be a useful tool for database appli-cations, especially at the time user privacy is increasingly a concern nowa-days. Unfortunately, practical deployment of private inference control mayencounter an enormous obstacle, due to the fact that since the databaseserver knows nothing about user queries, database systems enforcing privateinference control can be easily exploited by users launching DoS attacks,i.e., some users deliberately query the database server to waste the preciousresources of the server. It is a well known fact that inference control (evenwithout privacy protection) is extremely computation intensive [2], henceDoS attacks are expected to be particularly effective in private inferencecontrol. Directly incorporating access control into inference control cannotentirely solve the problem if inference control does not provide any extrainformation (e.g., whether a query leads to inference or not) to the en-forcement of access control. Without the assistance from inference control,rigid access control rules such as stipulating the frequency of user queriesor the number of continuous user queries will make the system extremelyinconvenient.Our Contribution. In this work, we are motivated to propose the con-cept of self-enforcing private inference control, in an attempt to resolvethe above problem in private inference control. The intuition behind self-enforcing private inference control is to force users not to make queriesthat form inference channels; otherwise, penalty will incur on the queryingusers. In other words, users are obliged to enforce costly inference controlby themselves before making queries and only make inference-free queriesthat do not complete inference channels.

Further, we manage to instantiate the self-enforcing intuition. Specifi-cally, we formalize a model on self-enforcing private inference control, andpropose a concrete scheme which is proven secure under the model. The pro-posed model and scheme are largely based on Woodruff and Staddon’s work[27]. In our scheme, “penalty” is instantiated to be a deprivation of users’access privilege. More precisely, if a user makes an inference-enabling query(which completes an inference channel together with his past queries), thenthe user’s access privilege is forfeited and he is rejected to make queries

Page 4: Self-enforcing Private Inference Control

4 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

any further. We achieve this by incorporating access control into inferencecontrol, and basing access control on one-time access keys: a user gets theaccess key for next query, only if his current query is inference-free. Wepoint out that this approach lets the server learn in retrospection that auser’s last query is inference-free or not, which relaxes the stringent def-inition on query privacy in [27, 17]; but we believe that it still providessufficient privacy protection for many practical applications, since the ac-tual content of user queries is not revealed to the server. Furthermore, wealso discuss how to achieve the same level of query privacy as in [27, 17].Organization. We review the preliminaries in Section 2. We present ourmodel and scheme for self-enforcing private inference control in Section 3,together with the security proof for the scheme. We discuss a variety ofpractical issues in Section 4 before we conclude the paper in Section 5.

2 Preliminaries

For ease of understanding, we give a brief introduction to the cryptographicprimitives used in our scheme.

Homomorphic Encryption A homomorphic encryption, E : (G1, .) → (G2,+),is a public-key encryption scheme satisfying E(a).E(b) = E(a + b), wherea, b ∈ G2. The Paillier homomorphic encryption [23], introduced below,well suits our needs.

– KeyGen(1κ): Given a security parameter 1κ, this algorithm selects twolarge primes p, q and sets n = p.q; chooses g whose order of g (mod n2)is a multiple of n. The public key is then set to be pk = (g, n) and thecorresponding secret key is sk = λ(n) = lcm(p− 1, q − 1).

– Hom Enc(m, pk): The encryption of a message m ∈ Zn using pk is thefollowing: choose a random r ∈ Z∗n, and compute C = gmrn (mod n2).For notation brevity, we do not explicitly indicate pk in the input list.

– Hom Dec(C, sk): The decryption of a ciphertext C using secret key sk ism = L(Cλ (mod n2))

L(gλ (mod n2))(mod n), where L is defined to L(x) = x−1

n .

The scheme is homomorphic:

Hom Enc(m1).Hom Enc(m2) = (gm1rn1 ).(gm2rn

2 ) = gm1+m2(r1r2)n (mod n2)= Hom Enc(m1 + m2)

Note that the Paillier homomorphic encryption scheme is semantically se-cure.

Page 5: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 5

Symmetric Private Information Retrieval The notion of Private Informa-tion Retrieval (PIR) was first formalized by Chor et al. [7]. In their for-malization, a database is modelled as a n-bit string and held by one ormultiple servers; suppose a user wants to retrieve the i-th bit, an executionof PIR between the user and the server(s) enables the former to get that bitfrom the latter while without revealing to the latter any information on theindex i. In its original form, PIR is not concerned protecting the interestof the database, in the sense that the querying user can end up learningmore bits than the requested bit. Symmetric PIR (SPIR) [18] adds to PIRthe functionality of also protecting the secrecy of the database, so that theuser can get nothing but the requested bit. Our scheme needs to leverageon single-server SPIR protocols, which usually achieve user query privacyrelative to the computationally-bounded server. Naor and Pinkas showedhow to transform any PIR protocol into such a SPIR protocol [22]. In ad-dition, we also require SPIR protocols to operate on a database of recordsinstead of bits. In fact, it is easy to transfer a bit-oriented SPIR protocolinto a record-oriented protocol by treating a record as bits, and runningthe former as many times as the bit size of the records. Optimizations uponthis straightforward method was given in [5].

For the purposes of proof, the SPIR protocol used in our scheme shouldbe secure in the real/ideal model, in the sense that there is an simulatorthat can extract the index of the requested bit learned by any user U∗

interacting with the honest server. Applying the transformation techniquesof [22] to the PIR protocols in [9, 19] yields such SPIR protocols.

3 Self-Enforcing Private Inference Control

3.1 Model

We consider a single-server scenario: the server houses a database consistingof a set of records, and allows users to retrieve records from the database.On the one hand, the server enforces inference control such that in a query, auser gets the requested record only if it does not complete an inference chan-nel, together with his previous retrieved records. On the other hand, userswant their query privacy to be protected, such that the server does not learnwhat records they are retrieving. The database is a string d ∈ (0, 1m)n

of n records, each record has m attributes (note that for formalization sim-plicity, we here consider 1-bit attributes, but extending our formalizationand scheme to attributes of larger size is straightforward). Let di denotethe ith record and di,j denote the jth attribute value of the ith record. Theinference channels we are considering are IC, a set of S ⊆ [1..m], each of

Page 6: Self-enforcing Private Inference Control

6 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

which is a set of attribute indices. By IC, it means that for all i ∈ [1..n]and S ∈ IC, a user should not learn di,j for all j ∈ S. As an example,let IC = 1, 2, 4, 5, 6, then inference control is to prevent a user fromgetting all 1st, 2nd, 4th attributes and all 5th, 6th attributes of any record.Given a database, IC can be determined by the technique in [24]. Once ICis decided, it is input to both the server and users.

Throughout the system’s lifetime, an honest user U generates a se-quence of |Q| tuples, Q = ((i1, j1), (i2, j2), · · · , (i|Q|, j|Q|)), in order to re-trieve di1,j1 , di2,j2 , · · · , di|Q|,j|Q| . Users are allowed to query the databaseadaptively, so (it, jt) can be generated depending on all the user’s previousinteractions with the server. In our scheme, all pairs in Q are distinct (i.e.no repeated queries), thus |Q| ≤ (m−1)n. As we show shortly in Section 4,this is not a weakness, but instead facilitates assuaging DoS attacks. Wefurther discuss how to allow repeated queries if necessary. A sequence Q issaid to be permissible if it does not compose any inference channel definedin IC, that is, for all S ∈ IC and all i ∈ [1..n], there exists a j ∈ S suchthat (i, j) /∈ Q. The sequence Q is non-permissible if it is not permissible.Formally, we can treat Q = Q(U, d) as a random variable induced by theuniform distribution on µ and ς, where U denotes the code of U , and µ, ςare random bits consumed by the user and the server, respectively.

Self-enforcing Private Inference Control (SePIC) is an interactive multi-round protocol between a user and the server. Let n,m, IC, 1κ be commoninput to both parties and d be an extra input to the server, where κ isa security parameter. Another common input is the initial access key K0

shared between the user and the server. The user registers to the server inadvance, and gets K0. Let the total number of rounds in each execution ofSePIC (i.e., processing one query) be Rd. Let ml,w be the message computedand sent to the server by the user in the wth round of the lth execution ofthe protocol, and al,w be the response the server sends back to the user(1 ≤ l ≤ |Q|, 1 ≤ w ≤ Rd), we define Ml,w = mi,j |1 ≤ i < l and1 ≤ j ≤ Rd, or i = l and 1 ≤ j ≤ w, and Al,w = ai,j |1 ≤ i < l and1 ≤ j ≤ Rd, or i = l and 1 ≤ j ≤ w. Let ΩU (Al,w) be state of the userand ΩS(Ml,w) be state of the server at the end of the wth round of the lthexecution. The states may include the entire query history and the one-timeaccess keys.Definition 1. (Self-enforcing Private Inference Control) Let QG bethe query generation algorithm, AC the the access control algorithm, QP thequery processing algorithm, and AR the answer reconstruction algorithm. adatabase access protocol runs the following process between a user and theserver:

Page 7: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 7

For 1 ≤ l ≤ |Q|

1. The user generates Ql = (il, jl) such that Q1, Q2, · · · , Ql arepermissible.

2. for 1 ≤ w ≤ Rd– The user computes ml,w = QG(il, jl, ΩU (Al,w−1), µ) and

sends ml,w to the server.– The server computes (a(1)

l,w, Ω(1)S (Ml,w)) =

AC(ml,w, d, ΩS(Ml,w−1), ς) and (a(2)l,w, Ω

(2)S (Ml,w)) =

QP(ml,w, d, ΩS(Ml,w−1), ω), where al,w = a(1)l,w ∪ a

(2)l,w and

ΩS(Ml,w) = Ω(1)S (Ml,w) ∪ Ω

(2)S (Ml,w). The server then

sends al,w to the user.– The user computes ΩU (Al,w) =

AR(al,w, il, jl, ΩU (Al,w−1), µ). If w = Rd,AR(al,w, il, jl, ΩU (Al,w−1), µ) also outputs outVl, whichis the requested record.

Note that AC(.) is executed only when w = 1. For w > 1, a(1)l,w = ∅, Ω

(1)S (Ml,w) =

∅. We define the view of an arbitrary user U∗ with respect to an honestbut curious server to be V iewU∗(d, µ, ς) = (A|Q|,Rd, µ, 1κ,m, n, K0), andthe view of an arbitrary server S∗ with respect to an arbitrary user U∗ tobe V iewS∗(d, µ, ς) = (M|Q|,Rd, d, ς, 1κ,m, n,K0).A database access protocol consisting of algorithms QG, AC, QP, and AR is aSePIC protocol if it satisfies correctness and security:Correctness. For all d ∈ (0, 1m)n and all honest users U ,

Prµ,ς

[∀α ∈ Q(U, d), outVα = diα,jα ] = 1 (1)

Security. Security includes the following three aspects.

1. Query Privacy: For all d ∈ (0, 1m)n, for any honest user U and anytwo permissible query sequences Q1 and Q2 with equal length, for allhonest but curious servers S∗, the probability that S∗ can distinguishbetween Q1 and Q2 is negligible. That is, for all polynomial algorithmA,

|Prµ

[A(V iewS∗(U, d, µ, ς)) = 1 | Q(U, d) = Q1]−Prµ

[A(V iewS∗(U, d, µ, ς)) = 1 | Q(U, d) = Q2]| ≤ ε(κ) (2)

Page 8: Self-enforcing Private Inference Control

8 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

where ε(κ) is a negligible function with respect to κ, and the probabilityis also taken over the random bits consumed by the adversary A.REMARK 1. By this definition, we relax the query privacy definition in[27], in that our definition relates to distinguishing two permissible se-quences, while their definition distinguishes any two sequences. How-ever, our definition well suits the setting we are considering, where aninference-enabling query immediately terminates the protocol.

2. Access Control: For any user U and any non-permissible query se-quence Q(U, d), the last pair (i|Q|, j|Q|) ∈ Q(U, d) must be inference-enabling, that is, Pr[(i|Q|, j|Q|) ∈ Q(U, d) is inference-free | Q(U, d) isnon-permissible] ≤ ε(κ). The intuition is that once an inference-enablingquery is issued, the user cannot make any further query. Thus the lastpair must be the inference-enabling one.

3. Inference Control: The protocol runs in the real world, and we com-pare it with the ideal model, where a trusted third party, TTP, whengiven d ∈ (0, 1m)n, and a permissable Q, gives the user di,j for all(i, j) ∈ Q. More precisely, we require that for every U∗ in the real worldand every µ, there exists an efficient U ′ in the ideal model, given the codeof U∗ and µ, that plays U∗’s role, such that for every d ∈ (0, 1m)n, U ′

can find a permissable Q such that the output of U∗ interacting withthe honest server and the output of U ′ interacting with TTP are com-putationally indistinguishable (both the honest server in the real worldand TTP in the ideal model are given d). Note that the Q found by U ′

is a random variable induced by the uniform distribution on ς.

3.2 Our Scheme

We next propose a concrete SePIC protocol, assuming IC to be a particularform IC = 1..m. The inference control rule is thus that for any record,the user cannot get all its attributes. The basic idea for our protocol is thefollowing. Prior to query processing, the server enforces access control suchthat query processing goes on only if the user passes access control step;otherwise, the server rejects the user. Access control is based on an one-time access key shared between the server and the user. When processing aquery, the server selects a random one-time key for the user’s next access,and the user will get the key only if his current query is inference free. Inother words, if the user issues an inference-enabling query, he will lose hisprivilege for further queries. This forces the user to be cautious in makingqueries, and thus enforce inference control by himself to make sure that thequery to be issued is inference free.

Page 9: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 9

Suppose by an initialization phase, IC is established. The input to theuser is (1κ, n, m,K0) and to the server is (1κ, d, n,m, K0). We show theprotocol for executing the user’s lth query in Figure 1.

For ease of understanding, we next outline the basic idea on how theprotocol works. In the Access Control algorithm, as long as there are m−1or more i’s among i1, i2, · · · , ii−1 being equal to il, in which case the lqueries thus far are a non-permissible sequence1, then the total number ofe’s satisfying Hom Enc(.) 6= Hom Enc(0) is at most (l− 1)− (m− 1) = l−m.This means that at most l − m shares can be recovered, which are notsufficient to recover Kl+1 in a (l − m + 1)-out-of-(l − 1) secret sharingscheme (see Figure 1). So the user’s access privilege for next query is ”lost.”Without Kl+1, the user is clearly unable to compute dil,jl

in the AnswerReconstruction algorithm. In this way, inference control is enforced. On thecontrary, if the l queries are a permissible sequence, Kl+1 clearly can becorrectly recovered, so can the user recover dil,jl

. Query privacy is enforceddue to the use of homomorphic encryption with semantic security and theuse of the SPIR protocol: first, from Hom Enc(il) and Hom Enc(jl) the serverlearns nothing on (il, jl). Second, through the SPIR protocol the serverlearns nothing on the entry retrieved by the user in the Query Processingalgorithm.

Performance. The communication cost for the lth query is O(l)+ |SPIR|,where |SPIR| denotes the communication cost of the SPIR protocol. Thecomputation overhead on the user side is O(l) plus that of the execution ofthe SPIR protocol. The computation overhead for the server is O(l + nm)plus the execution of the SPIR protocol.

3.3 Security Analysis

We next analyze the security of the above protocol. We have the followingtheorem.Theorem 1. The protocol given in Figure 1 is a SePIC protocol with respectto Definition 1.Proof. It is not difficult to check that the protocol satisfies the correctnessdefinition, considering our earlier explanation on how the protocol works. Inthe following, we focus on the security analysis and give the proof sketches.

1. Query Privacy. We prove by contradiction. Let us consider the actualview V iewS∗ of an arbitrary server S∗ with respect to an arbitrary hon-

1 This is because we do not allow repeated queries, so each of such queries must retrievea different attribute of dil

Page 10: Self-enforcing Private Inference Control

10 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

Let the lth pair in query be ql = (il, jl), where 1 ≤ l ≤ |Q|, 1 ≤ il ≤ nand 1 ≤ jl ≤ m

Query Generation QG: If l = 1, the user generates a public/privatekey pair for the homomorphic encryption scheme and passes the publickey to the server.The user computes and sends Hom Enc(il) and Hom Enc(jl) to the server.

Access Control AC:1. If l < m, the server sets Kl+1 = 0, and skips access control, directlygoing to the Query Processing algorithma.Else, the server and the user perform an interactive authentication pro-tocol, both using Kl. Here we assume that the user’s (l − 1)th queryis successful, and as a result, the user and the server share Kl. If theauthentication fails, the server aborts. Otherwise, the server selects arandom Kl+1, and in turn generates l − 1 shares, s1, s2, · · · , sl−1, form-ing a (l−m + 1)-out-of-(l− 1) sharing of Kl+1 using any secret sharingscheme.2. The server computes e1 = Hom Enc((i1 − il)s1), e2 = Hom Enc((i2 −il)s2), · · · , el−1 = Hom Enc((il−1 − il)sl−1) using the user’s previousqueries. The server then returns e1, e2, · · · , el−1 to the user, and alsoadds (Hom Enc(il), Hom Enc(jl)) to the user’s query history.3. The user decrypts e1, e2, · · · , el−1; if the user’s query sequence thus faris permissible, the user can recover at least l−m+1 s′s, thus reconstructKl+1.

Query Processing QP:1. The server selects random numbers r

(1)i,j and r

(2)i,j , for every 1 ≤

i ≤ n, 1 ≤ j ≤ m. The server then forms a table with n × m en-tries di,j = Hom Enc(r(1)

i,j (i − il) + r(2)i,j (j − jl) + Kl+1 + di,j). Clearly,

dil,jl= Hom Enc(Kl+1 + di,j), and each of the other entries is an encryp-

tion of a random number with respect to the user.2. The user and the server performs a SPIR protocol upon d to get the(il, jl)th entry dil,jl

.

Answer Reconstruction AR: The user decrypts dil,jlto obtain Kl+1 +

dil,jl. If and only if the user has successfully recovered Kl+1 in AC, can

he can then recover dil,jl.

a For the first m− 1 queries, access control is not necessary, because even all the m− 1queries access the same record, they still cannot form an inference channel. From them-th query onward, the server begins to enforce access control. K0 is then used forauthentication in the m-th query, i.e., Km = K0.

Fig. 1. Our Protocol

Page 11: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 11

est user U∗ in the protocol. After omitting the non-essential elements(with respect to query privacy) and common parameters, V iewS∗ can bewritten as V iewS∗ = (〈Hom Enc(il, jl), SPIR(il, jl)〉, 〈0〉)l∈|Q|, whereHom Enc(il, jl) is shorthand for Hom Enc(il) and Hom Enc(jl), SPIR(il, jl)denotes the interaction transcript produced by an execution of the SPIRprotocol, and 0 denotes the fact that the server knows that the queryis inference-free. In V iewS∗ , 0 can be omitted because every querygenerates the same element to the server. Therefore, we need to con-sider V iewS∗ = (Hom Enc(il, jl), SPIR(il, jl))l∈|Q|. The intuition forthe proof is then apparent, because it involves distinguishing homomor-phic encryption and SPIR protocol transcript. The actual proof is a se-ries of hybrid arguments on distinguishing V iewS∗ [Q] = (Hom Enc(il, jl),SPIR(il, jl))l∈|Q| and V iewS∗ [Q′] = (Hom Enc(i′l, j

′l), SPIR(i′l, j

′l))l∈|Q′|,

where Q = ((i1, j1), (i2, j2), · · ·), Q′ = ((i′1, j′1), (i

′2, j

′2), · · ·) are two per-

missible sequences with equal length. Suppose, to the contrary, thatthere is a PPT distinguisher D that can distinguish V iewS∗ [Q] andV iewS∗ [Q′] with non-negligible advantage Adv, we can show that aPPT D′ can be constructed to either break the semantic security ofhomomorphic encryption or break the privacy property of SPIR withadvantage Adv/|Q|, by invoking D. The details are straightforward,following the hybrid arguments technique, which can be found in [15].

2. Access Control. It suffices for us to consider the scenario that the lthquery is inference-enabling, but the user passes access control in the(l + 1)th query. We show that the probability for it to occur is negligi-ble. There are two cases to consider. First, the lth query is inference-enabling, but the user still manages to recover Kl+1. The probability forthis apparently is 1/|K|, where K is the space of one-time keys. Second,the user passes the access control in the (l + 1)th query without Kl+1.The probability for this entirely relates to the strength of the authentica-tion protocol. We denote this probability as Pr[auth-protocol failure],which is supposed to be negligible. To summarize, we have Pr[(i|Q|, j|Q|) ∈Q(U, d) is inference-free | Q(U, d) is non-permissible] ≤ 1/|K|+Pr[auth-protocol failure].

3. Inference Control. We consider an arbitrary user U∗ with randomtape µ. Using a SPIR protocol with security in the real/ideal model, asimulator S, when given U∗’s code and µ in the Access Control algo-rithm, can be formulated to extract the indices (il, jl) queried by U∗.Let U ′ interact with TTP in the ideal model. U ′ runs the code of U∗ anduses the knowledge extractor of S to obtain (i′l, j

′l) and Hom Enc(i′l, j

′l)

for some i′l and j′l. If the extracted indices (i′l, j′l) 6= (il, jl) or if (il, jl) to-

Page 12: Self-enforcing Private Inference Control

12 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

gether with the past pairs queried from TTP form an inference channel,in the second step of Access Control, U ′ will provide Hom Enc(r) for arandom number r; otherwise, U ′ will ask TTP for dil,jl

and follow theprotocol as the honest server. U ′ also honestly carries out query gener-ation, asks for one-time keys from TTP, and executes authentication.Thus far, it is not hard to see that U ′ can generate a computationallyindistinguishable view to U∗’s, and in turn a computationally indistin-guishable output.

4 Discussion

In this section, we discuss several issues pertaining to our self-enforcingprivate inference control system.

Penalty Lifting and An Alternative. In our protocol, once a userissues an inference-enabling query, the user’s access privilege is forfeited forever. One may argue that this violating-once-then-rejected penalty is tooharsh. It is actually not hard to solve this problem in practical applications.For example, one may allow the user to regain his access privilege throughsome out-of-band channel: the server issues a new one-time access key tothe user after the user accepted certain penalty, e.g., credit or financialpenalty, and the server resets the user’s access account with this new key.

Alternatively, it is possible to implement violating-multitimes-then-rejectedpenalty. To be specific, we allow a user to make a threshold number ofinference-enabling queries; however, if he makes one more inference-enablingquery, then he forfeits his access right. The idea is the following (supposethe threshold is t). The one-time access keys Kll are no longer used forauthentication in the Access Control algorithm. Instead, they help a userto obtain the one-time authentication keys AKll, which are actually usedin authentication. Of course, the generation and sharing of Kll betweenthe server and the user remain the same as in the original protocol. With-out loss of generality, in the lth query, the server needs to pass AKl+1,selected randomly by the server, to the user for next access. Note that atthis point of time (i.e., the beginning of the l query), the user has beengiven Km(= K0),Km+1, · · · ,Kl, as per the original protocol. We shouldguarantee that the user gets AKl+1, only if he has made no more than tinference-enabling queries in the past l − 1 queries, in which case the usershould have obtained at least (l−m+1)−t keys among Km,Km+1, · · · ,Kl.Under this rationale, the server generates l−m+1 shares of AKl+1 using a(l−m + 1− t)-out-of-(l−m + 1) secret sharing scheme, and then encrypts(e.g., using a block cipher) each share with a key from Km,Km+1, · · · ,Kl,

Page 13: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 13

and passes the encrypted shares to the user. Clearly, only if the user hasmade t or less inference-enabling queries, can he decrypt enough shares toreconstruct AKl+1. This method softens the violating-once-then-rejectedpenalty, thereby more acceptable, but it has the disadvantage that usershave to keep their one-time access keys Kll.

Repeat Queries. Our protocol does not allow for repeat queries. Thisproperty is in fact an advantage with respect to our objective of curbingDoS attacks. Imagine that if repeat queries are allowed, a user can simplyrepeatedly issue an inference-free query to the server for the purpose ofDoS attacks.

While prohibiting repeat queries is good in combating DoS attacks,some systems may still need to enable repeat queries in catering to com-mom access custom. We can manage to extend our protocol to provide thisproperty, but with relaxed query privacy. More precisely, it is revealed tothe server the information that a query is a repeat one. We believe thatsuch information disclosure is absolutely necessary in an anti-DoS privateinference control system, since without distinguishing repeat queries, onecannot expect the server to know when to take actions.

We next present an approach in handling repeat queries. The servertreats repeat queries separately from the non-repeating queries: the serverstill uses the original protocol in Figure 1 to answer non-repeating queries,but replies to repeat queries in the following way. A repeat query is inthe form of (“repeat query”, (Hom Enc(i), Hom Enc(j))). Suppose the usermakes this repeat query after the lth non-repeating query, then the useris supposed to have Kl+1 in his possession. First, the server selects ran-dom values r

(1)a and r

(2)a , for every 1 ≤ a ≤ l, and then generates ea =

Hom Enc(r(1)a (i− ia) + r

(2)a (j − ja) + Kl+1 + R), where R is a random num-

ber, and each of the l non-repeating query is Hom Enc(ia) and Hom Enc(ja)).Note that the involvement of Kl+1 guarantee that the lth non-repeatingquery must be inference-free. The server then sends (ea)1≤a≤l to the user,who then decrypts the correct one to get R. It is easy to see that the usergets R only if the repeat query indeed repeats one of the l non-repeatingqueries.

Next, the server selects random values r(1)a,b and r

(2)a,b , for every 1 ≤ a ≤

n, 1 ≤ b ≤ m, and then generates d′a,b = (Hom Enc(r(1)a,b(i− a) + r

(2)a,b(j− b) +

R + da,b). The user then engages in a SPIR protocol with the server to getd′i,j , and in turn decrypt it to recover di,j with R.

We should point out that the access control step is no longer neededin processing repeat queries. However, this does not necessarily weaken the

Page 14: Self-enforcing Private Inference Control

14 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

ability to counter DoS attacks, since the server can anyway cap the numberof continuous repeat queries. Capping in this case should not be deemedrigid, because the server is dealing with the unusual scenario of repeatqueries.

Stateful User. The countermeasure to DoS attacks in our protocolis essentially deterrence: if a user makes an inference-enabling query, heis rejected to access the database further. This is expected to force usersto be cautious in making queries, and enforce inference control themselvesin determining whether or not a query results in inference before querying.Since inference control bases upon a user’s past queries, users in our systemhave to be stateful, i.e., a user needs to keep his query history. This is not aserious problem if a user always uses a single machine for database access.However, a more general scenario is that a user has multiple machines. It isusually not convenient to freely transfer the update-to-date query historyamong multiple machines. Fortunately, the database server in our systemmaintains (encrypted) query history for all users. As such, a user, at thepoint of access, can download his past queries from the server’s centralrepository, rather than keeping his query history on all machines.

Stricter Query Privacy. As mentioned earlier, our protocol achievesa relaxed level of query privacy, compared to the private inference controlprotocols in [17, 27]. More precisely, by authenticating the user in currentquery, the database server learns in retrospection that whether the user’slast query is inference free or not (without knowing the query content itself).Such information actually is exploited to entitle the server to timely stopthe user from possible DoS attacks in our protocol.

It is possible to modify our protocol to attain the same level of queryprivacy as in [17, 27], but the resulting protocol is not as effective in curb-ing DoS attacks as our original protocol. The modification still follows the“self-enforcing” rationale, but the explicit authentication step is disabled orremoved. The user is unable to get the desired records for further queries,once he makes an inference-enabling query; the server learns nothing aboutthe query. As the server is not aware of inference-enabling queries, by de-fault it has to allow any query to proceed. This is the reason why the(modified) protocol is not effective in preventing DoS attacks. On the otherhand, the protocol is still better than those in [17, 27] with respect to DoSattacks, because the user has to bear the consequences of DoS attacks,i.e., he can never retrieve useful data from the server again. In contrast,the users in the protocols in [17, 27] have no worries in issuing inference-enabling queries, since they can still get the correct results as long as theirfurther queries are inference free.

Page 15: Self-enforcing Private Inference Control

Self-Enforcing Private Inference Control 15

The basic idea for the modification is the following. We remove theauthentication step in the original protocol, and alter the way the one-timeaccess keys are generated. In the original protocol, one-time access keysare independent of each other, and are selected as random numbers. Inthe modified protocol, an one-time access key is calculated from a randomnumber together with the last one-time access key. For example, in thelth query, Kl+1 is generated as Kl+1 = h(Kl, r), where Kl is the one-timeaccess key generated in the (l − 1)th query and r is a random number. Inthe Access Control algorithm, r is secret-shared instead of Kl+1; the QueryProcessing algorithm still uses Kl+1. As such, suppose the (l − 1)th querycauses inference and the user thus cannot obtain Kl, then the user in thelth query is not able to compute Kl+1, even the lth query is inference freeand the user recovers r; in turn the user cannot recover the requested recordin the Answer Reconstruction algorithm. Likewise, without Kl+1 the usercannot compute Kl+2, and so forth.

5 Conclusions

DoS attacks are particularly effective in private inference control systems, asthe database server learns nothing on user queries. We were thus motivatedto propose self-enforcing private inference control. The idea is to force usersto be cautious in making queries, as penalty will be inflicted upon theusers making inference-enabling queries. To avoid punishment, the usersare expected to enforce inference control themselves to determine whetheror not a query will result in inference before issuing the query. In this way,the users are deterred from launching DoS attacks.

We managed to instantiate the self-enforcing private inference controlconcept by presenting a formal model and a concrete scheme, which isprovably secure under the model. Several important issues pertaining tothe practicality of the proposed scheme were also discussed.

References

1. W. Aiello, Y. Ishai, O. Reingold. Priced Oblivious Transfer: How to Sell DigitalGoods, Proc. Eurocrypt’01, pp. 119-135, 2001.

2. N. R. Adam, J. C. Wortmann. Security-Control Methods for Statistical Databases: AComparative Study, ACM Computing Surveys, Vol. 21(4), pp. 516-556, 1989.

3. A. Brodsky, C. Farkas, S. Jajodia. Secure Databases: Constraints, Inference Chan-nels, and Monitoring Disclosures, IEEE Trans. Knowledge and Data Engineering,Vol. 12(6), pp. 1-20, 2000.

4. F. Y. Chin. Security Problems on Inference Control for SUM, MAX, and MINqueries, J. ACM, Vol. (33), pp. 451-464, 1986.

Page 16: Self-enforcing Private Inference Control

16 Yanjiang Yang, Yingjiu Li∗, Jianying Zhou, and Feng Bao

5. B. Chor, No Giboa, M. Naor. Private Information Retrieval by Keywords, TechnicalReport CS0917, Israel Institute of Technology, 1997.

6. B. Chor, and N. Gilboa. Computationally private information retrivial, Proc. 29thSTOC, pp. 304-313, 1997.

7. B. Chor, E. Kushilevitz, O. Goldreich, M. Sudan. Private information retrieval, Jour-nal of the ACM, 1995.

8. F. Y. Chin, P. Kossowski, S. C. Loh. Efficient Inference Control for Range SumQueries, Theor. Comput. Sci., Vol. 32, pp. 77-86, 1984.

9. C. Cachin, S. Micali, M. Stadler. Computationally Private Information Retrieval withPolylogarithmic Communication, Proc. Eurocrypt’99.

10. F. Y. Chin, G. Ozsoyoglu. Auditing and Inference Control in Statistical Databases,IEEE Trans. Softw. Eng. Vol. 6, pp. 574-582, 1982.

11. D. E. Denning. Cryptography and Data Security, Addison Wesley, 1982.12. D. E. Denning, P. J. Denning, M. D. Schwartz. The Tracker: A threat to Statistical

Database Security, ACM Trans. Database Systems, Vol. 4(1), pp. 76 - 96, 1979.13. D. Dobkin, A. K. Jones, R. J. Lipton. Secure Databases: Protection Against User

Influence, ACM Trans. Database Systems, Vol. 4(1), pp. 97-106, 1979.14. C. Farkas, S. Jajodia. The Inference Problem: A Survey, SIGKDD Explorations, 4(2).

pp. 6-11, 2002.15. O. Goldreich. Foundations of Cryptography: Basic Tools, The Proess of the Univeristy

of Cambridge, 2001.16. L. J. Hoffman. Modern Methods for Computer Security and Privacy, Prentice-Hall,

1977.17. G. Jagannathan, R. N. Wright. Private Inference Control for Aggregate Database

Queries, 7th IEEE International Conference on Data Mining Workshops,ICDMW’07, pp.711-716, 2007.

18. E. Kushilevitz, and R. Ostrovsky. Replication is not needed: single database, com-putationally private information retrieval, Proc. 38th IEEE Symp. on Foundation ofComputer Science, pp.364-373, 1997.

19. H. Lipmaa. An oblivious transfer protocol with log-squared communication, Proc.Information Security Conference, ISC’05, LNCS 3650, pp. 314-328, 2005.

20. Y. Li, H. Lu, R. H. Deng. Practical Inference Control for Data Cubes (extendedabstract), Proc. IEEE Symposium on Security and Privacy, pp.115-120, 2006.

21. F. M. Malvestuto, M. Mezzini. Auditing Sum-Queries, Proc. International Conferenceon Database Theory, pp. 504-509, 2003.

22. M. Naor, and B. Pinkas. Oblivious transfer and polynomial evaluation, Proc. 31thACM STOC, pp. 245-254, 1999.

23. P. Paillier. Public-key Cryptosystems based on Composite Degree Residuosity Classes,Proc. Eurocrypt’99, pp. 223-238, 1999.

24. X. Qian, M. Stickel, P. Karp, T. Lunt, T. Garvey. Detection and Elimination of In-ference Channels in Multilevel Relational Database Systems, Proc. IEEE Symposiumon Research in Security and Privacy, S&P’93, pp. 196-205, 1993.

25. J. Schlorer. Disclosure from Statistical Databases: Quantitative Aspects of Trackers,ACM Trans. Database Systems, Vol 5(4), pp. 467-492, 1980.

26. T. Su, G. Ozsoyoglu. Inference in MLS Database Systems, IEEE Trans. Knowledgeand Data Engineering, Vol. 3(4), pp. 474-485, 1991.

27. D. Woodruff, J. Staddon. Private Inference Control, Proc. ACM CCS’4, pp. 188-197,2004.

28. L. Wang, D. Wijesekera, S. Jajodia. Cardinality-based Inference Control in Sum-only Data Cubes, Proc. Europ. Symp. Computer Security, ERORICS, LNCS 2502,pp. 55-71, 2002.