Secure Query Processing in an Untrusted (Cloud) Environment
Secure Query Processing in an Untrusted (Cloud) Environment
Agenda
• Introduction to the model• Overview of different approaches
Model
Data owner / Users Database Service
Provider (SP)
Data
Database
Provides professional database service• Backup• Performance• …
EmpID HourlyRate WorkingHour
2 40 36
Get back data from SP for own use
Find Alice’s record
Introduction: security concern
Data owner / Users Database Service
Provider (SP)
SensitiveData
Trusted Party Untrusted Party
Objectives of our work:(1) Protect sensitive data from being seen by
untrusted party (including SP)(2) Users still enjoy the database service from SP
March 2009, Google Docs allowed unintended access to some private documentsJune 2013, Facebook bug leaks contact info of 6 million users
Secure database system - overview
• Encrypt data before sending to SP
57309 23749300 489226453
EmpID HourlyRate WorkingHour
1 30 23
2 40 36
Data owner (DO) / UserService provider (SP)
EmpID HourlyRate WorkingHour
79826 334164104 547322019
57309 23749300 489226453
Q: SELECT * WHERE HourlyRate * WorkingHour > 900
Q’
2 40 36
Transformed queries, with some ‘trapdoors’ to help SP to compute the answer
Approaches to solve the problem
• Hardware-based solutions– Trusted DB [SIGMOD 2011], Cipherbase [SIGMOD 2013]
• Homomorphic-encryption-based solutions– CryptDB [CACM 2012], MONOMI [PVLDB 2013]
• Secure Multiparty Computation (SMC) approach– ShareMind [PAISI 2012]
• Our solution• Secure indexing approaches
– Orthogonal to above solutions, can be integrated to any of them
– Domain partitioning [SIGMOD 2002]
Before we discuss different approaches…
• SP is assumed to be more powerful.• Users are trusted. They can see plain data.
– A baseline solution exists• Users retrieve the entire encrypted database• Decrypt it, then do whatever they want
– Problems of the baseline solution• High communication cost and high processing cost at users
• What different approaches are trying to do– Delegate the query processing job to SP
• Utilize the power of SP
– Users obtain the (encrypted) query answers only• Low communication cost and low processing cost at users
– We can always revert back to baseline method!
Hardware-based solutions
• Use of secure co-processor• Can store a key on it– No party can observe the
key stored on it• Provides API for
cryptographic actions using the stored key
• Tamper-resistance– Cannot hack the device through physical intrusion
Use of secure coprocessor
Users SP
Data
Database
Secure co-processor(s) is/are installed at SP side
Data
Encryption using a secret key
The key is sent to the secure coprocessor through secure channel
Note: the key is known to users and the secure co-processor only
Find Alice’s record
Decrypt the records one by one and process the query
Result
Can be encrypted or plain. In this example, just return yes/no
Answer
Answer
Decrypt the answer
Optimization strategies
• Add more secure co-processors for parallel processing
• Compute the part of query that does not involve encrypted data on DBMS first– Example: SELECT * FROM T WHERE Price > 10 and
Order_Date < “22 Feb 2014”• If Price is encrypted while Order_Date is not, the DMBS
first processes the predicate Order_Date < “22 Feb 2014”
Pros and cons
• Pros– Strong security protection as long as the secure
coprocessor is not compromised– Can process any query
• Cons– Require special hardware– Expensive
In USD(Data obtained on 7 Feb 2014)
Homomorphic-encryption-based solutions
• Homomorphic encryption– A special type of encryption which allows certain type
of operations (on plain values) to be executed on encrypted values• Let E be an encryption function
– Homomorphic propertyE(f(x, y)) = g(E(x), E(y))
– Examples• RSA
– E(a)*E(b) = E(a*b)
• OPES [SIGMOD 04]– E(a) > E(b) if and only if a > b
Using homomorphic encryptions
E(35) by OPES
ejAAS
Users SP
EmpID HourlyRate WorkingHour
1 50 23
2 30 36
EID HR WH
1 ka6fj h3a45
2 d2s2a Anm24
Sensitive
By OPES
By RSAEID HR WH
1 Hj%3 45877
2 Ks12# AA244OPES
RSA
SELECT HR*WH WHERE HR > 35
HR > 35
>
<
HR*WH
z%^#5
HR*WH
1150
Pros and cons
• Pros– Low overheads in query processing at SP
• Example: just need multiplication on RSA-encrypted data without encryption or decryption
• Cons– Multiple encrypted versions of the same data may
be needed– Does not support composition of operations
• Without data interoperability• Example: cannot compute HR*WH > 6000
Secure Multiparty Computation (SMC) approach
EID WH
2 18
2 13
2 5
Users
SP #1
SP #2
SP #3EID HR WH
1 50 23
2 30 36
EID HR WH
1 60 28
2 31 18
EID HR WH
1 40 56
2 0 5
EID HR WH
1 50 39
2 99 13
Secret sharing
v = v1+ v2 + v3 mod 100
Each SP can’t derive the plain value v by having one share vi only
SELECT EID, WH WHERE HR > 35
By exchanging some information (may involve multiple rounds), the result can be computed securely
EID WH
2 36
Pros and cons
• Pros– Theoretically support any computations– Usually low processing cost at SPs• Most protocols do not need cryptographic operations
• Cons– High communication costs between SPs• Multiple rounds of communication
– The SPs must not be colluding– 3 times the cost due to 3 SPs
Our solution
Users SP
KeysEncrypted data
2-party secret sharing
Storage cost at user is linear to schema size (number of tables and number of columns)
SELECT EID, WH WHERE HR > 35
Some hints for SP to process(derived from keys)Message size depends on keys (small)
Encrypted Results
Key features of our design
• Low processing cost at users– Operate on keys only– Make use of SP’s processing power for query
processing• Allows composition of operations– Example: evaluate Quantity * Price + Fixed_cost• First compute A = Quantity * Price• Then compute Ans = A + Fixed_cost
– Data interoperability
Key features of our design
• Allow operations between plain and encrypted data– Encrypting everything is not suggested• Overheads in processing on encrypted data
– Queries may compose of both plain and encrypted data
– Example: SELECT * WHERE Num_Stock * Stock_Price > 5000• Num_Stock is encrypted, Stock_Price and the constant
are not.
What can our system do?
• SQL structure
SELECT T1.Price*T2.QuantityFROM Inventory as T1 INNER JOIN SaleOrder as T2 ON T1.itemID = T2.itemIDWHERE T1.Stock*T1.Price < 10,000
• On integer type data
Projection with numeric operations; can be expressions composed with addition, multiplication
Equi-join
Predicate(s) to filter result tuples; support AND/OR/NOT; support expressions
More operations
• INSERT/UPDATE– Example:
UPDATE T1 SET Salary = Salary * 1.05 WHERE PeerScore + ManagerScore > 30
• Basic aggregate function: COUNT/SUM/AVERAGE– Example:
SELECT SUM(HR*WH) FROM T WHERE Age < 30
Can be an expression
Just like selection
Limitations
• Incur high processing cost to SP, due to massive cryptographic operations
• Still under development– Currently focus on integer type data– Query plan optimization
END.
ADDITIONAL MATERIALS
SMC Example: addition protocol
z
s3 + r1
SP #1
SP #2
SP #3
x y
x1 y1
x y
x3 y3
x y
x2 y2
Operation: z = x + y
s1 = x1+y1-r1
v = v1+ v2 + v3 mod n
s2 = x2+y2-r2
z
s1 + r2
z
s2 + r3s3 = x3+y3-r3
z1 + z2 + z3 = x + y
Our solution
X Y
x1a y1a
x2a y2a
Users SP
Row-id X Y
r1 x1a y1a
r2 x2a y2a
… … …
Row-id X Y
E(r1) x1b y1b
E(r2) x2b y2b
… … …
2-party secret sharingRow-id X
ckX
Ycky
r1
r2
Column key for each column
X Y
x1b y1b
x2b y2b
It incurs a high storage overhead to users Row-ids are encrypted by some
existing encryption method
Without knowing the shares at users, SP can’t recover the plain dataA table of pseudo-
random numbers
The actual storage at both sides
A<2, 2>
B<1, 3>
Users
Row-id A B
1 8 8
2 32 29
Row-id A B
E(1) 9 31
E(2) 22 29
SPUsers only remember the column keys (each contains two values)
A B
2 3
4 1
Plain data
v = v1v2 mod nn = 35
Operation on our encrypted data
A<2, 2>
B<1, 3>
Users
Row-id A B
E(1) 9 31
E(2) 22 29
SP
Similar to SMC, there will be some communications between user and SPBut the communication is uni-directional (only user -> SP)
Operation: C = A+B
C<4, 5>
Ce = A’ + B’
E(1) 20
E(2) 5Some ‘hints’ are sent to SP to help SP compute the operation
Retrieving the data
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Table schema, and column keys at user
Row-id Match?
E(1) No
E(2) Yes
E(6) No
E(4) No
… …
Find the answers
Projection on C only
Row-id C
E(2) 3
E(16) 12
… …
Encrypted answer sent back to userRow-ids must be there
Row-id A B C D
E(1) … … … …
E(2) … … … …
Encrypted values at SP
Decrypting the result
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Table schema, and column keys at user
v = v1v2 mod nn=35
Row-id C
E(2) 3
E(16) 12
… …
Row-id C
2 31
16 17
… …
User computes own item keys
Encrypted answers
C
23
29
…
Decrypt
Without data interoperability
RSA:E1(x) * E1(y) = E1(x*y)
*E1(x) E1(y) E1(a) =
E1(x*y)
OPES:E2(a) > E2(b) if a > b
Supports multiplication over encrypted data
Supports comparison over encrypted data
>E2(a) E2(b)
How to computex+y > b over encrypted data?
User
Operate on different space
decrypt E1(a)then encryptE2(a)
With data interoperability
+E(x) E(y)
>
E(a) = E(x+y)
E(b)
How to computex*y > b over encrypted data?
Other examples: (x1 – x2)2 + (y1 – y2)2 can be computed using addition and multiplication only
Secure item key generator
• INPUT: row key r, column key <m, x>– All are kept private
• System parameter: n, g– Selected by DO, n is public, g is not
• Generation function: vk = mgxr mod n• Security:– Extension of RSA function– Even if an attacker observes several item keys, it is
computationally hard to derive the secret parameters and hence other item keys
Illustration 1:Multiplication of 2 columns
A B
1 2 3
2 4 1
Plain data
A<2, 2>
B<1, 3>
Ae Be
1 9 31
2 22 29
Table schema, and column keys at DO
Encrypted values at SP
n=35g=2
C<2, 5>
Ce
1 34
2 8
Result: C
1 29
2 18
C=AB
6
4
DO SP
<mamb, xa + xb> Ce = AeBe
Proof of correctness
• We have a = magrxa a’
b = mbgrxb b’
• Decryption on Cmamb gr(xa+xb) (a’b’)
= (magrxa a’)(mbgrxb b’)
= ab
A<ma, xa>
B<mb, xb>
Ae Be
E(r) a’ b’
C<mamb, xa+xb>
Ce
E(r) a’b’
DOSP
Illustration 2Addition
• C=A+B– Example: SELECT * WHERE salary + bonus >
40,000• Preparation stage– We add a constant column S to the plain database– S is encrypted, i.e., DO keeps a column key of S, SP
keeps a column of encrypted valuesA B
2 3
4 1
A B S
2 3 1
4 1 1
A B S
2 3 1
4 1 1
DO SP
Plain data
C = A + B
5
5
C<4, 5>
A<2, 2>
B<1,3>
S<11,13>
Ae Be Se
E(1) 9 31 8
E(2) 22 29 4pA = 15pB = 2
A’ = qAAeSepA B’ = qBBeSe
pB
E(1) 29 26
E(2) 4 1
Row key C
1 23
2 1
Item keys
qA = 18qB = 4
Ce = A’ + B’
E(1) 20
E(2) 5
Storage at both sidesDO gives hints to SPSP computes the encrypted answers
pA = 13-1 * (5-2) mod 24pB = 13-1 * (5-3) mod 24qA = 2 * 1115 * 4-1 mod 35qB = 1 * 112 * 4-1 mod 35pA = xs
-1 * (xc-xa) mod Φ(n)pB = xs
-1 * (xc-xb) mod Φ(n)qA = ma * mS
pa * mC-1 mod n
qB = mb * mSpb * mC
-1 mod n
Proof of correctness
• We have a = magrxa a’
b = mbgrxb b’
1 = msgrxs s’ s’ = ms-1g-rxs
• Following the procedure, we have
A<ma, xa>
B<mb, xb>
S<mS, xS>
Ae Be Se
E(r) a’ b’ s’
C<mc, xc>
Ce
E(r) c’
DOSP
c’ = (qAa’s’pA)+(qBb’s’pB)c’ = (ma mspA mc
-1) a’ s’ pA + (mb mspB mc
-1) b’ s’ pBA’ = qAAeSe
pA B’ = qBBeSepB
E(1) 29 26
E(2) 4 1
Ce = A’ + B’
E(1) 20
E(2) 5
qA = ma * mSpa * mC
-1 mod nqB = mb * mS
pb * mC-1 mod n
c’ = (ma mc-1) a’ g-rxspA + (mb mc
-1) b’ g-rxspB
(ms-1g-rxs)pA
= ms-pAg-rxspA
c’ = (ma mc-1) a’ g-r(xc-xa) + (mb mc
-1) b’ g-r(xc-xb)
pA = xs-1 * (xc-xa) mod Φ(n)
pB = xs-1 * (xc-xb) mod Φ(n)
c’ = mc-1 g-rxc (ma grxa a’ + mb grxb b’)
c’ = mc-1 g-rxc (a + b)
Decryption on c’mc
grxc c’ = a + b