AN APPROACH FOR SECURE AND LEAKAGE RESILIENT SEARCH OVER ENCRYPTED NOSQL DATABASES IN A PUBLIC CLOUD by MOHAMMAD AHMADIAN M.S. University of Central Florida, 2014 M.S. Amirkabir University of Technology, 2009 A Proposal submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical Engineering and Computer Science in the College of Engineering and Computer Science at the University of Central Florida Orlando, Florida Fall Term 2016 Major Professor: Dan C. Marinescu
60
Embed
AN APPROACH FOR SECURE AND LEAKAGE ...cs.ucf.edu/~ahmadian/pubs/Proposal.pdfAN APPROACH FOR SECURE AND LEAKAGE RESILIENT SEARCH OVER ENCRYPTED NOSQL DATABASES IN A PUBLIC CLOUD by
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AN APPROACH FOR SECURE AND LEAKAGE RESILIENT SEARCH OVER ENCRYPTEDNOSQL DATABASES IN A PUBLIC CLOUD
by
MOHAMMAD AHMADIANM.S. University of Central Florida, 2014
M.S. Amirkabir University of Technology, 2009
A Proposal submitted in partial fulfilment of the requirementsfor the degree of Doctor of Philosophy
in the Department of Electrical Engineering and Computer Sciencein the College of Engineering and Computer Science
at the University of Central FloridaOrlando, Florida
a higher level of granularity and delivers a richer experience for modern programming techniques.
Document-oriented databases are using a key to locate the document inside data store. Most docu-
ment stores use JSON or BSON (Binary JSON). Document stores are suited for applications where
the input data can be represented in a document format. A document can contain complex data
structures such as nested objects. Document store allows document grouping into collections. A
document in a collection should have a unique key. Unlike an RDBMS, where every row in a
table follows the same schema, a document in document stores may have a different structure.
Document stores provide the capability of indexing documents based on the primary key as well
as on the contents of the documents. Like key-value stores, they are inefficient in multiple-key
transactions involving cross-document operations.
Graph Databases: This data model based on graphs can be used to represent complex structures
and highly connected data often encountered in real-world applications. In graph databases, the
nodes and edges have individual properties consisting of key-value pairs. Graph databases are
a good alternative for social networking applications, pattern recognition, dependency analysis
and recommendation systems. Some graph databases such as Neo4J 5 support ACID6 properties.
Graph data stores are not as efficient as other NoSQL data stores and do not scale well horizontally
when related nodes are distributed to different servers.
1.2.2 Searchable Security Scheme For NoSQL databases
Data security in cloud platform is critical for the applications running on public clouds because
multiple virtual machines (VMs) often share the same physical platform [50, 51, 52]. Using classic
cryptography primitives can protect data while in storage, but even the encrypted data has to be de-
5http://neo4j.com6ACID (Atomicity, Consistency, Isolation, Durability) properties guarantee that transactions are processed reliably.
7
crypted for processing purpose. This is particularly troubling when searching databases containing
personal information such as healthcare or financial records; then the entire plaintext database is
exposed to such attacks. This motivates us to investigate methods for searching encrypted NoSQL
databases. Though general computations with encrypted data are theoretically feasible using the
algorithms for Fully Homomorphic Encryption (FHE) [24], this is by no means a practical solu-
tion at this time. Existing algorithms for homomorphic encryption increase the processing time of
encrypted data by many orders of magnitude compared with the processing of plaintext data. Re-
cent implementation of FHE [28] requires about six minutes per batch; after optimization this time
drop to almost one second for computing simple operation on encrypted data [20]. Other related
methods are Learning With Error (LWE) [7], lattice based encryption [39, 10], and Attribute based
Encryption [26].
1.3 Leakage Proof Data Processing In Public Cloud
Encryption is a common practice to promise privacy of data and query, but still encrypted data and
query are vulnerable against information leakage in cloud platform. A database can be encrypted
by data owner before outsourcing to the cloud in such a way that client queries can still be pro-
cessed on transformed data. Ultimately, the encryption does not hide all information about the
encrypted data, for instance the collection name (or table name in RDBMS), field name, number
of the field, involved in the query and their length often revealing information about the encrypted
data. Moreover, a cloud insider can infer sensitive information from sequence of queries. This type
of attacks on encrypted database is classified as information leakage. Outsourced encrypted data
set should leak sensitive information as little as possible. An acceptable level of security on search-
able encryption can be achieved with the Oblivious RAM (ORAM) [25, 40, 34] method. The major
problem of ORAM is its efficiency and the high computational cost and intense communication
8
between client and server.
We will argue that any query is an object with several features. Therefore, any query is considered
as a point in n dimensional feature space. Then we use a linear classifier with training data set to
extract implicit information from encrypted dataset. Every query is distinct from others in terms
of measurable features, such as the length of query string, number of involved fields, number of
objects, operation between objects, aggregate functions, domain of query and the timing informa-
tion. These features form a fingerprint from each unique query which can be identified uniquely.
Furthermore, the fingerprint of each specific client can be achieved with high confidence based on
the combination of the fingerprints of all the most periodic issued queries. In this research work
We will formulate the information leakage from encrypted data sets then we will define metrics
and cost coefficient of leakage prevention solution, to measure their performance.
1.3.1 Cryptosystems For Outsourced Data Store
Data in the cloud computing can be in one of three states: store, transit, or process. Developers
of web applications need to have efficient tools to protect sensitive information from a third party,
including the CSP. In an effort to maintain security and privacy, any comprehensive data security
mechanism must take into account the protection criteria for data in any of these states.
The communication channels can be secured by using the standard HTTP over Secure Socket
Layer (SSL) communication protocol. Most CSPs provide an API for the web service that enables
developers to use both the standard HTTP and the secure version of the HTTPS protocol. The
security requirements of data in transit state fully can be satisfied by using HTTPS for communi-
cation with cloud. In addition, the endpoint authentication feature of the SSL protocol makes it
possible to ensure clients are communicating with an authentic cloud server.
9
The basic idea is to encrypt the data before uploading it to Cloud. However, the data should be
decrypted by the cloud server before getting processed. In other words, the data owner should
disclose decryption key to the server in order to decrypt the data before performing any required
operation. The problem is when the decryption key is compromised, the data confidentiality would
be affected. Therefore, in the cloud computing model, new set of cryptosystems is required. En-
cryption schemes that support operations on encrypted data are called homomorphic encryption
which have a very wide range of applications in cloud computing. In a nutshell, a fully homomor-
phic encryption scheme is a cryptosystem that allows evaluation of arbitrary complex operations
on encrypted data.
A cloud developer is responsible to ensure that the data in cloud storage is protected by authen-
tication based on user’s credentials. Moreover, for highly sensitive data, the risk of illegitimate
access should be considered. For instance, the data should be protected from a malevolent insider
who may gain access to the data. Thus, for protection purposes, the sensitive information should
be encrypted before being uploaded to the cloud. Any type of encryption can be used, since there
is no required data format for cloud storage.
Random (RND). Applying A RND type encryption scheme, a message is coupled with a key and
a random Initial Vector (IV). This scheme is called probabilistic, since encryption of the same
message with the same key yields different ciphertext. This randomness provides the highest level
of security. Randomness property is achievable with different encryption algorithms. Advanced
Encryption Standard (AES) with Cipher Block Chaining (CBC) mode [19] is used for RND en-
cryption. AES is a symmetric block cipher algorithm with a key size of 128,192 or 256 bits and
with a block size of 128 bits. RND type schemes are semantically secure against chosen plaintext
attacks and hides all kind of information about ciphertext. As a result, RND scheme does not allow
any efficient computation on the ciphertext. Equation 1.1 describes the encryption and decryption
10
of a block cipher in CBC mode.
C1 = Ek(P1 ⊕ IV ), P1 = IV ⊕Dk(C1)
for j = 2 . . . n; Cj = Ek(Pj ⊕ Cj−1), Pj = Cj−1 ⊕Dk(Cj)
(1.1)
Where: Ek is the Encryption algorithm, Dk is the Decryption algorithm, k is the secret key P is a
block of plaintext data and C is a block of ciphered data.
Deterministic (DET). A DET encryption scheme is a cryptosystem which always produces the
same ciphertext for an equal pair of given plaintext and key. Block ciphers in Electronic Code Book
(ECB) mode with a constant initialization vector are deterministic (DET). Deterministic encryption
scheme leaks information about ciphertext of same plaintext. AES encryption scheme in ECB
mode is used for DET encryption over document-oriented NoSQL databases. This DET scheme
enables server to process pipeline aggregation stages such as group, count, retrieving distinct values
and equality match 7 on the fields within an embedded document. The embedded document can
maintain the link with the primary document through application of DET encryption. The Equation
1.2 displays the encryption and decryption operation in a DET.
for j = 1 . . . n; Cj = Ek(Pj); Pj = Dk(Cj) (1.2)
Order-Preserving Encryption (OPE). OPE projects the order relation between plaintext data
elements to their ciphertext values. OPE leaks the order of ciphertext, so it supports a lower degree
of security. Even in Modular Order-Preserving Encryption (MOPE) [38] which is an extension
to the basic OPE for security improvement, there is information leakage. An efficient inequality
comparisons on the encrypted data elements can be performed by applying OPE which supports
7Equality matches over specific common fields in an embedded document will select documents in the collectionwhere the embedded document contains the specified fields with the specified values.
11
range queries, comparison, Min(), Max() on the ciphertext. We use the algorithm introduced in
[6] and implemented in [4] for cloud environment. Equation 1.3 shows the preservation of order
relation in plaintext and the ciphertext.
∀x, y | x, y ∈ Data Domain x < y =⇒ OPEk(x) < OPEk(y) (1.3)
Additive Homomorphic Encryption (AHOM). AHOM is a scheme that allows the server to
conduct computations on ciphertext with the final result that get decrypted at the proxy. In spite
of sustained research efforts [24, 8] of the Fully Homomorphic Encryption (FHE), there is no
efficient FHE, except for limited operations. We applied Paillier [41] scheme that supports additive
operations as shown by Equation 1.4. It should be noted that m1,m2 are messages to be encrypted
where m1,m2 ∈ Zn. r1, r2 are randomly selected and r1, r2 ∈ Z∗n. In other words, the product of
two ciphertexts decrypt to the sum of their corresponding plaintexts.
Information leakage is the ability of an attacker to infer sensitive information either through mul-
tiple database searches or through statistical analysis of cloud database queries. In a nutshell,
information leakage can be defined as using combination of data, meta-data and query that are
classified at lower level L1 to extract information that are at higher level L2.
In this research, we restrict our discussion to secure query processing particularly over encrypted
NoSQL databases with minimum information leakage. The key part of SecureNoSQL is evaluation
a set of operations on the encrypted databases. Moreover, the designed novel algorithms for in-
12
formation leakage prevention from data or query are added to SecureNoSQL. We also introduced
a novel descriptive language based on the JSON8 notation which enables the users to generate a
security plan. The security plan is useful tools for data owners for regulating security parameters
management without getting involved in the details. We considered four sections for any secu-
rity plan, the collection, data element, cryptographic modules and the mapping between them.
The concurrent queries are supported by the present designed structure; however, for the relevant
concurrent experiments, a network of multiple servers and clients are required. At this moment,
such configurations and hardware setup were not available. Thus, for some experiments of this
research we have used EC2 instances which is consistent with the final goal of this study. Since the
standard Database Management System (DBMS) are used in this work, therefore the concurrent
queries over encrypted distributed datasets are automatically supported without extra cost.
1.4 Roadmap
We discuss all of our approaches and solutions addressed above in the rest of this proposal which
has been organized as follows: the latest related work and researches on the subjects of secure
query processing and information leakage prevention are reviewed in Chapter 2. Chapter 3 repre-
sents the research objectives, motivation, threat model, JSON and BSON and finally we describe
the problem statement.
All the experiments of prototype systems are presented in Chapter 4. We propose two schemes for
secure query processing over encrypted data sets and information leakage management. The orga-
nization and the structure of security plan and the notation of descriptive language for generation
of security plan are discussed in Section 4.1. Afterwards, the mechanism for information leakage
8JSON (JavaScript Object Notation), is a lightweight text based syntax for storing and exchanging data objectsconsisting of key-value pairs. It is used primarily to transmit data between a server and web application. JSONpopularity is due to the fact that it is self-describing and easy to understand by human and machine.
13
prevention is discussed in Section 4.2. Finally this proposal is concluded in Chapter 5 with the in
progress and completed tasks time table as well as the published and under review papers.
14
CHAPTER 2: RELATED WORK
High scalability and distribution feature are the most important requirements for processing a large
volume of data which is mostly created by human or connected devices. DBaaS is extensively used
for data processing and meets both aforementioned requirements. Furthermore, DBaaS enables
users to use a database without running their own sever. In DBaaS setup, CSP takes the responsi-
bility of maintaining the hardware and the software. The cost for the service is proportional to the
usage of resources. Although easy launch of database through web-based console is an alluring
option, DBaaS brings in series of new security risks which need to be addressed. Some of the
studies on DBaaS focus on information leakage caused by sharing physical infrastructure among
multiple virtual machines. The study concluded by Ristenpart et al [43] showed the Infrastructure
As A Service (IaaS) model is susceptible for information leakage despite the isolation of virtual ma-
chines. A method called “Advanced cloud Protection System (ACPS)” for secure visualization in
cloud environment, proposed by Lombardi et al [36], mitigates security risks for external attackers
assuming the cloud is trustworthy.
The performance and efficiency of DBaaS have been extensively studied in the literature [27,
18, 17]. Techniques to improve workload balancing between clients and server and graph-based
partitioning algorithm for improving the performance and obtaining almost linear elastic scale-out
are introduced in [18]. Furthermore, a new benchmark framework compares DBaaS performance
offering by various CSPs [17].
The first SQL-aware query processing over encrypted database was CryptDB [42]. CryptDB sat-
isfies data confidentiality for the relational database. However, CryptDB cannot perform queries
over data encrypted with different keys. One important application of searching on encrypted
data [11, 45, 48] is in cloud computing where the clients outsource their storage and computation.
15
In [11] a practical searchable security scheme is introduced which can search on encrypted data
sets in sub-linear time complexity by using different types of indices, however it is not practical
on NoSQL data sets which are designed to scale to millions of users doing updates simultane-
ously [13].
NoSQL databases are suffering from lack of proper data protection mechanism because these
databases have been designed to support high performance and scalability requirement. In or-
der to protect personal and sensitive information, a privacy and security preserving mechanism is
required in big data platforms. Integration of privacy aware access control features into existing
big data are discussed in the [30]. The evolution of big data systems from the perspective of an in-
formation security application is studied in [23, 47]. A cloud based monitoring and threat detection
system proposed by [16] for critical component to make infrastructure systems secure. Security in
DBaaS has been studied by several research projects [42, 29, 48, 31]. In all of these researches the
cryptosystems applied for encrypting databases before outsourcing to the CSP, in the same way
queries are encrypted and processed on the server. This is a practical general approach for pro-
tection of sensitive data at the off-site data-store. For example, in [42] CryptDB is introduced for
processing queries over encrypted relational databases. Similarly, in SecureNoSQL is proposed
for processing queries over encrypted NoSQL databases in cloud platform. The system supports
access to a MongoDB1 encrypted document-store database. SecureNoSQL is a secure proxy that
allows the applications to access and process queries on the encrypted datasets. The proxy receives
queries from clients, extracts the elements of the query, applies security parameters on them, and,
finally, forwards them to the cloud database server. After an encrypted query is processed by the
database the proxy receives the results, decrypts it and forwards it to the client. SecureNoSQL is
an open infrastructure easily extended with new encryption modules. To implementation of leak-
1MongoDB is a document-oriented NoSQL database which adopts the concept of traditional table-based relationaldatabase structure in favor of JSON-like documents with dynamic schema.
16
age prevention algorithms, the construction of SecureNoSQL has been further developed for the
study discussed in this research. The leakage prevention mechanism also implemented inside the
SecureNoSQL. We have implemented a number of cryptosystems for different types of queries and
now we describe the characteristics of these cryptosystems and their applications.
Information leakage issue in a single untrusted server is studied in [49] and statistical measurement
of information leakage investigated in [15]. The weakness of k-anonymity solution for protection
against identity disclosure is recovered by introducing t-closeness in [33] which requires closeness
between distribution of sensitive attributes in the equivalent classes to the global distribution of
attributes.
To protect sensitive data from untrusted CSP the existing crypto-primitives which require de-
cryption key for processing could not be applicable, consequently the research track of finding
cryptosystems that allow processing over ciphertext data has been became appealing. Most of
researches focused on Homomorphic Encryption that allows computations to be carried out over
encrypted data [24]. Other cryptosystem that relaxed on security notion is Order-Preserving En-
cryption (OPE) also introduced in [6] and implemented in [4] for cloud platform. Untrusted CSP
still can extract information from encrypted data. In the majority of the research works in the lit-
erature, it is assumed that applying cryptographic techniques adequately provide protection in the
untrusted cloud platform, while this assumption is not utterly true. The information leakage from
encrypted data in the cloud is a plausible risk and very few works address this risk. The reported
research in this thesis, leverages the leakage-free query processing over very large scale encrypted
datasets. Ultimate goal is minimizing the information leakage with efficient solutions; therefore, a
Figure 4.3: Structure and description of Collection: (a) The chart outlines the structure of collection contain-ing the name of collection and name of all fields which are considered as meta-data thus should be protectedwith proper cryptographic module. The pointer to a cryptomodule, the encryption key, and the initializationvector used for the encryption of the items. (b) The description of a collection and security parameters indesigned JSON based language. In this specific case the Advanced Encryption Standard in deterministic(AES-DET) mode with a 128-bit key and an initialization vector (IV) is assigned to encrypt the name of thecollection and the fields name.
Typically, all documents in a collection are related with one another.
Cryptographic modules. There are various encryption algorithms for different applications, each
with diverse strengths and weaknesses. The choice of a particular cryptosystem depends on the se-
curity policy of applications. Criteria for algorithm selection include: the security against theoreti-
cal attacks, cost of implementation and performance issues whether the encryption and decryption
can be parallelized in CPU pool like cloud computing. Other factors may be involved in the selec-
tion of an algorithm are the memory requirements and the integration in the overall system design.
According to the proposed format, the Cryptographic modules introduces all encryption modules
30
and their parameters such as key, key-size, initialization vector and output-size. The structure of
this section depicted in Figure 4.4a and the listing introduced in Figure 4.4b is displaying second
section of security plan for the previous example.
Figure 4.4: Structure and description of Cryptographic modules: (a) Security Plan with the second section,the cryptographic module, expanded. The attributes included for each module are: name, type, key size,key, input and output size. (b) The OPE encryption including the cryptosystems and their attributes. Theproxy applies these modules using the key-value pairs (KVP).
Our proof of concept uses the parametric Order Preserving Encryption (OPE) and the Advanced
31
Encryption Standard(AES) modules. The system is open-ended, users can add the cryptosystems
best suited to the security requirements of their application. In our design the definitions of the
cryptographic modules and of the pairs, encryption key and initialization value, are separated fol-
lowing the so-called key separation principle [22]. This security practice is based on the observa-
tions that users have long- and short-term security policies. The cryptographic modules are less
likely to change while the key and the initialization value change frequently.
The data elements. The third section of security plan, the data elements and their properties are
covered. Figure 4.5 presents the structure and description of Data element section of Security plan.
The listing displayed in Figure 4.5b displays data elements and its JSON description for previous
example. To ensure the desired level of security the security plan should provide the description of
all sensitive data elements of database in third section of security plan.
Figure 4.5: Structure and description of Data element: (a) The chart outlines the structure of Data elementscontaining attributes of data elements such as name, type and value for of collection and name. Thenintroduces security parameters for each data elements. (b) The data element section of a sample databasewhich are represented in designed notation. A data item has 7 fields: id, name, salary, balance, ccn, ssn, andemail. The id, name, email and salary are required fields.
Mapping cryptographic modules to the fields The last section of security plan specifies all cryp-
32
tographic modules for all sensitive data fields. Figure 4.6 and the listing presented in Figure 4.6b
show the mapping of the cryptographic modules and the corresponding JSON format for a sample
Figure 4.6: Structure and description of Mapping cryptographic modules to the Data element: (a) Securityplan with the fourth section expanded. This section establishes a correspondence between the data fieldsand the cryptographic modules used to encrypt and decrypt it. (b) The mapping section of the schema fora sample database with 7 fields. For example, the id and the name will be encrypted with OPE 128 bit andAES-DET, respectively.
As outlined in Section 1, the method presented in this work can be easily extended to the other
NoSQL data models discussed in Section 2. Figure4.7 shows how this extension from the KV to
the document store model can be carried out.
Query and data validation The proxy validates the data and query as a JSON-formatted input
with the reference security plan. Afterward, enforcing assigned crypto-primitives, generates new
query with respect to NoSQL query semantic; in this process it applies to each field the cryp-
tographic modules described in the mapping section of the schema, Finally, the proxy forwards
33
Cryptographic module z
Key1 V alue1
... ...
Keyn V aluen
Cryptographic module1
...
Cryptographic modulen
(a)
Collection name
Cryptographic module x
Document ID
Cryptographic module y
Cryptographic module z
Key1 V alue1
... ...
Keyn V aluen
Cryptographic module1
...
Cryptographic modulen
(b)
Figure 4.7: SecureNoSQL applied to: (a) The key-value data model; Key1, . . . ,Keyn are all encryptedusing the cryptographic module z while the corresponding values, V alue1, . . . , V aluen are encrypted withcryptographic modules 1, 2, . . . , n, respectively. (b) The document store data model; the meta-data such asscollection name encrypted as well as attributes with assigned cryptographic modules.
new encrypted query/data to the NoSQL database server. Figure 4.8 depicts the schema validation
process.
For better illustration, consider listings depicted in Figure 4.9a as an input data after running val-
idation process the output is generated (see Figure 4.9b). The output of validation process is a
single file which contains descriptive information for data and meta-data in designed format and
ready to execute on the SecureNoSQL.
The output of validation process is a single file which contains descriptive information for data
and meta-data in designed format and ready to execute on the SecureNoSQL. The final output of
validation process for example is illustrate in Figure 4.9b. As it noted earlier in Section 3.5, the
34
JSON Data/Query Security plan
Validation of data elements (format matching)
Extraction of encryption parameters
Applying cryptomodules to the data and metadata
Forward encrypted Data/Query to cloud NoSQL server
NoSQL server
Figure 4.8: The validation process of input data against security plan in the client side.
prosed scheme is proportional to desired security level which explicitly expressed in security plan
for any database. In Table 4.1 the data overhead based on the different parameters for several
crypto-primitive are contracted.
Table 4.1: Overhead of encryption upon security level
Database Plain OPE64 OPE128 OPE256 OPE512
Size(MB) 170 430 508 662 1000
4.1.3 Processing Queries On Encrypted Data
According the proposed scheme, in order to process queries over encrypted data the queries should
transferred to the encrypted version with respect to security plan, this task is designed to conducted
by our secure proxy. The security plan provides the assigned cryptographic modules to be applied
to the different fields of query. Figure 4.10 displays the processing and rewriting of a sample query.
Figure 4.9: Security plan designed for sample input: (a) Data element section of sample security plan. (b)Output of JSON Data validation for sample database.
For better understanding the query encryption, in Table 4.2 you can find some sample encrypted
queries after enforcing security plan. As it can be seen, data elements and immediate values are
encrypted, however the output is consistent with NoSQL semantics.
36
and
≥
salary 5000
≤
balance 2000
(a)
and
≥
9mnGu8Q2VDstE+T9jFw2wQ==
3986410786398723978941641627711702
≤
5pgAxn6BF08WtM7zyuYaKg==
161374267674800082431533686937402
(b)
Figure 4.10: The query db.customers.find({salary:{$gt:5000}, balance:{$lt:2000}}) received from an ap-plication. (a) The parsing tree of the query (b) The cryptographic modules applied to the data elementsaccording to schema definition
.
Table 4.2: Sample queries and their corresponding encrypted versionQuery Encrypted query
The experiments to measure the query time must be carefully designed. To construct average
query processing time each experiment has to be carried out repeatedly. We noticed a significant
reduction of database management response time after the first execution of a query, a sign that
MongoDB is optimized and caches the results of the most recent queries. A solution is to disable
37
the cache, or if this is not feasible, to clear the cache before repeating the query. Another important
observation is that modern processors have a 64-bit architecture and are optimized for operations
on 64-bit integers. This explains why for three of the five types of queries, Q2 (Range query), Q3
(equality), and Q4 (logical), database response time is slightly shorter for the encrypted database
than for the unencrypted one when the keys are 32-bit integers.
Comparison EqualityRange Logical Aggregation
300
350
400
450
500
550
600
650
700
Que
rypr
oces
sing
time
(mic
rose
cond
s)
32 bit64 bit
128 bit256 bit512 bit
Figure 4.11: Query processing time in milliseconds (ms) for the unencrypted database and for the encrypteddatabases when the 32-bit keys are encrypted as 64, 128, 256 and 512-bit integers.
Our measurements show that the response time of the NoSQL database management system to
encrypted data depends on the type of the query. The shortest and longest database response time
occur for Q1 (comparison) and Q5 (aggregated queries), respectively; for these two extremes the
time for the unencrypted database almost doubles, but the time for encrypted databases increases
only by 70− 80%. As expected, the query processing type for a given type of query increases, but
only slightly, less than 5% when the key length increases from 64, to 128, 256, and 512 bit. As
expected, the OPE encryption time increases significantly with the size of the encryption space; it
38
increases almost tenfold when the size of the encrypted output increases from 64-bit to 1024-bit
and it is about 10 ms for 256-bit. The decryption time is considerably smaller, it increases only
slightly from 0.11 ms to 0.17 when the size of the encrypted key increases from 64-bit to 1024 bit.
Secure proxy is an important element for the proposed architecture; therefore, the potential attacks
that could affect the proxy, also should be taken to considerations. In general, two major possible
attacks on proxy are Denial of Service (DoS) and unauthorized access. In DoS attack, the attacker
sends so many network traffic to the proxy, that the system is not capable of process within the
expected time frame. Successful DoS attacks can turn the proxy to a bottleneck of the system.
In unauthorized access attacks, attackers use a proxy to mask their connections while attacking
to the different targets. For improving the security of proxy against DoS attacks and reducing the
consecutive impacts, there are different solutions including blocking the undesired packets or using
multiple proxies with load balancers. Moreover, for prevention of unauthorized access attacks, it
is required to use best fit authorization to access the proxy. User authentication based on group
membership with different authorizations are best practical solutions.
4.2 Leakage Prevention In DBaaS
Encryption is a common practice to promise privacy of data and query, but still encrypted data and
query are vulnerable against information leakage in cloud platform. A databases can be encrypted
by data owner before being outsourced to the cloud in such a way that client queries can still be
processed on the transformed data. Ultimately, the encryption does not hide all information about
39
the encrypted data. For instance, collection name (or table name in RDBMS), field name, number
of field, involved in a query and their length often reveal sensitive information about the encrypted
data. Moreover, a cloud insider can infer sensitive information from sequence of queries. This type
of attacks on encrypted database is categorized in information leakage class. Outsourced encrypted
data set should leaks sensitive information as little as possible. An acceptable level of security
on searchable encryption can be achieved with the proposed scheme. For studying information
leakage from DBaaS model, we choose NoSQL database model with flexible scheme. In the data
model of NoSQL, a database is depicted as a collection of documents C = {d1, d2, . . . , dn} and
accordingly a document is modeled with a set of key-value pairs {ki, vi} each of which represents
an attribute of an object.
4.2.1 Problem Statement
We assume that the data is fully or partially encrypted before being outsourced to the CSP. How-
ever, fully or partially encrypted databases in the cloud are at the risk of information leakage in the
presence of a malicious cloud insider who potentially could pool all databases and extract sensi-
tive information from correlation between various hosted databases. This work characterizes most
common sources of information leakage from encrypted NoSQL databases. We propose and ana-
lyze a secure query processing system with minimum information leakage in an untrusted cloud.
Also a metric to quantify the information leakage is introduced. This work currently is under
progress and the experimental results will be presented in the dissertation.
40
CHAPTER 5: CONCLUSION
We presented a novel searchable secure scheme over encrypted NoSQL databases which provides
protection for sensitive information in presence of two important threats confronting database-
backed applications. The proposed scheme meets all design objectives with respect to three prin-
ciples: i. Running queries efficiently over encrypted data using a novel JSON-aware encryption
strategy, the evaluation on a large trace of queries from a variety of databases running on the cloud
DBaaS shows that SecureNoSQL can support search operations over encrypted NoSQL data. The
throughput penalty of SecureNoSQL is modest, resulting in a reduction of 1425% on performance
of query processing time as compared to Plain database. Our security analysis shows that Se-
cureNoSQL protects most sensitive attributes of collection with highly secure encryption schemes
for variety of applications. ii. With application security plan which is novel notion introduced in
this work for automation of security parameter configuration to enforce security policy on database
and relevant queries. Intuitively, the life time of encryption key is shorter than encryption algo-
rithm and we expect key change happening more frequently than changing cryptosystem itself.
By using the designed descriptive language, the data owner manage the security parameters to the
secure proxy with minimum effort. iii. Our security analysis shows that SecureNoSQL protects
most sensitive attributes of collection with highly secure encryption schemes for variety of appli-
cations. Furthermore, the server application is kept unmodified and the user never involved in the
complexity of security measures.
The secure proxy is a critical component of the system, it is multi-threaded and the cache man-
41
agement is non-trivial. The management of the security attributes is rather involved. On the other
hand, a proxy integrated in the client-side software can be light-weight and considerably sim-
pler. We are currently implementing the two versions of proxy. Experimental results for multiple
large datasets with up to one million documents show that SecureNoSQL is rather efficient. Our
approach can be extended to a multi-proxy structure for big data applications. We are now im-
plementing a sophisticated mechanism for maintaining consistency of hash values database in the
proxies datasets based on PAXOS [32, 37]. Outsourcing encryption data sets to the third party
like cloud environment provides good level security, however encryption of query and data is still
vulnerable against data leakage in cloud platform. The encryption does not hide all information
about the encrypted data, and this is new area for research and investigation for future works. We
introduced novel techniques to protect encrypted data sets to prevent malicious insider to discover
implicit information especially with cross-referencing attack. The propose method introduces data
overhead which is proportional to the desired security level.
5.1 Work In Progress And Tasks Time Table
My research work in progress is on the leakage prevention from both plain and ciphertext databases
hosted by DBaaS. We propose solution for this problem by utilizing data encryption as a primary
approach that protects sensitive information from intruders and malicious insiders. In the rest
of this research, we implement the proposed algorithm in real world cloud service and NoSQL
databases hosted by DBaaS. The tasks have been done so far are shown in blue bars and the tasks
42
in progress are illustrated with red bars, all demonstrated in figure 5.1. The titles of all resulting
papers during this research work are listed in Table 5.1.
Table 5.1: List of publicationsPaper Paper Authorship Journal or StatusNo Title Conference
Paper 1Security of Applications Involving Multiple M.Ahmadian, A.Paya IEEE 28th InternationalOrganizations-OPE in Hybrid Cloud Environments [4] D.Marinescu Parallel & Distributed Processing Published (2014)
Paper 2A security scheme for geographic information M.Ahmadiandatabases in location based systems [3] J.Kho., D.Marinescu IEEE SoutheastCon Published (2015)
Paper 3SecureNoSQL: An approach to secure search on M.Ahmadian, F.Plochan International Journal ofencrypted NoSQL databases in public cloud [5] Z.Roessler, D.Marinescu Information Management (IJIM) Published (2017)
Paper 4An Analysis of Information Leakage due to Insider M.Ahmadian Journal of Information Securityand some Outsider Attackers in Computer Clouds D.Marinescu and Applications Under review
Paper 5Secure Query Processing in Cloud NoSQL [2] M.Ahmadian IEEE International Conference
on Consumer Electronics Published (2017)
Paper 6On information leakage in cloud database M.Ahmadian Transaction of sustainable computationservices D.Marinescu Under review
5.2 Future Work
The current research will be continued by the following suggestions:
43
• Multiple proxies in order to deal with a huge number of clients,
• Developing an efficient, fully homomorphic encryption for unlimited operations over the
encrypted data,
• Encryption key management mechanism development for periodically assigning new key for
cryptosystems in order to obtain higher levels of security.
44
LIST OF REFERENCES
[1] Amazon web services growth unrelenting. (last accessed 3rd May, 2016).
[2] M. Ahmadian. SECURE QUERY PROCESSING in CLOUD NoSQL. In 2017 IEEE in-ternational conference on consumer electronics (ICCE) (2017 ICCE), Las Vegas, USA, Jan.2017.
[3] M. Ahmadian, J. Khodabandehloo, and D. Marinescu. A security scheme for geographicinformation databases in location based systems. IEEE SoutheastCon, pages 1–7, April 2015.
[4] M. Ahmadian, A. Paya, and D. Marinescu. Security of applications involving multiple orga-nizations and order preserving encryption in hybrid cloud environments. IEEE Internationalconf. on Parallel Distributed Processing Symposium Workshops (IPDPSW), pages 894–903,May 2014.
[5] M. Ahmadian, F. Plochan, Z. Roessler, and D. C. Marinescu. SecureNoSQL: An approachfor secure search of encrypted nosql databases in the public cloud. International Journal ofInformation Management, 37(2):63 – 74, 2017.
[6] A. Boldyreva, N. Chenette, Y. Lee, and A. Oneill. Order-preserving symmetric encryption.In Advances in Cryptology-EUROCRYPT 2009, pages 224–241. Springer, 2009.
[7] Z. Brakerski and V. Vaikuntanathan. Fully homomorphic encryption from ring-lwe and secu-rity for key dependent messages. Advances in Cryptology–CRYPTO, pages 505–524, 2011.
[8] Z. Brakerski and V. Vaikuntanathan. Efficient fully homomorphic encryption from (standard)lwe. SIAM Journal on Computing, 43(2):831–871, 2014.
[9] T. Bray. The javascript object notation (json) data interchange format. 2014.
[10] D. Cash, D. Hofheinz, E. Kiltz, and C. Peikert. Bonsai trees, or how to delegate a latticebasis. Journal of cryptology, 25(4):601–639, 2012.
[11] D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Rosu, and M. Steiner. Dynamicsearchable encryption in very-large databases: Data structures and implementation. Networkand Distributed System Security Symposium (NDSS14), 2014.
[12] D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Rosu, and M. Steiner. Highly-scalablesearchable symmetric encryption with support for boolean queries. Advances in Cryptology–CRYPTO 2013, pages 353–373, 2013.
[13] R. Cattell. Scalable sql and nosql data stores. ACM SIGMOD Record, 39(4):12–27, 2011.
[14] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra,A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACMTransactions on Computer Systems (TOCS), 26(2):4, 2008.
[15] K. Chatzikokolakis, T. Chothia, and A. Guha. Statistical measurement of information leak-age. In International Conference on Tools and Algorithms for the Construction and Analysisof Systems, pages 390–404. Springer, 2010.
45
[16] R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon, R. Masuoka, and J. Molina. Controllingdata in the cloud: outsourcing computation without outsourcing control. Proc. of the ACMworkshop on Cloud computing security, pages 85–90, 2009.
[17] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloudserving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing,pages 143–154. ACM, 2010.
[18] C. Curino, E. P. Jones, R. A. Popa, N. Malviya, E. Wu, S. Madden, H. Balakrishnan, andN. Zeldovich. Relational cloud: A database-as-a-service for the cloud. 2011.
[19] J. Daemen and V. Rijmen. Aes proposal: Rijndael. 1999.
[20] L. Ducas and D. Micciancio. Fhew: Bootstrapping homomorphic encryption in less than asecond. Advances in Cryptology–EUROCRYPT 2015, pages 617–640, 2015.
[21] S. Faber, S. Jarecki, H. Krawczyk, Q. Nguyen, M. Rosu, and M. Steiner. Rich queries onencrypted data: Beyond exact matches. In European Symposium on Research in ComputerSecurity, pages 123–145. Springer, 2015.
[22] F. Galiegue and K. Zyp. Json schema: core definitions and terminology. Internet EngineeringTask Force (IETF), 2013.
[23] J. Gantz and D. Reinsel. The digital universe in 2020: Big data, bigger digital shadows, andbiggest growth in the far east. IDC iView: IDC Analyze the Future, 2007:1–16, 2012.
[24] C. Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009.
[25] O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious rams. Jour-nal of the ACM (JACM), 43(3):431–473, 1996.
[26] S. Gorbunov, V. Vaikuntanathan, and H. Wee. Attribute-based encryption for circuits. Proc.of the Forty-fifth Annual ACM Symposium on Theory of Computing, pages 545–554, 2013.
[27] H. Hacigumus, B. Iyer, and S. Mehrotra. Providing database as a service. In Data Engineer-ing, 2002. Proceedings. 18th International Conference on, pages 29–38. IEEE, 2002.
[28] S. Halevi and V. Shoup. Algorithms in helib. CRYPTO–Advances in Cryptology, pages554–571, 2014.
[29] H. Hu, J. Xu, C. Ren, and B. Choi. Processing private queries over untrusted data cloudthrough privacy homomorphism. In Data Engineering (ICDE), 2011 IEEE 27th InternationalConference on, pages 601–612. IEEE, 2011.
[30] M. Islam and M. Islam. An approach to provide security to unstructured big data. 8th Interna-tional Conf. on Software, Knowledge, Information Management and Applications (SKIMA),pages 1–5, Dec 2014.
[31] M. Kuzu, M. S. Islam, and M. Kantarcioglu. Distributed search over encrypted big data.In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy,CODASPY ’15, pages 271–278, New York, NY, USA, 2015. ACM.
[32] L. Lamport. Paxos made simple. ACM Sigact News, 32(4):18–25, 2001.
46
[33] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity andl-diversity. In IEEE 23rd International Conference on Data Engineering, pages 106–115,2007.
[34] C. Liu, L. Zhu, M. Wang, and Y.-a. Tan. Search pattern leakage in searchable encryption:Attacks and new construction. Information Sciences, 265:176–188, 2014.
[35] H. Liu. Amazon data center size. published March, 13, 2012.
[36] F. Lombardi and R. D. Pietro. Secure virtualization for cloud computing. Journal of Networkand Computer Applications, 34:1113–1122, 2011. Advanced Topics in Cloud Computing.
[37] D. C. Marinescu. Cloud computing: theory and practice. Newnes, 2013.
[38] C. Mavroforakis, N. Chenette, A. O’Neill, G. Kollios, and R. Canetti. Modular order-preserving encryption, revisited. Proc. of the 2015 ACM SIGMOD International Conf. onManagement of Data, pages 763–777, 2015.
[39] D. Micciancio. Lattice-based cryptography. Encyclopedia of Cryptography and Security,pages 713–715, 2011.
[40] R. Ostrovsky. Efficient computation on oblivious rams. In Proceedings of the twenty-secondannual ACM symposium on Theory of computing, pages 514–523. ACM, 1990.
[41] P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In Ad-vances in cryptologyEUROCRYPT99, pages 223–238. Springer, 1999.
[42] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan. Cryptdb: Protectingconfidentiality with encrypted query processing. Proc. of the Twenty-Third ACM Symposiumon Operating Systems Principles, pages 85–100, 2011.
[43] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: ex-ploring information leakage in third-party compute clouds. In Proceedings of the 16th ACMconference on Computer and communications security, pages 199–212. ACM, 2009.
[44] S. Sivasubramanian. Amazon dynamodb: A seamlessly scalable non-relational database ser-vice. Proc. of ACM SIGMOD Int. Conf. on Management of Data, pages 729–730, 2012.
[45] D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searches on encrypted data.Proc. IEEE Symposium on Security and Privacy, pages 44–55, 2000.
[46] M. Stonebraker. Sql databases v. nosql databases. Commun. ACM, 53(4):10–11, Apr. 2010.
[47] C. Tankard. Big data security. Network security, 2012(7):5–8, 2012.
[48] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. Processing analytical queries overencrypted data. Proc. of the VLDB Endowment, 6(5):289–300, 2013.
[49] S. E. Whang and H. Garcia-Molina. Managing information leakage. 2010.
[50] L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren. Information security in big data: Privacy anddata mining. Access, IEEE, 2:1149–1176, 2014.
[51] L. Xu, X. Zhang, X. Wu, and W. Shi. Abss: An attribute-based sanitizable signature forintegrity of outsourced database with public cloud. Proc. of the 5th ACM Conf. on Data andApplication Security and Privacy, pages 167–169, 2015.
47
[52] X. Yu and Q. Wen. A view about cloud data security from data life cycle. International Conf.on Computational Intelligence and Software Engineering (CiSE), pages 1–4, Dec 2010.