Top Banner
SecureCloud Joint EU-Brazil Research and Innovation Action S ECURE BIG DATA P ROCESSING IN UNTRUSTED CLOUDS https://www.securecloudproject.eu/ Requirements & Architecture Specification - Initial Version D1.1 Due date: 30 September 2016 Submission date: 1 October 2016 Start date of project: 1 January 2016 Document type: Deliverable Work package: WP1 Editor: Stefan Köpsell (TUD) Reviewer: Peter Gray (CS) Andrey Elisio Monteiro Brito (UFCG) Dissemination Level PU Public CO Confidential, only for members of the consortium (including the Commission Services) CI Classified, as referred to in Commission Decision 2001/844/EC SecureCloud has received funding from the European Union’s Horizon 2020 research and innovation programme and was supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under grant agreement No 690111.
58

SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Aug 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

SecureCloud

Joint EU-Brazil Research and Innovation ActionSECURE BIG DATA PROCESSING IN UNTRUSTED CLOUDS

https://www.securecloudproject.eu/

Requirements & Architecture Specification - Initial VersionD1.1

Due date: 30 September 2016Submission date: 1 October 2016

Start date of project: 1 January 2016

Document type: DeliverableWork package: WP1

Editor: Stefan Köpsell (TUD)

Reviewer: Peter Gray (CS)Andrey Elisio Monteiro Brito (UFCG)

Dissemination LevelPU Public

√CO Confidential, only for members of the consortium (including the Commission Services)CI Classified, as referred to in Commission Decision 2001/844/EC

SecureCloud has received funding from the European Union’s Horizon 2020 research and innovationprogramme and was supported by the Swiss State Secretariat for Education, Research and Innovation(SERI) under grant agreement No 690111.

Page 2: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data
Page 3: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Contents

1 Introduction 2

2 State-of-the-Art 32.1 Trusted computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Trusted platform module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Secure co-processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.3 Trusted execution environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Software isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Virtual machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Microservices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Software approaches for cloud security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.1 Reduced TCB size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Memory isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.3 Safe programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Use Cases & Requirements 103.1 Use Case 1: Dynamic Electrical Safety Assessment . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.5 Summary of Main Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 COPEL Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.1 Use Case 2: Fault Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.2 Use Case 3: Fraud Detection Application . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Use Case 4: Smart Metering Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.2 Periodical Meter Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3.3 Low-Cost Meter Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3.4 Aggregated Metering Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3.5 General Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.6 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Architecture 354.1 General concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Microservice runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Runtime environment: SCONE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.4 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4.1 Docker containers as microservice foundation . . . . . . . . . . . . . . . . . . . . 404.4.2 Deployment: Secure Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.4.3 Management and lifecycle of secure containers . . . . . . . . . . . . . . . . . . . 45

4.5 Practicability / Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

i

Page 4: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

ii

5 Summary & Conclusions 50

Page 5: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

List of Tables

3.1 Actors in the fault analysis use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Actors in the fraud detection use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Actors used in the metering use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Overview of the use case related to measuring and registering consumption data . . . . . 213.5 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.6 Overview of the use case related to retrieving consumption data . . . . . . . . . . . . . . 213.7 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.8 Overview of the use case related to retrieving billing data . . . . . . . . . . . . . . . . . . 223.9 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.10 Overview of the use case related to retrieving load data . . . . . . . . . . . . . . . . . . . 233.11 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.12 Overview of the use case related to sending sensor data to the cloud . . . . . . . . . . . . 253.13 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.14 Overview of the use case related to calculating the consumption data in the cloud . . . . 253.15 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.16 Overview of the use case related to retrieving of the consumption data . . . . . . . . . . . 263.17 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.18 Overview of the use case related to retrieving of the billing data . . . . . . . . . . . . . . 273.19 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.20 Overview of the use case related to retrieving of the load data . . . . . . . . . . . . . . . . 273.21 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.22 Overview of the use case related to sending the aggregated consumption data . . . . . . . 293.23 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.24 Overview of the use case related to retrieving the billing data . . . . . . . . . . . . . . . . 293.25 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.26 Overview of the use case related to retrieving the consumption data . . . . . . . . . . . . 303.27 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.28 Overview of the use case related to remotely switching the power . . . . . . . . . . . . . 313.29 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.30 Overview of use case regarding remote configuration of a meter . . . . . . . . . . . . . . 323.31 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.32 Overview of use case for installing time of use prices . . . . . . . . . . . . . . . . . . . . 333.33 Success and failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iii

Page 6: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

List of Figures

3.1 Architecture for a few worker processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Architecture for many worker processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Overview of the fault analysis use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Overview of the fraud detection use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Use cases diagram of periodical meter scenario . . . . . . . . . . . . . . . . . . . . . . . . 203.6 Use cases diagram of low-cost meter scenario . . . . . . . . . . . . . . . . . . . . . . . . . 243.7 Use cases diagram of aggregated metering scenario . . . . . . . . . . . . . . . . . . . . . 28

4.1 Overview of the SecureCloud architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 SCONE architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Components of Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4 Workflow of Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.5 Components of Secure Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.6 Workflow of Secure Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.7 Verifiable Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.8 Secure Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.9 Participants and components of the SecureCloud architecture . . . . . . . . . . . . . . . . 464.10 Throughput versus latency for Apache and Redis . . . . . . . . . . . . . . . . . . . . . . . 484.11 CPU utilization for Apache and Redis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1

Page 7: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

1 IntroductionDespite a steady increase in cloud adoption over the past few years some challenges still remain. Whileprivate and hybrid solutions in particular lead the way, there is still an assumption among applicationowners that they will inevitably sacrifice security (integrity) and privacy (confidentiallity) in order todeploy in the cloud. The overall goal of the SecureCloud project is to develop a platform which enables theimplementation, deployment and execution of secure applications within untrusted cloud environments.In the context of the SecureCloud project the term security especially refers to the property that eventhe operator of the cloud infrastructure, who for instance has physical control over the cloud hardware,should not be able to access confidential data nor manipulate them undetectably.

In order to achieve this goal, the SecureCloud project will elaborate how to utilise hardware basedsecurity features built into modern general purpose CPUs. Currently the project focuses on utilising IntelSoftware Guard Extensions (Intel SGX). Besides the pure development of the SecureCloud platform animportant part of the project is the evaluation of the platform. Therefore, the project will implement a setof applications from the area of Smart Grids.This deliverable serves two purposes:

• It will outline several Smart Grid related use cases of the SecureCloud platform;

• It will give a first overview of the envisioned architecture of the SecureCloud platform.

The area of application of the SecureCloud project and thus our use cases are form the field of bigdata processing in smart grids and smart metering. The description of the use cases itself is a basis forderiving specific requirements for the SecureCloud platform, on the one hand, and, on the other hand,defines means for evaluation of the SecureCloud platform. The use cases were selected to encompassdata processing deemed most likely to be outsourced into the cloud. Although the initial use cases arenot really big data use cases they are data intensive enough to be useful as a first benchmark for thepracticability of the SecureCloud platform and its design principles. Additionally, we selected use caseswhich have certain security requirements so that they fit well to the general SecureCloud vision of secureremote execution within untrusted cloud environments.

Based on the collected use case requirements and following modern distributed systems designparadigms we developed a first version of the SecureCloud platform and architecture. The current stateof the architecture and the underlying principles and rationales are explained in the second part of thisdeliverable.

The architectural foundation for the SecureCloud platform and of the applications to be executedon top of it is based on the principle of microservices; small components of a distributed system whichcommunicate over lightweight protocols. Note that this deliverable will concentrate on the descriptionof the parts of the SecureCloud platform necessary to execute a single microservice whereas the relateddeliverable D4.1 will focus on the orchestration of multiple microservices and their interaction, i.e. thecommunication mechanisms and protocols used.

The initial set of use cases and the initial design of the SecureCloud platform presented in thisdeliverable will be refined and extended during the next months. The results will be reported in thesucceeding deliverable D1.2.

2

Page 8: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

2 State-of-the-Art2.1 Trusted computing

Trusted computing provides mechanisms for guaranteeing that a computer system behaves in compliancewith the expectations of a verifying party. The term was coined by the Trusted Computing Group (TCG)[42], a consortium of organisations that develops and implements specifications for trusted computingcomponents. Trusted computing hinges on three fundamental features [41]: (i) protected capabilities: aset of commands with exclusive permissions to access shielded locations, which are safe to access andoperate on sensitive data; (ii) attestation: the process of ensuring the accuracy of information throughreliable evidence; and (iii) integrity measurement and reporting: the process of obtaining informationabout the platform characteristics that affect its trustworthiness.

2.1.1 Trusted platform module

A trusted platform module (TPM) [42] provides facilities for trusted boot, cryptographic key management,and secure storage. It is now present in many commodity server and desktop computers. Its features areimplemented in hardware and are protected against software attacks. By utilising the TPM as a root-of-trust, a chain-of-trust can be created to check the integrity of an entire system or stack of applications.When combined with other components, such as a signing and encryption engine, a random numbergenerator and a hash engine, a TPM can perform a range of security operations in cloud environments:

1. Integrity measurement: A TPM can measure the data and application code loaded onto the platform.This measurement, or fingerprint, can be used to verify that the data and application code have notbeen modified.

2. Sealed storage: A TPM can encrypt and bind data to a trusted execution state. It provides seal andunseal operations that are bound to the TPM’s internal state. Data encrypted with the seal operationcan only be decrypted using the unseal operation if the TPM state matches the original sealingstate.

3. Remote attestation: Using integrity measurements, in the form of signatures, a remote party cancheck if a system is TPM-based and what the state of that system is. By comparing the expected tothe measured state, it can ascertain the trustworthiness of the system.

There have been multiple proposals on how to utilise existing TPM functionality to protectapplications. Terra [15] uses a trusted virtual machine monitor (TVMM) to partition a tamper-resistanthardware platform into multiple, isolated VMs, thus differentiating between general-purpose and trustedcloud applications. It utilises attestation as a building block for bootstrapping trust, and builds a chain-of-trust through attesting the firmware, the boot loader, the TVMM, and the VM that is loaded.

Flicker [23] permits applications to execute logic securely inside the untrusted host environment. Itutilises a TPM for integrity measurements and remote attestation, and executes security-sensitive code inisolation, while trusting only 250 lines of code. It uses the late launch feature of AMD and Intel CPUs toreset the CPU to a trusted state before measuring the integrity of the secure loader block.

TrustVisor [22] protects small security-sensitive code blocks from an untrusted environment by usinga special-purpose VMM. Similar to Flicker, it also enforces isolation using the late launch feature, but ituses a software-based µTPM for the logic that executes securely, avoiding the performance overheadsof hardware-based TPM features. µTPMs are implemented and measured within TrustVisor, preventingmodification by a malicious adversary.

3

Page 9: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

In practice, there are several shortcomings of TPMs in cloud environments that have hindered theiradoption: (i) TPM use implies the existence of a TCB to be attested. This TCB typically includes thewhole cloud stack, which consists of millions of lines of code in production environments; (ii) the TPMmeasurements are carried out only at boot time, and any run-time incident will not be reflected in themeasurement (referred to as the time-of-check versus time-of-use issue); (iii) TPMs add complexity tothe software update process because each update requires re-attestation; (iv) TPMs are not able to protectagainst attacks performed by a cloud infrastructure provider that has physical access to the machines;and (v) existing approaches suffer from the low performance of TPM hardware, especially when using itfrequently for dynamic root-of-trust measurements.

2.1.2 Secure co-processors

In addition to TPM-based approaches, secure co-processors (SCs) can also provide strong securityguarantees in untrusted cloud environments. A SC is a physically secure general-purpose CPU that istrusted to carry out its computation correctly, even if an attacker has physical access to the device (asis the case with cloud providers). Typically a SC is used to perform cryptographic operations, such asthe encryption and decryption of data, calculating signatures, and random-number generation. SCs canalso store cryptographic keys and sensitive data. A SC is protected by tamper-proof hardware, which candetect a physical attack against the device.

Solutions that use SCs typically operate under a model in which all data is sent to the cloud in anencrypted form and only ever operated on inside the SC. The SC is responsible for decrypting the data,computing over the data, and then encrypting the result, all without ever exposing the data or private keysto the untrusted cloud environment.

Many systems have been proposed that use SCs to process data in untrusted cloud environments.For example, Mykletun et al. [27] describe a database-as-a-service model in which they run a completedatabase engine inside an SC. Since they use an IBM 4758 SC with a low clock rate of 99 Mhz and only 2MB of on-board memory, the hardware limitations of the SC inevitably become a performance bottleneck:data needs to be moved continuously between the outside environment and the SC to be processed.

TrustedDB [4] is a database system that uses two separate execution engines to process data, one forsensitive data inside the SC, and another for insensitive data outside. TrustedDB differentiates betweenthe two types of data through annotations in the database schema. Despite using an SC, TrustedDB stillincurs a performance overhead of factor 10 compared to an unmodified database system.

Cipherbase [2] outsources computation over sensitive data but, instead of an SC, it uses an FPGA asthe trusted hardware platform. When all columns are strongly encrypted and every data operation needsto go through the FPGA, Cipherbase achieves 40% of the throughput of an unmodified database.

Since SCs constitute specialised hardware, however, their high cost and specialized nature means thatthey are not widely available in public cloud environments. In addition, the capabilities and performanceof different SCs vary widely. As a result, cloud applications using SCs struggle to achieve acceptableperformance, mostly due to the low clock frequency of SCs and the need to expensive data movementoperations between the main CPU and the SC.

4

Page 10: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

2.1.3 Trusted execution environments

In recent years, hardware vendors like ARM and Intel started the development of mechanisms integratedinto their processors that allow the creation of so-called Trusted Execution Environments (TEEs). ATEE is a strictly isolated execution environment, which runs parallel to a feature-rich operating system.Applications running in a TEE are referred to as trusted applications. The confidentiality and integrity ofdata processed by these are protected by the TEE from the rich operating system. Additionally, applicationrunning in TEEs are isolated from each other. The effort of creating hardware support for TEEs emergedas ARM TrustZone [3], which specification has been released in 2002 with the ARMv6Z sub-architecture,and more recently, with Intel Software Guard Extensions (SGX) with the Skylake architecture in 2015[1, 24]. Both technologies extend the processor’s instructions set to establish TEEs for general-purposeapplications.

Intel SGX

Intel Software Guard Extensions (Intel SGX) is a set of instruction extensions for Intel CPUs that providesecure execution environments at the CPU level. These environments are termed enclaves and applicationscan instantiate them to protect memory regions within the address space of the application. The processorprevents unauthorised accesses to enclaves, even for privileged software such as the BIOS or VMM. Allmemory held in an enclave is encrypted when written to RAM, and is only ever decrypted inside theprocessor before being used. Intel SGX also supports many of the features seen in TPMs, such as remoteattestation, integrity measurements of code and data and privileged instructions. Intel SGX supportsmulti-threaded enclaves and uses special thread control structures to keep track of the thread state in thecase of interrupts. During execution, all code loaded into an enclave has access to enclave memory aswell as to unprotected memory in the parent process’ address space. Intel SGX also provides sealingfeatures that allow enclaves to store and retrieve data in persistent storage.

Haven [5] uses Intel SGX to execute unmodified Windows applications, such as Microsoft SQLServer, inside a secure enclave. Haven places an entire Windows library OS inside the enclave so thatcomplete Windows applications can be executed, which leads to a large TCB with millions of lines ofcode. In addition, the API used by the library OS to interact with the untrusted environment must behardened to prevent attacks. With a standard TPC-E database benchmark, Haven reduces transactionprocessing throughput by 40%.

VC3 [35] uses Intel SGX to execute trustworthy MapReduce computation in an otherwise untrustedcloud environment. It provides both confidentiality of code and data, as well as integrity for thecompleteness of computation results. VC3 runs on unmodified Apache Hadoop [39], and places the OSand VMM outside of the TCB, thus reducing its size significantly. VC3 only incurs a low performanceoverhead compared to unprotected Hadoop: there is a 4.5% slow-down with write integrity and an 8%slow-down with read/write integrity.

However, both Haven and VC3 have essential limitations: Haven places the entire Windows OS intoan enclave, thus relying on a large TCB whose security may be compromised by any vulnerabilities in theOS. While VC3 demonstrates that it is feasible to protect cloud applications with a small TCB and lowperformance overhead, it is limited to one single application (Hadoop) and restricts the implementationlanguage for MapReduce jobs to C/C++.

5

Page 11: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Lastly, Intel encourages software developers to write secure applications from scratch using the IntelSDK. This approach, however, comes with enormous efforts and is impractical in real world productionenvironments, where existing software has been developed and improved over multiple years.

2.2 Software isolation

The use of Virtual Machines (VM) and containers is central to the idea of virtualisation in cloud computing.It increases the efficiency of resource usage and thus reduces costs. Virtualisation enables cloud computingproviders to abstract away from physical resources (i.e., compute, storage, and network) in order to pooland distribute them to end users in such a way that presents them with a complete virtual computingenvironment. The resources of a single (or multiple) physical machine (referred to as the “host”) can beshared across multiple users (referred to as “guests”) while maintaining a logical separation of virtualresources.

2.2.1 Virtual machines

Technological advances in both hardware and software have resulted in a market in which cloud providerscan quickly and cost-effectively provision VMs on a massive scale never seen before.

In recent years a huge variety of different VM-based software stacks were developed. These are onethe one hand commercial solutions like VMWare’s vSphere, Microsoft’s Hyper-V or Amazon’s AWS aswell as open source solutions like Linux’ KVM, XEN or OpenStack to name just a few.

In particular, the advent of OpenStack [29] has simplified the set-up of infrastructure-as-a-service(IaaS) cloud environments that are ready to be configured according to end users requirements. Servicessuch as Chef [7], OpenStack’s Heat or Amazon’s AWS CloudFormation enable cloud consumers toautomatically provision and configure new machines and rapidly deploy services on those VMs as wellas orchestrating large distributed applications based on VMs.

Concurrently, SecureCloud industrial partner CloudSigma has taken an unique approach by usingtheir own proprietary cloud stack built on top of libvirt and KVM; a strategy that allows the companyto rollout new features much faster based on customer demands. CloudSigma is available via a numberof multi-cloud drivers (such as libcloud, fog.io1, jclouds, ansible and more) that enable customers todrive multiple cloud stacks including CloudSigma and Openstack side-by-side. An OpenStack HEATplugin has been developed to enable the provisioning and decommissioning of VMs on the CloudSigmacloud. Flexibility is extended further by the PyCloudSigma library, made available via github, offering aninterface to the CloudSigma API.

Due to the economies-of-scale, cloud providers can offer these services and resources at a fraction ofthe cost of traditional in-house data centres.

2.2.2 Containers

While VM-based solutions have dominated the virtualisation market, we have witnessed recently a movetowards containers. In the highly-competitive market of container technologies, which includes SolarisContainers, FreeBSD jails, Virtuozzo, Apache Mesos and OpenVZ, Docker is the container solution that

1http://fog.io

6

Page 12: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

has gained the most traction with developers and industry. The Docker container environment providesan effective way to develop and deploy applications in a standardised fashion.

Containers use OS-level virtualization [20] and have become increasingly popular for packaging,deploying and managing services such as key/value stores [33, 14] and web servers [17, 34]. UnlikeVMs, they do not require hypervisors or a dedicated OS kernel. Instead, they use kernel features toisolate processes, and thus do not need to trap system calls or emulate hardware devices. This means thatcontainer processes can run as normal system processes, through features such as overlay file systems [28]can add performance overheads [13]. Another advantage of containers is that they are lightweight—theydo not include the suite of services typical of a standalone OS, as they rely on the services provided bythe container host.

Inside a container, all files and libraries required to run an application are stored, which makes it aneffective distribution approach for applications. While VM could be used in the same way for deployment,containers consume significantly less bandwidth and can be deployed readily. Besides encapsulatingsoftware artefacts, a container also isolates the application from other containers on the same host machine.Unlike VMs, containers do not use hardware virtualisation mechanisms. Instead, LXC [16] and Docker[25] create containers of defense using a number of Linux kernel features, including namespaces andthe cgroups interface. By using the namespace feature, a parent process can create a child that has arestricted view of resources, including a remapped root file system and virtual network devices. Thecgroups interface provides performance isolation between containers using scheduler features alreadypresent in the kernel.

For the deployment and orchestration of containers, frameworks such as Docker Swarm [12] andKubernetes [20] instantiate and coordinate the interactions of containers across a cluster. For example,micro-service architectures [40] are built in this manner: a number of thin containers, each with a minimalset of processes, interact over well-defined network interfaces.

2.2.3 Microservices

Through the prevalence of container technologies, a new concept gaining traction is microservices [36,26, 40]. Given lightweight container software isolation solutions, the central idea of microservices isto develop and deploy many small and loosely coupled services. Large systems are then composed ofmany small microservices, whereby communication between the microservices is a central ingredient. Incontrast to existing large and monolithic services, the benefit of microservices is their greater modularity,performance, scalability, flexibility, and maintainability.

2.3 Software approaches for cloud security

Due to the lack of control that cloud tenants have when outsourcing their applications to cloudenvironments, they have to trust the cloud provider and its infrastructure. Clearly the level of control acloud customer has (and therefore the amount of trust he needs) depends on the policies set by the cloudprovider on the one hand and on the other hand on the kind of service requested by the customer (Metalas a Service, Infrastructure as a Service, Platform as a Service, Software as a Service etc.). But for almostany cloud provider it is the case that the customer will not have any kind of physical control over thehardware resources provided by the cloud provider.

7

Page 13: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Therefore, when faced with malicious providers or privileged attackers, this adversary can corruptthe underlying operating system (OS) or the virtual machine manager (VMM). An essential aspect ofcloud security is therefore the size of the trusted computing base (TCB). The TCB of an applicationincludes all software and hardware components that have to be trusted by the tenant (e.g., not to containvulnerabilities) in order to consider the application to be secure. As detailed in the following subsections,a number of approaches have been proposed to increase the trustworthiness of cloud environments withoutrelying on any specialised hardware support.

2.3.1 Reduced TCB size

To avoid having to trust large pieces of system software, such as an entire VMM, researchers haveproposed solutions that minimise the TCB size. Previous work on protecting cloud applications splitsthe system into high-assurance and low-assurance partitions, executing the sensitive and insensitivefunctionality of an application, respectively. The high-assurance part is considerably smaller in size thanthe low-assurance part, thus having a smaller TCB size.

CloudVisor [43] protects the privacy and integrity of VMs on commodity infrastructures. It protectsagainst a malicious VMM using nested virtualisation, creating a small trusted security monitor that isoutside of the control of the VMM. By separating the responsibilities between protection and allocation,CloudVisor is small enough to be verified formally (around 5,500 lines of code). NGSCB [6, 31]implements a small verified isolation kernel, similar to a VMM, that allows multiple OSs to run ontop of it. This way a standard commodity OS can execute alongside a small, trusted high-assuranceOS. Security-sensitive cloud applications can execute inside the restricted high-assurance OS, whileuntrusted applications use the low-assurance OS. Proxos [38] observes that, instead of requiring thecloud application developers to partition applications into components that trust the OS and those that donot, they only need to partition the OS system call interface. Proxos relies on two separate VMs on aVMM, with a private VM containing the security-sensitive application. Any system call made by thisapplication is routed to the trusted or untrusted OS. While Proxos reduces the TCB size considerably, itincurs substantial performance overheads due to context switches and inter-VM communication.

On the downside, all of these approaches define a software-based TCB that must be trusted. As such,they cannot protect against powerful attackers, who have physical access to the cloud infrastructure andcan therefore tamper directly with the TCB. In addition, existing approaches for introducing software-based trusted components (as part of the VMM or OS) incur a performance overhead, which preventstheir adoption in performance-oriented cloud environments.

2.3.2 Memory isolation

Another approach to cloud security is to rely on mechanisms that isolate application memory, which maypotentially contain sensitive application data, from the rest of the system, including the OS or VMM.Overshadow [8] provides an application and the OS with different views of memory: while the applicationcan access plaintext memory contents, the OS can see only an encrypted version. This property, calledmemory cloaking, is enforced by a small trusted VMM. SecureME [9] also provides memory cloaking,but leverages hardware support. InkTag [18] uses para-verification for memory accesses: to perform anyaction, the OS must provide additional information to the VMM, which verifies its behaviour. H-SVM[19] protects VMs against malicious VMMs by introducing changes to the hardware. VirtualGhost

8

Page 14: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

[11] instead utilises compiler instrumentation (sandboxing and control-flow integrity) and VMM-basedruntime checks to protect the application and its memory from an untrusted OS. Although VirtualGhostrequires applications to be modified to identify which memory should be protected, it outperforms theVMM-based approaches above.

Execute-only memory (XOM) [21] is an architectural idea for software isolation. Data is encryptedwhen written to memory and message authentication codes (MACs) are calculated. When the data isread, the MACs are used to check integrity. AEGIS [37] also encrypts memory contents but it interleavesmemory accesses with encryption operations, thus reducing performance overheads. OASIS [30] exploresa generic design of a CPU instruction set extension for externally verifiable initiation, execution, andtermination of an isolated execution environment. It leverages on-chip cache, treating it as main memory,and isolation is provided by using only on-die hardware.

While memory isolation approaches achieve a high level of security, they suffer from a highperformance overhead due to software encryption and decryption. They also require a trusted securitymonitor. Techniques that increase performance by relying on specific isolation features in the memorymanagement unit have yet to be adopted.

2.3.3 Safe programming languages

While reducing the TCB is one meaningful measure to reduce the risk of vulnerabilities, even smallcode bases may contain severe and security critical flaws. The main issue is that writing or generatingerror free code is very difficult and requires enormous effort. To lower the risk that such flaws resultin exploitable security vulnerabilities, software developers increasingly make use of both type-safe andmemory safe programming languages:

A type-safe programming language usually involves additional checks at compile time. Thesechecks ensure that variables, constants, and methods (functions) being used within an expression have anappropriate data type [32]. Examples for (partially) type-safe languages are C#, Java, Standard ML, andC.

Memory-safe programming languages provide mechanisms which prevent security vulnerabilitiescaused by insecure memory operations. Typical examples for such vulnerabilities are buffer overflowsand dangling pointers. Countermeasures include techniques such as disallowing pointer arithmeticsand automatic memory management. Examples of (partially) memory-safe languages include Java, C#,OCaml, Python, and Perl.

9

Page 15: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

3 Use Cases & RequirementsIn the following sections four use cases are described and related requirements are derived. The usecases are all from the area of smart grids considering especially the situation in Brazil in terms of thetechnical, functional, organisational and regulatory situation and demands. The first use case deals withthe management of the smart grid, especially regarding electrical safety assessments. The second andthird use cases are based on data collected from smart meters, whereas one use case is related to faultanalysis and the other covers fraud detection. The last use case also relates to smart metering, but ismainly concerned with billing and incorporates the end-user (i.e. the utilities customer) perspective.In fact the last use case is somewhat different compared to the previous three, because this use casedescribes various different scenarios and the related sub use cases. As they are all related to the same areaof application we refer to them as just one use case (use case 4).

3.1 Use Case 1: Dynamic Electrical Safety Assessment

3.1.1 Overview

Dynamic electrical safety1 assessment is a regular process employed by power system operators andis traditionally performed by engineers with low to no automation. Consequently, it is a relativelyslow process with narrow scope. Based on advances in numerical algorithms and high performancecomputation, higher levels of automation have been introduced, allowing faster and more comprehensiveassessments of power system safety state. Consequently, it was deployed at control centres of the BrazilianNational System Operator (ONS) and has been introduced in its operational planning processes. Powersystem electrical safety assessments (i.e., evaluation of risks of partial or total blackout and equipmentintegrity) can be provided based on a given operating state (e.g., a snapshot representing the systemreal-time state) or for a region surrounding an operating state in a three-dimensional parametric space,i.e., a safety region.

The computational effort depends basically on the system model size, the type of safety assessment andthe number of scenarios to be investigated. System sizes can typically vary from a few thousand equationsfor small countries to hundreds of thousands equations for large, typically continental-interconnected,grids. There is also a trend in increasing simulation models with higher penetration of distributedgeneration such as wind and photovoltaic sources. The computation of a safety region can requirethousands of time domain simulations (numerical integration of a set of differential and algebraicequations). Therefore, the overall computational effort can be huge, which demands parallel processing.

Other applications, such as risk analyses and system restoration evaluation and guidance, which arealso computationally intensive, can significantly benefit from similar automation and algorithms.

For control centre applications, where the focus is on the current operating state, dedicated hardwarehas been used to execute the computations. However, the need of computational resources is much higherfor the several operational planning processes. There would be significant cost and scalability advantagesin hosting such applications in the cloud. In this case, the main concern of utilities and system operatorsare related to the confidentiality of the data. The correctness of the results is also a concern, but changesin input data or program logic are likely to produce easily noticeable effects.

1The term “electrical security assessment” is more common–but we decided to stay with the term “safety” instead of“security” here to not confuse the reader with the term “security” used if we refer to information/data-security.

10

Page 16: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

3.1.2 Architecture

Figure 3.1 shows a parallel processing Static and Dynamic Safety Assessment (SDSA) configurationused with a few worker processes, say up to 16. A more suitable configuration for a large number ofworkers is shown in Figure 3.2 (the dashed line implies multiple worker processes). Simulation data istransferred from a user to a central process, called Manager, that distributes tasks to workers, collects andsynthesises the respective simulations, and sends overall results back to the user. Depending on the tasksto be performed, several nodes may be needed in the distributed process.

Figure 3.1: Architecture for a few worker processes

Figure 3.2: Architecture for many worker processes

The current implementation is based on the Message Passing Information (MPI) standard. As thesystem has only been used on site so far, the Manager process retrieves the necessary information fromits local network. In a cloud implementation, either the user or an automatic process would send the datato the Manager, which broadcasts them to all workers. Then, simulation tasks are assigned to workers ona first-asked-first-served basis. This is done with non-blocking point-to-point communication. Dependingon the complexity of the assigned task, a worker can send partial or final results back to the/a Manager(again using point-to-point communication) and will receive another task or enter standstill state, if there

11

Page 17: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

is no other task to be performed. At the end of the overall computation process, the Manager sends theresults to the user.

The SDSA software is written in Fortran 90/95 and can interface with components written in C++ andC#. It can be compiled with graphical user interface for Windows environment or as a console applicationor background (daemon) process for Windows and Linux. The numerical methods used include thefollowing:

• Power flow calculation by Full-Newton method;

• Continuation power flow by tangent vector;

• Optimal power flow by non-linear interior point;

• Time domain simulation with variable order and variable time step, and simultaneous solution ofdifferential and algebraic equations;

• Network reduction methods;

• Dynamic equivalent reduction methods;

• Safety region calculation methods;

• Safety limits based on energy function.

3.1.3 Data

Practically all data sent from the user to the (General) Manager is replicated to workers. These includethe network model, dynamic models and simulation details. Network models and respective state are sentfrom the user to the application for every new safety assessment. Dynamic models and other data caneither reside on the cloud or similarly be updated along with the network model.

The requirements regarding confidentiality of the data vary significantly depending on each countryregulation. Typically, the network model is considered to be highly confidential, either because it isconsidered to be a trade secret with respect to the energy market or because of general defence and safetyconsiderations. The confidentiality of dynamic models is also critical but to a lesser degree. Resultsshowing potential risk of energy supply interruption can also be considered classified information.

Despite the simulation model being quite large and the simulations computer intensive, the requiredinput data is relatively small. For example, for the largest grids, all data can be stored in approximately40-50 MB of text files.

3.1.4 Performance

Computational performance is paramount for this type of application, in particular for online safetyassessments. Thus, the computational overhead introduced by the cryptographic mechanisms to protectthe confidentiality of the data should be limited to a small percentage of the overall processing time,which ranges from 2 to 5 minutes.

12

Page 18: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

3.1.5 Summary of Main Requirements

The following main requirements with respect to cyber security must be observed for this application.

• Data exchanges between a user and the application and data stored in the cloud must not beaccessible or be altered undetectably by unauthorised parties.

• Similarly, the logic of the running processes must not be altered.

• It is necessary to establish a secure communication mechanism among multiple processes.

• The performance impact of cyber security mechanisms on the overall system performance, shouldbe tolerable, meaning that either it can be compensated by adding a few more nodes to theprocessing hardware, resulting in a small cost overhead, or resulting in just a small performancedeterioration.

3.2 COPEL Use Cases

This section describes two smart metering use cases based on end user requirements provided by Brazilianstate-owned utility Companhia Paranaense de Energia (COPEL). COPEL is the largest electric utilityand the largest company of the State of Parana, in Brazil, operating in the generation, transmission anddistribution of electricity, with a generation in the first semester of 2016 of 13,451 GWh and distributionof 14,822 GWh.

3.2.1 Use Case 2: Fault Analysis

This application aims to identify the COPEL network weakest regions and to prioritise the maintenance,based on time and frequency of faults. Its results should help COPEL to decide which network locationsshould be prioritised in the infrastructure maintenance. In this application, the metering and customerdata will be analysed and processed in order to create clusters of the faults by location and identify theduration and frequency of the faults for each cluster. With this information it will be possible to rank theclusters, indicating to COPEL the location of the weakest points in their network.

COPEL should use the ranking to define its investment strategy in its network maintenance. After themaintenance, COPEL will give a feedback to the software, and it will make an analysis of the adjustedlocations, showing the fault trends for the next months.

System Overview

The main entities involved in the fault analysis use case are listed in Table 3.1. Figure 3.3 gives anoverview of the different activities of this use case.

Actor GoalMetering Data Collector (MDC) Store metering and customer informationCOPEL Give feedback on maintenance of the distribution network

Table 3.1: Actors in the fault analysis use case

13

Page 19: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Figure 3.3: Overview of the fault analysis use case

Functional Requirements

1. The following information must be included in the database: metering and customer informationincluding metering point and alarms data.

2. The data analysis will be performed using a pre-defined time-window size.

3. The data’s output analysis will be delivered to COPEL in a specific interface, separated from theMDM system.

4. The user interface will consist of a fault ranking table, one table for each metering point affectedby the maintenance and an indicative map.

5. COPEL needs to invest in its infrastructure according to the result of the metering points dataanalysis, so the improvement analysis of each individual metering point can be performed.

14

Page 20: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Interface Functional Requirements

1. The fault ranking table consists of the following columns:

(a) Fault ranking: ranking of the fault, based on the frequency and time duration of outages foreach metering point.

(b) Geographic location of the metering point.

(c) Outage time duration: total outage time duration at each metering point, in minutes.

(d) Frequency of interruptions: outage frequency for each metering point during a specific timewindow (usually a month).

(e) Maintenance: date, in which a new maintenance was made in the COPEL networkinfrastructure. COPEL should include this data manually.

(f) Analysis after maintenance: link to a new table with information of the fault analysis, at theselected metering point, after investments were done in the network infrastructure.

2. The table for each metering point affected by the maintenance main characteristics are:

(a) This table contains the fault ranking, geographical location, interruption duration andfrequency for a specific metering point.

(b) The time-window used to generate this table starts the day after the maintenance.

(c) The metering point table will be extinguished when its time-window duration reaches the sizeof the fault ranking table time-window duration.

3. An indicative map with the metering points geographical location and fault ranking:

(a) The geographical location of each metering point will be pointed out in the map.

(b) The ranking of each metering point will be indicated in the map with ether the ranking numberor with a label.

Non-Functional Requirements

1. Software performance: For the first time of operation, where the software must analyse the wholedata set, create clusters and rank the data, the software can spend more time, up to some hours.However, in regular operation, after user’s feedback, the software must present results in a fewseconds.

2. Operating constrains: The software development must follow the COPEL requirements for thesoftware interface.

3. Usability requirement: the software should be simple to use, software training should not benecessary.

4. External requirement: Data concerning customer and geographical location must be anonymised.

5. Security and Legal Requirements: some of the metering information are confidential, thusconfidentiality of this data needs to be preserved.

15

Page 21: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

6. Interoperability requirement: the system should communicate with SQLite database.

7. Storage requirement: all databases must be stored in the cloud in order to reach the performancerequirements. The developed solution should induce as less as possible operational costs.

Security

All dataset can be considered highly sensible. They contain information about the customer, as name,address, register number and geographical location as well as billing information and electrical data.The billing information is also extremely sensible, being this information the target for metering frauds.Therefore, manipulation of these data by unauthorized persons can impact directly the utility bills.

Additionally, the electrical data is acquired with a frequency that makes the construction of loadcurves possible, which can be used to determine the customer’s behavior. Therefore, the access to thedataset must be restricted to some specific employees in the metering and billing areas of the utility.

However, the data generated by this application is not as sensible as the underlying database used bythe application. In fact, the data about electrical faults is treated as public by the Brazilian regulatoryscenario.

Conclusion

This application should help COPEL to make better informed decisions with regard to the companiesinvestment strategy for network maintenance. Therefore, MDC will provide its metering and customerdata so the software can create clusters and rank theses clusters by outage frequency and duration,indicating the critical network areas in an user interface separated from the MDM system. The userinterface consists of a fault ranking table, one table for each metering point affected by the maintenanceinvestment and an indicative map. The initial data processing is flexible for performance, since it is onlyprocessed once. However, in regular operation, after user’s feedback, the software should present itsresults in a few seconds. All database must be stored in the cloud in order to reach the performancerequirements and to relieve COPEL to have to manage its own datacenter infrastructure.

Its operations should be in compliance with COPEL requirements for software interface.Legal and security requirements should also be observed. Thus, data concerning customer and

geographical location should be kept confidential, as well as the metering information.

3.2.2 Use Case 3: Fraud Detection Application

In the fraud detection application, the phasor report from each COPEL consumer with advanced meteringinfrastructure (AMI) will be analysed. The phasor report table is an electrical parameters report of themetering data, which are acquired every 15 minutes. It is composed by the following parameters: voltage(phase A, B and C), voltage angle (phase A, B and C), current (phase A, B and C), current angle (phase A,B and C), power (phase A, B and C), A power factor (phase A, B and C), phase to phase voltage (AB, BCand AC), harmonic distortion, frequency, demand (pulse) (1, 2 and 3) and totalizer (pulse) (1, 2 and 3).

The required analyses include the calculation of the correlation coefficient matrix of every parameterin the phasor report in a fixed time-window, correlating each one of the 14,000 AMI customers fromCOPEL. The correlation coefficients will be applied to an ANN algorithm in order to create automaticclusters, which will be used to create a ranking of potential fraud consumer units. By conducting field

16

Page 22: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

tests based on the algorithm result, COPEL will be able to feed the algorithm with data to optimise itsresults.

System Overview

The main entities involved in the fraud detection use case are listed in Table 3.2. Figure 3.4 gives anoverview of the different activities of this use case.

Actor GoalMetering Data Collector (MDC) Store metering and customer informationCOPEL Give feedback on fraud-attempt field inspection

Table 3.2: Actors in the fraud detection use case

Figure 3.4: Overview of the fraud detection use case

Functional Requirements

1. The following information must be included in the Database: metering and customer informationincluding metering point, alarms data and phasor report.

2. The correlation coefficient matrix will be calculated based on the phasor report.

(a) One correlation coefficient matrix (14,000 by 14,000) will be calculated for each parameter ofthe phasor report and for a pre-determined time-window duration and time interval (COPELhas 14,000 consumers with advanced metering infrastructure).

17

Page 23: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

(b) All 29 parameters x time-window duration x time interval matrixes will be stored in thedatabase.

3. The algorithm will compare the correlation coefficients between meters, as well as the correlationcoefficient of single meters through time in order to indicate the likely fraud attempts.

4. The correlation coefficients will be applied to an ANN algorithm in order to create automaticclusters. The results will be used to create a ranking of potential fraud consumer units.

5. The data’s output analysis will be delivered to COPEL in a specific interface, separated from theMDM system.

6. The user interface will consist of a fraud-attempt ranking table.

7. COPEL should make a field inspection in the metering points ranked as most likely to have a fraudattempt.

8. COPEL should give a feedback through a specific interface.

9. The software must adjust the fraud detection algorithm based on user feedback.

Interface Functional Requirements

1. The fraud-attempt ranking table consists of the following columns:

(a) Geographic location of the metering point.

(b) Fraud ranking: ranking of the fraud attempted potential, based on the correlation coefficientmatrixes.

(c) Fraud confirmation: a combo box in which the user can select whether the fault was confirmed,was not confirmed or has not been verified yet.

Non-Functional Requirements

1. Software performance: For the first time of operation, where the software must analyse the wholedata set, create all the correlation matrices and train the ANN algorithm, the software can spendmore time, from some hours to few days. However, in regular operation, after user’s feedback, thesoftware must present results in a few seconds.

2. Operating constrains: The software development must follow the COPEL requirements for softwareinterface.

3. Usability requirement: the software should be simple to use, software training should not benecessary.

4. External requirement: Data concerning customers and geographical locations must be anonymised.

5. Security and Legal Requirements: some of the metering information are confidential and should bekept confidential.

18

Page 24: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

6. Interoperability requirement: the system should communicate with SQLite database.

7. Storage requirement: all database must be stored in the cloud in order to reach the performancerequirements.

Security

Security requirements applied to the database and to the electrical data are the same described in the faultanalysis application. However, the data generated by this application is also highly sensitive. As it will beused to rank and localise fraudulent behavior within the metering system in the metering systems, theaccess to these data by unauthorised persons can directly impact in the fraud detection process and in thevalidation of the algorithm.

Therefore, the access to the dataset and the application must be restricted to some specific employeesin the metering and anti-fraud areas of the utility.

Conclusion

This application should help COPEL to identify the main fraud customer units. Therefore, MDC willprovide the metering phasor report so the software can calculate the correlation coefficient matrices andthe algorithm can rank the potential fraud consumer units, outputting these data in a user interface.

The initial data processing is flexible for performance, since it is only processed once. However, inregular operation, after user’s feedback, the software should present its results in a few seconds. Alldatabase must be stored in the cloud in order to reach the performance requirements.

Its operations should be in compliance with COPEL requirements for software interface.Legal and security requirements should also be observed. Thus, data concerning customer and

geographical location should be kept confidential, as well as the metering information.

3.3 Use Case 4: Smart Metering Scenarios

3.3.1 Introduction

This section describes the functional and non-functional (security and privacy) requirements of a smartmetering system considering consumers and utilities as end-users. These requirements are definedaccording to three scenarios. Each scenario consists of smart meters, customers and utilities with theirinteractions. Intrinsically, the meter’s capabilities define each scenario presented.

In the first scenario discussed, each meter sends periodically its energy consumption information tothe cloud. The utility can access this information from the cloud to bill the consumers or for demandcontrol planning. Consumers receive consumption information in real-time.

The second scenario covers a low-cost meter whereas the third scenario introduces are more powerfulsmart meter which can execute some of the necessary computations locally.

System overview

Table 3.3 gives an overview of the entities involved in all the smart metering use cases and scenarios.

19

Page 25: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Actor GoalSmart Meter Register consumption dataConsumer Retrieve consumption data

UtilityRetrieve load dataRetrieve billing data

Table 3.3: Actors used in the metering use cases

3.3.2 Periodical Meter Scenario

In this scenario the meter sends its energy usage measurements to the cloud. Each meter sends its dataperiodically. The utility requests the metering application running in the cloud to perform the aggregationof consumption data in order to obtain either billing data or load data. Figure 3.5 shows an overview ofthe different use cases of this scenario.

Figure 3.5: Use cases diagram of periodical meter scenario

Use Case: Register consumption data

The first use case relates to the measurement of the energy consumption and the transmission of themeasured values into the cloud.

If the use case is successful the correct measured values are stored in the cloud. Everything else isconsidered to be a failure. This is especially the case if no measured values can be stored or if the wrongvalues are stored, be it because of a communication error or because of an active attack.

Besides the protection of the integrity the confidentiality of the meter data has be be ensured.

20

Page 26: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Characteristic informationGoal in Context To register electricity consumption in the cloud.Scope Metering system.Level Primary taskPreconditions Smart meter is connected to the grid and active. Sensors are

working. Communication means are functioning.Success End Condition Data is stored in the cloud.Failed End Condition Data is not stored in the cloud. No consumption is registered. Data

is stored tampered in the cloud.Primary Actor Smart meterSecondary Actor CloudTrigger Period timeout.

Table 3.4: Overview of the use case related to measuring and registering consumption data

Main success flow Failure scenario1. Metering data reaches a period timeout.

2. Meter establishes a secure connectionwith the cloud.

2a. Meter establishes insecure connection withthe cloud.2b1. Meter cannot establish a connection.2b2. Data is never sent.

3. Meter sends current metering data. 3a. Cloud receives modified metering data.

Table 3.5: Success and failure cases

Use Case: Retrieve consumption data

The second use case relates to the retrieving of the consumption data by the end-user (customer).

Characteristic informationGoal in Context To retrieve electricity consumption from the cloud.Preconditions Consumer already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor ConsumerSecondary Actor CloudTrigger Consumer solicitation.

Table 3.6: Overview of the use case related to retrieving consumption data

21

Page 27: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Main success flow Failure scenario1. Consumer asks the server for his consumption.

2. Consumer establishes a secureconnection with the cloud.

2a. Consumer establishes insecure connectionwith the cloud.2b1. Consumer cannot establish a connection.2b2. Data is never received.

3. Consumer retrieves his relevant consumptiondata.

3a. Consumer receives modified metering data.

Table 3.7: Success and failure cases

Use Case: Retrieve billing data

The third use case relates to the retrieving of the billing data by the utility.

Characteristic informationGoal in Context To retrieve electricity billing data from the cloud.Preconditions Cloud already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor UtilitySecondary Actor CloudTrigger Utility solicitation.

Table 3.8: Overview of the use case related to retrieving billing data

Main success flow Failure scenario1. Utility asks the server for the billing data.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection withthe cloud.2b1. Utility cannot establish a connection.2b2. Data is never received.

3. Utility retrieves its relevant billing data. 3a. Utility receives modified metering data.

Table 3.9: Success and failure cases

22

Page 28: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Use Case: Retrieve load data

The fourth use case relates to the retrieving of the load data by the utility.

Characteristic informationGoal in Context To retrieve electricity load data from the cloud.Preconditions Cloud already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor UtilitySecondary Actor CloudTrigger Utility solicitation.

Table 3.10: Overview of the use case related to retrieving load data

Main success flow Failure scenario1. Utility asks the server for the load data.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection withthe cloud.2b1. Utility cannot establish a connection.2b2. Data is never received.

3. Utility retrieve its relevant load data. 3a. Utility receives modified metering data.

Table 3.11: Success and failure cases

Security Requirements

Based on the scenario and the use cases outlined above the following security requirements can bederived:

• Confidentiality (including privacy)

– Detailed load signature must be kept confidential. Only the affected consumer should be ableto see his consumption data in full detail.

– The utility should be allowed to access the data in an aggregated manner only to ensure user’sprivacy.

• Integrity (including non-repudiation)

– The integrity as well as the accountability (including non-repudiation) must be ensured. Thisis critical due to the billing and load management.

• Availability

– Some availability needs to be ensured but there is no grantee for real-time access to the datanecessary.

23

Page 29: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

3.3.3 Low-Cost Meter Scenario

This scenario is very similar to the previous scenario. The main difference is that in this scenario metersare employed in places where confidentiality requirements are not important. They can be used in publicareas or places with low consumption. Each meter sends samples of voltage and current directly to thecloud, the cloud then calculates the consumption usage (cf. Figure 3.6).

Figure 3.6: Use cases diagram of low-cost meter scenario

Use Case: Send sensor data

The first use case relates to the sending of metered values to the cloud.

24

Page 30: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Characteristic informationGoal in Context To send the sensor data to the cloud.Scope Metering system.Level Primary taskPreconditions Smart meter is connected to the grid and active. Sensors are

working. Communication means are functioning.Success End Condition Data is sent to the cloud.Failed End Condition Data is not sent. No consumption is registered. Wrong

consumption is registeredPrimary Actor Smart meterSecondary Actor CloudTrigger Period timeout.

Table 3.12: Overview of the use case related to sending sensor data to the cloud

Main success flow Failure scenario1. Metering data reaches a period timeout.

2. Meter establishes a connection with thecloud.

2a.2b1. Meter cannot establish a connection.2b2. Data is never sent.

3. Meter sends current readings. 3a. Cloud receives modified metering data.

Table 3.13: Success and failure cases

Use Case: Calculate consumption

The second use case is related to the calculation of the consumption data.

Characteristic informationGoal in Context To calculate electricity consumption in the cloud.Scope Metering system.Level Primary taskPreconditions Received sensors data from meter.Success End Condition Correctly calculated consumption data is stored in the cloud.Failed End Condition Calculated consumption data is not stored in the cloud or

calculation is wrong.Primary Actor CloudSecondary Actor Smart MeterTrigger Data received.

Table 3.14: Overview of the use case related to calculating the consumption data in the cloud

25

Page 31: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Main success flow Failure scenario1. Data is received.2. Cloud execute the consump-tion algorithm for current read-ings.3. Consumption data is stored. 3a Consumption data cannot be stored

3b Calculation leads to wrong results

Table 3.15: Success and failure cases

Use Case: Retrieve consumption data

The third use case is related to the retrieving of the consumption data.

Characteristic informationGoal in Context To retrieve electricity consumption from the cloud.Preconditions Consumer already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor ConsumerSecondary Actor CloudTrigger Consumer solicitation.

Table 3.16: Overview of the use case related to retrieving of the consumption data

Main success flow Failure scenario1. Consumer asks the server forhis consumption.

2. Consumer establishes a secureconnection with the cloud.

2a. Consumer establishes insecure connectionwith the cloud.2b1. Consumer cannot establish a connection.2b2. Data is never received.

3. Consumer retrieves his rele-vant consumption data.

3a. Consumer receives modified metering data.

Table 3.17: Success and failure cases

Use Case: Retrieve billing data

The fourth use case is related to the retrieving of the billing data. As in the previous scenario the entityinterested in retrieving the billing data (as well as the load data cf. next use case) is the utility.

26

Page 32: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Characteristic informationGoal in Context To retrieve electricity billing data from the cloud.Preconditions Cloud already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor UtilitySecondary Actor CloudTrigger Utility solicitation.

Table 3.18: Overview of the use case related to retrieving of the billing data

Main success flow Failure scenario1. Utility asks the server for the billing data.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection withthe cloud.2b1. Utility cannot establish a connection.2b2. Data is never received.

3. Utility retrieves its relevant billing data. 3a. Utility receives modified metering data.

Table 3.19: Success and failure cases

Use Case: Retrieve load data

The fourth use case is related to the retrieving of the load data.

Characteristic informationGoal in Context To retrieve electricity load data from the cloud.Preconditions Cloud already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor UtilitySecondary Actor CloudTrigger Utility solicitation.

Table 3.20: Overview of the use case related to retrieving of the load data

Main success flow Failure scenario1. Utility asks the server for the load data.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection with the cloud.2b1. Utility cannot establish a connection.2b2. Data is never received.

3. Utility retrieve its relevant load data. 3a. Utility receives modified metering data.

Table 3.21: Success and failure cases

27

Page 33: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Security Requirements

The security requirements are similar to the previous scenario despite the fact, that confidentiality of themetered values is not important.

• Confidentiality (including privacy)

– Not critical

• Integrity (including non-repudiation)

– Critical due to billing and load management

• Availability

– real-time availability of the data is not critical

3.3.4 Aggregated Metering Scenario

In this scenario the meter calculates its consumption and stores detailed usage only locally. When thebilling period is reached it sends the aggregated value to the cloud (Figure 3.7).

Figure 3.7: Use cases diagram of aggregated metering scenario

28

Page 34: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Use Case: Register consumption data

This use case is related to the sending of the consumption data by the smart meter. Note that this time(different compared to the previous scenarios) not the measured values are sent to the cloud but just theaggregation locally calculated by the smart meter. This will lead to an increased privacy for the user.

Characteristic informationGoal in Context To register electricity consumption in the cloud.Scope Metering system.Level Primary taskPreconditions Smart meter is connected to the grid and active. Sensors are

working. Communication means are functioning.Success End Condition Data is stored in the cloud.Failed End Condition Data is not stored in the cloud. No consumption is registered.Primary Actor Smart meterSecondary Actor CloudTrigger Billing period.

Table 3.22: Overview of the use case related to sending the aggregated consumption data

Main success flow Failure scenario1. Meter reaches a billing period.

2. Meter establishes a secure connectionwith the cloud.

2a. Meter establishes insecure connection withthe cloud.2b1. Meter cannot establish a connection.2b2. Data is never sent.

3. Meter sends calculated consumption data.

Table 3.23: Success and failure cases

Use Case: Retrieve billing data

This use case is related to the retrieving of the billing data by the utility.

Characteristic informationGoal in Context To retrieve electricity billing data from the cloud.Preconditions Cloud already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor UtilitySecondary Actor CloudTrigger Utility solicitation.

Table 3.24: Overview of the use case related to retrieving the billing data

29

Page 35: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Main success flow Failure scenario1. Utility asks the server for the billing data.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection withthe cloud.2b1. Utility cannot establish a connection.2b2. Data is never received.

3. Utility retrieves its relevant billing data. 3a. Utility receives modified metering data.

Table 3.25: Success and failure cases

Use Case: Retrieve consumption data

This use case is related to the retrieving of the consumption data by the customer.

Characteristic informationGoal in Context To retrieve electricity consumption from the cloud.Preconditions Consumer already has data stored.Success End Condition Data is retrieved from the cloud.Failed End Condition Data is not retrieved from the cloud. Wrong data is retrieved.Primary Actor ConsumerSecondary Actor CloudTrigger Consumer solicitation.

Table 3.26: Overview of the use case related to retrieving the consumption data

Main success flow Failure scenario1. Consumer asks the server for his consumption.

2. Consumer establishes a secureconnection with the cloud.

2a. Consumer establishes insecure connectionwith the cloud.2b1. Consumer cannot establish a connection.2b2. Data is never received.

3. Consumer retrieves his relevant consumptiondata.

3a. Consumer receives modified metering data.

Table 3.27: Success and failure cases

Security Requirements

This scenario has similar requirements as the first scenario, but parts of the confidentiality problem aresolved by keeping the detailed consumption data stored locally in the meter instead off the cloud.

• Confidentiality (including privacy)

– Detailed load signature is kept in the meter only.

30

Page 36: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

• Integrity (including non-repudiation)

– Critical due to billing.

• Availability

– Data is not critical in real-time.

3.3.5 General Use Cases

The use cases presented below can be applied for all of the scenarios previously presented.

Use Case: Remote switch

The first use case relates to remotely switching the power on and off.

Characteristic informationGoal in Context To turn on or off a meter in the grid.Scope Control system.Preconditions Smart meter is connected to the grid. Communication means are

functioning.Success End Condition Smart Meter is turned on/off.Failed End Condition Wrong state is set.Primary Actor UtilitySecondary Actor Smart Meter, CloudTrigger Utility request.

Table 3.28: Overview of the use case related to remotely switching the power

Main success flow Failure scenario1. Utility requests to send the command to themeter.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection withthe cloud.2b1. Utility cannot establish a connection.2b2. Command is never received.

3. Cloud executes the command in the meter. 3a. Meter does not switch on/off the power.

Table 3.29: Success and failure cases

Security Requirements

Regarding this use case especially integrity and availability are important.

• Confidentiality

31

Page 37: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

– Not critical.

• Integrity

– Critical to avoid invalid disconnections, which could happen, if the attacker would be able toundetectably manipulate the commands sent to the smart meter.

• Availability

– Important to turn power on as soon as possible

Use Case: Remote configuration

The second use case deals with the remote configuration of the meter.

Characteristic informationGoal in Context To remote setup or calibrate a meter.Scope Control system.Preconditions Smart meter is connected to the grid. Communication means are

functioning.Success End Condition Operational command is done.Failed End Condition Wrong state is set.Primary Actor UtilitySecondary Actor Smart Meter, CloudTrigger Utility request.

Table 3.30: Overview of use case regarding remote configuration of a meter

Main success flow Failure scenario1. Utility requests to send the command to themeter.

2. Utility establishes a secure connectionwith the cloud.

2a. Utility establishes insecure connection withthe cloud.2b1. Utility cannot establish a connection.2b2. Command is never received.

3. Cloud execute the command in the meter. 3a. Wrong command is executed.

Table 3.31: Success and failure cases

Security Requirements

Again regarding the security requirements integrity and availability are more important thanconfidentiality.

• Confidentiality

32

Page 38: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

– Generally, not very important.

• Integrity

– Critical to avoid invalid results

• Availability

– Important to turn meter back in operational state as soon as possible.

Use Case: Time of use price

This use case deals with the installation of time of use prices into the meter.

Characteristic informationGoal in Context Send to the meter the time of use price-table.Scope Dynamic pricing.Preconditions Communication means are functioning.Success End Condition Meter receive the price value.Failed End Condition Wrong price is received.Primary Actor CloudSecondary Actor Meter, UtilityTrigger Price change.

Table 3.32: Overview of use case for installing time of use prices

Main success flow Failure scenario1. Current pricing scheme is changed by theutility.

2. Cloud establishes a connection with themeter.

2a. Cloud establishes insecure connection withthe meter.2b1. Cloud cannot establish a connection.2b2. Data is never received.

3. Cloud sends current pricing. 3a. Meter receives wrong pricings.

Table 3.33: Success and failure cases

Security Requirements

• Confidentiality

– Not critical.

• Integrity

– Critical due to billing.

33

Page 39: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

• Availability

– Critical due to billing.

3.3.6 Summary and Conclusion

In this chapter different use cases related to the smart grid were presented. Based on these use casesvarious requirements on the SecureCloud platform were derived. Besides the obvious ones like fastand cost effective processing of large amounts of data, especially the security related requirements areimportant. As it was illustrated all the three important security goals (namely: confidentiality, integrityand availability) play an important role for the implementation of the described use cases and it istherefore paramount that the SecureCloud platform enables application developers to reach these securitygoals without much effort.

34

Page 40: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

4 ArchitectureIn this chapter we will describe the envisioned SecureCloud architecture and platform. Therefore, we willfirst give an overview of the general concepts, technologies and mechanisms used. Afterwards we willexplain how a sinlge process (microservice) can be securely executed on a single host. As cloud-baseddistributed applications are comprised of many such processes we will then explain how this instances cansecurely communicate with each other. Having explained the basic building blocks of our architecture wewill continue with describing the orchestration process, e.g. how to setup and run a distributed applicationwithin the SecureCloud platform. Finally, we report on some early results regarding the practicability ofthe SecureCloud platform in terms of performance evaluations.

4.1 General concepts

In SecureCloud we envision to create an architecture built upon recent technologies that enable big dataapplications to run in the cloud without the risk that sensitive data could be disclosed or modified in anunauthorised way. For that we utilize novel hardware features in current Intel CPUs, namely the SoftwareGuard Extensions (SGX).

SGX allows the execution of code in a secure environment, in which the integrity of the applicationlogic and the confidentiality of the processed data is protected. Even the provider of the hardware (i.e.,the cloud provider) is not able to infer sensitive data by tampering with the application. An overview ofthe architecture is shown in Figure 4.1.

Figure 4.1: Overview of the SecureCloud architecture.

The fundamental building block of SecureCloud applications are microservices. Each microserviceis a small program with a particular task and communicates with other microservices over the network.The collaboration of a number of microservices forms the actual application. This approach has severaladvantages compared to more “classical”, yet component based application designs: It enhances thecohesion and decreases the coupling, so that functionalities can be changed or added more easily.Microservices are exchangeable and can be implemented with different technologies.

It is very popular to focus on REST as a means for microservices to communicate with each other.Nevertheless, we also envision an event bus to connect microservices in a secure and efficient way.

Containers will be used to separate individual microservices. Containers are a virtualisation on theoperating-system level and enable the isolation of multiple applications with almost no overhead and mayalso provide some form of resource management. It is important to understand that our containers are notbased on whole virtual machines and related hypervisors, which are able to isolate full operating systemsfrom each other.

35

Page 41: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

To leverage the advantages of containers, we will use a SecureCloud-enhanced version of Docker1 torun our microservices. Docker is a software which allows the automatic deployment of applications incontainers. The Docker containers are based on Linux kernel functionalities like cgroups and namespacesto achieve an isolation of the resources (processor, memory, ...) and an isolated view of the operatingsystem.

The main protection of the application in the SecureCloud architecture will be provided by Intel SGX,through the so-called enclave. Although it could be feasible to achieve the intended security level relyingsolely on Intel SGX, we will follow the principle of defense in depth. Multiple layers of security keepsthe system secure in the case of new vulnerabilities. Even if some flaws in Intel SGX will be found in thefuture, an attacker has to overcome the other security layers to exploit the vulnerability. By following thatprinciple attack surfaces are reduced and hardened.

As first level of defense we utilize secure and trusted boot. This ensures that we start the correctoperating system, so that we have a trusted computing base (TCB) for the execution of all further services.

We also leverage the standard Docker security features like Docker Content Trust (DCT), whichallows to digitally sign images, as explained in more detail in Section 4.4.1.

4.2 Microservice runtime

The SecureCloud architecture of a single computation node builds around the SGX enclave as the mainprotection mean. Ultimately a minimal TCB is aspired where only the microservice implementation isinside of the enclave. And all management and communication tasks implementable without endangeringthe objectives of the system are carried out in untrusted space. As the implementation of this goal ispart of the concurrent research in WP 4 we currently do not want to restrict the solution space throughover-specification of the single node architecture. Instead, we developed SCONE to investigate theinfluence of the SGX protection on the performance of existing program. This approach enables us to getinsight into the behaviour of real applications within enclaves that will help the design process of thesingle node architecture. Moreover, SCONE allows us to draw upon established web servers, such asApache or NGINX, to build the REST bridge of our microservice architecture. Instead of engaging in theerror-prone task of creating new software for the such common tasks as HTTP parsing and generation.

4.2.1 Runtime environment: SCONE

The Secure COntainer Environment (SCONE) is a secure container mechanism for Docker that uses theSGX trusted execution support of Intel CPUs to protect container processes from outside attacks. Itexecutes whole applications within SGX enclaves.2 The performance impact of SGX is mitigated throughthe usage of user-level thread scheduling and an asynchronous system call interface.

SCONE’s is mainly implemented as a modification of the standard musl c library3. It can be used bylinking a program against this library. Figure 4.2 shows SCONE’s architecture. The whole applicationtogether with its specific libraries as well as the majority of SCONE are executed within the enclave and

1https://www.docker.com/2Remember that our containers isolate just processes (applications) from each other, but not whole operating systems (like

virtual machines and hypervisors do).3https://www.musl-libc.org/

36

Page 42: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Host operating system (Linux)Container

SCONE kernelmodule

EnclaveSy

stem

cal

lre

ques

ts

SGX-aware C libraryAsynchronous system call interface

Application-specific librariesFile system shieldNetwork shield

trusted

Syst

em c

all

resp

onse

slock-freequeuesscall4

scall5scall6

SCONEcomponent

Intel SGXdriver

resp2resp3

M:N Threading

Application Code

Exte

rnal

con

tain

er in

terf

ace

trus

ted

untr

uste

d

resp1

Figure 4.2: SCONE architecture

therefore protected from unauthorized access. The communication of the application with the host OS isdone via system calls. These, together with referenced buffers, are transferred by SCONE’s system callinterface via lock-free queues to the kernel as the kernel is unable to read the buffers located within theenclave.

So far, many popular services, such as Redis and Memcached, have been created under the assumptionthat the underlying OS is trusted. Such services therefore store files in the clear, communicate with otherprocesses via unencrypted TCP channels (i.e., without TLS), and output to stdout and stderr directly.To protect such services in secure containers, SCONE supports a set of shields. Shields focus on (1)preventing low-level attacks, such as the OS kernel controlling pointers and buffer sizes passed to theservice; and (2) ensuring the confidentiality and integrity of the application data passed through the OS. Ashield is enabled by statically linking the service with a given shield library. SCONE supports shields for(1) the transparent encryption of files, (2) the transparent encryption of communication channels via TLS,and (3) the transparent encryption of console streams. When a file descriptor is opened, SCONE canassociate the descriptor with a shield. A shield also has configuration parameters, which are encryptedand can only be accessed after the enclave has been initialized. Note that the shields described belowonly focus on application data, and do not verify data maintained by the OS, such as file system metadata.If the integrity of such data is important, further shields can be added.

File system shield. The file system shield protects the confidentiality and integrity of files: files areauthenticated and encrypted, transparently to the service. For the file system shield, a container imagecreator must define three disjoint sets of file path prefixes: prefixes of (1) unprotected files, (2) encryptedand authenticated files, and (3) authenticated files. When a file is opened, the shield determines the

37

Page 43: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

longest matching prefix for the file name. Depending on the match, the file is authenticated, encrypted,or just passed through to the host OS. The file system shield splits files into blocks of fixed sizes. Foreach block, the shield keeps an authentication tag and a nonce in a metadata file. The metadata file is alsoauthenticated to detect modifications. The keys used to encrypt and authenticate files as well as the threeprefix sets are part of the configuration parameters passed to the file system shield during startup. Forimmutable file systems, the authentication tag of the metadata file is part of the configuration parametersfor the file system shield. Containerized services often only use a read-only file system and considerwrites to be ephemeral. While processes in a secure container have access to the standard Docker tmpfs, itrequires the use of inefficient enclave memory. As an alternative that uses non-enclave memory, SCONEalso supports a dedicated secure ephemeral file system through its file system shield. The shield ensuresthe integrity and confidentiality of ephemeral files: the ephemeral file system only maintains the stateof modified files in non-enclave memory. The ephemeral file system implementation is resilient againstrollback attack: after restarting the container process, the file system returns to a preconfigured startupstate that is validated by the file system shield, and therefore it is not possible for an attacker to rollbackthe file system to an intermediate state.

Network shield. Some container services, such as Apache and NGINX, always encrypt networktraffic; others, such as Redis and Memcached, assume that the traffic is protected by orthogonal means,such as TLS proxies, which terminate the encrypted connection and forward the traffic to the servicein plaintext. Such a setup is only appropriate for data centers in which the communication between theproxy and the service is assumed to be trusted, which is incompatible with our threat model: an attackercould control the unprotected channel between the proxy and the service and modify the data. Therefore,for secure containers, a TLS network connection must be terminated inside the enclave. SCONE permitsclients to establish secure tunnels to container services using TLS. It wraps all socket operations andredirects them to a network shield. The network shield, upon establishing a new connection, performs aTLS handshake and encrypts/decrypts any data transmitted through the socket. This approach does notrequire client- or service-side changes. The private key and certificate are protected by the file systemshield.

Console shield. Container environments permit authorized processes to attach to the stdin, stdout,and stderr console streams. To ensure the confidentiality of application data sent to these streams, SCONEsupports transparent encryption for them, and only an authorized SCONE client can access the encrypteddata. Console streams are unidirectional, which means that they cannot be protected by the network shieldwhose underlying TLS implementation requires bidirectional streams. A console shield encrypts a streamby splitting it into variable-sized blocks based on flushing patterns. A stream is protected against replayand reordering attacks by assigning each block a unique identifier, which is checked by the authorizedSCONE client.

SCONE may lead to a larger TCB size than a strictly separated application but it allows us to rapidlymove existing code into an enclave just by recompiling it. Therefore, we are able to deliver earlyperformance estimates and directly use well-tested existing applications.

4.3 Communication

In the previous section we outlined how a single instance of a mircoservice can be securely executed on asingle host. But modern distributed applications are comprised of many such microservices which need

38

Page 44: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

to be able to communicate securely with each other. In this section we will briefly describe the envisionedcommunication mechanisms. In fact the secure distributed communication mechanisms are subject ofongoing investigations in WP4, on which deliverable D4.1, concurrently developed by UniNE, will report.For these reasons we limit this text to a coarse grained consideration and refer to D4.1. for further details.We will revisit the communication mechanism of SecureCloud in the follow-up deliverable D1.2 in moredepth.

There are several frameworks, such as Kore4 or Mongoose5 for example, that can be used to buildmicroservices with a HTTP-based RESTful API for communication. However, we decided to use a secureevent bus based communication mechanism as the main communication means for inter-microservicecommunication within SecureCloud. We have chosen this approach over leaving the communication logicto the application developer or using existing off-the-shelf solutions for the following reasons: (1) Securityis a main objective of SecureCloud, thus inter-microservice communication should be secure by default.Legacy communication mechanisms can also be secured, for example with TLS, but history shows thatthe usage of these mechanisms is cumbersome and error-prone. A tiny misconfiguration, such as settingan accept-all certificate policy, will easily undermine the whole protection. Therefore, we want to providea middleware that provides secure communication without any intervention of the application developer.(2) The application developer should not be troubled with the complex details of communication thatemerge as a consequence of the system’s dynamic nature. In the cloud, microservices may join and leavethe application at any time due to dynamic scaling. Furthermore, joining microservices typically haveunforeseeable IP addresses. Thus, service discovery becomes essential. By providing this service withinthe SecureCloud middleware disburden the application developer and a decrease the overall TCB ofcomplex applications building upon a variety of microservices from different developers as all of themuse them same codebase to solve service discovery. (3) Lastly, since SecureCloud is build for big dataapplications, efficient communication is a necessity. Therefore, a more lightweight event bus mechanismis beneficial compared to a full-fledged HTTP-based mechanism. Communication performance is subjectto security. Nevertheless, the communication middleware delivered with SecureCloud can be optimizedfor its specific use case within SecureCloud, so that we expect lower communication overhead than witha COTS solution.

One candidate for the communication mechanism of SecureCloud’s infrastructure is content-basedrouting (CBR). CBR is a publish/subscribe approach in which clients can subscribe to messages withpredicates over the message content.

Considering the area of application of the SecureCloud project (which is the field of smart grids andsmart metering) an example would be that a billing agent will only subscribe to billing related informationsent by a smart meter while a smart grid monitoring agent will subscribe just to the alarms (regardingpower losses, frequency shifts etc.) emitted by the smart meter. Since the clients will only receivemessages matching its subscription, the network traffic and computation effort decreases and data sets ofBig Data use cases become manageable.

Due to the approaches nature, however, the routing engine of a CBR system has to investigate thecontent of all messages and know the publishers, clients and subscriptions of the system to make itsrouting decisions. Therefore, the cloud provider executing the routing engine might learn all details of asystem just by observing it. To prevent this, the service owner has to ensure that the routing engine is not

4https://kore.io/5https://www.cesanta.com/products/mongoose

39

Page 45: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

spied on, which basically means executing it on owned hardware and barring the service owner from theeconomic benefits of the cloud.

UniNE currently develops a secure content-based routing (SCBR) mechanism based on Intel SGX.UniNE’s SCBR leverages SGX to ensure that the routing engine’s memory and computations can not bespied on by the cloud provider. SCBR might provide the foundation to a secure, scalable communicationinfrastructure for SecureCloud microservices.

4.4 Orchestration

4.4.1 Docker containers as microservice foundation

Docker follows a client-server architecture (Figure 4.3). The Docker client serves as a user front end andcommunicates with the Docker daemon via a REST API. The actual work is done on the server side:the Docker host. The Docker daemon launches containers using images, which it gets from a DockerRegistry. An image is a read-only template for containers and can contain a whole operating system,which could be a basis for other images, or only an application with some necessary libraries.

Figure 4.3: Components of Docker

The workflow of using Docker containers consists of three steps: build, ship, run (Figure 4.4). At firsta Docker image has to be created. For the build step the actual application and the necessary librariesare needed. To provide the dependencies, also an existing image can be used as basis. After the buildstep the image has to be transferred to the Docker host. This is realized by pushing the image to theDocker registry. If a Docker host gets the command to start a particular container for the first time, itautomatically pulls the image from the Docker registry. In the last step the Docker host sets up theisolated environment based on the image and executes the application.

Docker Content Trust (DCT) allows to digitally sign an image to verify its integrity and theauthenticity of its publisher, even if the image is downloaded from a potentially untrusted Dockerregistry. It also allows to verify – at least to a certain extent – the freshness of images, as the validity ofeach image can be certified for a certain period of time only and will expire if no further action is taken.To expand the lifetime of an image, a publisher has to actively recertify its validity.

As a container should only be started on the correct Docker host, each Docker client, Docker hostand Docker registry will be properly authenticated using certificate-based client-server authentication.The communication between all parties will be protected by means of secure communication via HTTPS

40

Page 46: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Figure 4.4: Workflow of Docker

based on TLS with known certificates. By allowing only connections from clients and to hosts identifiedby a certificate signed by a central Certificate Authority (CA), an attacker is hindered to pretend to bepart of the Docker infrastructure.

4.4.2 Deployment: Secure Docker

Docker is a well established solution to deploy containerized programs. Its build, ship, run methodologyhas proven itself as generic and useful. The deployment of SGX protected programs, however, requiresadditional activities that are not part of the original Docker deployment workflow. With Secure Docker weadd the necessary functionality on top of the Docker infrastructure to provide an integrated solution forthe deployment of SGX protected programs. We do so without modifying Docker’s codebase but insteadrely upon well established interfaces exposed by Docker. This yields a system that can be developedindependently from Docker and we expect low long-term maintenance effort. For Secure Docker onlytwo additional components are introduced to the architecture (Figure 4.5): the Secure Docker client and aruntime executed within the SGX enclave.

Figure 4.5: Components of Secure Docker

41

Page 47: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Being a part of SCONE (explained in Section 4.2.1), Secure Docker assumes that whole programs areexecuted within an enclave making the system call interface the boundary between the enclave and theuntrusted operating system (OS). This means, for example, that the standard input and output streams ofthe process are terminated within the enclave providing a means to directly communicate with the enclave.Since with Intel’s SGX an enclave is only protected after its initialization, an attacker may tamper withit when at rest. Consequently, one of the main objectives of Secure Docker is to build trust into freshlystarted enclaves through remote attestation.

Furthermore, Secure Docker protects services provided by the OS and used frequently by programssuch as console input and output and file storage. One advantage of this approach is that neither needseach developer of a microservice to implement the related security mechanisms on his or her own norneeds the microservice to trust any component executed outside of the enclave. Note that especially thelatter would reduce the enclave based protection mechanisms to absurdity.

In Secure Docker the established deployment workflow of Docker’s build, ship, run is retained, but itis extended to provide the necessary properties (Figure 4.6). Particularly the build and run activities haveto be adapted, while the ship step can be used without modification, as the image can easily be protectedwith cryptographic means during its transmission. In the following paragraphs the adaptions of the buildand run activities are explained before the secure image and Secure Docker client are detailed.

Figure 4.6: Workflow of Secure Docker

Secure Image

A secure image is a Docker image that contains a SCONE-enabled executable and is additionally annotatedwith metadata that allows the verification of the started program among other things. Secure images arecreated by the Secure Docker client, but since the extensions are implemented in a compatible manner asecure image behaves for the most part just like standard Docker images. In particular, a Secure dockerimage can be distributed via an unmodified Docker infrastructure. Only when it comes to the activationof a secure image, additional steps are necessary that are conducted by the secure client.

The metadata is easily attached to the image via image labels, a Docker feature that allows theaddition of custom metadata to images. It will include the location of the SCONE-enable executablewithin the image, its build log, its SGX-enclave hash, the location of the file system protection file (FSPF)as well as a cryptographic signature of this information. With this metadata a far-reaching verificationof the executable, its runnable instances and chosen files of the secure image is possible and a seamlessintegration can be achieved.

42

Page 48: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

The FSPF is stored within the secure image. It enumerates the user-chosen protected and unprotectedsubdirectories of the image and contains the protection data of the files that are part of a protected filesystem branch. Encrypted files have individual encryption keys stored within the FSPF simplifying theextension of existing secure images. To protect the contained keys the FSPF itself is encrypted.

Secure Docker Client

The Secure Docker client is the endpoint of the Secure Docker system that is used to interact with thesystem. Since it terminates Secure Docker’s protection it is a trusted part of it. Its responsibilities spanthe creation of secure images, the invocation of secure images and the handling of the secured program’slog messages. To implement these actions, it depends upon the original Docker client to interact withDocker as well as the enclave runtime.

Since a secure image is for the most part just a standard Docker image, the creation procedure isbasically equivalent with regards to the Docker related actions. However, the SCONE-enabled executableand the additional metadata has to be added to the image. Furthermore, the publisher may decide toprotect files of certain directories by authentication or encryption. In this case the Secure Docker clienttakes care of the necessary operations, i.e. the calculation of authentication tags and the encryption of filecontent, as well as the creation and inclusion of the FSPF.

When a secure container has to be started, the Secure Docker client attests the started program andsecurely provisions the necessary secrets to it afterwards. To do so it relies on the enclave hash includedin the metadata of the secure image and Intel’s remote attestation functionality for SGX enclaves. Byverifying the included signature it can, furthermore, ensure that the metadata was not tampered with andthe image originates from a trusted source.

Secure Build

Docker’s build activity includes all actions leading to the deliverable Docker image. This typicallymeans taking a so-called base image and extending it by the addition of files, the execution of commandswithin its context and setting environment variables and other metadata such as the author’s name, thedefault command, and user. Secure build extends the original Docker build step to enable a trustworthyverification of the executable’s compilation even when executed on a untrusted system, detection ofunauthorized modifications of Docker images, and a confidentiality preserving distribution of files withinthe image.

The first modification in secure build concerns the compilation of the executable. On the one hand,the resulting program has to be executed within a SGX enclave. The SCONE compilation tool chain willautomatically add the necessary instructions. Furthermore, functions interacting with the Secure Dockerclient (see below) to conduct remote attestation, secret provisioning, secure terminal in- and output andfile encryption and authentication have to be added to the original program. On the other hand, we planto protect the compilation procedure itself by making its inputs and computations verifiable even in thepresence of powerful adversaries that control the build system such as a rootkit or a malicious developer.Therefore, the compilation is executed within an enclave, referred to as build enclave (see Figure 4.7),which is attested by one or multiple trusted build verifiers. The hashes of the build enclave, source codefiles, and produced binary are recorded in a build log, which is signed by the verifiers. Since the build

43

Page 49: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

log covers all artifacts that influence the trustworthiness of the built component, a challenger can decideabout the trustworthiness of a component by scrutinizing the listed artifacts.

Figure 4.7: Verifiable Build

The second altered activity is the creation of the Docker image. There are two goals involved here.First we want to maintain the original Docker distribution workflow in which the image is published bya publisher and used by a customer or consumer. To enable the consumer to use the image in a securefashion it has to include the necessary cryptographic protection details, which are added in the creationprocess. Second we cannot assume that the image remains unaltered or its secrets remain confidentialafter it was transmitted into a foreign environment. Therefore, Secure Docker calculates authenticationtags of user-chosen files distributed with the image and moreover encrypts their content if it shouldremain confidential. The authentication tags and file encryption keys are securely stored in the file systemprotection file (FSPF), which is shipped inside of the image. Later the enclave runtime will use this file totransparently ensure the integrity of files when they are read by the program.

Secure Run

The run activity creates a Docker container from a Docker image and starts the program delivered withinthe image. Secure Docker – not trusting the execution environment – additionally has to ensure that thecorrect software was invoked, i.e., check that it was not altered before its activation. Since SGX onlyprotects the confidentiality of initialized enclaves, necessary secrets, as for example the command linearguments or environment variables, have to be provisioned only after the system has established trustinto the activated component. Finally, Secure Docker provides protected access to files delivered with theDocker image. Figure 4.8 depicts the process of starting a microservice in a secure container.

In secure run the original run procedure is conducted to start the software component in the foreignenvironment. However, it is restricted in the way that no actual startup parameters such as command linearguments or environment variables are transmitted. Instead, after the enclave is started the Secure Dockerstartup routine takes control and carries out remote attestation with the Secure Docker client. Only afterthe enclave is successfully attested does the client provide the command line arguments, environmentvariables, and further secrets and control is given to the program.

A running software component typically generates log messages to inform about its state and specificevents. Docker collects these messages, stores them in a logging service, and is able to deliver them to theclient. For Secure Docker we have to assume these log messages contain confidential information that hasto be protected. Therefore, a runtime routine is compiled into the software component that transparently

44

Page 50: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Figure 4.8: Secure Run

encrypts all of its console output. The necessary encryption key is exchanged in the secret provisioningphase such that the Secure Docker client is able to decrypt the messages of the component.

Furthermore, Secure Docker provides protected access to the file system of the untrusted environment.This is implemented in runtime routines and uses the protection data previously generated in the securebuild step.

4.4.3 Management and lifecycle of secure containers

An overview of the envisioned SecureCloud architecture with the main participants and components andthe information flow between them is shown in Figure 4.9. In our architecture we differentiate betweenthree main parties:

• the data owner, that wants to process sensitive data in the cloud,

• the cloud provider, that offers resources (mainly in terms of hardware) for the computation, and

• the software publisher, that develops and provides the appropriate applications.

The data owner may be some company, which wants to outsource some big data processing inthe cloud. Of course the company wants to do this without losing control over its data. Thus, simplyextending the domain of trust is apparently not a viable option. On the other hand, to utilize informationtechnology some extent of trust is always needed. From a trust anchor, be it the developers or someother entity, one has to derive trust in the hardware and software used. This covers on the one hand thefundamental trust, that the application running on local hardware does the right thing. However, now thistrust needs to be extended towards a correctly implemented infrastructure – even though the requirementsmight be higher than before.

The software publisher is responsible for providing a secure image together with a hash usable forsoftware attestation as well as a configuration template for the configuration of the service. All three partshave to be signed, so that the authenticity of the publisher and integrity of the image can be verified bythe data owner. We envision the utilization of the Secure Docker Client for the whole process.

For our SecureCloud architecture we assume that the data owner trusts at least one software publisher.The decision to trust an application can originate from the fact that the software publisher and the

45

Page 51: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

Figure 4.9: Participants and components of the SecureCloud architecture

data owner belong to the same company, some auditing of the software or the recommendation by atrusted third party. Eventually the SecureCloud solutions does not rely on trust in the cloud provider.The SecureCloud architecture will prevent data corruption or data leakage by a provider which actsmaliciously.

Before the first actual application can be executed, the data owner has to set up the SecureCloudinfrastructure. In this bootstrapping process all essential services will be started. To protect all sensitivecode and data of this services, they will be executed in SGX enclaves, which in the process will beinitialized and attestated. The initiator of the bootstrapping has to be some authoritative entity whichserves as trust anchor for the bootstrapped infrastructure. Not only the person that performs this processhas to be trusted, but also the hardware and software which is used for this purpose. The essentialinfrastructure, which will run at the end of this process, consists of an Application Management Serviceand a CA at the local side and a Resource Management Service, a Coordination Service and an IdentityService on the cloud side.

The Application Management Service is the local interface for managing applications in the cloud.For that it provides a frontend for operators to control the infrastructure. It communicates with specialmicroservices on the cloud provider side and provides them all necessary information, which they need tostart the actual application. For this purpose it also prompts the local CA to issue new certificates, if theyare needed to setup the secure communication channels between the microservices of the application. TheApplication Management Service has also the task to prevent a malicious microservice or a microservicewith a harmful configuration starting as part of an application. This task is very security critical, since abad microservice could be exploited to reveal sensitive data like certificates or key material. To prevent

46

Page 52: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

this, the list of startable microservices should be restricted. One solution could be a list of trusted publishersignatures.

The actual user of the SecureCloud infrastructure is the Operator. For that matter the Operatorcould be a group of administrators which is responsible for the management of the infrastructure likestarting and stopping applications. With the help of the Application Management Service they createan application plan that specifies which microservices are needed and how they are connected. Thisincludes the definition of all necessary information which are required to configure the microservices.The Application Management Service then initiates the realization of the plan by sending the plan andall necessary information to the Coordination Service and simultaneously initiates the generation ofcertificates and keys for the involved microservice types by the CA.

Although the Operator controls the infrastructure, he is not fully trusted. The SecureCloud architecturewill prevent that he can access any (secret) key material, neither directly nor by a bad configuration of anapplication. Therefore each certificate and key generation will take place in an enclave and the relatedsecrets will only transmitted over secure channels to other enclaves.

The Coordination Service is responsible to realize the application plan. For this it works togetherwith the Resource Management Service to deploy the microservices of the application and at the sametime sends all configuration information to the Secure Configuration Service. The Coordination Servicealso monitors all running microservices. If it detects slow or crashed services, it restarts them with thehelp of the Resource Management Service.

The Resource Management Service actually deploys the microservices. It starts a microservice withthe help of the Secure Docker Client, which also performs a remote attestation to ensure that the correctcode in the enclave is executed and provides the microservice with a config key which the microservicelater needs to decrypt its own configuration.

The Identity Service is responsible for the access control. It implements the policies in the applicationplan, that define which microservice can access which other microservice, by distributing certificatesaccordingly.

One part of the Identity Service is the Secure Configuration Service, which configures themicroservices after successful attestation by the Resource Management Service. It takes the providedconfiguration information and certificates and generates the microservice specific config. The resultingconfiguration data is encrypted by the config key and sent to the microservice. Only the correctmicroservice, which was attestated and provided with the config key, can decrypt and apply the config.

Each microservice should be able to identify itself. For this purpose each microservice gets acertificate and a private key provided by a secure config. Each certificate is unique for a given type ofmicroservice. A type is derived by the application, the secure image on which the microservice is basedand the purpose of the microservice within the application. Replicates of a microservice have the samecertificate, which allows a seamless failover in the case that a microservice crashes. The necessary keysand certificates will be generated once by the local CA and will then be managed by the Identity Service.

As described above we envision the key distribution by a central Secure Configuration Service,although it would also be possible to realize the key exchange in a more decentralized way: Eachmicroservice could generate a public/private key pair on its own and then let certify the public key by acentral CA. The fact that the key belongs to the correct microservice type could be ensured by softwareattestation. The advantage of our central Secure Configuartion Service is that less communication steps

47

Page 53: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

and no accesses to the CA from the remote side are needed. Hence this simplifies the communication andhopefully shrinks the attack surface.

4.5 Practicability / Evaluation

● ● ● ● ● ●

0.0

2.5

5.0

7.5

10.0

12.5

0 20 40

Throughput (k.req/s)

Late

ncy

(s)

● Glibc SCONE−async SCONE−sync

(a) Apache

● ● ● ● ●

●●●●

0

1

2

3

50 100 150

Throughput (k.op/s)

Late

ncy

(ms)

● Glibc + Stunnel

Shielded SCONE−async

Shielded SCONE−sync

(b) Redis

Figure 4.10: Throughput versus latency for Apache and Redis

●●

●●

●●●

0

200

400

600

800

0 20 40

Throughput (k.req/s)

CP

U U

tiliz

atio

n (%

)

● Glibc SCONE−async SCONE−sync

(a) Apache

●●●

0

200

400

600

800

50 100 150

Throughput (k.op/s)

CP

U U

tiliz

atio

n (%

)

● Glibc + Stunnel

Shielded SCONE−async

Shielded SCONE−sync

(b) Redis

Figure 4.11: CPU utilization for Apache and Redis

Our evaluation with SCONE yields promising results with Apache and Redis showing acceptableoverhead when executed within an enclave.

All experiments were run on an Intel Xeon E3-1270 v5 CPU with 4 cores at 3.6 GHz and 8 hyper-threads (2 per core) and 8MB cache. The server has 64 GB of memory and runs Ubuntu 14.04.4 LTS withLinux kernel version 4.2. We disable dynamic frequency scaling to reduce interference. The workloadgenerators run on a machine with two 14-core Intel Xeon E5-2683 v3 CPUs at 2 GHz with 112 GB ofRAM and Ubuntu 15.10. Each machine has a 10 Gb Ethernet NIC connected to a dedicated switch. Wecompare the performance of three versions for each application: (i) one built with the GNU C library(glibc), (ii) one built with a variant of the musl C library adapted to run inside SGX enclaves that usessynchronous system calls (SCONE-sync); and (iii) one built with the same musl C library variant butwith asynchronous system calls (SCONE-async).

48

Page 54: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

For Redis which does not support encryption, we use Stunnel to encrypt their communication in theglibc version. When reporting CPU utilization, the application’s glibc version includes the utilization dueto Stunnel processes. In SCONE, the network shield subsumes the functionality of Stunnel.

Apache is a highly configurable and mature web server, originally designed to spawn a process foreach connection. We configure Apache to use a single process with a pool of 28 threads and use the HTTPbenchmarking tool wrk2 6 to fetch a web page. We increase the number of concurrent clients and thefrequency at which they retrieve the page until the response times start to degrade. Since Apache supportsapplication level encryption in the form of HTTPS, we do not use Stunnel or SCONE’s network shield.Figure 4.10a shows that all three versions show comparable performance until about 40,000 requests persecond, at which point the latency of the SCONE-sync increases dramatically. SCONE-async and glibckeep similar performance until about 50,000 requests per second. The maximum achievable throughputof SCONE-async is slightly higher than that of glibc. SCONE-async achieves a higher throughput thanSCONE-sync due to the asynchronous system call interface.

Redis is a distributed in-memory key/value store and represents a networked, I/O-intensive service.Typical workloads with many concurrent operations exhibit a high system call frequency. Persistence inRedis is achieved by forking and writing the state to stable storage in the background. Fundamentally,forking for enclave applications is difficult to implement and not supported by SCONE. Hence, we deployRedis solely as an in-memory store. We use workloads A to D from the YCSB benchmark suite [10]. Inthese workloads, both the application code and data fit into the EPC, so the SGX driver does not needto page-in any EPC pages. We only present results for workload A (50% reads and 50% updates); theother workloads exhibit similar behaviour. We deploy 100 clients and increase their request frequencyuntil reaching maximum throughput. Figure 4.10b shows that Redis with glibc achieves a throughput of189,000 operations per second. At this point, as shown in Figure 4.11b, Redis, which is single-threaded,becomes CPU-bound with an overall CPU utilization of 400% (4 hyper-threads): 1 hyper-thread is usedby Redis, and 3 hyper-threads are used by Stunnel. SCONE-sync cannot scale beyond 40,000 operationsper second (21% of glibc), also due to Redis’s single application thread. By design, SCONE-syncperforms encryption as part of the network shield within the application thread. Hence, it cannot balancethe encryption overhead across multiple hyper-threads, as Stunnel does, and its utilization peaks at 100%.SCONE-async reaches a maximum throughput of 116,000 operations per second (61% of glibc). Inaddition to the single application thread, multiple OS threads execute system calls inside the SCONEkernel module. Ultimately, SCONE-async is also limited by the single Redis application thread, which iswhy CPU utilization peaks at 200% under maximum throughput. The performance of SCONE-async isbetter than SCONE-sync since SCONE-async has a higher single thread system call throughput. However,since SCONE-async does not assign TLS termination to a separate thread either, it cannot reach thethroughput of glibc.

6https://github.com/giltene/wrk2

49

Page 55: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

5 Summary & ConclusionsStarting with a set of use cases and their requirements an initial version of the SecureCloud platform wasdescribed. Grounded on hardware based security mechanisms (namely Intel SGX enclaves) the mainbuilding blocks of the SecureCloud platform are secure Docker containers running microservices whichcommunicate using a secure event bus. The whole SecureCloud platform is comprised of various services– which themselves are securely executed within enclaves – to develop, deploy, orchestrate, run, andmanage distributed applications which can be securely executed within untrusted cloud environments.Preliminary evaluations – mostly covering practicability and performance aspects – justify the soundnessof our approach.

Based on the foundations laid out in this deliverable subsequent work will refine the envisionedSecureCloud platform. This will be on the hand driven by the lessons learned from prototyping thedescribed use cases and on the other hand from the planed next steps according to the work plan whichincludes work on secure distributed data management and storage as well as distributed schedulingmechanisms. Therefore, the upcoming deliverable D1.2 will report about this extended version of thearchitecture.

50

Page 56: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Bibliography[1] I. Anati, S. Gueron, S. Johnson, and V. Scarlata. Innovative Technology for CPU Based Attestation

and Sealing. In Proceedings of the 2nd International Workshop on Hardware and ArchitecturalSupport for Security and Privacy, volume 13, 2013.

[2] A. Arasu, K. Eguro, M. Joglekar, R. Kaushik, D. Kossmann, and R. Ramamurthy. Transactionprocessing on confidential data using cipherbase. In 2015 IEEE 31st International Conference onData Engineering, pages 435–446, April 2015.

[3] A. ARM. Security technology. building a secure system using trustzone technology. ARMTechnical White Paper, 2009.

[4] S. Bajaj and R. Sion. TrustedDB: A Trusted Hardware-Based Database with Privacy and DataConfidentiality. IEEE Transactions on Knowledge and Data Engineering, 26(3):752–765, March2014.

[5] A. Baumann, M. Peinado, and G. Hunt. Shielding applications from an untrusted cloud with haven.ACM Trans. Comput. Syst., 33(3):8:1–8:26, Aug. 2015.

[6] A. Carroll, M. Juarez, J. Polk, and T. Leininger. Microsoft Palladium: A business overview.Microsoft Content Security Business Unit, pages 1–9, 2002.

[7] Chef Software, Inc. Chef, 2016.

[8] X. Chen, T. Garfinkel, E. C. Lewis, P. Subrahmanyam, C. A. Waldspurger, D. Boneh, J. Dwoskin,and D. R. Ports. Overshadow: A Virtualization-based Approach to Retrofitting Protection inCommodity Operating Systems. SIGPLAN Not., 43(3):2–13, Mar. 2008.

[9] S. Chhabra, B. Rogers, Y. Solihin, and M. Prvulovic. SecureME: A Hardware-software Approachto Full System Security. In Proceedings of the International Conference on Supercomputing, ICS’11, pages 108–119. ACM, 2011.

[10] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud servingsystems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages143–154. ACM, 2010.

[11] J. Criswell, N. Dautenhahn, and V. Adve. Virtual Ghost: Protecting Applications from HostileOperating Systems. SIGPLAN Not., 49(4):81–96, Feb. 2014.

[12] Docker Inc. Docker Swarm, 2016.

[13] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An updated performance comparison of virtualmachines and linux containers. In Performance Analysis of Systems and Software (ISPASS), 2015IEEE International Symposium on, pages 171–172, March 2015.

[14] B. Fitzpatrick. Distributed Caching with Memcached. Linux Journal, 2004(124):5–, Aug. 2004.

[15] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh. Terra: A Virtual Machine-basedPlatform for Trusted Computing. In Proceedings of the 19th ACM Symposium on OperatingSystems Principles, SOSP ’03, pages 193–206. ACM, 2003.

[16] S. Graber and S. Hallyn. Lxc linux containers, 2014.

51

Page 57: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

[17] HAProxy. HAProxy, 2016.

[18] O. S. Hofmann, S. Kim, A. M. Dunn, M. Z. Lee, and E. Witchel. InkTag: Secure Applications onan Untrusted Operating System. SIGPLAN Not., 48(4):265–278, Mar. 2013.

[19] S. Jin, J. Ahn, S. Cha, and J. Huh. Architectural Support for Secure Virtualization Under aVulnerable Hypervisor. In Proceedings of the 44th Annual IEEE/ACM International Symposiumon Microarchitecture, MICRO-44, pages 272–283. ACM, 2011.

[20] Kubernetes. Kubernetes, 2016.

[21] D. Lie, C. Thekkath, M. Mitchell, P. Lincoln, D. Boneh, J. Mitchell, and M. Horowitz. ArchitecturalSupport for Copy and Tamper Resistant Software. SIGPLAN Not., 35(11):168–177, Nov. 2000.

[22] J. M. McCune, Y. Li, N. Qu, Z. Zhou, A. Datta, V. Gligor, and A. Perrig. TrustVisor: Efficient TCBReduction and Attestation. In 2010 IEEE Symposium on Security and Privacy, pages 143–158,May 2010.

[23] J. M. McCune, B. J. Parno, A. Perrig, M. K. Reiter, and H. Isozaki. Flicker: An ExecutionInfrastructure for Tcb Minimization. SIGOPS Oper. Syst. Rev., 42(4):315–328, Apr. 2008.

[24] F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas, H. Shafi, V. Shanbhogue, and U. R.Savagaonkar. Innovative instructions and software model for isolated execution. In Proceedings ofthe 2nd International Workshop on Hardware and Architectural Support for Security and Privacy,HASP ’13, pages 10:1–10:1. ACM, 2013.

[25] D. Merkel. Docker: Lightweight Linux Containers for Consistent Development and Deployment.Linux Journal, 2014(239), Mar. 2014.

[26] M. Mohamed, S. Yangui, S. Moalla, and S. Tata. Web Service Micro-Container for Service-basedApplications in Cloud Environments. In 20th IEEE International Workshops on EnablingTechnologies: Infrastructure for Collaborative Enterprises (WETICE), pages 61–66, June 2011.

[27] E. Mykletun and G. Tsudik. Incorporating a secure coprocessor in the database-as-a-service model.In Innovative Architecture for Future Generation High-Performance Processors and Systems(IWIA’05), Jan. 2005.

[28] NeilBrown. Linux Kernel Overlay Filesystem Documentation, 2016.

[29] Open Stack Foundation. OpenStack Open Source Cloud Computing Software, 2016.

[30] E. Owusu, J. Guajardo, J. McCune, J. Newsome, A. Perrig, and A. Vasudevan. OASIS: OnAchieving a Sanctuary for Integrity and Secrecy on Untrusted Platforms. In Proceedings of the2013 ACM SIGSAC Conference on Computer & Communications Security, CCS ’13, pages13–24. ACM, 2013.

[31] M. Peinado, Y. Chen, P. England, and J. Manferdelli. NGSCB: A Trusted Open System, pages86–97. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.

52

Page 58: SecureCloud · The use cases were selected to encompass data processing deemed most likely to be outsourced into the cloud. Although the initial use cases are not really big data

Deliverable 1.1 Secure Big Data Processing in Untrusted Clouds

[32] B. C. Pierce. Types and programming languages. The MIT Press, 2002.

[33] Redis. Redis, 2016.

[34] W. Reese. Nginx: The High-performance Web Server and Reverse Proxy. Linux Journal,2008(173), Sept. 2008.

[35] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis, M. Peinado, G. Mainar-Ruiz, and M. Russinovich.VC3: Trustworthy Data Analytics in the Cloud Using SGX. In 2015 IEEE Symposium on Securityand Privacy, pages 38–54, May 2015.

[36] J. Stubbs, W. Moreira, and R. Dooley. Distributed Systems of Microservices Using Docker andSerfnode. In 7th International Workshop on Science Gateways (IWSG), pages 34–39, June 2015.

[37] G. E. Suh, D. Clarke, B. Gassend, M. van Dijk, and S. Devadas. AEGIS: Architecture forTamper-evident and Tamper-resistant Processing. In Proceedings of the 17th Annual InternationalConference on Supercomputing, ICS ’03, pages 160–171. ACM, 2003.

[38] R. Ta-Min, L. Litty, and D. Lie. Splitting Interfaces: Making Trust Between Applications andOperating Systems Configurable. In Proceedings of the 7th Symposium on Operating SystemsDesign and Implementation, OSDI ’06, pages 279–292, Berkeley, CA, USA, 2006. USENIXAssociation.

[39] The Apache Software Foundation. Apache Hadoop, 2016.

[40] J. Thones. Microservices. IEEE Software, 32(1):116–116, Jan 2015.

[41] Trusted Computing Group. TCG Specification Architecture Overview Revision 1.4. Technicalreport, Aug. 2007.

[42] Trusted Computing Group. Trusted Computing Group — Open Standards for Security Technology,2016.

[43] F. Zhang, J. Chen, H. Chen, and B. Zang. CloudVisor: Retrofitting Protection of Virtual Machinesin Multi-tenant Cloud with Nested Virtualization. In Proceedings of the Twenty-Third ACMSymposium on Operating Systems Principles, SOSP ’11, pages 203–216. ACM, 2011.

53