Cloud Dependability: How Close Are We? Challenges and Research Issues Towards Dependable Clouds Marc Lacoste, Thierry Coupaye Orange Labs International.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Cloud Dependability: How Close Are We?
Challenges and Research Issues Towards Dependable Clouds
Marc Lacoste, Thierry Coupaye
Orange Labs
International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’11)Grenoble, France - October 12th, 2011
Technological vision: Cloud computing is a model for enabling on-demand network access to a shared pool of virtualized computing resources (networks, servers, storage, applications, devices/mobiles and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (self-service model through API or web portals)
Market vision (XaaS): same + pay-per-use (or pay-as-you-go) billing models
5 characteristics
1. On Demand Self-Service
2. Broad Network Access
3. Virtualized Resource Pooling
4. Rapid Elasticity
5. Measured Service
3 delivery models
(markets)
1. Cloud Software as a Service (SaaS)
2. Cloud Platform as a Service (PaaS)
3. Cloud Infrastructure as a Service (IaaS)
4 deployment
models
1. Private cloud
2. Public cloud
3. Hybrid cloud
4. Community Cloud
3 pivotal technologies
1. Virtualization
2. Autonomics (automation)
3. Grid Computing (job scheduling)
* Adapted from NIST definition
6
Technologies, Visions, Usages, Markets
Server VirtualizationStorage VirtualizationNetwork Virtualization
Strong isolation throughout the life- cycle of personal information.
Many tough questions:
Need-to-know enforcement, secure data storage, data retention and destruction, legal implications…
Today’s PETs are not enough!
Barrier #5: PrivacyBarrier #5: Privacy
17
Data protectionGuarantee data security in a shared multi-tenant environment.
Barrier #6: TraceabilityBarrier #6: Traceability
Being unable to locate the data adds to
the loss of control over IT resources.
Legal, political, trust issues: Compliance, data hosted abroad exposed to foreign governments. Inability to prove that data comes from a trusted source.
A widely unchartered area.
Barrier #7: Legal issuesBarrier #7: Legal issues
Multiple conflicting jurisictions over the cloud data flows.
Cloud providers: have trouble providing assurance of compliance with regulations.
Customers: have trouble understanding the rights and obligations of each party.
Importance of security SLAs.
18
Trust enablersProve to third parties that the cloud infrastructure is trustworthy.
Source: Gartner. Analyzing the Risk Dimensions of Cloud and Saas Computing, 2010.
Barrier #8: TransparencyBarrier #8: Transparency
Prove security hygiene of providerinfrastructure to third parties.
– Open, dynamic, virtualized environments : VMs can move between physical servers, so does latency and hardware reliability
– Scalability and multi-tenancy: thousands of applications (each generally made of multiple VMs) hosted on a single platform, resources actually shared by multiple applications/users
– Layered infrastructures (SaaS/PaaS/IaaS) with multiple administrative roles/domains with limited control offered to application developpers
– Pay-per-use: you have to pay for ressources you use (VMs)!
25
Where are we on cloud resilience ?
A huge background on distributed systems dependability, including in grid and autonomic computing (self-repair), but few works specifically targeting cloud reliability
– e.g. 0 paper in IEEE Cloud 2009, 1 paper in IEEE Cloud 2010, 1 paper in IEEE Cloud 2011
Some illustrative work-in-progress
– Server failures characterization with objective of characterizing the complete data center hardware (server, storage, network) reliability model [Microsoft Research, ACM Symposium on Cloud Computing’2010]
– 3-replicas ring with scatter placement algorithm of backups of cloud management nodes [U. Maryland, IBM Research, IEEE IPDPS’2009]
– Crash and timing fault tolerance (not Byzantine faults) through strong replica consistency of applicative nodes (semi-active and semi-passive replication) [U. Cleveland, UC Santa Barbara, IEEE Cloud’2010]
– Bizantine fault tolerance through 3f+1 active replication in community (‘’volontary’’) cloud [U. Hong Kong, NUDT, IEEE Cloud’2011]
– Adaptive fault tolerance (forward and backward recovery) in real time cloud through runtime continuous reliability assessment of nodes and replication with variants (VMs) [INRIA, IEEE World Congres on Services’2011]
26
Virtual Appliance Management Platform (VAMP)
VAMP is a PaaS Application Lifecycle Platform (ALM) under development inside Orange Labs.
VAMP targets the construction (e.g. VM image generation), deployment and management of distributed applications in the cloud.
VAMP is based on a architectural description (component-based with the Fractal component model) of applications that is used throughout the complete lifecycle of applications (including ‘model@runtime’)
Control Plane (VAMP)
Applicative Plane (legacy application)
Reification and control
Components
27
VAMP Runtime Architecture
deploymentmanager
configurator agent
configurator agent
VM0 VM1 VM2
Control Plane (VAMP)
Applicative Plane (legacy application)
Message Oriented Middleware (MOM)
VAMPmanager
VAMP managercreates and repairDeployment Managers.VAMP manager may be inside ou outside the IaaS
Deployment Manager (1/application)create applicative VMs and recreates VMs when they fail
Configurator Agents self-configure the application
28
Reliable Self-configuration in VAMP (work in progress)
Self-configuration protocol
– At VM creation, each component C (managed by a
Configurator inside a VM):– Announces itself to the configurator network;– Configures locally the applicative elements;– Exports its server interfaces to its “client components”
which can then bind to C (‘bind’ signals);– Once all C client interfaces are bound, C starts and
notifies its start to its “client components” (‘start’ signals).
– Gradually (“epidemically”), the complete application is started.
– When a VM failure is detected, the faulty VM is replaced by a new VM which is simply introduced in the self-configuration protocol as if it was in its initial deployment phase (except that ‘start’ signals are interpreted as ‘restart’ by already running components).
29
Synthesis on VAMP
A decentralized approach based on asynchronous and reliable communications.
– Each VM evolves in parallel at its own pace.
– No need for global synchronization between VMs. Replication (primary-backup schema, passive replication) of DMs
complete the self-configuration protocol for complete application reliability.
The approach works for VM crash, transient network failures, and for stateless components.
The self-configuration protocols is also a (basic) self-repair protocol: a running application is seen as in a continuous deployment process.
30
Attack targets and countermeasures
Hypervisor
vSwitch
Hardware
VM VM VM VM
STORAGE
VNIC
CPU MEMORY NETWORK
PNIC
MAC spoofing/snooping: Static address allocation.
IP attacks: Virtual firewalling.
VLAN hopping: Physical traffic segregation.
Hyperspacing: MMU, IOMMU.
Hyperthreading: No hyperthreading.
Buffer overflows: No eXecute bit.
Hyperjacking:
High attack surface: Certification, open, modular solutions.
Hypervisor integrity: TPM
Secrecy violations: Authentication, signature.
Integrity violations: Encryption of stored data (self-encrypting drives).
– Emergence of cloud networking. Issues: real-time virtual network reconfiguration; on-demand chaining of security services; security of network management interface.
Influence of VM migration: long distance migration (across data centers, across clouds). Migration of the security state consistently, securely, and efficiently (e.g., IP address space)?
Remediation mechanisms: Currently limited to threat containment, with no elimination.
Hypervisor protection: still little addressed.
– Isolation of device drivers: rich literature in the system community (virtualization, sandboxing, language techniques, new kernel architectures...) is applicable.
– Assurance: control flow integrity, attestation, compliance proofs, trusted computing.
42
Open Issues: Decision
Multi-lateral security policy management:
– Policy definition: despite advances, current policies are not sufficiently flexible for the cloud and the inter-cloud setting. Need for policies spanning organization boundaries, geographical borders, applicable in multiple contexts, with multiple actors.
– Automated policy aggregation and deployment.
Security adaptation strategy:
– Flexible policy negotiation: Some first frameworks (e.g., XENA), but a lot still to be done. Issues: interoperability, multiple conflicting stakeholder responsibilities, multiple jurisdictions. Promising solution: ontologies.
– Strategy representation: notion of policy continuum, DSLs.
Coordination of multiple self-protection loops: Stability of the result?
– Event correlation from monitoring components? Decision synchronization towards between reaction components?
– Authentication of communications: detection / decision; decision / reaction.
– Lack of security supervision architecture: for the cloud, and the inter-cloud setting.
Learning from past attacks to improve security and build defenses against future threats.
43
Open Issues: Resilience
Main roadblocks toward cloud resilience:
– Cloud layered architecture with different responsabilities: coordination and consistency of mechanisms offered by platforms versus mechanisms programmed by applications?
– Reliability of cloud management services themselves.
– Applicative models (stateless/statefull, strong/weak coupling) and fault-tolerance?
Self-stabilization and cloud resilience?
– Applicativity of self-stabilization in cloud?
– At the hypervisor level? At VM and applicative level?
– Openness of cloud systems?
– Cost (w.r.t. performance) of self-stabilization in cloud?
44
Conclusion
In the cloud, security is clearly not an option:
– Migrating critical data and applications to the cloud is and remains risky.– Check (and stick to) best practices!
The main challenge of cloud dependability remains trust:
– A lot of building blocks already there, but still a long way to go.– Importance of SLAs to clarify responsibilities between customer and
provider. – Self-healing, self-protecting cloud architectures can help, both in
specification and enforcement of such SLAs.– Strong security, reliability, transparency, and accountability will be
the key enablers to maintain solid and durable trust between a CSP and its customers.