Cloud Computing - Zoozoo.cs.yale.edu/classes/cs426/2014/lec/cloud.pdf · -Decentralized social network application ... Cloud computing is a business model for enabling convenient

Post on 18-Mar-2018

222 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

Cloud Computing

Ennan ZhaiComputer Science at Yale University

ennan.zhai@yale.edu

About Final Project

About Final Project

• Important dates before demo session: - Oct 31: Proposal v1.0 - Nov 7: Source code v1.0- Nov 14: Proposal v2.0- Nov 21: Source code v2.0- Dec 1: Bakeoff version

About Final Project• How to present?

- ~2 min video and ~2 min Q&A

• What to submit? - 1) Source code- 2) Proposal- 3) README

• Others- any programming language- running on zoo machines- individual or group (2-3 members)

About Final Project• Topics:

- Structured P2P system, e.g., Chord- Hybrid P2P system, e.g., BubbleStorm- Email client software- Decentralized read/write file system- Decentralized social network application- Sybil-resistant recommendation system- Fault tolerance (BFT) system- Accountability system, e.g., PeerReview - ... ...

About Final Project

• A few good final projects in the past years:- Workable email client- A picture-based encryption tool- A privacy-preserving social network- ... ...

Questions?

• Cloud Computing

• Challenges in the Clouds

• A Concrete Cloud Reliability Case

Lecture Outline

• Cloud Computing

• Challenges in the Clouds

• A Concrete Cloud Reliability Case

Lecture Outline

What’s the Cloud Computing?

Cloud computing is a business model for enabling convenient

network access to a shared pool of configurable resources

which can be rapidly provisioned and released with minimal

management effort or service provider interaction.

--- according to NIST(National Institute of Standards and Technology)

What’s the Cloud Computing?

Have You Used the Cloud?

Have You Used the Cloud?

Have You Used the Cloud?

Have You Used the Cloud?

Have You Used the Cloud?

Have You Used the Cloud?

Why We Like It?

• Why users like it? - Do not care where it is, it is “just there”- Access from “any” platform

• Why CS researchers like it? - High-performance computation with less money- Lots of hard and interesting challenges

Why We Like It?

• Why users like it? - Do not care where it is, it is “just there”- Access from “any” platform

• Why CS researchers like it? - High-performance computation with less money- Lots of hard and interesting challenges

Why We Like It?

• Why users like it? - Do not care where it is, it is “just there”- Access from “any” platform

• Why CS researchers like it? - High-performance computation with less money- Lots of hard and interesting challenges

Why We Like It?

Cloud V.S. Distributed Systems

What Kinds of Clouds Exist Now?

• Three types of services:- Software as a Service (SaaS)

- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc

- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.

- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.

What Kinds of Clouds Exist Now?

• Three types of services:- Software as a Service (SaaS)

- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc

- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.

- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.

What Kinds of Clouds Exist Now?

• Three types of services:- Software as a Service (SaaS)

- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc

- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.

- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.

What Kinds of Clouds Exist Now?

• Three types of services:- Software as a Service (SaaS)

- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc

- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.

- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.

What Kinds of Clouds Exist Now?

Software as a Service (SaaS)

Software as a Service (SaaS)

Hardware

Middleware

Application

Cloud Provider (i.e., SaaS Provider)

Software as a Service (SaaS)

Hardware

Middleware

Application

Cloud Provider (i.e., SaaS Provider)

• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider- Example: Google Apps, Salesforce.com, etc.

Software as a Service (SaaS)

Hardware

Middleware

Application

Cloud Provider (i.e., SaaS Provider)

• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider- Example: Google Apps, Salesforce.com, etc.

Software as a Service (SaaS)

Hardware

Middleware

Application

Customer

Cloud Provider (i.e., SaaS Provider)

• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider and uses the service- Example: Google Apps, Salesforce.com, etc.

Software as a Service (SaaS)

Hardware

Middleware

Application

Customer

Cloud Provider (i.e., SaaS Provider)

• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider and uses the service- Example: Google Apps, Salesforce.com, etc.

Software as a Service (SaaS)

Hardware

Middleware

Application

Customer

Cloud Provider (i.e., SaaS Provider)

• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider and uses the service- Example: Google Apps, Salesforce.com, etc.

A Typical SaaS: Gmail

A Typical SaaS: Gmail

Hardware

Middleware

Application

Gmail Provider

A Typical SaaS: Gmail

Hardware

Middleware

Application

Gmail Provider

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

A Typical SaaS: Gmail

Hardware

Middleware

Application

Gmail Provider

BigTable

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

A Typical SaaS: Gmail

Hardware

Middleware

Application

Gmail Provider

BigTable

BigTable APIs

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

Hardware

Middleware

Application

Gmail Provider

Gmail

A Typical SaaS: Gmail

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

BigTable

BigTable APIs

Hardware

Middleware

Application

Customer

Gmail Provider

Gmail

A Typical SaaS: Gmail

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

BigTable

BigTable APIs

Hardware

Middleware

Application

Customer

Gmail Provider

Gmail

A Typical SaaS: Gmail

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

BigTable

BigTable APIs

Hardware

Middleware

Application

Customer

Gmail Provider

Gmail

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

A Typical SaaS: Gmail

BigTable

BigTable APIs

Hardware

Middleware

Application

Customer

Gmail Provider

Gmail

• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)

A Typical SaaS: Gmail

BigTable

BigTable APIs

Platform as a Service (PaaS)

Hardware

Middleware

Application

Platform as a Service (PaaS)

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the platform- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

Hardware

Middleware

Application

Platform as a Service (PaaS)

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the platform- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

Hardware

Middleware

Application

Platform as a Service (PaaS)

App Provider

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays App provider for the service- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

Hardware

Middleware

Application

Platform as a Service (PaaS)

App Provider

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays App provider for the service- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

Hardware

Middleware

Application

Platform as a Service (PaaS)

CustomerApp Provider

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays app provider for the service- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

Hardware

Middleware

Application

Platform as a Service (PaaS)

CustomerApp Provider

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays app provider for the service- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

Hardware

Middleware

Application

Platform as a Service (PaaS)

CustomerApp Provider

• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays app provider for the service- Example: Windows Azure, Google App Engine, etc.

Cloud Provider (i.e., PaaS Provider)

Application

A Typical PaaS: Facebook

Hardware

Middleware

Application

A Typical PaaS: Facebook

Facebook Provider

Hardware

Middleware

Application

A Typical PaaS: Facebook

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- Third-party game applications- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook Provider

Hardware

Middleware

Application

A Typical PaaS: Facebook

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- Third-party game applications- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook APIs

Facebook Clusters

Facebook Provider

Hardware

Middleware

Application

A Typical PaaS: Facebook

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- Third-party game applications- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook APIs

Facebook Clusters

Facebook Provider

Hardware

Middleware

Application

A Typical PaaS: Facebook

App Provider

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook Game

Facebook APIs

Facebook Clusters

Facebook Provider

Hardware

Middleware

Application

A Typical PaaS: Facebook

App Provider

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook APIs

Facebook Clusters

Facebook Provider

Facebook Game

Hardware

Middleware

Application

A Typical PaaS: Facebook

CustomerApp Provider

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook Game

Facebook APIs

Facebook Clusters

Facebook Provider

Facebook Game

Hardware

Middleware

Application

A Typical PaaS: Facebook

CustomerApp Provider

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook Game

Facebook APIs

Facebook Clusters

Facebook Provider

Facebook Game

Hardware

Middleware

Application

A Typical PaaS: Facebook

CustomerApp Provider

• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log

analysis for recommendations

Facebook Game

Facebook APIs

Facebook Clusters

Facebook Provider

Facebook Game

Infrastructure as a Service (IaaS)

Hardware

Middleware

Application Application

Middleware

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the resources- Example: Amazon Web Services, Rackspace Cloud, etc.

Infrastructure as a Service (IaaS)

Cloud Provider (i.e., IaaS Provider)

Hardware

Middleware

Application Application

Middleware

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the resources- Example: Amazon Web Services, Rackspace Cloud, etc.

Infrastructure as a Service (IaaS)

Cloud Provider (i.e., IaaS Provider)

Hardware

Middleware

Application Application

Middleware

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.

Infrastructure as a Service (IaaS)

App Provider

Cloud Provider (i.e., IaaS Provider)

Hardware

Middleware

Application Application

Middleware

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.

Infrastructure as a Service (IaaS)

App Provider

Cloud Provider (i.e., IaaS Provider)

Middleware

Application

Hardware

Middleware

Application Application

Customer

Middleware

Infrastructure as a Service (IaaS)

App Provider

Cloud Provider (i.e., IaaS Provider)

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.

Middleware

Application

Hardware

Middleware

Application Application

Customer

Middleware

Infrastructure as a Service (IaaS)

App Provider

Cloud Provider (i.e., IaaS Provider)

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.

Middleware

Application

Hardware

Middleware

Application Application

Customer

Middleware

Infrastructure as a Service (IaaS)

App Provider

Cloud Provider (i.e., IaaS Provider)

• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.

Middleware

Application

Hardware

Middleware

Application Application

Middleware

Typical IaaS: EC2 and S3

Amazon

Hardware

Middleware

Application Application

Middleware

Typical IaaS: EC2 and S3

Amazon

EC2 S3

Hardware

Middleware

Application Application

Middleware

Amazon

EC2 S3

Netflix Provider

• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce

Typical IaaS: EC2 and S3

Hardware

Middleware

Application

Middleware

Amazon

EC2 S3

Netflix Provider

• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce

Typical IaaS: EC2 and S3

Hardware

Middleware

Application

Middleware

Amazon

EC2 S3

Netflix Provider

• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce

Typical IaaS: EC2 and S3

Netflix

Hardware

Middleware

Application

Middleware

Amazon

EC2 S3

Netflix Provider

• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce

Typical IaaS: EC2 and S3

Netflix

Hardware

Middleware

Application Application

Middleware

Amazon

EC2 S3

CustomerNetflix Provider

• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce

Typical IaaS: EC2 and S3

Netflix

Hardware

Middleware

Application Application

Middleware

Amazon

EC2 S3

CustomerNetflix Provider

• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce

Typical IaaS: EC2 and S3

Netflix

• Three types of services:- Software as a Service (SaaS)

- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc

- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.

- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.

Recall

• Three types of services:- Software as a Service (SaaS)

- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc

- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.

- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.

Recall

Zoo?

The Major Cloud Providers

The Major Cloud Providers

• Amazon is the big player: - Infrastructure as a service (e.g., EC2)- Storage as a service (e.g., S3)

• But there are many others:- Microsoft Azure: It has similar services to Amazon, with an

emphasis on .Net programming model- Google App Engine: It offers programming interface, Hadoop, also

software as a service, e.g., Gmail and Google Docs- IBM, HP, Yahoo!: They seem to focus on enterprise scale cloud apps

The Major Cloud Providers

• Amazon is the big player: - Infrastructure as a service (e.g., EC2)- Storage as a service (e.g., S3)

• But there are many others:- Microsoft Azure: It has similar services to Amazon, with an

emphasis on .Net programming model- Google App Engine: It offers programming interface, Hadoop, also

software as a service, e.g., Gmail and Google Docs- IBM, HP, Yahoo!: They seem to focus on enterprise scale cloud apps

• Cloud Computing

• Challenges in the Clouds

• A Concrete Cloud Reliability Case

Lecture Outline

• Cloud Computing

• Challenges in the Clouds

• A Concrete Cloud Reliability Case

Lecture Outline

What Kinds of Challenges?

• Scalability

• Availability and reliability

• Security and privacy

What Kinds of Challenges?

• Scalability

• Availability and reliability

• Security and privacy

What Kinds of Challenges?

Scalability

PC

Scalability

• What if one computer is not enough? - Buy a bigger (server-class) computer

PC

Scalability

• What if one computer is not enough? - Buy a bigger (server-class) computer

PC Server

Scalability

• What if one computer is not enough? - Buy a bigger (server-class) computer

PC Server

• What if the biggest computer is not enough? - Buy many computers

Scalability

• What if one computer is not enough? - Buy a bigger (server-class) computer

PC Server Cluster

• What if the biggest computer is not enough? - Buy many computers

Scalability

ScalabilityRack

ScalabilityNetwork switches

(connects nodes with each other and with other racks)

Rack

ScalabilityNetwork switches

(connects nodes with each other and with other racks)

Many nodes/blades (often identical)

Rack

ScalabilityNetwork switches

(connects nodes with each other and with other racks)

Many nodes/blades (often identical)

Storage device(s)

Rack

Scalability

• What if cluster is too big to fit into machine room? - Build a separate building for the cluster- Building can have lots of cooling and power- Result: Data center

PC Server Cluster

Scalability

• What if cluster is too big to fit into machine room? - Build a separate building for the cluster- Building can have lots of cooling and power- Result: Data center

PC Server Cluster

Scalability

• What if cluster is too big to fit into machine room? - Build a separate building for the cluster- Building can have lots of cooling and power- Result: Data center

PC Server Cluster Data center

Google Data Center in Oregon

Data centers (size of a football field)

Google Data Center in Oregon

• A warehouse-sized computer - A single data center can easily contain 10,000 racks with

100 cores in each rack (1,000,000 cores total)

Google Data Center in OregonData centers (size of a

football field)

Google Data Center Locations

Google Data Centers in the USA

Google Data Centers in Europe

Google Data Centers World Wide

Open Challenges

Open Challenges

• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])- Can you design more scalable data center network?

Open Challenges

• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])- Can you design more scalable data center network?

Open Challenges

• Can you make data center more scalable?- Scalable data center architecture (e.g., VL2 [SIGCOMM’09])- Can you design more scalable data center network?

• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])

Open Challenges

• Can you make data center more scalable?- Scalable data center architecture (e.g., VL2 [SIGCOMM’09])- Can you design more scalable data center network?

• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])

• Scalability

• Availability and reliability

• Security and privacy

What Kinds of Challenges?

• Scalability

• Availability and reliability

• Security and privacy

What Kinds of Challenges?

Availability & Reliability

• Is the cloud always there when you need it? - Service outages- Connectivity outages

Recent Cloud Disasters

Recent Cloud Disasters

Recent Cloud Disasters

Top10 Cloud Service Outages

Open Challenges

• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])

Open Challenges

• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])

Open Challenges

• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])

Open Challenges

• Can you propose an approach to make the clouds more robust?- Fault tolerate systems (e.g., F10 [NSDI’13])

• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])

Open Challenges

• Can you propose an approach to make the clouds more robust?- Fault tolerate systems (e.g., F10 [NSDI’13])

• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])

• Scalability

• Availability and reliability

• Security and privacy

What Kinds of Challenges?

• Scalability

• Availability and reliability

• Security and privacy

What Kinds of Challenges?

Security & Privacy

• Compromised your cloud accounts - Hacker does not need to break into your home to steal all your

private data, if he can break or guess your cloud password- Even worse, if hack who cracks your Facebook account get into

your accounts everywhere online

Security & Privacy

• Compromised your cloud accounts - Hacker does not need to break into your home to steal all your

private data, if he can break or guess your cloud password- Even worse, if hack who cracks your Facebook account get into

your accounts everywhere online

• You do not know if the cloud providers read your private data

Open Challenges

• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])

Open Challenges

• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])

Open Challenges

• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])

Open Challenges

• Can you propose an approach to verify if the cloud provider modifies your data?- Trusted cloud computing (e.g., Excalibur [USENIX Sec’12])

• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])

Open Challenges

• Can you propose an approach to verify if the cloud provider modifies your data?- Trusted cloud computing (e.g., Excalibur [USENIX Sec’12])

• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])

More Risks?

• EverClouds is a project of DeDiS group (Bryan is PI): - aims to solve tricky cloud security problems (e.g., timing channels)- tries to make the clouds more reliable (e.g., failure detection)

• We already have some efforts:- SRA: A cloud structural reliability auditing system (submitted)- iRec: A cloud independence recommender system (HotDep’13)- P-SRA: A privacy-preserving structural-reliability auditor (CCSW’13)- Timing channel control with provider-enforced deterministic

execution (CCSW’10)

More Risks?

• Cloud Computing

• Challenges in the Clouds

• A Concrete Cloud Reliability Case

Lecture Outline

• Cloud Computing

• Challenges in the Clouds

• A Concrete Cloud Reliability Case

Lecture Outline

Realistic Problem

Summary of the October 22, 2012 AWS Service Event in the US-East Region

We’d like to share more about the service event that occurred on Monday, October 22nd in the US-East Region. We have now completed the analysis of the events that affected AWS customers, and we want to describe what happened, our understanding of how customers were affected, and what we are doing to prevent a similar issue from occurring in the future.

The Primary Event and the Impact to Amazon Elastic Block Store (EBS) and Amazon Elastic Compute Cloud (EC2)

Correlated failures resulting from EBSdue to bugs in one EBS server

Realistic Problem

Elastic Compute Cloud (EC2)

Elastic Block Store (EBS)

Realistic Problem

... ...

Elastic Block Store (EBS)

Realistic Problem

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

Elastic Block Store (EBS)

Realistic Problem

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Problem

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Problem

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Problem

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Problem

System We Need

• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing

System We Need

• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing

Dependency Data Collections

Type Dependency Expression

Network <src=”S” dst=”D” route=”x,y,z”/>

Hardware <hw=”H” type=”T” dep=”x”/>

Software <pgm=”S” hw=”H” dep=”x,y,z”/>

Our defined format

• Reuse existing data collection tools: - Convert the outputs to uniform format. - Three types of format: NET, HW and SW.

Dependency Data Collections

DepDB

NSDMiner

Dependency Data Collections

DepDB

NSDMiner

Dependency Data Collections

DepDB

NSDMiner

Dependency Data Collections

DepDB

NSDMiner

<src=”S1” dst=”Internet” route=”ToR1,Core1”/><src=”S1” dst=”Internet” route=”ToR1,Core2”/><src=”S2” dst=”Internet” route=”ToR1,Core1”/><src=”S2” dst=”Internet” route=”ToR1,Core2”/>

DepDB

NSDMiner

<src=”S1” dst=”Internet” route=”ToR1,Core1”/><src=”S1” dst=”Internet” route=”ToR1,Core2”/><src=”S2” dst=”Internet” route=”ToR1,Core1”/><src=”S2” dst=”Internet” route=”ToR1,Core2”/>

System We Need

• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing

Example Redundancy

Example Redundancy

SW

HW

NET

Example Redundancy

Building Fault Graph Top-to-Bottom

SW

HW

NET

Redundancy configuration fails

Step1: Root Node

Server 2 failsServer 1 fails

Redundancy configuration fails

Step2: Server Nodes

AND gate: all the sublayer nodes fail, the upper layer node fails

Server 2 failsServer 1 fails

Step2: Server NodesRedundancy configuration fails

Net fails

+" +"

HW fails SW fails SW fails

Server 2 failsServer 1 fails

Net fails HW fails

Redundancy configuration fails

Step3: Dependency Nodes

OR gate: one of the sublayer nodes fails, the upper layer node fails

Net fails

+" +"

HW fails SW fails SW fails

Server 2 failsServer 1 fails

Net fails HW fails

Step3: Dependency NodesRedundancy configuration fails

Net fails

+" +"

HW fails

Disk2CPU2

+"

SW fails SW fails

Server 2 failsServer 1 fails

Disk1CPU1

+"

Net fails HW fails

Step4: Hardware DependencyRedundancy configuration fails

Net fails

+" +"

ToR1 Core2Core1

HW fails

+" +"

Disk2CPU2

+"

Path1 Path2

SW fails SW fails

Server 2 failsServer 1 fails

Disk1CPU1

+"

Net fails HW fails

Step5: Network DependencyRedundancy configuration fails

Net fails

+" +"

ToR1 Core2Core1

HW fails

libc6 libccllibsvnl

+" +"

Disk2CPU2

+"

Path1 Path2

+"

Riak

+"

Query

+"

SW fails SW fails

Server 2 failsServer 1 fails

Disk1CPU1

+"

Net fails HW fails

Step6: Software DependencyRedundancy configuration fails

System We Need

• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing

• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm

Efficient Auditing

• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm

Efficient Auditing

Minimal Fault Set Algorithm

• Traditional algorithm in safety engineering- Exponential complexity (NP-hard)

• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours

• We propose efficient failure sampling algorithm.

Minimal Fault Set Algorithm

• Traditional algorithm in safety engineering- Exponential complexity (NP-hard)

• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Failure Sampling Algorithm

1 or 0 1 or 0 1 or 0

Failure Sampling Algorithm

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

1 or 0 ?

1 or 0 1 or 0 1 or 0

Failure Sampling Algorithm

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Fault Sets

1 or 0 1 or 0 1 or 0

Failure Sampling Algorithm

1 or 0 ?Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

The 1st Sampling Round

Fault Sets

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

1 or 0 1 or 0 1 or 0

The 1st Sampling Round

Fault Sets

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘ ✘

The 1st Sampling Round

Fault Sets

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘ ✘

The 1st Sampling Round

Fault Sets

✘ ✘

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘ ✘Fault Sets

{Server1’s HW, Server2’s HW}

The 1st Sampling Round

✘ ✘

Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

1 or 0 1 or 0 1 or 0

Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔ ✘✔Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔ ✘✔

Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔ ✘

Fault Sets

{Server1’s HW, Server2’s HW}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘✔Fault Sets

{Server1’s HW, Server2’s HW}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘✔Fault Sets

{Server1’s HW, Server2’s HW}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✘ ✘

✔✘✔Fault Sets

{Server1’s HW, Server2’s HW}

{Switch1}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✘ ✘

Fault Sets{Server1’s HW, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

... ...

After Many (e.g., 107) Rounds

Fault Sets{Server1’s HW, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

... ...

Size-Based Ranking

Fault Sets{Switch1}

{Switch1}

{Switch1, Server2’s HW}

{Switch1, Server2’s HW}

{Server1’s HW, Server2’s HW}

... ...

Size-Based Ranking

Thanks!

Questions?

top related