Top Banner
Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai 1 Ruichuan Chen 2 , David Isaac Wolinsky 1 , Bryan Ford 1 1 Yale University 2 Bell Labs/Alcatel-Lucent
179

Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Aug 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Heading Off Correlated Failures through Independence-as-a-Service

Ennan Zhai1

Ruichuan Chen2, David Isaac Wolinsky1, Bryan Ford1

1 Yale University 2 Bell Labs/Alcatel-Lucent

Page 2: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly

Background

Page 3: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly

Background

Page 4: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly

Background

Page 5: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly

Background

Unexpected common dependencies!

Page 6: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Service Outage Losses

Data Center Outages Generate Big LossesDowntime in a data center can cost an average of $505,500 per incident, according to a Ponemon Institute study.

Analytics Slideshow: 2010 Data Center

Operational Trends Report

Page 7: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Service Outage Losses

Data Center Outages Generate Big LossesDowntime in a data center can cost an average of $505,500 per incident, according to a Ponemon Institute study.

Analytics Slideshow: 2010 Data Center

Operational Trends Report

Page 8: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

What is correlated failure?

Page 9: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Rack1

Switch1

Rack2

Switch2

Rack3

Switch3

What is correlated failure?

Page 10: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Rack1

Switch1

Rack2

Switch2

Rack3

Switch3

Primary Backup Backup

What is correlated failure?

Page 11: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Agg Switch

Rack1

Switch1

Rack2

Switch2

Rack3

Switch3

Primary Backup Backup

What is correlated failure?

Page 12: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Agg Switch

Rack1

Switch1

Rack2

Switch2

Rack3

Switch3

Primary Backup Backup

What is correlated failure?

Page 13: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Agg Switch

Rack1

Switch1

Rack2

Switch2

Rack3

Switch3

Primary Backup Backup

What is correlated failure?

Page 14: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Realistic Example

Page 15: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Summary of the October 22, 2012 AWS Service Event in the US-East Region

We’d like to share more about the service event that occurred on Monday, October 22nd in the US-East Region. We have now completed the analysis of the events that affected AWS customers, and we want to describe what happened, our understanding of how customers were affected, and what we are doing to prevent a similar issue from occurring in the future.

The Primary Event and the Impact to Amazon Elastic Block Store (EBS) and Amazon Elastic Compute Cloud (EC2)

Realistic Example

Correlated failures resulting from EBSdue to bugs in one EBS server

Page 16: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Elastic Compute Cloud (EC2)

Elastic Block Store (EBS)

Realistic Example

Page 17: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

... ...

Elastic Block Store (EBS)

Realistic Example

Page 18: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

Elastic Block Store (EBS)

Realistic Example

Page 19: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Example

Page 20: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Example

Page 21: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Example

Page 22: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

... ...... ...

VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4

EBS Server2EBS Server1

Realistic Example

Page 23: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Even Worse

Page 24: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,
Page 25: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Page 26: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Third-party infrastructure components

Page 27: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Third-party infrastructure components

ISP Router BISP Router A ISP Router C

Page 28: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Third-party infrastructure components

Power Source

ISP Router BISP Router A ISP Router C

Page 29: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Third-party infrastructure components

Power Source

ISP Router BISP Router A ISP Router C

Page 30: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Third-party infrastructure components

Power Source

ISP Router BISP Router A ISP Router C

Page 31: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Video App

Cloud Provider A Cloud Provider B

Third-party infrastructure components

Power Source

ISP Router BISP Router A ISP Router C

Cloud providers do not usually share

information about their dependencies

Page 32: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.

Existing Efforts

Page 33: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Solving the problem after outage occurs.

• We want to prevent the problem before the outage occurs.

Existing Efforts

• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.

Page 34: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Solving the problem after outage occurs.

• Prevent correlated failures before outage occurs.

Existing Efforts

• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.

Page 35: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Solving the problem after outage occurs.

• Prevent correlated failures before outage occurs.

Existing Efforts

Independence-as-a-Service(INDaaS)

• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.

Page 36: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

INDaaS Workflow

Service Provider, Alice

A Given Redundancy Configuration

Page 37: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

INDaaS Workflow

Two-Way Redundancy Configuration

Service Provider, Alice

DependencyData Source1

DependencyData Source2

Page 38: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

INDaaS Workflow

Service Provider, Alice

DependencyData Source1

DependencyData Source2

Two-Way Redundancy Configuration

Independence of this two-way redundancy ?

Page 39: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1: Specification Submission

DependencyData Source1

DependencyData Source2

INDaaS Workflow

Service Provider, Alice

Page 40: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

DependencyData Source1

DependencyData Source2

Step2: Dependency data collection

Step2: Dependency data collection

INDaaS Workflow

Service Provider, Alice

Page 41: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

DependencyData Source1

DependencyData Source2

Step3: Independence Evaluation

INDaaS Workflow

Service Provider, Alice

Page 42: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Relative Independence is 0.3

Step2 Step2

Step3

DependencyData Source1

DependencyData Source2

INDaaS Workflow

Service Provider, Alice

Page 43: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

Multiple Cloud Providers

Step3

DependencyData Source1

DependencyData Source2

Service Provider, Alice

Page 44: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2Unwilling to share the

dependency dataUnwilling to share the

dependency data

Step3

Multiple Cloud Providers

✘ ✘

Data Source1(Cloud1)

Data Source2(Cloud2)

Service Provider, Alice

Page 45: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

Private Independence Evaluation

Data Source1(Cloud1)

Data Source2(Cloud2)

Step3:Private auditing

Service Provider, Alice

Page 46: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

Step3Step4 St

ep4

Service Provider, Alice

Private Independence Evaluation

Data Source1(Cloud1)

Data Source2(Cloud2)

Page 47: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

Step3

Service Provider, Alice

Private Independence Evaluation

Only know my information

Only know my information

Data Source1(Cloud1)

Data Source2(Cloud2)

Step4 Step

4

Page 48: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

Step3

Only the relative independence

Service Provider, Alice

Private Independence Evaluation

Data Source1(Cloud1)

Data Source2(Cloud2)

Step4 Step

4

Page 49: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1

Step2 Step2

Step3

Step5

Service Provider, Alice

Private Independence Evaluation

Data Source1(Cloud1)

Data Source2(Cloud2)

Step4 Step

4

Page 50: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Technical Challenges

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

Page 51: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #1: Dependency collections- Solution: Reusing existing tools

Technical Challenges

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

Page 52: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #1: Dependency collections- Solution: Reusing existing tools

Technical Challenges

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

Page 53: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #1: Dependency collections- Solution: Reusing existing tools

Technical Challenges

• #3: Efficient auditing- Solution: Failure sampling algorithm

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

Page 54: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Technical Challenges

• #4: Private independence audit- Solution: Private Jaccard similarity

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #2: Dependency representation- Solution: Fault graphs

Page 55: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1 Step5

Step3

• #4: Private independence audit- Solution: Private Jaccard similarity

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #2: Dependency representation- Solution: Fault graphs

Page 56: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS

Step1 Step5

Step3

• #4: Private independence audit- Solution: Private Jaccard similarity

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #2: Dependency representation- Solution: Fault graphs

Page 57: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Dependency Data Collections

Type Dependency Expression

Network <src=”S” dst=”D” route=”x,y,z”/>

Hardware <hw=”H” type=”T” dep=”x”/>

Software <pgm=”S” hw=”H” dep=”x,y,z”/>

Our defined format

• Reuse existing data collection tools: - Convert the outputs to uniform format. - Three types of format: NET, HW and SW.

Please see our paper for more details

Page 58: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 59: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 60: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Example Redundancy

Page 61: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Example Redundancy

Page 62: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

SW

HW

NET

Example Redundancy

Page 63: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Building Fault Graph Top-to-Bottom

SW

HW

NET

Page 64: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Redundancy configuration fails

Step1: Root Node

Page 65: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Server 2 failsServer 1 fails

Redundancy configuration fails

Step2: Server Nodes

Page 66: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

AND gate: all the sublayer nodes fail, the upper layer node fails

Server 2 failsServer 1 fails

Step2: Server NodesRedundancy configuration fails

Page 67: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Net fails

+" +"

HW fails SW fails SW fails

Server 2 failsServer 1 fails

Net fails HW fails

Redundancy configuration fails

Step3: Dependency Nodes

Page 68: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

OR gate: one of the sublayer nodes fails, the upper layer node fails

Net fails

+" +"

HW fails SW fails SW fails

Server 2 failsServer 1 fails

Net fails HW fails

Step3: Dependency NodesRedundancy configuration fails

Page 69: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Net fails

+" +"

HW fails

Disk2CPU2

+"

SW fails SW fails

Server 2 failsServer 1 fails

Disk1CPU1

+"

Net fails HW fails

Step4: Hardware DependencyRedundancy configuration fails

Page 70: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Net fails

+" +"

ToR1 Core2Core1

HW fails

+" +"

Disk2CPU2

+"

Path1 Path2

SW fails SW fails

Server 2 failsServer 1 fails

Disk1CPU1

+"

Net fails HW fails

Step5: Network DependencyRedundancy configuration fails

Page 71: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Net fails

+" +"

ToR1 Core2Core1

HW fails

libc6 libccllibsvnl

+" +"

Disk2CPU2

+"

Path1 Path2

+"

Riak

+"

Query

+"

SW fails SW fails

Server 2 failsServer 1 fails

Disk1CPU1

+"

Net fails HW fails

Step6: Software DependencyRedundancy configuration fails

Page 72: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 73: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 74: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm

Efficient Auditing

Page 75: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm

Efficient Auditing

Page 76: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Minimal Fault Set Algorithm

• Traditional algorithm in safety engineering- Exponential complexity (NP-hard)

• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours

Page 77: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• We propose efficient failure sampling algorithm.

Minimal Fault Set Algorithm

• Traditional algorithm in safety engineer- Exponential complexity (NP-hard)

• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours

Page 78: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Failure Sampling Algorithm

Page 79: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

1 or 0 1 or 0 1 or 0

Failure Sampling Algorithm

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 80: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

1 or 0 ?

1 or 0 1 or 0 1 or 0

Failure Sampling Algorithm

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 81: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Fault Sets

1 or 0 1 or 0 1 or 0

Failure Sampling Algorithm

1 or 0 ?Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 82: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

The 1st Sampling Round

Fault Sets

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 83: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

1 or 0 1 or 0 1 or 0

The 1st Sampling Round

Fault Sets

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 84: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

✔✘ ✘

The 1st Sampling Round

Fault Sets

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 85: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘ ✘

The 1st Sampling Round

Fault Sets

✘ ✘

Page 86: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔✘ ✘Fault Sets

{Server1’s HW, Server2’s HW}

The 1st Sampling Round

✘ ✘

Page 87: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 88: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

1 or 0 1 or 0 1 or 0

Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 89: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

✔ ✘✔Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 90: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

✔ ✘✔

Fault Sets

{Server1’s HW, Server2’s HW}

The 2nd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✔ ✘

Page 91: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Fault Sets

{Server1’s HW, Server2’s HW}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 92: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

✔✘✔Fault Sets

{Server1’s HW, Server2’s HW}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

Page 93: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

✔✘✔Fault Sets

{Server1’s HW, Server2’s HW}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✘ ✘

Page 94: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

✔✘✔Fault Sets

{Server1’s HW, Server2’s HW}

{Switch1}

The 3rd Sampling Round

Redundancy configuration fails

+" +"

Switch1 fails

Server 2 failsServer 1 fails

Server2’s HW failsServer1’s HW fails

✘ ✘

Page 95: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Fault Sets{Server1’s HW, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

... ...

After Many (e.g., 107) Rounds

Page 96: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Fault Sets{Server1’s HW, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

{Switch1}

{Switch1, Server2’s HW}

... ...

Size-Based Ranking

Page 97: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Fault Sets{Switch1}

{Switch1}

{Switch1, Server2’s HW}

{Switch1, Server2’s HW}

{Server1’s HW, Server2’s HW}

... ...

Size-Based Ranking

Page 98: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• Multiple equations for option: - summation of sizes- weighted average of sizes

Independence Evaluation

Fault Sets{Switch1}

{Switch1}

{Switch1, Server2’s HW}

{Switch1, Server2’s HW}

{Server1’s HW, Server2’s HW}

... ...

Page 99: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 100: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 101: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

Data Source1(Cloud1)

Data Source2(Cloud2)

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 102: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS AgentService Provider

Page 103: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 104: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 105: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

Trusted Third-Party

ISP A Power B ISP B Power CPower A

Service Provider

Page 106: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Service Provider

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

Trusted Third-Party

Cloud providers are reluctant to share this information!

ISP A Power B ISP B Power CPower A

Page 107: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

Secure Multiparty Computation (SMPC)

SMPC

ISP A Power B ISP B Power CPower A

Service Provider

Page 108: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

Secure Multiparty Computation (SMPC)

SMPCSMPC is hard to scale![Xiao et al. CCSW’13]

ISP A Power B ISP B Power CPower A

Page 109: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 110: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 111: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Evaluating independence by the dataset similarity between clouds

INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 112: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 113: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS AgentService Provider

Page 114: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS Agent

Cloud A Cloud B Cloud C

App Provider Auditor

ISP APower APower B

ISP BPower APower B

ISP BPower C

Using Jaccard similarity to evaluate the independence of each redundancy configuration.

ISP A Power B ISP B Power CPower A

Page 115: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

INDaaS Agent

Cloud A Cloud B Cloud C

App Provider

ISP APower APower B

ISP BPower APower B

ISP BPower C

S1 S2 ... ... Sn

S1 S2 ... ... Sn

J(S1, S2, ..., Sn) = | |

| |

ISP A Power B ISP B Power CPower A

Page 116: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS AgentService Provider

Page 117: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

=2=4

Service Provider

Page 118: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

=2=4

J = 2/4

Service Provider

Page 119: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

INDaaS Agent

Deployment SimCloud A&B 0.5

ISP A Power B ISP B Power CPower A

=2=4

J = 2/4

Service Provider

Page 120: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

INDaaS Agent

Deployment SimCloud A&B 0.5Cloud B&C 0.25

ISP A Power B ISP B Power CPower A

=1=4

J = 1/4

Service Provider

Page 121: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0

=0 =5 J = 0/5,,

Service Provider

Page 122: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0

=0 =5 ,,

0  means  fully  independentService Provider

J = 0/5

Page 123: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0

=0 =5 J = 0/5,,

Service Provider

Page 124: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5

=0 =5 J = 0/5,,

Service Provider

Page 125: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5

ISP A Power B ISP B Power CPower A

Service Provider

Page 126: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

S1 S2 ... ... Sn

S1 S2 ... ... Sn

J(S1, S2, ..., Sn) = | |

| |

P-SOP [Vaidya et al. JCS05]

• We apply Private Set Operation Protocol (P-SOP): - Private set intersection cardinality.- Private set union cardinality.

Page 127: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

P-SOP [Vaidya et al. JCS05]

• Allow k parties to compute both intersection and union cardinalities without learning other information.

Page 128: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

11

3

10

1

5

20

3

7

3

P-SOP [Vaidya et al. JCS05]

• Allow k parties to compute both intersection and union cardinalities without learning other information.

Page 129: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

11

3

10

1

5

20

3

7

3

P-SOP

P-SOP [Vaidya et al. JCS05]

• Allow k parties to compute both intersection and union cardinalities without learning other information.

Page 130: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

11

3

10

1

5

20

3

7

3

P-SOP

=1, =7=1, =7

=1, =7

P-SOP [Vaidya et al. JCS05]

• Allow k parties to compute both intersection and union cardinalities without learning other information.

Page 131: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

11

3

10

1

5

20

3

7

3

Protocol

But I do not know which elements are overlapping/union.

P-SOP

But I do not know which elements are overlapping/union.

But I do not know which elements are overlapping/union.

P-SOP [Vaidya et al. JCS05]

• Allow k parties to compute both intersection and union cardinalities without learning other information.

Page 132: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

37

15

203Kd

Kf

Ke

113

10

P-SOP [Vaidya et al. JCS05]

Page 133: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

37

15

203Kd

Kf

Ke

113

10

P-SOP [Vaidya et al. JCS05]

Page 134: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

37

15

203Kd

Kf

Ke

113

10

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

Page 135: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

37

15

203Kd

Kf

Ke

113

10

Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))

Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))

Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))

Ed(Ef(Ee(20)))

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

Page 136: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Kd

Kf

Ke

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

113

10

Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))

37

Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))

15

203

Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))

Ed(Ef(Ee(20)))

Page 137: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Kd

Kf

Ke

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

113

10

Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))

37

Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))

15

203

Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))

Ed(Ef(Ee(20)))

Page 138: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Kd

Kf

Ke

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

113

10

Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))

37

Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))

15

203

Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))

Ed(Ef(Ee(20)))

Ef(Ee(Ed(3)))

Page 139: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Kd

Kf

Ke

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

11Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))

3Ee(Ed(Ef(3)))

Ee(Ed(Ef(7)))

1Ed(Ef(Ee(3)))

Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))

Ed(Ef(Ee(20)))

Ef(Ee(Ed(3)))

Page 140: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Kd

Kf

Ke

P-SOP [Vaidya et al. JCS05]

Each party maintains a commutative encryption key

Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))

Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))Ee(Ed(Ef(7)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))

Ed(Ef(Ee(20)))

Ef(Ee(Ed(3)))

7

Page 141: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Private Independence Evaluation

Page 142: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 143: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 144: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

ISP A Power B ISP B Power CPower A

Service Provider

Page 145: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS AgentService Provider

Page 146: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

P-SOP

Service Provider

Page 147: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

=2

ISP A Power B ISP B Power CPower A

INDaaS AgentService Provider

Page 148: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

=2=4

ISP A Power B ISP B Power CPower A

INDaaS AgentService Provider

Page 149: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP BPower C

=2=4

J = 2/4

ISP A Power B ISP B Power CPower A

INDaaS Agent

ISP APower APower B

ISP BPower APower B

Service Provider

Page 150: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

=2=4

J = 2/4

ISP A Power B ISP B Power CPower A

INDaaS Agent

Deployment SimCloud A&B 0.5

Service Provider

Page 151: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

P-SOP

Deployment SimCloud A&B 0.5

Service Provider

Page 152: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP A Power B ISP B Power CPower A

INDaaS Agent

ISP BPower APower B

ISP BPower C

=1=4

J = 1/4

Deployment SimCloud A&B 0.5

Service Provider

Page 153: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP A Power B ISP B Power CPower A

INDaaS Agent

ISP BPower APower B

ISP BPower C

=1=4

J = 1/4

Deployment SimCloud A&B 0.5Cloud B&C 0.25

Service Provider

Page 154: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

P-SOP

Deployment SimCloud A&B 0.5Cloud B&C 0.25

Service Provider

Page 155: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

=0 =5 J = 0/5,,

Deployment SimCloud A&B 0.5Cloud B&C 0.25

Service Provider

Page 156: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

=0 =5 J = 0/5,,

Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0

Service Provider

Page 157: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

=0 =5 J = 0/5,,

Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0

Service Provider

Page 158: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

ISP APower APower B

ISP BPower C

ISP A Power B ISP B Power CPower A

INDaaS Agent

=0 =5 J = 0/5,,

Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5

Service Provider

Page 159: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS Agent

Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5

ISP A Power B ISP B Power CPower A

Service Provider

Page 160: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Cloud A Cloud B Cloud C

INDaaS AgentDeployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5

ISP A Power B ISP B Power CPower A

Service Provider

Page 161: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• #2: Dependency representation- Solution: Fault graphs

• #3: Efficient auditing- Solution: Failure sampling algorithm

• #1: Dependency collections- Solution: Reusing existing tools

RoadMap

INDaaS Agent

Step1 Step5

Step3

Step2

Step

4

Step2

Step4

DependencyData Source1

DependencyData Source2

• #4: Private independence audit- Solution: Private Jaccard similarity

Page 162: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Evaluation

Page 163: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Evaluation

• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol

Page 164: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Evaluation

• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol

Page 165: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Three Case Studies

• Common network dependency

• Common hardware dependency

• Common software dependency

Please see our paper for more details

Page 166: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Evaluation

• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol

Page 167: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

• We evaluate efficiency/accuracy tradeoff.• We generate topology based on fat tree model.• We also bu

Tradeoff Evaluation

Page 168: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Topology A Topology B Topology C

# of Core Routers 64 144 576

# of Agg Switches 128 288 1,152

# of ToR Switches 128 288 1,152

# of Servers 1,024 3,456 27,648

Total # of devices 1,344 4,176 30,528

Tradeoff Evaluation

Page 169: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Tradeoff Evaluation

Topology A Topology B Topology C

# of Core Routers 64 144 576

# of Agg Switches 128 288 1,152

# of ToR Switches 128 288 1,152

# of Servers 1,024 3,456 27,648

Total # of devices 1,344 4,176 30,528

Page 170: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Topology C: 30,528 Devices

30

40

50

60

70

80

90

100

1 2 4 8 16 32 64 128 256 512 2048% c

ritic

al fa

ult s

ets

dete

cted

Computational time (minutes)

Minimal Fault Set AlgorithmFailure Sampling Algorithm

Failure Sampling Algorithm (106)

Failure Sampling Algorithm (104)

Minimal Fault Set Algorithm

Page 171: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Topology C: 30,528 Devices

30

40

50

60

70

80

90

100

1 2 4 8 16 32 64 128 256 512 2048% c

ritic

al fa

ult s

ets

dete

cted

Computational time (minutes)

Minimal Fault Set AlgorithmFailure Sampling Algorithm

Failure Sampling Algorithm (106)

Failure Sampling Algorithm (104)

Minimal Fault Set Algorithm

Page 172: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Evaluation

• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol

Page 173: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

What we evaluate?

• Kissner and Song (KS) protocol for comparison.• Bandwidth overhead of P-SOP and KS.• Computational overhead of P-SOP and KS.

Page 174: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Bandwidth Overhead

0

50

100

150

200

1000 10000 100000

Tota

l tra

ffic

sent

(MB)

Number of elements in each provider’s dataset

KS (4)P-SOP (4)

KS (3)P-SOP (3)

KS (2)P-SOP (2)

Page 175: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Bandwidth Overhead

0

50

100

150

200

1000 10000 100000

Tota

l tra

ffic

sent

(MB)

Number of elements in each provider’s dataset

KS (4)P-SOP (4)

KS (3)P-SOP (3)

KS (2)P-SOP (2)

Page 176: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Bandwidth Overhead

0

50

100

150

200

1000 10000 100000

Tota

l tra

ffic

sent

(MB)

Number of elements in each provider’s dataset

KS (4)P-SOP (4)

KS (3)P-SOP (3)

KS (2)P-SOP (2)

~80MB

Page 177: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Computational Overhead

1

10

100

1000

10000

100000

1e+06

1000 10000 100000

Com

puta

tiona

l tim

e (s

econ

ds)

Number of elements in each provider’s dataset

KS (4)KS (3)KS (2)

P-SOP (4)P-SOP (3)P-SOP (2)

~103 sec

Page 178: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Conclusions

• INDaaS is a first step towards reliable clouds: - Dependency collections- Dependency representations- Efficient auditing- Private independence auditing

• We evaluated INDaas with three realistic case studies and large-scale simulations.

Page 179: Heading Off Correlated Failures through Independence-as-a ... · Heading Off Correlated Failures through Independence-as-a-Service Ennan Zhai1 Ruichuan Chen2, David Isaac Wolinsky1,

Thanks, questions?

• INDaaS: Heading off correlated failures in clouds

• Find out more at:- http://dedis.cs.yale.edu/cloud/

• We will be at the poster session tonight.