Top Banner
Storage Fabric CS6453
31

Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Sep 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Storage FabricCS6453

Page 2: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Summary

Last week: NVRAM is going to change the way we thing about storage.

Today: Challenges of storage layers (SSDs, HDs) that are created from massive

data.

Slowdowns in HDs and SSDs.

Enforcing policies for IO operations in Cloud architectures.

Page 3: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Background: Storage for Big Data

One disk is not enough to handle massive amounts of data.

Last time: Efficient datacenter networks using large number of cheap

commodity switches.

Solution: Efficient IO performance using large number of commodity storage

devices.

Page 4: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Background: RAIDS

Achieves Nx performance where

N is the number of Disks.

Is this for free?

When N becomes large then the

probability of Disk failures

becomes large as well.

RAID 0 does not tolerate

failures.

Page 5: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Background: RAIDS

Achieves (K-1)-fault tolerance

with Kx Disks.

Is this for free?

There are Kx more disks (e.g. if

you want to tolerate 1 failure

you need 2x more Disks than

RAID 0).

RAID 1 does not utilize resources

in an efficient way.

Page 6: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Background: Erasure Code

Achieves K-fault tolerance with

N+K Disks.

Efficient utilization of Disks (not

as great as RAID 0).

Fault-Tolerance (not as great as

RAID 1).

Is this for free?

Reconstruction Cost : # of Disks

needed from a read in case of

failure(s).

RAID 6 has a Reconstruction Cost

of 3.

Page 7: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Modern Erasure Code Techniques

Erasure Coding in Windows Azure Storage [Huang, 2012]

Exploit Point:

𝑃𝑟𝑜𝑏 1 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 ≫ 𝑃𝑟𝑜𝑏[2 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠 𝑜𝑟 𝑚𝑜𝑟𝑒]

Solution: Construct Erasure Code Technique that has low reconstruction cost for 1

failure.

1.33x more storage overhead (relatively low).

Tolerate up to 3 failures in 16 storage devices.

Reconstruction cost of 6 for 1 failure and 12 for 2+ failures.

Page 8: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

The Tail at Store: Problem

We have seen how we treat failures with reconstruction. What about

slowdowns in HDs (or SSDs)?

A slowdown of a disk (no failures) might have significant impact at overall

performance.

Questions:

Do HDs or SSDs exhibit transient slowdowns?

Are slowdowns of disks frequent enough to affect the overall performance?

What causes slowdowns?

How do we deal with slowdowns?

Page 9: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

The Tail at Store: Study

RAID

D P Q

Disk SSD

#RAID groups 38,029 572

#Data drives per group 3-26 3-22

#Data drives 458,482 4,069

Total drive hours 857,183,442 7,481,055

Total RAID hours 72,046,373 1,072,690

D … D

Page 10: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

0.9

0.92

0.94

0.96

0.98

1

1x 2x 4x 8xSlowdown

CDF of Slowdown (Disk)

SiT

The Tail at Store: Slowdowns?

Hourly average I/O latency per drive 𝐿

Slowdown:

𝑆 =𝐿

𝐿𝑚𝑒𝑑𝑖𝑎𝑛

Tail:T = 𝑆𝑚𝑎𝑥

Slow Disks: S ≥ 2

𝑆 ≥ 2 at 99.8 percentile

𝑆 ≥ 1.5 at 99.3 percentile

𝑇 ≥ 2 at 97.8 percentile

𝑇 ≥ 1.5 at 95.2 percentile

SSDs exhibit even more slowdowns

Page 11: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

0

0.2

0.4

0.6

0.8

1

1 2 4 8 16 32 64 128 256

Slowdown Interval (Hours)

CDF of Slowdown Interval

DiskSSD

The Tail at Store: Duration?

Slowdowns are transient

40% of HD slowdowns ≥2

hours

12% of HD slowdowns ≥ 10hours

Many slowdowns happen in

consecutive hours (last

more)

Page 12: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35

Inter-Arrival between Slowdowns (Hours)

CDF of Slowdown Inter-Arrival Period

DiskSSD

The Tail at Store: Correlation between

slowdowns in the same storage?

90% of Disk slowdown are within 24 hours of another

slowdown of the same Disk.

> 80% of SSDs slowdown are within 24 hours of

another slowdown of the

same SSD.

Slowdowns happen in the

same Disks relatively close

to each other.

Page 13: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

0

0.2

0.4

0.6

0.8

1

0.5x 1x 2x 4xRate Imbalance

CDF of RI within Si >= 2

DiskSSD

The Tail at Store: Causes?

𝑅𝐼 =𝐼/𝑂𝑅𝑎𝑡𝑒

𝐼/𝑂𝑅𝑎𝑡𝑒𝑚𝑒𝑑𝑖𝑎𝑛

Rate imbalance does not

seem to be the main cause

of slowdowns for slow

Disks.

Page 14: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

0

0.2

0.4

0.6

0.8

1

0.5x 1x 2x 4xSize Imbalance

CDF of ZI within Si >= 2

DiskSSD

The Tail at Store: Causes?

𝑆𝐼 =𝐼/𝑂𝑆𝑖𝑧𝑒

𝐼/𝑂𝑆𝑖𝑧𝑒𝑚𝑒𝑑𝑖𝑎𝑛

Size imbalance does not

seem to be the main cause

of slowdowns for slow

Disks.

Page 15: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

0.95

0.96

0.97

0.98

0.99

1

1x 2x 3x 4x 5x

Slowdown

CDF of Slowdown vs. Drive Age (Disk)

91234576

108

The Tail at Store: Causes?

Disk age seems to have

some correlation but it is

not strongly correlated.

Page 16: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

The Tail at Store: Causes?

No correlation of slowdowns to time of the day (0am – 24pm)

No explicit drive events around slow hours

Unplugging disks and plugging them back does not particularly help

SSD vendors have significant differences between them

Page 17: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

The Tail at Store: Solutions

Create Tail-Tolerant RAIDS.

Treat slow disks as failed disks.

Reactive

Detect slow Disks: take a lot of time to answer (>2x from other Disks).

Reconstruct answer from other disks using RAID redundancy if Disk is slow.

Latency is going to optimally be around 3x compared to a read from an average Disk.

Proactive

Always use RAID redundancy for additional read.

Take fastest answer.

Uses much more I/O bandwidth.

Adaptive

Combination of both approaches taking into account the findings.

Use reactive approach until a slowdown is detected.

After this use proactive approach since slowdowns are repetitive and last many hours.

Page 18: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

The Tail at Store: Conclusions

More research on possible causes for Disk and SSD slowdowns is required

Need Tail-Tolerant RAIDS to reduce the overhead from slowdowns

Since reconstruction of data is the way to deal with slowdowns and if

𝑃𝑟𝑜𝑏 1 𝑠𝑙𝑜𝑤𝑑𝑜𝑤𝑛 ≫ 𝑃𝑟𝑜𝑏[2 𝑠𝑙𝑜𝑤𝑑𝑜𝑤𝑛 𝑜𝑟 𝑚𝑜𝑟𝑒]

the Azure paper [Huang, 2012] becomes more relevant.

Page 19: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Background: Cloud Storage

General Purpose Applications

Separate VM-VM connections from VM-

Storage connections

Storage is virtualized

Many layers from application to actual storage

Resources are shared across multiple tenants

Page 20: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Problem

Cannot support end-to-end policies (e.g.

minimum IO bandwidth from application to

storage)

Applications do not have any way of

expressing their storage policies

Sharing infrastructure where aggressive

applications tend to get more IO bandwidth

Page 21: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Challenges

No existing enforcing mechanism for

controlling IO rates

Aggregate performance policies

Non-performance policies

Admission control

Dynamic enforcement

Support for unmodified applications and VMs

Page 22: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Do it like SDNs

Page 23: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Supported policies

<VM, Destination> -> Bandwidth (static, compute side)

<VM, Destination> -> Min Bandwidth (dynamic, compute side)

<VM, Destination> -> Sanitize (static, compute or storage side)

<VM, Destination> -> Priority Level (static, compute and storage side)

<Set of VMs, Set of Destinations> -> Bandwidth (dynamic, compute side)

Page 24: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Example 1: Interface

Policies:

<VM1,Server X> -> B1

<VM2,Server X> -> B2

Controller to SMBc of physical server containing VM1 and VM2

createQueueRule(<VM1,Server X>,Q1)

createQueueRule(<VM2,Server X>,Q2)

createQueueRule(<*,*>,Q0)

configureQueueService(Q1, <B1, low, S>), where S is the size of the queue

configureQueueService(Q2, <B2, low, S>)

configureQueueService(Q0, <C-B1-B2, low, S>), where C is the Capacity of Server X.

Page 25: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Example 2: Max-Min Fairness

Policies:

<VM1-VM3,Server X> -> 900 Mbps

Demand:

VM1 -> 600 Mbps

VM2 -> 400 Mbps

VM3 -> 200 Mbps

Result:

VM1 -> 350 Mbps

VM2 -> 350 Mbps

VM3 -> 200 Mbps

Page 26: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Evaluation of Policy

Enforcement

Windows-based IO stack

10 hypervisors with 12 VMs each (120 VMs total)

4 tenants using 30 VMs each (3 VMs per hypervisor for each tenant)

1 Storage Server

6.4 Gbps IO Bandwidth

1 Controller

1s interval between dynamic enforcements of policies

Page 27: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Evaluation of Policy

Enforcement

Tenant Policy

Index {VM 1 -30, X} -> Min 800 Mbps

Data {VM 31 - 60, X} -> Min 800 Mbps

Message {VM 61 -90, X} -> Min 2500 Mbps

Log {VM 91 -120, X} -> Min 1500 Mbps

Page 28: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Evaluation of Policy Enforcement

Page 29: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Evaluation of Overhead

Page 30: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

IOFlow: Conclusions

Contributions

First Software Defined Storage approach

Fine-grain control over the IO operations in Cloud

Limitations

Network or other resources might be the bottleneck

Need to care about locating the VMs (spatial locality) close to data

Flat Datacenter Storage [Nightingale, 2012] provides solutions for this problem

Guaranteed latencies are not expressed by current policies

Best effort approach by setting priority

Page 31: Storage Fabric · 2017. 5. 7. · Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑃 1 𝑖 𝑢 ≫𝑃 [2 𝑖 𝑢 ] Solution: Construct Erasure Code Technique

Specialized Storage Architectures

HDFS [Shvachko, 2009] and GFS [Ghemawat, 2003] work well for Hadoop

MapReduce applications.

Facebook’s Photo Storage [Beaver, 2010] exploits workload characteristics to

design and implement better storage system.