Top Banner
LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin Li, Karan Mehra, Shiv Rajpal, Surendra Verma, Sergey Yekhanin
36

LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

Apr 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC Erasure Coding

in Windows Storage Spaces

Cheng Huang

Microsoft CorporationJoint work with Parikshit Gopalan, Erik Hortsch, Jin Li, Karan Mehra, Shiv Rajpal,

Surendra Verma, Sergey Yekhanin

Page 2: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Outline

Storage Spaces Overview

Resiliency and Availability Mechanics

LRC Erasure Coding

Cost and Performance Benefits

Page 3: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Windows Storage Spaces Overview

Page 4: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Storage Spaces Overview

Storage Spaces: storage virtualization platform in

Windows 8 and Windows Server 2012

Greatly enhanced in Windows Server 2012 R2 and

Windows 8.1

Flexible, resilient, scalable and highly available

storage for both consumers and enterprises

Page 5: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Storage Pool: a collection of physical drives

Storage Space: virtual drive created from free space

in a storage pool

Home Example

Page 6: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Thin Provision: actual capacity is not consumed by

the space until used

Thin Provision

Page 7: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Multiple spaces from the same pool

Each space chooses its own resiliency scheme

Flexibility

Page 8: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Mirror vs. Parity Resiliency

a=2

b=3

a=2

b=3 b=3

a=2

a+b=5

mirror

(replication)

parity

(erasure coding)

a=2

b=3

a=2

8

• storage 2x 1.5x

• reconstruction 1 2

reconstruction

Page 9: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Enterprise Example

From Single Server to Cluster of Servers with Multiple JBOD Enclosures

Page 10: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Clustered Storage Spaces

Use Storage Spaces together with Failover Clustering

feature in Windows Server

Create storage pool across multiple JBOD enclosures

Read/Write storage space from any server in the cluster

Automatic failover during failures of hard drive, JBOD

enclosure and server

Page 11: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Resiliency & Availability Mechanics

Page 12: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Capacity Allocation

Storage Space allocates physical capacity in “slabs”

slab size = 256MB

Mirror Space

Each slab is mirrored on 2 separate drives

Parity Space

Slabs across multiple drives form erasure coding groups

Page 13: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Parallel Failure Rebuild

Page 14: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Parallel Failure Rebuild

6

9

10

13

3

3

5

8

11

1

7

9

11

15

2

5

10

12

15

4

6

7

13

14

1

4

8

12

14

2

drive 1 drive 2 drive 3 drive 4 drive 5 drive 6

slabs:

rebuild uses all remaining drives as both source and destination

Page 15: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Journaling

Space is mutable slabs can be overwritten

Integrity of Space against power loss or drive failure

is protected by journaling

Journal mirrored

2-way or 3-way based on resiliency scheme

Incoming writes journaled before applied to target slabs

SSD as journal most effective in absorbing random writes

Page 16: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Write IO Example

On critical path

Incoming write sent to mirrored journal

Flush sent to journal

Write completed and acked

In background

De-stage from journal to target slabs

Page 17: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Write IO Example

De-stage overwrite IO in Parity Space

overwite IO changes a=2 a’=5

a=2 b=3 a+b=5

a'=5 a'+b=8

Read

new data (a’=5) from journal

old data (a=2) from disk

old parity (a+b=5) from disk

Calculate new parity

(a’-a) + (a+b) = 8

Flush new parity to journal

Flush new data and parity to disk

Page 18: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Handling Data Corruption

Storage Spaces even more powerful in handling

data corruption together with ReFS

ReFS keeps checksum for every block and srubs

data on rest in background

Storage Spaces automatically repair slabs when

data corruption is detected

Page 19: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC Erasure Coding

Page 20: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Classic Erasure Codes

Reed-Solomon (RS) codes most widely used

basis of RAID

Example – RAID64+2

2 parities calculated from 4 data blocks

tolerates up to 2 failures

a b qc d p

Page 21: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Why New Erasure Codes?

Classic erasure codes were designed and optimized

for communication, not storage.

Naively applying classic erasure codes in storage

system is okay, but missing enormous opportunities!

Page 22: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Opportunity I – Space Saving

Storage systems are often hierarchical, bringing

multi-level durability requirements

Consider a Storage Pool with 6 JBODs

How to tolerate failures of 1 JBOD + 1 HDD?

Note: no need to tolerate 2 JBOD failures

Page 23: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Opportunity I – Space Saving

How to tolerate failures of 1 JBOD + 1 HDD?

RAID64+2 is an option, but it tolerates 2 JBOD failures

Excessive durability storage space waste!

New erasure codes designed targeting multi-level

durability requirements can reduce storage space

Page 24: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Opportunity II – Performance Gain

Failures do happen, but storage systems continue

to operate

Missing data need to be reconstructed to

serve read IO targeting missing data

bring resiliency back to desired level

a b qc d p

Page 25: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Opportunity II – Performance Gain

Reconstruction bears IO cost

In classic erasure codes, reconstruction cost is the same

despite of the number of failures

RAID64+2: reconstruction of 1 and 2 failures both cost 4 IOs

In storage systems, single failure way more common than

multiple failures

New erasure codes optimized for single failure can

reduce reconstruction cost for common case

Page 26: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC Erasure Coding

LRC: erasure codes optimized for storage

designed targeting multi-level durability requirements

space saving over classic erasure codes

optimized for single failure reconstruction

performance gain over classic erasure codes

introduces parity locality

LRC stands for Local Reconstruction Codes

Page 27: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC Erasure Coding Example

x1 x2 x3 x4 px

y1 y2 qy3 y4 py

z1 z2 z3 z4 pz

local parity global parity

LRC specified by # of data, local parity and global parity

LRC12+3+1: 12 data, 3 local parities and 1 global parity

Page 28: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC Erasure Coding Example

x1 x2 x3 x4 px

y1 y2 qy3 y4 py

z1 z2 z3 z4 pz

local parity global parity

LRC specified by # of data, local parity and global parity

LRC12+3+1: 12 data, 3 local parities and 1 global parity

local reconstruction: x1 = px – (x2 + x3 + x4)

Page 29: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Cost and Performance Benefits

Page 30: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Space Saving over RAID

x1 x2 x3 x4 px

y1 y2 qy3 y4 py

z1 z2 z3 z4 pz

local parity global parityJBOD enclosure

storage overhead: 1.33x (LRC12+3+1) < 1.5x (RAID64+2)

But, does LRC indeed tolerate failures of 1 JBOD + 1 HDD?

Page 31: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Space Saving over RAID

x1 x2 x3 x4 px

y1 y2 qy3 y4 py

z1 z2 z3 z4 pz

local parity global parityJBOD enclosure

y3 and z3 are reconstructed using local parity py and pz

x3 and x4 are then reconstructed using px and global parity q

Page 32: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Performance Gain over RAID

x1 x2 x3 px

y1 y2

q

y3 py

local parity global parityJBOD enclosure

LRC6+2+1: 6 data, 2 local parities and 1 global parity

storage overhead: 1.5x (LRC6+2+1) = 1.5x (RAID64+2)

reconstruction IO: 3 (LRC6+2+1) < 4 (RAID64+2)

Page 33: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC vs. RAID Summary

LRC offers better trade-offs for storage

same storage overhead fewer reconstruction IOs

same reconstruction IO less storage overhead

RAID64+2 LRC12+3+1 LRC6+2+1

storage overhead 1.5x 1.33x 1.5x

reconstruction IO 4 4 3

tolerating failure of 1 JBOD + 1 HDD

Page 34: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

LRC vs. RAID Measurements

LRC offers better trade-offs for storage

same storage overhead 27% more IOPS

same reconstruction IO 11% less storage overhead

RAID64+2 LRC12+3+1 LRC6+2+1

storage overhead 1.5x 1.33x 1.5x

reconstruction IO 4 4 3

reconstruction read

(IOPS)1333 1328 1695

measures from a 16-drive deployment

Page 35: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Blazingly Fast Computation

Page 36: LRC in Windows Storage Spaces - microsoft.com · LRC Erasure Coding in Windows Storage Spaces Cheng Huang Microsoft Corporation Joint work with Parikshit Gopalan, Erik Hortsch, Jin

2013 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Summary

LRC: erasure codes optimized for storage

designed targeting multi-level durability requirements

optimized for single failure reconstruction

LRC offers better space and performance trade-offs than

classic erasure codes (RAID)

Available now in Windows 8.1 and Windows Server 2012 R2