Top Banner
StarFish: highly- available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林林林
27

StarFish : highly-available block storage

Feb 23, 2016

Download

Documents

lizina

StarFish : highly-available block storage. Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋. Introduction. Important data need to be protected . - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: StarFish : highly-available block storage

StarFish: highly-available block storageEran GabberJeff FellinMichael FlasterFengrui GuBruce HillyerWee Teck Ng Banu O¨ zden Elizabeth Shriver

2003 USENIX Annual Technical Conference

Presenter: D00922019 林敬棋

Page 2: StarFish : highly-available block storage

IntroductionImportant data need to be

protected.◦Making replicas.

Replication on remote sites◦Reduce the amount of data lost in

failure.◦Decrease the time required to

recover from catastrophic site failure.

Page 3: StarFish : highly-available block storage

StarFishA highly-available geographically-

dispersed block storage system.◦Does not require expensive

dedicated communication lines to all replicas to achieve highly-available .

◦Achieves good performance even during recovery from a replica failure.

◦Single-owner access semantics.

Page 4: StarFish : highly-available block storage

ArchitectureStarFish consists of

◦One Host Element(HE) Provides storage virtualization and read

cache.◦N Storage Element(SE)

Q: write quorum size. Synchronous updates to a quorum of Q

SEs, and asynchronous updates to the rest.

Page 5: StarFish : highly-available block storage

Recommended Setup

N = 3, Q = 2

MAN : Metropolitan Area NetworkWAN :Wide Area Network

Page 6: StarFish : highly-available block storage

Another Deployment

Page 7: StarFish : highly-available block storage

SE RecoveryWrite log

◦HE keeps a circular buffer of recent writes.

◦Each SE maintains a circular buffer of recent writes on a log disk.

Three types of recovery◦Quick recovery◦Replay recovery◦Full recovery

Page 8: StarFish : highly-available block storage

Availability and ReliabilityAssume that the failure and

recovery processes of the network links and SEs are i.i.d Poisson processes with combined mean failure and recovery rates of λ and μ per second.

Similarly, the HE has Poisson-distributed λhe and μhe .

Page 9: StarFish : highly-available block storage

AvailabilityThe steady-state probability that

at least Q SEs are available.

Derived from the standard machine repairman mode.

NQ

iN

NQA N

QN

i

i

1,10,

)1(),( 0

Page 10: StarFish : highly-available block storage

Machine Repairman Model

Page 11: StarFish : highly-available block storage

Availability(cont.)

Page 12: StarFish : highly-available block storage

Availability(cont.)

X ★ 9 : the number of 9s in an availability measure.

Achieve a much higher availability when N = 2Q + 1.

For fixed N, availability decrease with larger quorum size.◦Increasing quorum size trades off

availability for reliability.

Page 13: StarFish : highly-available block storage

ReliabilityThe probability of no data loss.The reliability increases with

larger Q.Two approaches

◦Make Q > floor(N/2) and at least Q SEs are available. Reduce availability and performance.

◦Read-only consistency

Page 14: StarFish : highly-available block storage

Read-only ConsistencyAvailable in read-only mode

during failure.◦Read-only mode obviates the need

for Q SEs to be available to handle updates.

◦Increase availabilityQ

he

iQ

ihe

Nhe

iN

iadOnly

iQ

iN

NQA)1)(1(

)(

)1)(1(

)(),(

1

0

1

0Re

he

he

headOnly

QANANQA

1

),1(1

),1(),(Re

Page 15: StarFish : highly-available block storage

Availability with Read-only Consistency

Page 16: StarFish : highly-available block storage

ObservationsIf ρhe = 0, availability is

independent of Q.◦Can always recover from HE.

If ρhe increase, availability increase with Q.

Largest increase occurs from Q = 1 to Q = 2, and bounded by 3/16 when ρ = 1.◦Diminishing gain after Q = 2.◦Suggest Q = 2 in practical system.

Page 17: StarFish : highly-available block storage

Implementation

Page 18: StarFish : highly-available block storage

Performance MeasurementsCompares with a direct-attached

RAID unit.

Page 19: StarFish : highly-available block storage

SettingsDifferent network delays

◦1, 2, 4, 8, 23, 36, 65 msDifferent bandwidth limitations

◦31, 51, 62, 93, 124 Mb/s.Benchmark:

◦Micro-benchmark Read hit Read miss Write

◦PostMark

Page 20: StarFish : highly-available block storage

Effects of network delays and HE cache size

Near SE delay: 4ms; Far SE delay: 8msNo cache miss if HE cache size = 400

MB

Page 21: StarFish : highly-available block storage

ObservationLarge HE cache improves

performance.◦HE can respond to more read

requests without communicating with SE. Does not change write requests.

◦Especially beneficial when local SE has significant delays.

Q = 2 and 400MB cache size is not influenced by the delay to local SE.◦Depend on near SE.

Page 22: StarFish : highly-available block storage

Normal Operation and placement of the far SE

1-8: 1, 2, 4, 8 ms; 4-12: 4, 8, 12 ms 23-65: 23, 36, 65 ms; 31-124:

31,51,62,93,124 Mbps Local SE delay: 0ms

N = 3

Page 23: StarFish : highly-available block storage

Normal Operation and placement of the far SE(Cont.)

N = 3 8 threads

Page 24: StarFish : highly-available block storage

Normal Operation and placement of the far SE(Cont.)

Page 25: StarFish : highly-available block storage

ObservationPerformance is influenced mostly

by two parameters◦Write quorum size◦Delay to the SE.

StarFish can provide adequate performance when one of the SEs is placed in a remote location.◦At least 85% of the performance of a

direct-attached RAID.

Page 26: StarFish : highly-available block storage

Recovery

Performance degrades more during full recovery.

Page 27: StarFish : highly-available block storage

ConclusionThe StarFish system reveals

significant benefits from a third copy of the data at an intermediate distance.

A StarFish system with 3 replicas, a write quorum size of 2, and read-only consistency yields better than 99.9999% availability assuming individual Storage Element availability of 99%.