Top Banner
6.033 Spring 2018 Lecture #14 Reliability via Replication General approach to building fault-tolerance systems Single-disk failures: RAID 6.033 | spring 2018 | Katrina LaCurts 1
15

RAID - MIT OpenCourseWare

May 06, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RAID - MIT OpenCourseWare

6.033 Spring 2018Lecture #14

• Reliability via Replication• General approach to building fault-tolerance systems• Single-disk failures: RAID

6.033 | spring 2018 | Katrina LaCurts 1

Page 2: RAID - MIT OpenCourseWare

How to Design Fault-tolerant Systems in Three Easy Steps

1. identify all possible faults

6.033 | spring 2018 | Katrina LaCurts 2

Page 3: RAID - MIT OpenCourseWare

3

Page 4: RAID - MIT OpenCourseWare

4

Page 5: RAID - MIT OpenCourseWare

5

Page 6: RAID - MIT OpenCourseWare

How to Design Fault-tolerant Systems in Three Easy Steps

1. identify all possible faults

2. detect and contain the faults

3. handle the fault

6.033 | spring 2018 | Katrina LaCurts 6

Page 7: RAID - MIT OpenCourseWare

quantifying reliability

6.033 | spring 2018 | Katrina LaCurts 7

Page 8: RAID - MIT OpenCourseWare

dealing with disk failures

6.033 | spring 2018 | Katrina LaCurts 8

Page 9: RAID - MIT OpenCourseWare

700,000 hours ≈ 80 years© Seagate Technology LLC. All rights reserved. This content is excluded from our Creative Commons license. For more

information, see https://ocw.mit.edu/help/faq-fair-use. 6.033 | spring 2018 | Katrina LaCurts

9

Page 10: RAID - MIT OpenCourseWare

dealing with disk failures

6.033 | spring 2018 | Katrina LaCurts 10

Page 11: RAID - MIT OpenCourseWare

RAID 1 (mirroring)

!

"

can recover from single-disk failure requires 2N disks

6.033 | spring 2018 | Katrina LaCurts 11

Page 12: RAID - MIT OpenCourseWare

RAID 4 (dedicated parity disk)

xxx xxx xxx xxx xxx

sector i of the parity diskxor is the xor of sector i… from all data disks

parity data disks disk

! can recover from single-disk failure! requires N+1 disks (not 2N)! performance benefits if you stripe a

single file across multiple data disks" all writes hit the parity disk

6.033 | spring 2018 | Katrina LaCurts 12

Page 13: RAID - MIT OpenCourseWare

RAID 5 (spread out the parity)

xxx xxx … xxx

xxx xxx

! can recover from single-disk failure! requires N+1 disks (not 2N)! performance benefits if you stripe a

single file across multiple data disks! writes are spread across disks

6.033 | spring 2018 | Katrina LaCurts 13

Page 14: RAID - MIT OpenCourseWare

• Systems have faults. We have to take them into accountand build reliable, fault-tolerant systems. Reliabilityalways comes at a cost — there are tradeoffs betweenreliability and monetary cost, reliability and simplicity, etc.

• Our main tool for improving reliability is redundancy.One form of redundancy is replication, which can beused to combat many things including disk failures(important, because disk failures mean lost data).

• RAID replicates data across disks in a smart way: RAID 5protects against single-disk failures while maintaininggood performance.

6.033 | spring 2018 | Katrina LaCurts 14

Page 15: RAID - MIT OpenCourseWare

MIT OpenCourseWare https://ocw.mit.edu

6.033 Computer System EngineeringSpring 2018

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

15