DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)

DISK FAILURESPROF. T.Y.LINCS-257

Presenter:

Shailesh Benake(104)

IndexIndex 13.4 Disk Failures

13.4.1 Intermittent Failures13.4.2 Organizing Data by Cylinders13.4.3 Stable Storage13.4.4 Error- Handling Capabilities of

Stable Storage

13.4.5 Recovery from Disk Crashes13.4.6 Mirroring as a Redundancy

Technique13.4.7 Parity Blocks13.4.8 An Improving: RAID 513.4.9 Coping With Multiple Disk Crashers

Intermittent FailuresIntermittent Failures If we try to read the sector but the correct

content of that sector is not delivered to the disk controller

Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the

read operation

ChecksumsChecksums Each sector has some additional bits, called the

checksums Checksums are set depending on the values of

the data bits stored in that sector Probability of reading bad sector is less if we

use checksums For Odd parity: Odd number of 1’s, add a parity

bit 1 For Even parity: Even number of 1’s, add a

parity bit 0 So, number of 1’s becomes always even

Intermittent Failure: Parity Check Media Decay And Write Failure: Stable

Storage Disk Crash: RAID Example: 1. Sequence : 01101000-> odd no of 1’s

parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s

parity bit: 0 -> 111011100

Stable StorageStable Storage

Correct Errors Sectors are paired and each pair is said to

be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by subsituting spare sector of Xl and Xr until the good value is returned

Error Handling Capabilities of Error Handling Capabilities of Stable StorageStable Storage

Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small

Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X

Recovery from Disk Crashes: Recovery from Disk Crashes: Ways Ways to recover the datato recover the data

The most serious mode of failure for disks is “head crash” where data permanently destroyed.

So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes.

Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk.

Mirroring as a Redundancy Mirroring as a Redundancy TechniqueTechnique

Mirroring Scheme is referred as RAID level 1 protection against data loss scheme.

In this scheme we mirror each disk.

One of the disk is called as data disk and other redundant disk.

In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.

Parity BlocksParity Blocks RAID level 4 scheme uses only one redundant

disk no matter how many data disks there are.

In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks.

It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.

Parity Block – Reading diskParity Block – Reading disk

Reading data disk is same as reading block from any disk.

We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum.

disk 2: 10101010disk 3: 00111000disk 4: 01100010

If we take the modulo-2 sum of the bits in each column, we get - disk 1: 11110000

Parity Block - WritingParity Block - Writing When we write a new block of a data disk, we need to

change that block of the redundant disk as well.

One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk.But this approach requires n-1 reads of data, write a data block and write of redundant disk block.

Total = n+1 disk I/Os

Better approach will require only four disk I/Os1. Read the old value of the data block being changed.2. Read the corresponding block of the redundant disk.3. Write the new data block.4. Recalculate and write the block of the redundant

disk.

Parity Blocks – Failure Parity Blocks – Failure RecoveryRecoveryIf any of the data disk crashes then we just have to compute the module-2 sum to recover the disk.

Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like:

disk 1: 11110000disk 2: ????????disk 3: 00111000disk 4: 01100010

If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is : 10101010

An Improvement: RAID 5An Improvement: RAID 5

RAID 4 is effective in preserving data unless there are two simultaneous disk crashes.

Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk.

However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

Thank YouThank You

DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)

Documents

redundant disk block

disk failures

data disk crashes

replacement disk

missing block of disk

number of disk writes

simultaneous disk crashes

disk controller check