Top Banner
Storage Architecture CE202 December 2, 2003 David Pease
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ce202 Storage

Storage Architecture

CE202

December 2, 2003

David Pease

Page 2: Ce202 Storage

Hierarchy of Storage

Smaller

Larger

Cap

acity

Higher

Lower

Cos

t

RAM

Disk

Optical

Tape

Cache

Faster

Slower

Spe

ed

Page 3: Ce202 Storage

• Application• I/O Library• File System• Device Driver• Host Bus Adapter• Interconnect• Storage Controller• Devices I/O Context

Storage System Components

Page 4: Ce202 Storage

Disks

Page 5: Ce202 Storage

Disk Drives

• “Workhorse” of modern storage systems• Capacity increasing, raw price dropping

– can buy 1TB for only $1000!– bandwidth not keeping pace– reliability is actually decreasing

• massive systems can mean even lower availability

• Majority of cost of ownership in administration, not purchase price– backup, configuration, failure recovery

Page 6: Ce202 Storage

Disk Architecture

track

platters

spindle

sector

arms withread/writeheads

rotation

cylinder

Page 7: Ce202 Storage

Disk Storage Density

Page 8: Ce202 Storage

Disk Capacity Growth

Page 9: Ce202 Storage

IBM Disk Storage Roadmap

Page 10: Ce202 Storage

Storage Costs

Page 11: Ce202 Storage

RAID• Redundant Arrays of Inexpensive Disks• Two orthogonal concepts:

– data striping for performance– redundancy for reliability

• Striped arrays can increase performance, but at the cost of reliability (next page)– redundancy can give arrays better reliability than an

individual disk

Page 12: Ce202 Storage

Reliability of Striped Array

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of Disks

Sys

tem

Rel

iab

ilit

y

Page 13: Ce202 Storage

Trace collected from the Internet Archive (March 2003)(thanks Kelly Gottlib)-- Over 100 terabytes of compressed data-- 30 disk failures out of total 70 hardware problems

power supply

6%

FS error6%

disk subsystem

10%

disk error10%

disk failure42%

others26%

One-month Trace of Hardware Failures

Page 14: Ce202 Storage

RAID Levels

Additonal FailuresLevel Description Disks Tolerated

0 Non-redundant striping 0 01 Mirrored n 12 Memory-style ECC 1+lg n 13 Bit-Interleaved Parity 1 14 Block-Interleaved Parity 1 15 Block-Interleaved, Distributed Parity 1 16 P+Q Redundancy 2 2

Page 15: Ce202 Storage

RAID Levels0

2

3

4

5

6

1

Page 16: Ce202 Storage

RAID: 4x Small Write Penaltysmall data write

1 2

4

5

3

xor

Page 17: Ce202 Storage

Log-Structured File Systems• Based on assumption that disk traffic will

become dominated by writes• Always writes disk data sequentially, into next

available location on disk – no seeks on write

• Eliminates problem of 4x write penalty – all writes are “new”, no need to read old data or

parity

• However, almost no examples in industry file systems

Page 18: Ce202 Storage

Tape

Page 19: Ce202 Storage

Tape Media

• Inherently sequential– long time to first byte– no random I/O

• Subject to mechanical stress– number of read-write cycles lower than disk

• Problems as an archival medium:– readers go away after some years

• most rapidly in recent years– tapes (with data) remain in a salt mine

Page 20: Ce202 Storage

Tape Media

• Density will always trail that of disk– Tape stretches, more difficult to get higher

density

• Alignment also an issue – once it’s past the head, it’s gone– more conservative techniques required

• Bottom line: mechanical engineering issues for tape are the difficult ones

Page 21: Ce202 Storage

Optical

• CD, CD-R/RW, DVD, DVD-R/RW– Capacities:

• CD: ~700MB (huge 20 years ago!)

• DVD: – single sided, single layer: 5GB– single sided, double layer: 9GB– double sided, single layer: 10GB– double sided, double layer: 18GB

• Size of cell limited by wavelength of light– current lasers are red– blue lasers are under development, then UV, ...

Page 22: Ce202 Storage

Optical

• Magneto-optical (HAMR)– heat from laser makes changing direction

of magnetization easier (so cell is smaller)

Page 23: Ce202 Storage

MEMS• MicroElectroMechanical Systems

– 6-10 times faster than disk– cost and capacity issues

Page 24: Ce202 Storage

Magnetic RAM (MRAM)

• Stores each bit in a magnetic cell rather than a capacitor or flip-flop– data is persistent

• Can be read and written very quickly– Read and write times 0.5 – 10 µs or less– Individual bits are writeable (no block erase)

• Density & cost comparable to DRAM– may require density/speed tradeoffs– denser MRAM may have to run slower because of

heat dissipation on writes

Page 25: Ce202 Storage

Magnetic RAM (MRAM)

• Several companies have announced partnerships to produce products ~2003

• Ideas for use of MRAM in storage:– Persistent cache

• Hot data in MRAM, cold data to disk• No need to flush write cache to avoid data loss

– HeRMES• all metadata in MRAM• enough file data in MRAM to hide disk latency for first

access to a file

Page 26: Ce202 Storage

Peripheral Buses

• SCSI• IDE/ATA• HIPPI (High Performance Parallel Intf.)• IEEE 1394 (FireWire)• FibreChannel (FCP)• IP (e.g., iSCSI)• InfiniBand• Serial ATA

Page 27: Ce202 Storage

Peripheral Buses• Parallel

– SCSI, most printers, IBM Channels– 1 or more bytes per clock– Skew problems at high speeds

• Serial– FC, RS232, IEEE1394 (FireWire)– 1 bit per clock, self clocking– can be run at much higher speeds than

parallel bus

Page 28: Ce202 Storage

Networked Storage• Storage attached by general-purpose or

dedicated network (e.g., FibreChannel)• Motivations:

– homogenous and heterogeneous file sharing– centralized administration– better resource utilization (shared storage

resources, pooling)

• Dedicated Networks:– Fibre-Channel: FCP (SCSI over FC)– iSCSI: SCSI over IP– InfiniBand

Page 29: Ce202 Storage

Networked Storage• Can mean many things:

– NAS (Network-Attached Storage): file server appliances serving NFS and/or CIFS (for example, Network Appliance)

– NASD (Network-Attached Secure Disk): intelligent, network-attached drives w/ security features (also, Network-Attached Storage Device)

– SAN (Storage Area Network): network for attaching disks and computers, usually dedicated only to storage operations

• OBSD (Object-Based Storage Device): similar to NASD

Page 30: Ce202 Storage

Meta-dataServer

A SAN File System

SAN

IFS w/cache

Win2K

IFS w/cache

AIX

IFS w/cache

Solaris Meta-dataServer

Meta-dataServer

StorageManagement

ServerHSM &Backup

Meta-data

Control Network (IP)NFS

CIFSFTP

HTTP

Data Data

data

Securityassists

IFS w/cache

Linux

Page 31: Ce202 Storage

Additional Reading• Hennessy & Patterson: Chapter 6

• Chen, Lee, Gibson, Katz, & Patterson: RAID: high performance, reliable secondary storage. ACM Computing Surveys 26, June 1994, 145-185

• Rosenblum & Ousterhout: The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, Feb. 1992, 26-52

• Gibson, Nagle, et al.: A cost-effective, high-bandwidth storage architecture. Proceedings of the Eight Conference on Architectural Support for Programming Languages and Operating Systems, 1998

• http://www.almaden.ibm.com/cs/storagesystems/stortank/