Storage Architecture CE202 December 2, 2003 David Pease
Storage Architecture
CE202
December 2, 2003
David Pease
Hierarchy of Storage
Smaller
Larger
Cap
acity
Higher
Lower
Cos
t
RAM
Disk
Optical
Tape
Cache
Faster
Slower
Spe
ed
• Application• I/O Library• File System• Device Driver• Host Bus Adapter• Interconnect• Storage Controller• Devices I/O Context
Storage System Components
Disks
Disk Drives
• “Workhorse” of modern storage systems• Capacity increasing, raw price dropping
– can buy 1TB for only $1000!– bandwidth not keeping pace– reliability is actually decreasing
• massive systems can mean even lower availability
• Majority of cost of ownership in administration, not purchase price– backup, configuration, failure recovery
Disk Architecture
track
platters
spindle
sector
arms withread/writeheads
rotation
cylinder
Disk Storage Density
Disk Capacity Growth
IBM Disk Storage Roadmap
Storage Costs
RAID• Redundant Arrays of Inexpensive Disks• Two orthogonal concepts:
– data striping for performance– redundancy for reliability
• Striped arrays can increase performance, but at the cost of reliability (next page)– redundancy can give arrays better reliability than an
individual disk
Reliability of Striped Array
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Disks
Sys
tem
Rel
iab
ilit
y
Trace collected from the Internet Archive (March 2003)(thanks Kelly Gottlib)-- Over 100 terabytes of compressed data-- 30 disk failures out of total 70 hardware problems
power supply
6%
FS error6%
disk subsystem
10%
disk error10%
disk failure42%
others26%
One-month Trace of Hardware Failures
RAID Levels
Additonal FailuresLevel Description Disks Tolerated
0 Non-redundant striping 0 01 Mirrored n 12 Memory-style ECC 1+lg n 13 Bit-Interleaved Parity 1 14 Block-Interleaved Parity 1 15 Block-Interleaved, Distributed Parity 1 16 P+Q Redundancy 2 2
RAID Levels0
2
3
4
5
6
1
RAID: 4x Small Write Penaltysmall data write
1 2
4
5
3
xor
Log-Structured File Systems• Based on assumption that disk traffic will
become dominated by writes• Always writes disk data sequentially, into next
available location on disk – no seeks on write
• Eliminates problem of 4x write penalty – all writes are “new”, no need to read old data or
parity
• However, almost no examples in industry file systems
Tape
Tape Media
• Inherently sequential– long time to first byte– no random I/O
• Subject to mechanical stress– number of read-write cycles lower than disk
• Problems as an archival medium:– readers go away after some years
• most rapidly in recent years– tapes (with data) remain in a salt mine
Tape Media
• Density will always trail that of disk– Tape stretches, more difficult to get higher
density
• Alignment also an issue – once it’s past the head, it’s gone– more conservative techniques required
• Bottom line: mechanical engineering issues for tape are the difficult ones
Optical
• CD, CD-R/RW, DVD, DVD-R/RW– Capacities:
• CD: ~700MB (huge 20 years ago!)
• DVD: – single sided, single layer: 5GB– single sided, double layer: 9GB– double sided, single layer: 10GB– double sided, double layer: 18GB
• Size of cell limited by wavelength of light– current lasers are red– blue lasers are under development, then UV, ...
Optical
• Magneto-optical (HAMR)– heat from laser makes changing direction
of magnetization easier (so cell is smaller)
MEMS• MicroElectroMechanical Systems
– 6-10 times faster than disk– cost and capacity issues
Magnetic RAM (MRAM)
• Stores each bit in a magnetic cell rather than a capacitor or flip-flop– data is persistent
• Can be read and written very quickly– Read and write times 0.5 – 10 µs or less– Individual bits are writeable (no block erase)
• Density & cost comparable to DRAM– may require density/speed tradeoffs– denser MRAM may have to run slower because of
heat dissipation on writes
Magnetic RAM (MRAM)
• Several companies have announced partnerships to produce products ~2003
• Ideas for use of MRAM in storage:– Persistent cache
• Hot data in MRAM, cold data to disk• No need to flush write cache to avoid data loss
– HeRMES• all metadata in MRAM• enough file data in MRAM to hide disk latency for first
access to a file
Peripheral Buses
• SCSI• IDE/ATA• HIPPI (High Performance Parallel Intf.)• IEEE 1394 (FireWire)• FibreChannel (FCP)• IP (e.g., iSCSI)• InfiniBand• Serial ATA
Peripheral Buses• Parallel
– SCSI, most printers, IBM Channels– 1 or more bytes per clock– Skew problems at high speeds
• Serial– FC, RS232, IEEE1394 (FireWire)– 1 bit per clock, self clocking– can be run at much higher speeds than
parallel bus
Networked Storage• Storage attached by general-purpose or
dedicated network (e.g., FibreChannel)• Motivations:
– homogenous and heterogeneous file sharing– centralized administration– better resource utilization (shared storage
resources, pooling)
• Dedicated Networks:– Fibre-Channel: FCP (SCSI over FC)– iSCSI: SCSI over IP– InfiniBand
Networked Storage• Can mean many things:
– NAS (Network-Attached Storage): file server appliances serving NFS and/or CIFS (for example, Network Appliance)
– NASD (Network-Attached Secure Disk): intelligent, network-attached drives w/ security features (also, Network-Attached Storage Device)
– SAN (Storage Area Network): network for attaching disks and computers, usually dedicated only to storage operations
• OBSD (Object-Based Storage Device): similar to NASD
Meta-dataServer
A SAN File System
SAN
IFS w/cache
Win2K
IFS w/cache
AIX
IFS w/cache
Solaris Meta-dataServer
Meta-dataServer
StorageManagement
ServerHSM &Backup
Meta-data
Control Network (IP)NFS
CIFSFTP
HTTP
Data Data
data
Securityassists
IFS w/cache
Linux
Additional Reading• Hennessy & Patterson: Chapter 6
• Chen, Lee, Gibson, Katz, & Patterson: RAID: high performance, reliable secondary storage. ACM Computing Surveys 26, June 1994, 145-185
• Rosenblum & Ousterhout: The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, Feb. 1992, 26-52
• Gibson, Nagle, et al.: A cost-effective, high-bandwidth storage architecture. Proceedings of the Eight Conference on Architectural Support for Programming Languages and Operating Systems, 1998
• http://www.almaden.ibm.com/cs/storagesystems/stortank/