Bridging the Information Gap in Storage Protocol Stacks Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison http://www.cs.wisc.edu/wind/
Mar 20, 2016
Bridging the Information Gapin Storage Protocol Stacks
Timothy E. Denehy,Andrea C. Arpaci-Dusseau,
and Remzi H. Arpaci-Dusseau
University of Wisconsin, Madisonhttp://www.cs.wisc.edu/wind/
2 of 31
State of Affairs
NamespaceFiles
MetadataLayout
Liveness
ParallelismRedundancy
File System
Storage System
3 of 31
• Information gap may cause problems– Poor performance
• Partial stripe write operations– Duplicated functionality
• Logging in file system and storage system– Reduced functionality
• Storage system lacks knowledge of files• Time to re-examine the division of labor
Problem
4 of 31
• Enhance the storage interface– Expose performance and failure information
• Use information to provide new functionality– On-line expansion– Dynamic parallelism– Flexible redundancy
Our Approach
Informed LFS
Exposed RAID
5 of 31
Outline
• ERAID Overview• I·LFS Overview• Functionality and Evaluation• Conclusion
6 of 31
• Goals– Backwards compatibility
• Block-based interface• Linear, concatenated address space
– Expose information to the file system above• Allows file system to utilize semantic knowledge
ERAID Overview
7 of 31
• Region– Contiguous portion of the address space
• Regions can be added to expand the address space• Region composition
– RAID: one region for all disks– Exposed: separate regions for each disk– Hybrid
ERAID Regions
ERAID
8 of 31
• Exposed on a per-region basis• Throughput and queue length• Reveals
• Static disk heterogeneity• Dynamic performance and load fluctuations
ERAID Performance Information
ERAID
9 of 31
• Exposed on a per-region basis• Number of tolerable failures• Regions may have different failure characteristics• Reveals dynamic failures to file system above
ERAID Failure Information
RAID1
ERAID
X
10 of 31
Outline
• ERAID Overview• I·LFS Overview• Functionality and Evaluation• Conclusion
11 of 31
• Modified NetBSD LFS– All data and metadata is written to a log– Log is a collection of segments– Segment table describes each segment– Cleaner process produces empty segments
I·LFS Overview
12 of 31
• Goals– Improve performance, functionality, and manageability– Minimize system complexity
• Exploits ERAID information to provide– On-line expansion– Dynamic parallelism– Flexible redundancy– Lazy redundancy
I·LFS Overview
13 of 31
• NetBSD 1.5• 1 GHz Intel Pentium III Xeon• 128 MB RAM• Four fast disks
– Seagate Cheetah 36XL, 21.6 MB/s• Four slow disks
– Seagate Barracuda 4XL, 7.5 MB/s
I·LFS Experimental Platform
14 of 31
I·LFS Baseline Performance
15 of 31
• Goal: expand storage incrementally– Capacity– Performance
• Ideal: instant disk addition– Minimize downtime– Simplify administration
• I·LFS supports on-line addition of new disks
I·LFS On-line Expansion
16 of 31
• ERAID: an expandable address space• Expansion is equivalent to adding empty segments• Start with an oversized segment table• Activate new portion of segment table
I·LFS On-line Expansion Details
17 of 31
I·LFS On-line Expansion Experiment
• I·LFS takes immediate advantage of each extra disk
18 of 31
• Goal: perform well on heterogeneous storage– Static performance differences– Dynamic performance fluctuations
• Ideal: maximize throughput of the storage system• I·LFS writes data proportionate to performance
I·LFS Dynamic Parallelism
19 of 31
• ERAID: dynamic performance information• Most file system routines are not changed
– Aware of only the ERAID linear address space• Segment selection routine
– Aware of ERAID regions and performance– Chooses next segment based on current performance
• Minimizes changes to the file system
I·LFS Dynamic Parallelism Details
20 of 31
I·LFS Static Parallelism Experiment
• I·LFS provides the full throughput of the system• Simple striping runs at the rate of the slowest disk
21 of 31
I·LFS Dynamic Parallelism Experiment
• I·LFS adjusts to the performance fluctuation
22 of 31
• Goal: offer new redundancy options to users• Ideal: range of redundancy mechanisms and
granularities• I·LFS provides mirrored per-file redundancy
I·LFS Flexible Redundancy
23 of 31
• ERAID: region failure characteristics• Use separate files for redundancy
– Even inode N for original files– Odd inode N+1 for redundant files– Original and redundant data in different sets of regions
• Flexible data placement within the regions• Use recursive vnode operations for redundant files
– Leverage existing routines to reduce complexity
I·LFS Flexible Redundancy Details
24 of 31
I·LFS Flexible Redundancy Experiment
• I·LFS provides a throughput and reliability tradeoff
25 of 31
• Goal: avoid replication performance penalty• Ideal: replicate data immediately before failure• I·LFS offers redundancy with delayed replication• Avoids penalty for redundant, short-lived files
I·LFS Lazy Redundancy
26 of 31
• ERAID: region failure characteristics• Segments needing replication are flagged• Cleaner acts as replicator
– Locates flagged segments– Checks data liveness and lifetime– Generates redundant copies of files
I·LFS Lazy Redundancy
27 of 31
I·LFS Lazy Redundancy Experiment
• I·LFS avoids performance penalty for short-lived files
28 of 31
Outline
• ERAID Overview• I·LFS Overview• Functionality and Evaluation• Conclusion
29 of 31
Comparison with Traditional Systems
• On-line expansion– Yes, but capacity only, not performance
• Dynamic parallelism– Yes, but with duplicated functionality
• Flexible redundancy– No, the storage system is not aware of file composition
• Lazy redundancy– No, the storage system is not aware of file deletions
30 of 31
Conclusion
• Introduced ERAID and I·LFS• Extra information enables new functionality
– Difficult or impossible in traditional systems• Minimal complexity
– 19% increase in code size• Time to re-examine the division of labor
31 of 31
Questions?
• Full paper available on the WiND publications page– http://www.cs.wisc.edu/wind/
32 of 31
Extra Slides
33 of 31
Storage Failure
34 of 31
Crossed-pointer Problem