Bridging the Information Gap in Storage Protocol Stacks

Bridging the Information Gapin Storage Protocol Stacks

Timothy E. Denehy,Andrea C. Arpaci-Dusseau,

and Remzi H. Arpaci-Dusseau

University of Wisconsin, Madisonhttp://www.cs.wisc.edu/wind/

2 of 31

State of Affairs

NamespaceFiles

MetadataLayout

Liveness

ParallelismRedundancy

File System

Storage System

3 of 31

• Information gap may cause problems– Poor performance

• Partial stripe write operations– Duplicated functionality

• Logging in file system and storage system– Reduced functionality

• Storage system lacks knowledge of files• Time to re-examine the division of labor

Problem

4 of 31

• Enhance the storage interface– Expose performance and failure information

• Use information to provide new functionality– On-line expansion– Dynamic parallelism– Flexible redundancy

Our Approach

Informed LFS

Exposed RAID

5 of 31

Outline

• ERAID Overview• I·LFS Overview• Functionality and Evaluation• Conclusion

6 of 31

• Goals– Backwards compatibility

• Block-based interface• Linear, concatenated address space

– Expose information to the file system above• Allows file system to utilize semantic knowledge

ERAID Overview

7 of 31

• Region– Contiguous portion of the address space

• Regions can be added to expand the address space• Region composition

– RAID: one region for all disks– Exposed: separate regions for each disk– Hybrid

ERAID Regions

ERAID

8 of 31

• Exposed on a per-region basis• Throughput and queue length• Reveals

• Static disk heterogeneity• Dynamic performance and load fluctuations

ERAID Performance Information

ERAID

9 of 31

• Exposed on a per-region basis• Number of tolerable failures• Regions may have different failure characteristics• Reveals dynamic failures to file system above

ERAID Failure Information

RAID1

ERAID

X

10 of 31

Outline


11 of 31

• Modified NetBSD LFS– All data and metadata is written to a log– Log is a collection of segments– Segment table describes each segment– Cleaner process produces empty segments

I·LFS Overview

12 of 31

• Goals– Improve performance, functionality, and manageability– Minimize system complexity

• Exploits ERAID information to provide– On-line expansion– Dynamic parallelism– Flexible redundancy– Lazy redundancy

I·LFS Overview

13 of 31

• NetBSD 1.5• 1 GHz Intel Pentium III Xeon• 128 MB RAM• Four fast disks

– Seagate Cheetah 36XL, 21.6 MB/s• Four slow disks

– Seagate Barracuda 4XL, 7.5 MB/s

I·LFS Experimental Platform

14 of 31

I·LFS Baseline Performance

15 of 31

• Goal: expand storage incrementally– Capacity– Performance

• Ideal: instant disk addition– Minimize downtime– Simplify administration

• I·LFS supports on-line addition of new disks

I·LFS On-line Expansion

16 of 31

• ERAID: an expandable address space• Expansion is equivalent to adding empty segments• Start with an oversized segment table• Activate new portion of segment table

I·LFS On-line Expansion Details

17 of 31

I·LFS On-line Expansion Experiment

• I·LFS takes immediate advantage of each extra disk

18 of 31

• Goal: perform well on heterogeneous storage– Static performance differences– Dynamic performance fluctuations

• Ideal: maximize throughput of the storage system• I·LFS writes data proportionate to performance

I·LFS Dynamic Parallelism

19 of 31

• ERAID: dynamic performance information• Most file system routines are not changed

– Aware of only the ERAID linear address space• Segment selection routine

– Aware of ERAID regions and performance– Chooses next segment based on current performance

• Minimizes changes to the file system

I·LFS Dynamic Parallelism Details

20 of 31

I·LFS Static Parallelism Experiment

• I·LFS provides the full throughput of the system• Simple striping runs at the rate of the slowest disk

21 of 31

I·LFS Dynamic Parallelism Experiment

• I·LFS adjusts to the performance fluctuation

22 of 31

• Goal: offer new redundancy options to users• Ideal: range of redundancy mechanisms and

granularities• I·LFS provides mirrored per-file redundancy

I·LFS Flexible Redundancy

23 of 31

• ERAID: region failure characteristics• Use separate files for redundancy

– Even inode N for original files– Odd inode N+1 for redundant files– Original and redundant data in different sets of regions

• Flexible data placement within the regions• Use recursive vnode operations for redundant files

– Leverage existing routines to reduce complexity

I·LFS Flexible Redundancy Details

24 of 31

I·LFS Flexible Redundancy Experiment

• I·LFS provides a throughput and reliability tradeoff

25 of 31

• Goal: avoid replication performance penalty• Ideal: replicate data immediately before failure• I·LFS offers redundancy with delayed replication• Avoids penalty for redundant, short-lived files

I·LFS Lazy Redundancy

26 of 31

• ERAID: region failure characteristics• Segments needing replication are flagged• Cleaner acts as replicator

– Locates flagged segments– Checks data liveness and lifetime– Generates redundant copies of files

I·LFS Lazy Redundancy

27 of 31

I·LFS Lazy Redundancy Experiment

• I·LFS avoids performance penalty for short-lived files

28 of 31

Outline


29 of 31

Comparison with Traditional Systems

• On-line expansion– Yes, but capacity only, not performance

• Dynamic parallelism– Yes, but with duplicated functionality

• Flexible redundancy– No, the storage system is not aware of file composition

• Lazy redundancy– No, the storage system is not aware of file deletions

30 of 31

Conclusion

• Introduced ERAID and I·LFS• Extra information enables new functionality

– Difficult or impossible in traditional systems• Minimal complexity

– 19% increase in code size• Time to re-examine the division of labor

31 of 31

Questions?

• Full paper available on the WiND publications page– http://www.cs.wisc.edu/wind/

32 of 31

Extra Slides

33 of 31

Storage Failure

34 of 31

Crossed-pointer Problem

Bridging the Information Gap in Storage Protocol Stacks

Documents