Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant, Erik Riedel, Susan Spence, Ram Swaminathan, Simon Towers, Alistair Veitch, John Wilkes; HP Labs Storage Systems Program Migrate to Configurati on Analyze Workloa d Execute Applicati on Design New Configurati on
38
Embed
Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hippodrome: Automatic Global Storage Adaptation
Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant, Erik Riedel, Susan Spence, Ram Swaminathan, Simon Towers, Alistair Veitch, John Wilkes; HP Labs Storage Systems Program
Migrate toConfiguration
AnalyzeWorkload
ExecuteApplication
Design NewConfiguration
Hippodrome: Why?
• Computer systems very complex
• System administrators very expensive
• Let the computer handle it
• Optimize the system for the workload as it changes
• Determine when to add/remove hardware
• Two parts to talk
• Description of framework for managing a large I/O centric system
• Experimental results showing when it works and when it doesn’t.
Hippodrome: Lessons
• Global system adaptation possible by use of four parts of the loop:
• Solver: Finds new "optimal" configuration• Models: Predicts the performance of a configuration• Analysis: Generates summary of a workload• Migration: Moves current configuration to new one
• "Goodness" dependent on accuracy of models• Rate of adaptation dependent on "over-commit"
available in the system• A gradually increasing workload can always be
"good" if enough headroom exists
Hippodrome: Our System
• Targeted at applications running on large storage systems
• Solver chooses appropriate configuration for array and mapping of application-level storage units onto the array
• Experiments use synthetic applications for ease of understanding "good" behaviour
• Applications run on an N-class server and access an HP FC-60 disk array via switched fibre channel
Hippodrome: Four Parts Needed for Adaptation
• Analysis: Generates summary of a workload• Models: Predicts the performance of a configuration• Solver: Finds new "optimal" configuration• Migration: Moves current configuration to new one
• Solver and Models both part of "Design New Configuration" step
Migrate toConfiguration
AnalyzeWorkload
ExecuteApplication
Design NewConfiguration
Hippodrome: Analysis, Models, Solver, Migration
• Trace the I/O's generated, run through analysis tools to create "workload" file.
• Two parts generated from analysis:
• "stores:" a logically contiguous fixed-size block of storage. Usually implemented as a logical volume
• "streams:" an access pattern to a particular store. Currently defined as average request rate, average request size, run count, on/off time, overlap fraction
• In our experiments, some additional per-stream values are also calculated to ease understanding the behaviour of the system
Hippodrome: Analysis, Models, Solver, Migration
• Two inputs to models:• Device configuration: Logical Units (LUNs) with disk
type, number of disks, raid level, stripe size; array controller associated with each LUN
• Workload configuration: List of stores on each LUN and therefore the streams accessing that lun and using the associated controller
• Output is utilization of each component (disk, controller, SCSI bus, etc.)
• In our experiments, models calibrated to 6-disk R5 LUN for 4k and 256k random I/Os at an accuracy above 98% as the general models are still being developed.
Hippodrome: Analysis, Models, Solver, Migration
• Two inputs to solver:• The workload (streams and stores)• Description of "valid" configurations (what devices to
use, what raid levels to use, etc.)• Output of solver is a configuration:
• Array descriptions (LUNs, disks, controllers, etc.)• The mapping of stores onto LUNs
• Solver uses models to predict if a configuration is valid (i.e. No component is over 100% utilized)
• In our experiments, solver pinned to using 6-disk R5 luns to match the models and to eliminate the need to migrate between raid types.
Hippodrome: Analysis, Models, Solver, Migration
• Takes as input new "desired" configuration
• Migrates the system to the new configuration preserving the data and access to the data during the migration
• In our experiments, the synthetic application does not care about the data, and so we simply destroy the old configuration and create the new one to do a "migration"
Hippodrome: Experimental overview
• Each experiment is a series of iterations around the loop. Each iteration is called a "step"
• Each step will provide three values:• Deviation from target rate: "goodness" metric 1• Average I/O response time: "goodness" metric 2• Number of LUNs used
Migrate toConfiguration
AnalyzeWorkload
ExecuteApplication
Design NewConfiguration
Experiment Grouping
• Multiple variants of each "application":
• constant-1: streams always on, I/O rate constant
• constant-2: stream groups anti-correlated, I/O rate constant when active
• scaling-1: one store running as fast as possible
• scaling-2: like constant-1, but streams are enabled in different steps; once enabled, a stream will stay on
• scaling-3: like constant-1, but stream I/O rate increases as step number increases
• All experiments show global adaptation possible
Hippodrome: Experiments Demonstrate Lessons
• "Goodness" dependent on accuracy of models• constant-1, constant-2; we show how to "break" the
loop.• Rate of adaptation dependent on "over-commit"
available in the system• constant-1, constant-2; we show how fast the system
converges• A gradually increasing workload can always be
"good" if enough headroom exists• scaling-2, scaling-3; we show that the application
always runs at its target rate
Hippodrome: Experimental Hardware/Software
• Array for experiments is HP FC-60• 2 controllers, 6 trays• 1 Ultra SCSI bus/tray (40MB/s)• 4 Seagate 18GB, 10k RPM disks used/tray = 24 total• 4 6 disk R5 LUNs at 16k stripe size• 1 LUN can do ~625 random 4k reads/second
• Host for experiments is HP N-Class• 1 440 MHz CPU, 1 GB memory, HP-UX 11.00• 2 100 MB/s fibre channel cards used
• Locally developed synthetic application (Buttress)• Host and array connected through Brocade switch
Hippodrome: Common Experiment Parameters• Will vary # stores, # streams, target request rate• Some parameters usually the same:
• Phasing: all streams on at the same time• Store capacity: 256 MB• Max. # I/O's outstanding/stream: 4• Headroom: 0%
• Some parameters constant for all experiments:• Request type: 4k read• Request offset: uniformly random across store, aligned
to 1k boundary• Run count: 1 (no sequentiality in requests)• Arrival process: open, poisson
Hippodrome: Constant-1 experiments
• Important result is shape of the graphs:
• Deviation from target rate converges to 0
• Response time gets (much) better
• # luns used (in the end) matches required request rate
• Comments:
• Variants 0-3 have total RR of 2000 = 4 LUNs
• Variants 4-6 experiment with filling a LUN to start
• Variant 4 converged even though the LUN was full at the start
• Variant 5 converged because of the 10% headroom
• Variant 6 never converges; models predict the LUN is only 95% utilized
Constant-1: Response Time
• Response times get an order of magnitude better
• Variant 6 stays at the bad (0.15 second) average response time
Constant-1: Number of LUNs
• Lines offset slightly so different variants can be seen
• Goes up by 1 lun each step; can't over commit device to 200%
• Variants 4,5 have a total request rate < 3*625, so only use 3 luns
• Variant 6 stays at 1 lun, as would be predicted by other results
Hippodrome: Constant workload review
• Given a constant workload, the loop converges to the "correct" system in most cases
• "Goodness" dependent on accuracy of models
• We "break" the loop either through not enough headroom or bad models
• Rate of adaptation dependent on "over-commit" available in the system
• In general, it increases by 1 LUN per iteration
• With a workload with idle time, it converges faster
• Now look at workloads that change
Hippodrome: Scaling-2 experiments• Scaling-2 intended to simulate adding in
additional weeks in a data warehouse, additional file systems, etc.
• We turn on streams as the step number increases• Store capacity 64 MB, max. outstanding 4
• Comments:• Always "correct!"; rate of increase is small enough• Response time shows points where we added work• LUNs increases as necessary
Parameter Variant 0 Variant 1 Variant 2 Variant 3# Stores 60 120 90 60Target Request Rate 36 18 24 36Stream Enablement 10*(1+step/2) 10*(1+step) 10 for 2 steps 10 for 2 stepsPattern 10 every other step 10 every step same for 1 step same for 2 steps
Scaling-2: Deviation from Target Rate
• Error bars are the same size as before; scale is much smaller
• Amazingly, always within 95% confidence interval of correct
• Slightly above 0 deviation because of measurement methodology
Scaling-2: Response Time
• Scale is much smaller than for constant workloads (max. of 0.055 s vs. 1s)
• Now we can see when we add work and when we remain constant
• Height of peaks show how close to 100% the previous step was
• Slight trend upward; more total I/Os and more capacity actively used
Scaling-2: Response Time – Variant 0 only
• Now we can see when we add work and when we remain constant
• Height of peaks show how close to 100% the previous step was
Scaling-2: Number of LUNs
• Gradual increase in # luns
• Exact switch point dependent on specific increase pattern
• Changes close together as increase patterns are similar
Hippodrome: Scaling workload review
• Handled order of magnitude increase in workload without having serious slowdowns
• Number of luns up by factor of 4
• Could see points of additional work in response time jumping and then settling
• Question: what other scaling up patterns are useful?
• One other group planned is different streams scaling at different rates
Hippodrome: Future Work
• Shifting workloads (transaction processing in the day, decision support at night)
• Cyclic workloads (system is told about the different shift positions)
• More complete models, migration of actual data
• More complex synthetic workloads
• Simple "application" (TPC-B?)
• Complex application (Retail Data Warehouse)
• Support for global bounds on system size/cost
Hippodrome: Four Parts Needed for Adaptation
• Analysis: Generates summary of a workload• Models: Predicts the performance of a configuration• Solver: Finds new "optimal" configuration• Migration: Moves current configuration to new one
• Solver and Models both part of "Design New Configuration" step
Migrate toConfiguration
AnalyzeWorkload
ExecuteApplication
Design NewConfiguration
Hippodrome: Lessons
• Global system adaptation possible by use of four parts of the loop:
• Solver: Finds new "optimal" configuration• Models: Predicts the performance of a configuration• Analysis: Generates summary of a workload• Migration: Moves current configuration to new one
• "Goodness" dependent on accuracy of models• Rate of adaptation dependent on "over-commit"
available in the system• A gradually increasing workload can always be
"good" if enough headroom exists
Hippodrome: Automatic Global Storage Adaptation
• Questions?
• Joint work with: Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant, Erik Riedel, Susan Spence, Ram Swaminathan, Simon Towers, Alistair Veitch, John Wilkes; HP Labs Storage Systems Program
Hippodrome: Constant-2 experiments• Phasing is a very important workload property
• Divide streams into groups (1..n), group start times offset, then constant on/off pattern