Top Banner
Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant, Erik Riedel, Susan Spence, Ram Swaminathan, Simon Towers, Alistair Veitch, John Wilkes; HP Labs Storage Systems Program Migrate to Configurati on Analyze Workloa d Execute Applicati on Design New Configurati on
38

Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Automatic Global Storage Adaptation

Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant, Erik Riedel, Susan Spence, Ram Swaminathan, Simon Towers, Alistair Veitch, John Wilkes; HP Labs Storage Systems Program

Migrate toConfiguration

AnalyzeWorkload

ExecuteApplication

Design NewConfiguration

Page 2: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Why?

• Computer systems very complex

• System administrators very expensive

• Let the computer handle it

• Optimize the system for the workload as it changes

• Determine when to add/remove hardware

• Two parts to talk

• Description of framework for managing a large I/O centric system

• Experimental results showing when it works and when it doesn’t.

Page 3: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Lessons

• Global system adaptation possible by use of four parts of the loop:

• Solver: Finds new "optimal" configuration• Models: Predicts the performance of a configuration• Analysis: Generates summary of a workload• Migration: Moves current configuration to new one

• "Goodness" dependent on accuracy of models• Rate of adaptation dependent on "over-commit"

available in the system• A gradually increasing workload can always be

"good" if enough headroom exists

Page 4: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Our System

• Targeted at applications running on large storage systems

• Solver chooses appropriate configuration for array and mapping of application-level storage units onto the array

• Experiments use synthetic applications for ease of understanding "good" behaviour

• Applications run on an N-class server and access an HP FC-60 disk array via switched fibre channel

Page 5: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Four Parts Needed for Adaptation

• Analysis: Generates summary of a workload• Models: Predicts the performance of a configuration• Solver: Finds new "optimal" configuration• Migration: Moves current configuration to new one

• Solver and Models both part of "Design New Configuration" step

Migrate toConfiguration

AnalyzeWorkload

ExecuteApplication

Design NewConfiguration

Page 6: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Analysis, Models, Solver, Migration

• Trace the I/O's generated, run through analysis tools to create "workload" file.

• Two parts generated from analysis:

• "stores:" a logically contiguous fixed-size block of storage. Usually implemented as a logical volume

• "streams:" an access pattern to a particular store. Currently defined as average request rate, average request size, run count, on/off time, overlap fraction

• In our experiments, some additional per-stream values are also calculated to ease understanding the behaviour of the system

Page 7: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Analysis, Models, Solver, Migration

• Two inputs to models:• Device configuration: Logical Units (LUNs) with disk

type, number of disks, raid level, stripe size; array controller associated with each LUN

• Workload configuration: List of stores on each LUN and therefore the streams accessing that lun and using the associated controller

• Output is utilization of each component (disk, controller, SCSI bus, etc.)

• In our experiments, models calibrated to 6-disk R5 LUN for 4k and 256k random I/Os at an accuracy above 98% as the general models are still being developed.

Page 8: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Analysis, Models, Solver, Migration

• Two inputs to solver:• The workload (streams and stores)• Description of "valid" configurations (what devices to

use, what raid levels to use, etc.)• Output of solver is a configuration:

• Array descriptions (LUNs, disks, controllers, etc.)• The mapping of stores onto LUNs

• Solver uses models to predict if a configuration is valid (i.e. No component is over 100% utilized)

• In our experiments, solver pinned to using 6-disk R5 luns to match the models and to eliminate the need to migrate between raid types.

Page 9: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Analysis, Models, Solver, Migration

• Takes as input new "desired" configuration

• Migrates the system to the new configuration preserving the data and access to the data during the migration

• In our experiments, the synthetic application does not care about the data, and so we simply destroy the old configuration and create the new one to do a "migration"

Page 10: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Experimental overview

• Each experiment is a series of iterations around the loop. Each iteration is called a "step"

• Each step will provide three values:• Deviation from target rate: "goodness" metric 1• Average I/O response time: "goodness" metric 2• Number of LUNs used

Migrate toConfiguration

AnalyzeWorkload

ExecuteApplication

Design NewConfiguration

Page 11: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Experiment Grouping

• Multiple variants of each "application":

• constant-1: streams always on, I/O rate constant

• constant-2: stream groups anti-correlated, I/O rate constant when active

• scaling-1: one store running as fast as possible

• scaling-2: like constant-1, but streams are enabled in different steps; once enabled, a stream will stay on

• scaling-3: like constant-1, but stream I/O rate increases as step number increases

• All experiments show global adaptation possible

Page 12: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Experiments Demonstrate Lessons

• "Goodness" dependent on accuracy of models• constant-1, constant-2; we show how to "break" the

loop.• Rate of adaptation dependent on "over-commit"

available in the system• constant-1, constant-2; we show how fast the system

converges• A gradually increasing workload can always be

"good" if enough headroom exists• scaling-2, scaling-3; we show that the application

always runs at its target rate

Page 13: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Experimental Hardware/Software

• Array for experiments is HP FC-60• 2 controllers, 6 trays• 1 Ultra SCSI bus/tray (40MB/s)• 4 Seagate 18GB, 10k RPM disks used/tray = 24 total• 4 6 disk R5 LUNs at 16k stripe size• 1 LUN can do ~625 random 4k reads/second

• Host for experiments is HP N-Class• 1 440 MHz CPU, 1 GB memory, HP-UX 11.00• 2 100 MB/s fibre channel cards used

• Locally developed synthetic application (Buttress)• Host and array connected through Brocade switch

Page 14: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Common Experiment Parameters• Will vary # stores, # streams, target request rate• Some parameters usually the same:

• Phasing: all streams on at the same time• Store capacity: 256 MB• Max. # I/O's outstanding/stream: 4• Headroom: 0%

• Some parameters constant for all experiments:• Request type: 4k read• Request offset: uniformly random across store, aligned

to 1k boundary• Run count: 1 (no sequentiality in requests)• Arrival process: open, poisson

Page 15: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Constant-1 experiments

• Important result is shape of the graphs:

• Deviation from target rate converges to 0

• Response time gets (much) better

• # luns used (in the end) matches required request rate

• Comments:

• Variants 0-3 have total RR of 2000 = 4 LUNs

• Variants 4-6 experiment with filling a LUN to start

• Variants 5,6 differ only in the headroomParameter Variant 0 Variant 1 Variant 2 Variant 3 Variant 4 Variant 5 Variant 6# Stores 100 200 50 50 84 21 21Target Request Rate 20 10 40 40 20 80 80Max Outstanding 4 4 4 16 4 4 4Store Size 256MB 256MB 256MB 256MB 1GB 4GB 4GBHeadroom 0% 0% 0% 0% 0% 10% 0%

Page 16: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Constant-1: Deviation from Target Rate

• Variants 0-5 converge to 95% CI of 0

• Variant 4 converged even though the LUN was full at the start

• Variant 5 converged because of the 10% headroom

• Variant 6 never converges; models predict the LUN is only 95% utilized

Page 17: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Constant-1: Response Time

• Response times get an order of magnitude better

• Variant 6 stays at the bad (0.15 second) average response time

Page 18: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Constant-1: Number of LUNs

• Lines offset slightly so different variants can be seen

• Goes up by 1 lun each step; can't over commit device to 200%

• Variants 4,5 have a total request rate < 3*625, so only use 3 luns

• Variant 6 stays at 1 lun, as would be predicted by other results

Page 19: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Constant workload review

• Given a constant workload, the loop converges to the "correct" system in most cases

• "Goodness" dependent on accuracy of models

• We "break" the loop either through not enough headroom or bad models

• Rate of adaptation dependent on "over-commit" available in the system

• In general, it increases by 1 LUN per iteration

• With a workload with idle time, it converges faster

• Now look at workloads that change

Page 20: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Scaling-2 experiments• Scaling-2 intended to simulate adding in

additional weeks in a data warehouse, additional file systems, etc.

• We turn on streams as the step number increases• Store capacity 64 MB, max. outstanding 4

• Comments:• Always "correct!"; rate of increase is small enough• Response time shows points where we added work• LUNs increases as necessary

Parameter Variant 0 Variant 1 Variant 2 Variant 3# Stores 60 120 90 60Target Request Rate 36 18 24 36Stream Enablement 10*(1+step/2) 10*(1+step) 10 for 2 steps 10 for 2 stepsPattern 10 every other step 10 every step same for 1 step same for 2 steps

Page 21: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-2: Deviation from Target Rate

• Error bars are the same size as before; scale is much smaller

• Amazingly, always within 95% confidence interval of correct

• Slightly above 0 deviation because of measurement methodology

Page 22: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-2: Response Time

• Scale is much smaller than for constant workloads (max. of 0.055 s vs. 1s)

• Now we can see when we add work and when we remain constant

• Height of peaks show how close to 100% the previous step was

• Slight trend upward; more total I/Os and more capacity actively used

Page 23: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-2: Response Time – Variant 0 only

• Now we can see when we add work and when we remain constant

• Height of peaks show how close to 100% the previous step was

Page 24: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-2: Number of LUNs

• Gradual increase in # luns

• Exact switch point dependent on specific increase pattern

• Changes close together as increase patterns are similar

Page 25: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Scaling workload review

• Handled order of magnitude increase in workload without having serious slowdowns

• Number of luns up by factor of 4

• Could see points of additional work in response time jumping and then settling

• Question: what other scaling up patterns are useful?

• One other group planned is different streams scaling at different rates

Page 26: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Future Work

• Shifting workloads (transaction processing in the day, decision support at night)

• Cyclic workloads (system is told about the different shift positions)

• More complete models, migration of actual data

• More complex synthetic workloads

• Simple "application" (TPC-B?)

• Complex application (Retail Data Warehouse)

• Support for global bounds on system size/cost

Page 27: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Four Parts Needed for Adaptation

• Analysis: Generates summary of a workload• Models: Predicts the performance of a configuration• Solver: Finds new "optimal" configuration• Migration: Moves current configuration to new one

• Solver and Models both part of "Design New Configuration" step

Migrate toConfiguration

AnalyzeWorkload

ExecuteApplication

Design NewConfiguration

Page 28: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Lessons

• Global system adaptation possible by use of four parts of the loop:

• Solver: Finds new "optimal" configuration• Models: Predicts the performance of a configuration• Analysis: Generates summary of a workload• Migration: Moves current configuration to new one

• "Goodness" dependent on accuracy of models• Rate of adaptation dependent on "over-commit"

available in the system• A gradually increasing workload can always be

"good" if enough headroom exists

Page 29: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Automatic Global Storage Adaptation

• Questions?

• Joint work with: Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant, Erik Riedel, Susan Spence, Ram Swaminathan, Simon Towers, Alistair Veitch, John Wilkes; HP Labs Storage Systems Program

Page 30: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Constant-2 experiments• Phasing is a very important workload property

• Divide streams into groups (1..n), group start times offset, then constant on/off pattern

• Max. outstanding/stream: 32; Target rate: 40• Comments:

• Variant 1 shows faster adaptation because of idle time• Variant 4/5 show what happens when analysis step is

wrong• Scaling # groups proved uninteresting

Parameter Variant 0 Variant 1 Variant 2 Variant 3 Variant 4 Variant 5# Stores 200 200 100 100 100 100On/Off Time 1.0/1.0 0.75/3.25 2.0/2.0 1.0/1.0 0.5/0.5 0.5/0.5Number of Groups 4 4 2 2 2 2Start Delay Multiplier 1 1 2 1 0 0.5

Page 31: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Constant-2: Deviation from Target Rate

• All except variant 4 converge; 4 appears the same as 5 in the analysis, but the two groups overlap in 4 and are anti-correlated in 5

• Variant 1 converges faster than the others; this is because the idle time between groups running allows the system to drain requests

Page 32: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Constant-2: Response Time

• Similar results to previous slide:

• Variant 4 does not get to a good response time

• Variant 1 converges faster than others.

Page 33: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Constant-2: Number of LUNs

• Now we see why variant 1 converges faster, it gets to 4 luns in only 2 steps rather than three; this is because of the idle time.

• Otherwise, behaviour is the same as for constant-1, which is to be expected as in the aggregate, constant-2 is the same as constant-1

Page 34: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Scaling-1 experiments

• Scaling-1 intended to simulate something like a disk copy that will run as fast as the disks will go (it's disk bound, not cpu bound)

• Worked for 3 iterations of the loop (even striped the store across multiple LUNs), then wanted 5 luns, which is not available

• Future work: handling a global bound on the size of the storage system (for example, you can't spend more than $100,000)

Page 35: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Hippodrome: Scaling-3 experiments

• Scaling-3 intended to simulate adding work over constant data set (e.g. more queries to DB)

• We increase target request rate as step increases

• Store capacity 64 MB, max. outstanding 4, max. RR 36

• Comments:

• Always "correct!"; rate of increase is small enough

• Response time shows points where we added work

• LUNs increases as necessary

• Initial deviations garbage due to low request rateParameter Variant 0 Variant 1# Stores 60 60Target Request Rate 3*(step+1) 6*(1+step/2)

Page 36: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-3: Deviation from target rate

• Ignore the graph before about step 4, request rates too low, analysis seeing bursts and calculating rates over that

• Always supports target request rate

Page 37: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-3: Response Time

• Variant 1 shows up-down pattern of changing then stabilizing workload

• Always doing pretty well, big drops for variant 0 as lun count increased

Page 38: Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,

Scaling-3: Number of LUNs

• Increases gradually, exact switch over dependent on variant specifics