Top Banner
CS 136, Advanced Architecture Storage Performance Measurement

CS 136, Advanced Architecture Storage Performance Measurement.

Dec 23, 2015



Justin Daniel
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
  • Slide 1
  • CS 136, Advanced Architecture Storage Performance Measurement
  • Slide 2
  • CS136 2 Outline I/O Benchmarks, Performance, and Dependability Introduction to Queueing Theory
  • Slide 3
  • CS136 3 I/O Performance Response time = Queue Time + Device Service Time 100% Response Time (ms) Throughput (% total BW) 0 100 200 300 0% Proc Queue IOCDevice Metrics: Response Time vs. Throughput
  • Slide 4
  • CS136 4 I/O Benchmarks For better or worse, benchmarks shape a field Processor benchmarks classically aimed at response time for fixed-size problem I/O benchmarks typically measure throughput, possibly with upper limit on response times (or 90% of response times) Transaction Processing (TP) (or On-Line TP = OLTP) If bank computer fails when customer withdraws money, TP system guarantees account debited if customer gets $ & account unchanged if no $ Airline reservation systems & banks use TP Atomic transactions make this work Classic metric is Transactions Per Second (TPS)
  • Slide 5
  • CS136 5 I/O Benchmarks: Transaction Processing Early 1980s great interest in OLTP Expecting demand for high TPS (e.g., ATMs, credit cards) Tandems success implied medium-range OLTP expanding Each vendor picked own conditions for TPS claims, reported only CPU times with widely different I/O Conflicting claims led to disbelief in all benchmarks chaos 1984 Jim Gray (Tandem) distributed paper to Tandem + 19 in other companies proposing standard benchmark Published A measure of transaction processing power, Datamation, 1985 by Anonymous et. al To indicate that this was effort of large group To avoid delays in legal departments at each authors firm Led to Transaction Processing Council in 1988
  • Slide 6
  • CS136 6 I/O Benchmarks: TP1 by Anon et. al Debit/Credit Scalability: # of accounts, branches, tellers, history all function of throughput TPSNumber of ATMsAccount-file size 101,0000.1 GB 10010,0001.0 GB 1,000100,00010.0 GB 10,0001,000,000100.0 GB Each input TPS =>100,000 account records, 10 branches, 100 ATMs Accounts must grow since customer unlikely to use bank more often just because they have faster computer! Response time: 95% transactions take 1 second Report price (initial purchase price + 5-year maintenance = cost of ownership) Hire auditor to certify results
  • Slide 7
  • CS136 7 Unusual Characteristics of TPC Price included in benchmarks Cost of HW, SW, 5-year maintenance agreement Price-performance as well as performance Data set must scale up as throughput increases Trying to model real systems: demand on system and size of data stored in it increase together Benchmark results are audited Ensures only fair results submitted Throughput is performance metric but response times are limited E.g, TPC-C: 90% of transaction response times < 5 seconds Independent organization maintains the benchmarks Ballots on changes, holds meetings to settle disputes...
  • Slide 8
  • CS136 8 TPC Benchmark History/Status
  • Slide 9
  • CS136 9 I/O Benchmarks via SPEC SFS 3.0: Attempt by NFS companies to agree on standard benchmark Run on multiple clients & networks (to prevent bottlenecks) Same caching policy in all clients Reads: 85% full-block & 15% partial-block Writes: 50% full-block & 50% partial-block Average response time: 40 ms Scaling: for every 100 NFS ops/sec, increase capacity 1GB Results: plot of server load (throughput) vs. response time & number of users Assumes: 1 user => 10 NFS ops/sec 3.0 for NFS 3.0 Added SPECMail (mailserver), SPECWeb (web server) benchmarks
  • Slide 10
  • CS136 10 2005 Example SPEC SFS Result: NetApp FAS3050c NFS servers 2.8 GHz Pentium Xeons, 2 GB DRAM per processor, 1GB non-volatile memory per system 4 FDDI nets; 32 NFS Daemons, 24 GB file size 168 fibre channel disks: 72 GB, 15000 RPM, 2 or 4 FC controllers
  • Slide 11
  • CS136 11 Availability Benchmark Methodology Goal: quantify variation in QoS metrics as events occur that affect system availability Leverage existing performance benchmarks To generate fair workloads To measure & trace quality-of-service metrics Use fault injection to compromise system Hardware faults (disk, memory, network, power) Software faults (corrupt input, driver error returns) Maintenance events (repairs, SW/HW upgrades) Examine single-fault and multi-fault workloads The availability analogues of performance micro- and macro- benchmarks
  • Slide 12
  • CS136 12 Example single-fault result Compares Linux and Solaris reconstruction Linux: minimal performance impact but longer window of vulnerability to second fault Solaris: large perf. impact but restores redundancy fast Linux Solaris
  • Slide 13
  • CS136 13 Reconstruction policy (2) Linux: favors performance over data availability Automatically-initiated reconstruction, idle bandwidth Virtually no performance impact on application Very long window of vulnerability (>1hr for 3GB RAID) Solaris: favors data availability over app. perf. Automatically-initiated reconstruction at high BW As much as 34% drop in application performance Short window of vulnerability (10 minutes for 3GB) Windows: favors neither! Manually-initiated reconstruction at moderate BW As much as 18% app. performance drop Somewhat short window of vulnerability (23 min/3GB)
  • Slide 14
  • CS136 14 Introduction to Queueing Theory More interested in long-term steady state than in startup Arrivals = Departures Littles Law: Mean number tasks in system = arrival rate x mean response time Observed by many, Little was first to prove Makes sense: large number of customers means long waits Applies to any system in equilibrium, as long as black box not creating or destroying tasks ArrivalsDepartures
  • Slide 15
  • CS136 15 Deriving Littles Law Define arr(t) = # arrivals in interval (0,t) Define dep(t) = # departures in (0,t) Clearly, N(t) = # in system at time t = arr(t) dep(t) Area between curves = spent(t) = total time spent in system by all customers (measured in customer-seconds) N(t)
  • Slide 16
  • CS136 16 Deriving Littles Law (contd) Define average arrival rate during interval t, in customers/second, as t = arr(t)/t Define T t as system time/customer, averaged over all customers in (0,t) Since spent(t) = accumulated customer-seconds, divide by arrivals up to that point to get T t = spent(t)/arr(t) Mean tasks in system over (0,t) is accumulated customer-seconds divided by seconds: Mean_tasks t = spent(t)/t Above three equations give us: Mean_tasks t = t T t Assuming limits of t and T t exist, limit of mean_tasks t also exists and gives Littles result: Mean tasks in system = arrival rate mean time in system
  • Slide 17
  • CS136 17 A Little Queuing Theory: Notation Notation: Time server average time to service a task Average service rate = 1 / Time server (traditionally ) Time queue average time/task in queue Time system average time/task in system = Time queue + Time server Arrival rate avg no. of arriving tasks/sec (traditionally ) Length server average number of tasks in service Length queue average length of queue Length system average number of tasks in system = Length queue + Length server Littles Law: Length server = Arrival rate x Time server ProcIOCDevice Queue server System
  • Slide 18
  • CS136 18 Server Utilization For a single server, service rate = 1 / Time server Server utilization must be between 0 and 1, since system is in equilibrium (arrivals = departures); often called traffic intensity, traditionally Server utilization = mean number tasks in service = Arrival rate x Time server What is disk utilization if get 50 I/O requests per second for disk and average disk service time is 10 ms (0.01 sec)? Server utilization = 50/sec x 0.01 sec = 0.5 Or, on average server is busy 50% of time
  • Slide 19
  • CS136 19 Time in Queue vs. Length of Queue We assume First In First Out (FIFO) queue Relationship of time in queue (Time queue ) to mean number of tasks in queue (Length queue )? Time queue = Length queue x Time server + Mean time to complete service of task when new task arrives if server is busy New task can arrive at any instant; how to predict last part? To predict performance, need to know something about distribution of events
  • Slide 20
  • CS136 20 I/O Request Distributions I/O request arrivals can be modeled by random variable Multiple processes generate independent I/O requests Disk seeks and rotational delays are probabilistic What distribution to use for model? True distributions are complicated Self-similar (fractal) Zipf We often ignore that and use Poisson Highly tractable for analysis Intuitively appealing (independence of arrival times)
  • Slide 21
  • CS136 21 The Poisson Distribution Probability of exactly k arrivals in (0,t) is: P k (t) = (t) k e -t /k! is arrival rate parameter More useful formulation is Poisson arrival distribution: PDF A(t) = P[next arrival takes time t] = 1 e -t pdf a(t) = e -t Also known as exponential distribution Mean = standard deviation = Poisson distribution is memoryless: Assume P[arrival within 1 second] at time t 0 = x Then P[arrival within 1 second] at time t 1 > t 0 is also x I.e., no memory that time has passed
  • Slide 22
  • CS136 22 Kendalls Notation Queueing system is notated A/S/s/c, where: A encodes the interarrival distribution S encodes the service-time distribution Both A and S can be M (Memoryless, Markov, or exponential), D (deterministic), E r (r-stage Erlang), G (general), or others s is the number of servers c is the capacity of the queue, if non-infinite Examples: D/D/1 is arrivals on clock tick, fixed service times, one server M/M/m is memoryless arrivals, memoryless service, multiple servers (this is good model of a bank) M/M/m/m is case where customers go away rather than wait in line G/G/1 is what disk drive is really like (but mostly intractable to analyze)
  • Slide 23
  • CS136 23 M/M/1 Queuing Model System is in equilibrium Exponential interarrival and service times Unlimited source of customers (infinite population model) FIFO queue Book also derives M/M/m Most important results: Let arrival rate = = 1/average interarrival time Let service rate = = 1/average service time Define utilization = = / Then average number in system = /(1-) And time in system = (1/)/(1-)
  • Slide 24
  • CS136 24 Explosion of Load with Utilization
  • Slide 25
  • CS136 25 Example M/M/1 Analysis Assume 40 disk I/Os / sec Exponential interarrival time Exponential service time with mean 20 ms = 40, Time server = 1/ = 0.02 sec Server utilization = = Arrival rate Time server = / = 40 x 0.02 = 0.8 = 80% Time queue = Time server x /(1-) = 20 ms x 0.8/(1-0.8) = 20 x 4 = 80 ms Time system =Time queue + Time server = 80+20 ms = 100 ms
  • Slide 26
  • CS136 26 How Much Better With 2X Faster Disk? Average service time is now 10 ms Arrival rate/sec = 40, Time server = 0.01 sec Now server utilization = Arrival rate Time server = 40 x 0.01 = 0.4 = 40% Time queue = Time server x /(1-) = 10 ms x 0.4/(1-0.4) = 10 x 2/3 = 6.7 ms Time system = Time queue + Time server = 6.7+10 ms = 16.7 ms 6X faster response time with 2X faster disk!
  • Slide 27
  • CS136 27 Value of Queueing Theory in Practice Quick lesson: Dont try for 100% utilization But how far to back off? Theory allows designers to: Estimate impact of faster hardware on utilization Find knee of response curve Thus find impact of HW changes on response time Works surprisingly well
  • Slide 28
  • CS136 28 Crosscutting Issues: Buses Point-to-Point Links & Switches StandardwidthlengthClock rateMB/sMax (Parallel) ATA8b0.5 m133 MHz1332 Serial ATA2b2 m3 GHz300? (Parallel) SCSI16b12 m80 MHz (DDR)32015 Serial Attach SCSI1b10 m --37516,256 PCI32/640.5 m33 / 66 MHz533? PCI Express2b0.5 m3 GHz250? No. bits and BW is per direction 2X for both directions (not shown). Since use fewer wires, commonly increase BW via versions with 2X-12X the number of wires and BW but timing problems arise
  • Slide 29
  • CS136 29 Storage Example: Internet Archive Goal of making a historical record of the Internet Internet Archive began in 1996 Wayback Machine interface performs time travel to see what a web page looked like in the past Contains over a petabyte (10 15 bytes) Growing by 20 terabytes (10 12 bytes) of new data per month Besides storing historical record, same hardware crawls Web to get new snapshots
  • Slide 30
  • CS136 30 Internet Archive Cluster 1U storage node PetaBox GB2000 from Capricorn Technologies Has 4 500-GB Parallel ATA (PATA) drives, 512 MB of DDR266 DRAM, G-bit Ethernet, and 1 GHz C3 processor from VIA (80x86). Node dissipates 80 watts 40 GB2000s in a standard VME rack, 80 TB raw storage capacity 40 nodes connected with 48-port Ethernet switch Rack dissipates about 3 KW 1 Petabyte = 12 racks
  • Slide 31
  • CS136 31 Estimated Cost Via processor, 512 MB of DDR266 DRAM, ATA disk controller, power supply, fans, and enclosure = $500 7200 RPM 500-GB PATA drive = $375 (in 2006) 48-port 10/100/1000 Ethernet switch and all cables for a rack = $3000 Total cost $84,500 for an 80-TB rack 160 Disks are 60% of total
  • Slide 32
  • CS136 32 Estimated Performance 7200 RPM drive: Average seek time = 8.5 ms Transfer bandwidth 50 MB/second PATA link can handle 133 MB/second ATA controller overhead is 0.1 ms per I/O VIA processor is 1000 MIPS OS needs 50K CPU instructions for a disk I/O Network stack uses 100K instructions per data block Average I/O size: 16 KB for archive fetches 50 KB when crawling Web Disks are limit: 75 I/Os/s per disk, thus 300/s per node, 12000/s per rack About 200-600 MB/sec bandwidth per rack Switch must do 1.6-3.8 Gb/s over 40 Gb/s links
  • Slide 33
  • CS136 33 Estimated Reliability CPU/memory/enclosure MTTF is 1,000,000 hours (x 40) Disk MTTF 125,000 hours (x 160) PATA controller MTTF 500,000 hours (x 40) PATA cable MTTF 1,000,000 hours (x 40) Ethernet switch MTTF 500,000 hours (x 1) Power supply MTTF 200,000 hours (x 40) Fan MTTF 200,000 hours (x 40) MTTF for system works out to 531 hours ( 3 weeks) 70% of failures in time are disks 20% of failures in time are fans or power supplies
  • Slide 34
  • CS136 34 Summary Littles Law: Length system = rate x Time system (Mean no. customers = arrival rate x mean service time) Appreciation for relationship of latency and utilization: Time system = Time server + Time queue Time queue = Time server x /(1-) Clusters for storage as well as computation RAID: Reliability matters, not performance ProcIOCDevice Queue server System