This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
CS136 2 Outline I/O Benchmarks, Performance, and Dependability
Introduction to Queueing Theory
Slide 3
CS136 3 I/O Performance Response time = Queue Time + Device
Service Time 100% Response Time (ms) Throughput (% total BW) 0 100
200 300 0% Proc Queue IOCDevice Metrics: Response Time vs.
Throughput
Slide 4
CS136 4 I/O Benchmarks For better or worse, benchmarks shape a
field Processor benchmarks classically aimed at response time for
fixed-size problem I/O benchmarks typically measure throughput,
possibly with upper limit on response times (or 90% of response
times) Transaction Processing (TP) (or On-Line TP = OLTP) If bank
computer fails when customer withdraws money, TP system guarantees
account debited if customer gets $ & account unchanged if no $
Airline reservation systems & banks use TP Atomic transactions
make this work Classic metric is Transactions Per Second (TPS)
Slide 5
CS136 5 I/O Benchmarks: Transaction Processing Early 1980s
great interest in OLTP Expecting demand for high TPS (e.g., ATMs,
credit cards) Tandems success implied medium-range OLTP expanding
Each vendor picked own conditions for TPS claims, reported only CPU
times with widely different I/O Conflicting claims led to disbelief
in all benchmarks chaos 1984 Jim Gray (Tandem) distributed paper to
Tandem + 19 in other companies proposing standard benchmark
Published A measure of transaction processing power, Datamation,
1985 by Anonymous et. al To indicate that this was effort of large
group To avoid delays in legal departments at each authors firm Led
to Transaction Processing Council in 1988 www.tpc.org
Slide 6
CS136 6 I/O Benchmarks: TP1 by Anon et. al Debit/Credit
Scalability: # of accounts, branches, tellers, history all function
of throughput TPSNumber of ATMsAccount-file size 101,0000.1 GB
10010,0001.0 GB 1,000100,00010.0 GB 10,0001,000,000100.0 GB Each
input TPS =>100,000 account records, 10 branches, 100 ATMs
Accounts must grow since customer unlikely to use bank more often
just because they have faster computer! Response time: 95%
transactions take 1 second Report price (initial purchase price +
5-year maintenance = cost of ownership) Hire auditor to certify
results
Slide 7
CS136 7 Unusual Characteristics of TPC Price included in
benchmarks Cost of HW, SW, 5-year maintenance agreement
Price-performance as well as performance Data set must scale up as
throughput increases Trying to model real systems: demand on system
and size of data stored in it increase together Benchmark results
are audited Ensures only fair results submitted Throughput is
performance metric but response times are limited E.g, TPC-C: 90%
of transaction response times < 5 seconds Independent
organization maintains the benchmarks Ballots on changes, holds
meetings to settle disputes...
Slide 8
CS136 8 TPC Benchmark History/Status
Slide 9
CS136 9 I/O Benchmarks via SPEC SFS 3.0: Attempt by NFS
companies to agree on standard benchmark Run on multiple clients
& networks (to prevent bottlenecks) Same caching policy in all
clients Reads: 85% full-block & 15% partial-block Writes: 50%
full-block & 50% partial-block Average response time: 40 ms
Scaling: for every 100 NFS ops/sec, increase capacity 1GB Results:
plot of server load (throughput) vs. response time & number of
users Assumes: 1 user => 10 NFS ops/sec 3.0 for NFS 3.0 Added
SPECMail (mailserver), SPECWeb (web server) benchmarks
Slide 10
CS136 10 2005 Example SPEC SFS Result: NetApp FAS3050c NFS
servers 2.8 GHz Pentium Xeons, 2 GB DRAM per processor, 1GB
non-volatile memory per system 4 FDDI nets; 32 NFS Daemons, 24 GB
file size 168 fibre channel disks: 72 GB, 15000 RPM, 2 or 4 FC
controllers
Slide 11
CS136 11 Availability Benchmark Methodology Goal: quantify
variation in QoS metrics as events occur that affect system
availability Leverage existing performance benchmarks To generate
fair workloads To measure & trace quality-of-service metrics
Use fault injection to compromise system Hardware faults (disk,
memory, network, power) Software faults (corrupt input, driver
error returns) Maintenance events (repairs, SW/HW upgrades) Examine
single-fault and multi-fault workloads The availability analogues
of performance micro- and macro- benchmarks
Slide 12
CS136 12 Example single-fault result Compares Linux and Solaris
reconstruction Linux: minimal performance impact but longer window
of vulnerability to second fault Solaris: large perf. impact but
restores redundancy fast Linux Solaris
Slide 13
CS136 13 Reconstruction policy (2) Linux: favors performance
over data availability Automatically-initiated reconstruction, idle
bandwidth Virtually no performance impact on application Very long
window of vulnerability (>1hr for 3GB RAID) Solaris: favors data
availability over app. perf. Automatically-initiated reconstruction
at high BW As much as 34% drop in application performance Short
window of vulnerability (10 minutes for 3GB) Windows: favors
neither! Manually-initiated reconstruction at moderate BW As much
as 18% app. performance drop Somewhat short window of vulnerability
(23 min/3GB)
Slide 14
CS136 14 Introduction to Queueing Theory More interested in
long-term steady state than in startup Arrivals = Departures
Littles Law: Mean number tasks in system = arrival rate x mean
response time Observed by many, Little was first to prove Makes
sense: large number of customers means long waits Applies to any
system in equilibrium, as long as black box not creating or
destroying tasks ArrivalsDepartures
Slide 15
CS136 15 Deriving Littles Law Define arr(t) = # arrivals in
interval (0,t) Define dep(t) = # departures in (0,t) Clearly, N(t)
= # in system at time t = arr(t) dep(t) Area between curves =
spent(t) = total time spent in system by all customers (measured in
customer-seconds) N(t)
Slide 16
CS136 16 Deriving Littles Law (contd) Define average arrival
rate during interval t, in customers/second, as t = arr(t)/t Define
T t as system time/customer, averaged over all customers in (0,t)
Since spent(t) = accumulated customer-seconds, divide by arrivals
up to that point to get T t = spent(t)/arr(t) Mean tasks in system
over (0,t) is accumulated customer-seconds divided by seconds:
Mean_tasks t = spent(t)/t Above three equations give us: Mean_tasks
t = t T t Assuming limits of t and T t exist, limit of mean_tasks t
also exists and gives Littles result: Mean tasks in system =
arrival rate mean time in system
Slide 17
CS136 17 A Little Queuing Theory: Notation Notation: Time
server average time to service a task Average service rate = 1 /
Time server (traditionally ) Time queue average time/task in queue
Time system average time/task in system = Time queue + Time server
Arrival rate avg no. of arriving tasks/sec (traditionally ) Length
server average number of tasks in service Length queue average
length of queue Length system average number of tasks in system =
Length queue + Length server Littles Law: Length server = Arrival
rate x Time server ProcIOCDevice Queue server System
Slide 18
CS136 18 Server Utilization For a single server, service rate =
1 / Time server Server utilization must be between 0 and 1, since
system is in equilibrium (arrivals = departures); often called
traffic intensity, traditionally Server utilization = mean number
tasks in service = Arrival rate x Time server What is disk
utilization if get 50 I/O requests per second for disk and average
disk service time is 10 ms (0.01 sec)? Server utilization = 50/sec
x 0.01 sec = 0.5 Or, on average server is busy 50% of time
Slide 19
CS136 19 Time in Queue vs. Length of Queue We assume First In
First Out (FIFO) queue Relationship of time in queue (Time queue )
to mean number of tasks in queue (Length queue )? Time queue =
Length queue x Time server + Mean time to complete service of task
when new task arrives if server is busy New task can arrive at any
instant; how to predict last part? To predict performance, need to
know something about distribution of events
Slide 20
CS136 20 I/O Request Distributions I/O request arrivals can be
modeled by random variable Multiple processes generate independent
I/O requests Disk seeks and rotational delays are probabilistic
What distribution to use for model? True distributions are
complicated Self-similar (fractal) Zipf We often ignore that and
use Poisson Highly tractable for analysis Intuitively appealing
(independence of arrival times)
Slide 21
CS136 21 The Poisson Distribution Probability of exactly k
arrivals in (0,t) is: P k (t) = (t) k e -t /k! is arrival rate
parameter More useful formulation is Poisson arrival distribution:
PDF A(t) = P[next arrival takes time t] = 1 e -t pdf a(t) = e -t
Also known as exponential distribution Mean = standard deviation =
Poisson distribution is memoryless: Assume P[arrival within 1
second] at time t 0 = x Then P[arrival within 1 second] at time t 1
> t 0 is also x I.e., no memory that time has passed
Slide 22
CS136 22 Kendalls Notation Queueing system is notated A/S/s/c,
where: A encodes the interarrival distribution S encodes the
service-time distribution Both A and S can be M (Memoryless,
Markov, or exponential), D (deterministic), E r (r-stage Erlang), G
(general), or others s is the number of servers c is the capacity
of the queue, if non-infinite Examples: D/D/1 is arrivals on clock
tick, fixed service times, one server M/M/m is memoryless arrivals,
memoryless service, multiple servers (this is good model of a bank)
M/M/m/m is case where customers go away rather than wait in line
G/G/1 is what disk drive is really like (but mostly intractable to
analyze)
Slide 23
CS136 23 M/M/1 Queuing Model System is in equilibrium
Exponential interarrival and service times Unlimited source of
customers (infinite population model) FIFO queue Book also derives
M/M/m Most important results: Let arrival rate = = 1/average
interarrival time Let service rate = = 1/average service time
Define utilization = = / Then average number in system = /(1-) And
time in system = (1/)/(1-)
Slide 24
CS136 24 Explosion of Load with Utilization
Slide 25
CS136 25 Example M/M/1 Analysis Assume 40 disk I/Os / sec
Exponential interarrival time Exponential service time with mean 20
ms = 40, Time server = 1/ = 0.02 sec Server utilization = = Arrival
rate Time server = / = 40 x 0.02 = 0.8 = 80% Time queue = Time
server x /(1-) = 20 ms x 0.8/(1-0.8) = 20 x 4 = 80 ms Time system
=Time queue + Time server = 80+20 ms = 100 ms
Slide 26
CS136 26 How Much Better With 2X Faster Disk? Average service
time is now 10 ms Arrival rate/sec = 40, Time server = 0.01 sec Now
server utilization = Arrival rate Time server = 40 x 0.01 = 0.4 =
40% Time queue = Time server x /(1-) = 10 ms x 0.4/(1-0.4) = 10 x
2/3 = 6.7 ms Time system = Time queue + Time server = 6.7+10 ms =
16.7 ms 6X faster response time with 2X faster disk!
Slide 27
CS136 27 Value of Queueing Theory in Practice Quick lesson:
Dont try for 100% utilization But how far to back off? Theory
allows designers to: Estimate impact of faster hardware on
utilization Find knee of response curve Thus find impact of HW
changes on response time Works surprisingly well
Slide 28
CS136 28 Crosscutting Issues: Buses Point-to-Point Links &
Switches StandardwidthlengthClock rateMB/sMax (Parallel) ATA8b0.5
m133 MHz1332 Serial ATA2b2 m3 GHz300? (Parallel) SCSI16b12 m80 MHz
(DDR)32015 Serial Attach SCSI1b10 m --37516,256 PCI32/640.5 m33 /
66 MHz533? PCI Express2b0.5 m3 GHz250? No. bits and BW is per
direction 2X for both directions (not shown). Since use fewer
wires, commonly increase BW via versions with 2X-12X the number of
wires and BW but timing problems arise
Slide 29
CS136 29 Storage Example: Internet Archive Goal of making a
historical record of the Internet Internet Archive began in 1996
Wayback Machine interface performs time travel to see what a web
page looked like in the past Contains over a petabyte (10 15 bytes)
Growing by 20 terabytes (10 12 bytes) of new data per month Besides
storing historical record, same hardware crawls Web to get new
snapshots
Slide 30
CS136 30 Internet Archive Cluster 1U storage node PetaBox
GB2000 from Capricorn Technologies Has 4 500-GB Parallel ATA (PATA)
drives, 512 MB of DDR266 DRAM, G-bit Ethernet, and 1 GHz C3
processor from VIA (80x86). Node dissipates 80 watts 40 GB2000s in
a standard VME rack, 80 TB raw storage capacity 40 nodes connected
with 48-port Ethernet switch Rack dissipates about 3 KW 1 Petabyte
= 12 racks
Slide 31
CS136 31 Estimated Cost Via processor, 512 MB of DDR266 DRAM,
ATA disk controller, power supply, fans, and enclosure = $500 7200
RPM 500-GB PATA drive = $375 (in 2006) 48-port 10/100/1000 Ethernet
switch and all cables for a rack = $3000 Total cost $84,500 for an
80-TB rack 160 Disks are 60% of total
Slide 32
CS136 32 Estimated Performance 7200 RPM drive: Average seek
time = 8.5 ms Transfer bandwidth 50 MB/second PATA link can handle
133 MB/second ATA controller overhead is 0.1 ms per I/O VIA
processor is 1000 MIPS OS needs 50K CPU instructions for a disk I/O
Network stack uses 100K instructions per data block Average I/O
size: 16 KB for archive fetches 50 KB when crawling Web Disks are
limit: 75 I/Os/s per disk, thus 300/s per node, 12000/s per rack
About 200-600 MB/sec bandwidth per rack Switch must do 1.6-3.8 Gb/s
over 40 Gb/s links
Slide 33
CS136 33 Estimated Reliability CPU/memory/enclosure MTTF is
1,000,000 hours (x 40) Disk MTTF 125,000 hours (x 160) PATA
controller MTTF 500,000 hours (x 40) PATA cable MTTF 1,000,000
hours (x 40) Ethernet switch MTTF 500,000 hours (x 1) Power supply
MTTF 200,000 hours (x 40) Fan MTTF 200,000 hours (x 40) MTTF for
system works out to 531 hours ( 3 weeks) 70% of failures in time
are disks 20% of failures in time are fans or power supplies
Slide 34
CS136 34 Summary Littles Law: Length system = rate x Time
system (Mean no. customers = arrival rate x mean service time)
Appreciation for relationship of latency and utilization: Time
system = Time server + Time queue Time queue = Time server x /(1-)
Clusters for storage as well as computation RAID: Reliability
matters, not performance ProcIOCDevice Queue server System