1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected] , http://research.Microsoft.com/~Gray/T alks/
Jan 04, 2016
1
Rules of Thumb in Data Engineering
Jim GrayInternational Conference on Data EngineeringSan Diego, CA 4 March [email protected], http://research.Microsoft.com/~Gray/Talks/
2
Credits & Thank You!! Prashant Shenoy U. Mass, Amherst analysis of web caching rules. [email protected]
Terrance Kelly, U. Michigan,lots of advice on fixing the paper, [email protected]
interesting work on caching at: http://ai.eecs.umich.edu/~tpkelly/papers/wcp.pdf
Dave Lomet, Paul Larson, Surajit Chaudhurihow big should database pages be?
Remzi Arpaci-Dusseau, Kim Keeton, Erik Riedel discussions about balanced systems an IO
Windsor Hsu, Alan Smith, & Honesty Young, also studied TPC-C and balanced systems (very nice work!) http://golem.cs.berkeley.edu/~windsorh/DBChar/
Anastassia Ailamaki, Kim Keeton cpi measurements
Gordon Bell discussions on balanced systems.
3
Woops!
and Apology…..Printed/Published paper has MANY bugs! Conclusions OK (sort of ), but typos, flaws,
errors,… Revised version at
http://research.microsoft.com/~Gray/ and in CoRR and MS Research tech report archive.By 15 March 2000.
Sorry!
Sorry!
4
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
5
Trends: Moore’s LawPerformance/Price doubles every 18 months100x per decadeProgress in next 18 months
= ALL previous progress New storage = sum of all old storage
(ever) New processing = sum of all old
processing.
E. coli double ever 20 minutes!
15 years ago
6
Trends: ops/s/$ Had Three Growth Phases1890-1945
Mechanical
Relay
7-year doubling
1945-1985Tube, transistor,..
2.3 year doubling
1985-2000Microprocessor
1.0 year doubling 1.E-06
1.E-03
1.E+00
1.E+03
1.E+06
1.E+09
1880 1900 1920 1940 1960 1980 2000
doubles every 7.5 years
doubles every 2.3 years
doubles every 1.0 years
ops per second/$
7
Trends: Gilder’s Law: 3x bandwidth/year for 25 more years
Today: 10 Gbps per channel 4 channels per fiber: 40 Gbps 32 fibers/bundle = 1.2 Tbps/bundle
In lab 3 Tbps/fiber (400 x WDM)In theory 25 Tbps per fiber1 Tbps = USA 1996 WAN bisection bandwidthAggregate bandwidth doubles every 8 months!
1 fiber = 25 Tbps
8
Trends: Magnetic Storage Densities
Amazing progressRatios have changedCapacity grows 60%/yAccess speed grows 10x more slowly 0.01
0.1
1
10
100
1000
10000
100000
1000000
84 88 92 96 00 04
tpikbpiMBpsGbpsi
Magnetic Disk Parameters vs Time
year
9
Trends: Density Limits
The end is near!Products:11 GbpsiLab: 35 Gbpsi“limit”: 60 GbpsiButlimit keeps rising& there are alternatives
Bit Density
3 2
3,000 2,000
1,000 600
300 200
100 60
30 20
10 6
b/µm2 Gb/in2
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
1 0.6
CD
DVD ODD
Wavelength Limit
SuperParmagnetic Limit
?: NEMS, Florescent
? Holograpic
, DNA?
Figure adapted from Franco Vitaliano, “The NEW new media: the growing attraction of nonmagnetic storage”, Data Storage, Feb 2000, pp 21-32, www.datastorage.com
Density vs Timeb/µm2 & Gb/in2
10
Trends: promises NEMS (Nano Electro Mechanical Systems)(http://www.nanochip.com/) also Cornell, IBM, CMU,…
• 250 Gbpsi by using tunneling electronic microscope
• Disk replacement• Capacity: 180 GB now,
1.4 TB in 2 years • Transfer rate: 100 MB/sec R&W• Latency: 0.5msec• Power: 23W active, .05W Standby• 10k$/TB now, 2k$/TB in 2002
11
Consequence of Moore’s law:Need an address bit every 18 months.
Moore’s law gives you 2x more in 18 months.RAM Today we have 10 MB to 100 GB machines
(24-36 bits of addressing) then In 9 years we will need 6 more bits:
30-42 bit addressing (4TB ram).
Disks Today we have 10 GB to 100 TB file systems/DBs
(33-47 bit file addresses) In 9 years, we will need 6 more bits
40-53 bit file addresses (100 PB files)
12
Architecture could change this
1-level store: System 48, AS400 has 1-level store. Never re-uses an address. Needs 96-bit addressing today.
NUMAs and Clusters Willing to buy a 100 M$ computer? Then add 6 more address bits.
Only 1-level store pushes us beyond 64-bitsStill, these are “logical” addresses, 64-bit physical will last many years
13
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
14
Storage Latency: How Far Away is the Data?
RegistersOn Chip CacheOn Board Cache
Memory
Disk
12
10
100
Tape /Optical Robot
10 9
10 6
Olympia
This Hotel
This RoomMy Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 YearsAndromeda
15
Storage Hierarchy : Speed & Capacity vs Cost TradeoffsStorage Hierarchy : Speed & Capacity vs Cost Tradeoffs
1015
1012
109
106
103
Typ
ical
Sys
tem
(by
tes)
Size vs Speed
Access Time (seconds)10-9 10-6 10-3 10 0 10 3
Cache
Main
Secondary
Disc
Nearline Tape
Offline Tape
Online Tape
102
100
10-2
10-4
10-6
$/M
B
Price vs Speed
Access Time (seconds)10-9 10-6 10-3 10 0 10 3
Cache
MainSecondary
Disc
Nearline Tape
Offline Tape
Online Tape
16
Disks: TodayDisk is 8GB to 80 GB10-30 MBps5k-15k rpm (6ms-2ms rotational latency)
12ms-7ms seek7K$/IDE-TB, 20k$/SCSI-TBFor shared disks most time spent waiting in queue for access to arm/controller
Seek
Rotate
Transfer
Seek
Rotate
Transfer
Wait
17
Standard Storage MetricsCapacity: RAM: MB and $/MB: today at 512MB and 3$/MB Disk: GB and $/GB: today at 40GB and 20$/GB Tape: TB and $/TB: today at 40GB and
10k$/TB (nearline)
Access time (latency) RAM: 100 ns Disk: 15 ms Tape: 30 second pick, 30 second position
Transfer rate RAM: 1-10 GB/s Disk: 20-30 MB/s - - -Arrays can go to 10GB/s Tape: 5-15 MB/s - - - Arrays can go to
1GB/s
18
New Storage Metrics: Kaps, Maps, SCAN
Kaps: How many kilobyte objects served per second The file server, transaction processing metric This is the OLD metric.
Maps: How many megabyte objects served per sec The Multi-Media metric
SCAN: How long to scan all the data the data mining and utility metric
And Kaps/$, Maps/$, TBscan/$
22
Storage Ratios Changed10x better access time10x more bandwidth100x more capacityData 25x cooler (1Kaps/20MB vs 1Kaps/500MB)
4,000x lower media price20x to 100x lower disk priceScan takes 10x longer (3 min vs 45 min)
Disk Performance vs Time
1
10
100
1980 1990 2000
Year
seek
s p
er s
eco
nd
ban
dw
idth
: MB
/s
0.1
1.
10.
Cap
acity
(GB
)
Disk accesses/second vs Time
1
10
100
1980 1990 2000
Year
Acc
esse
s p
er S
eco
nd
Storage Price vs TimeMegabytes per kilo-dollar
0.1
1.
10.
100.
1,000.
10,000.
1980 1990 2000
Year
MB
/k$
DRAM/disk media price ratio changed
1970-1990 100:1 1990-1995 10:1 1995-1997 50:1 today ~ 0.03$/MB disk 100:1
3$/MB dram
23
Data on Disk Can Move to RAM in 10 years
Storage Price vs TimeMegabytes per kilo-dollar
0.1
1.
10.
100.
1,000.
10,000.
1980 1990 2000
Year
MB
/k$
100:1
10 years
24
Kaps over time
1.E+0
1.E+1
1.E+2
1.E+3
1.E+4
1.E+5
1.E+6
1970 1980 1990 2000
Kap
s/$
10
100
1000
Kap
s/d
isk
Kaps
Kaps/$
More Kaps and Kaps/$ but….
Disk accesses got much less expensiveBetter disks
Cheaper disks!But: disk arms are expensivethe scarce resource45 minute Scanvs 5 minutes in 1990
100 GB
30 MB/s
25
Disk vs Tape
Disk 40 GB 20 MBps 5 ms seek time 3 ms rotate latency 7$/GB for drive
3$/GB for ctlrs/cabinet 4 TB/rack
1 hour scan
Tape 40 GB 10 MBps 10 sec pick time 30-120 second seek time 2$/GB for media
8$/GB for drive+library 10 TB/rack
1 week scanThe price advantage of tape is narrowing, and the performance advantage of disk is growingAt 10K$/TB, disk is competitive with nearline tape.
GuestimatesCern: 200 TB3480 tapes2 col = 50GBRack = 1 TB=20 drives
27
It’s Hard to Archive a PetabyteIt takes a LONG time to restore it.At 1GBps it takes 12 days!Store it in two (or more) places online (on disk?).
A geo-plexScrub it continuously (look for errors)On failure, use other copy until failure repaired, refresh lost copy from safe copy.
Can organize the two copies differently (e.g.: one by time, one by space)
28
The “Absurd” 10x (=5 year) Disk
2.5 hr scan time (poor sequential access)1 aps / 5 GB (VERY cold data)It’s a tape!
1 TB100 MB/s
200 Kaps
29
How to cool disk data:
Cache data in main memory See 5 minute rule later in presentation
Fewer-larger transfers Larger pages (512-> 8KB -> 256KB)
Sequential rather than random access Random 8KB IO is 1.5 MBps Sequential IO is 30 MBps (20:1 ratio is
growing)
Raid1 (mirroring) rather than Raid5 (parity).
30
Stripes, Mirrors, Parity (RAID 0,1, 5)
RAID 0: Stripes bandwidth
RAID 1: Mirrors, Shadows,… Fault tolerance Reads faster, writes 2x slower
RAID 5: Parity Fault tolerance Reads faster Writes 4x or 6x slower.
0,3,6,.. 1,4,7,.. 2,5,8,..
0,1,2,.. 0,1,2,..
0,2,P2,.. 1,P1,4,.. P0,3,5,..
31
RAID 10 (strips of mirrors) Wins“wastes space, saves arms”RAID 5 (6 disks 1 vol):
Performance 675 reads/sec 210 writes/sec Write
4 logical IO, 2 seek + 1.7 rotate
SAVES SPACEPerformance degrades on failure
RAID1 (6 disks, 3 pairs)
Performance 750 reads/sec 300 writes/sec Write
2 logical IO 2 seek 0.7 rotate
SAVES ARMSPerformance improves on failure
33
Auto Manage Storage1980 rule of thumb: A DataAdmin per 10GB, SysAdmin per mips
2000 rule of thumb A DataAdmin per 5TB SysAdmin per 100 clones (varies with app).
Problem: 5TB is 60k$ today, 10k$ in a few years.
Admin cost >> storage cost !!!!Challenge: Automate ALL storage admin tasks
34
Summarizing storage rules of thumb (1)
Moore’s law: 4x every 3 years 100x more per decade
Implies 2 bit of addressing every 3 years.Storage capacities increase 100x/decadeStorage costs drop 100x per decadeStorage throughput increases 10x/decadeData cools 10x/decadeDisk page sizes increase 5x per decade.
35
Summarizing storage rules of thumb (2)
RAM:Disk and Disk:Tape cost ratios are 100:1 and 3:1So, in 10 years, disk data can move to RAM since prices decline 100x per decade. A person can administer a million dollars of disk storage: that is 1TB - 100TB todayDisks are replacing tapes as backup devices.You can’t backup/restore a Petabyte quicklyso geoplex it.
Mirroring rather than Parity to save disk arms
36
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
37
Standard Architecture (today)
PCI Bus 2
System Bus
PCI Bus 1
38
Amdahl’s Balance Laws
parallelism law: If a computation has a serial part S and a parallel component P, then the maximum speedup is (S+P)/S.balanced system law: A system needs a bit of IO per second per instruction per second:about 8 MIPS per MBps.
memory law: =1: the MB/MIPS ratio (called alpha ()), in a balanced system is 1.IO law: Programs do one IO per 50,000 instructions.
39
Amdahl’s Laws Valid 35 Years Later?
Parallelism law is algebra: so SURE! Balanced system laws? Look at tpc results (tpcC, tpcH) at http://www.tpc.org/
Some imagination needed: What’s an instruction (CPI varies from 1-
3)? RISC, CISC, VLIW, … clocks per instruction,…
What’s an I/O?
40
Disks/ cpu
50
22
TPC systemsNormalize for CPI (clocks per instruction) TPC-C has about 7 ins/byte of IO TPC-H has 3 ins/byte of IO
TPC-H needs ½ as many disks, sequential vs randomBoth use 9GB 10 krpm disks (need arms, not bytes)
MHz/cpu
CPI mipsKB
/IO
IO/s/
disk
Disks
MB/s/
cpu
Ins/IO
Byte
Amdahl 1 1 1 6 8
TPC-C=random
550 2.1 262 8 100 397 40 7TPC-H= sequential
550 1.2 458 64 100 176 141 3
41
TPC systems: What’s alpha (=MB/MIPS)?Hard to say:
Intel 32 bit addressing (= 4GB limit). Known CPI.
IBM, HP, Sun have 64 GB limit. Unknown CPI.
Look at both, guess CPI for IBM, HP, Sun
Alpha is between 1 and 6Mips Memory Alpha
Amdahl 1 1 1tpcC Intel 8x262 = 2Gips 4GB 2tpcH Intel 8x458 = 4Gips 4GB 1tpcC IBM 24 cpus ?= 12 Gips 64GB 6tpcH HP 32 cpus ?= 16 Gips 32 GB 2
43
Amdahl’s Balance Laws Revised
Laws right, just need “interpretation” (imagination?)
Balanced System Law: A system needs 8 MIPS/MBpsIO, but instruction rate must be measured on the workload. Sequential workloads have low CPI (clocks per
instruction), random workloads tend to have higher CPI.
Alpha (the MB/MIPS ratio) is rising from 1 to 6. This trend will likely continue.One Random IO’s per 50k instructions. Sequential IOs are larger One sequential IO per 200k instructions
44
PAP vs RAP Peak Advertised Performance vs Real Application Performance
File System
ApplicationData
133 MBps
90 MBps
PCI
66 MBps25 MBps
Disks
SCSI160 MBps
90 MBps
1600 MBps500 MBps
System Bus 550 x4 Mips = 2 Bips1-3 cpi = 170-550 mips
CPU
PCI Bus 2
System Bus
PCI Bus 1
45
Outline
Moore’s Law and consequencesStorage rules of thumbBalanced systems rules revisitedNetworking rules of thumbCaching rules of thumb
47
1 GBps1 GBps
Ubiquitous 10 GBps SANs
in 5 years
1Gbps Ethernet are reality now. Also FiberChannel ,MyriNet, GigaNet,
ServerNet,, ATM,…
10 Gbps x4 WDM deployed now (OC192) 3 Tbps WDM working in lab
In 5 years, expect 10x, wow!!
5 MBps20 MBps
40 MBps
80 MBps
120 MBps120 MBps(1Gbps)(1Gbps)
48
Networking
WANS are getting faster than LANSG8 = OC192 = 8Gbps is “standard”Link bandwidth improves 4x per 3 yearsSpeed of light (60 ms round trip in US)Software stacks have always been the problem.
Time = SenderCPU + ReceiverCPU + bytes/bandwidth
This has been the problem
49
0
50
100
150
200
250
100Mbps Gbps SAN
Transmitreceivercpusender cpu
Time µs toSend 1KB
The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/
Yesterday: 10 MBps (100 Mbps Ethernet)
~20 MBps tcp/ip saturates 2 cpus
round-trip latency ~250 µs
Now Wires are 10x faster
Myrinet, Gbps Ethernet, ServerNet,…
Fast user-level communication tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us
1.6 Gbps demoed on a WAN
50
How much does wire-time cost?$/Mbyte?
Cost Time
Gbps Ethernet .2µ$ 10 ms100 Mbps Ethernet .3µ$ 100 msOC12 (650 Mbps) .003$ 20 msDSL .0006$ 25 secPOTs .002$ 200 secWireless: .80$ 500 sec
Seat cost$/3y
BandwidthB/s $/MB Time
GBpsE 2000 1.00E+08 2.E-07 0.010100MbpsE 700 1.00E+07 7.E-07 0.100OC12 12960000 5.00E+07 3.E-03 0.020OC3 3132000 3.00E+06 1.E-02 0.333T1 28800 1.00E+05 3.E-03 10.000DSL 2300 4.00E+04 6.E-04 25.000POTS 1180 5.00E+03 2.E-03 200.000Wireless ? 2.00E+03 8.E-01 500.000
seconds in 3 years 94608000
52
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
53
The Five Minute RuleTrade DRAM for Disk AccessesCost of an access (Drive_Cost / Access_per_second)Cost of a DRAM page ( $/MB/ pages_per_MB)Break even has two terms:Technology term and an Economic term
Grew page size to compensate for changing ratios.Now at 5 minutes for random, 10 seconds sequential
ofDRAMPricePerMB
skDrivePricePerDi
skecondPerDiAccessPerS
ofDRAMPagesPerMBtervaleferenceInBreakEvenR
54
Cost a RAM Page RAM_$_Per_MB
PagesPerMB
The 5 Minute Rule Derived
Breakeven: RAM_$_Per_MB = _____DiskPrice . PagesPerMB T x AccessesPerSecond
T = DiskPrice x PagesPerMB . RAM_$_Per_MB x AccessPerSecond
$
( )/
T
T =TimeBetweenReferences to Page
Disk Access Cost /T
DiskPrice .
AccessesPerSecond
55
Plugging in the Numbers
ofDRAMPricePerMB
skDrivePricePerDi
skecondPerDiAccessPerS
ofDRAMPagesPerMBtervaleferenceInBreakEvenR
PPM/aps disk$/Ram$ Break Even
Random 128/120 ~1
1000/3 ~300 5 minutes
Sequential
1/30 ~ .03 ~ 300 10second
s Trend is longer times because disk$ not changing much, RAM$ declining 100x/decade
5 Minutes & 10 second rule
56
When to Cache Web Pages.
Caching saves user timeCaching saves wire timeCaching costs storageCaching only works sometimes: New pages are a miss Stale pages are a miss
57
The 10 Instruction RuleSpend 10 instructions /second to save 1 byteCost of instruction:
I =ProcessorCost/MIPS*LifeTimeCost of byte:
B = RAM_$_Per_B/LifeTimeBreakeven:
NxI = B
N = B/I = (RAM_$_B X MIPS)/ ProcessorCost ~ (3E-6x5E8)/500 = 3 ins/B for Intel
~ (3E-6x3E8)/10 = 10 ins/B for ARM
58
Web Page Caching Saves People Time
Assume people cost 20$/hour (or .2 $/hr ???)Assume 20% hit in browser, 40% in proxy Assume 3 second server timeCaching saves people time
28$/year to 150$/year of people time or .28 cents to 1.5$/year.
connection cacheR_remoteseconds
R_localseconds
Hhit rate
People Savings¢/page
LAN proxy 3 0.3 0.4 0.6
LAN browser 3 0.1 0.2 0.3
Modem proxy 5 2 0.4 0.7
Modem browser 5 0.1 0.2 0.5
Mobile proxy 13 10 0.4 0.7
Mobile browser 13 0.1 0.2 1.4
59
Web Page Caching Saves Resources
Wire cost is penny (wireless) to 100µ$ LAN
Storage is 8 µ$/mo
Breakeven: wire cost = storage rent4 to 7 months
Add people cost: breakeven is ~ 4 years.“cheap people” (.2$/hr) 6 to 8 months.A
$/10 KB
download
network
B
$/10 KB
storage/mo
Time = A/B
Break-even
cache
storage time
C
People Cost
of download
$
Time =
(A+ C )/B
Break Even
Internet/LAN 1.E-04 8.E-06 18 months 0.02 15 yearsModem 2.E-04 8.E-06 36 months 0.03 21 yearsWireless 1.E-02 2.E-04 300 years 0.07 >999 years
60
Caching Disk caching 5 minute rule for random IO 11 second rule for sequential IO
Web page caching: If page will be re-referenced in
18 months: with free users 15 years: with valuable usersthen cache the page in the client/proxy.
Challenge: guessing which pages will be re-referenceddetecting stale pages (page velocity)
61
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb