Mar 26, 2015
Storage Performance for Data Warehousing
About Joe ChangAbout Joe Chang
SQL Server Execution Plan Cost Model
True cost structure by system architecture
Decoding statblob (distribution statistics)
SQL Clone – statistics-only database
ToolsExecStats – cross-reference index use by SQL-execution plan
Performance Monitoring,
Profiler/Trace aggregation
StorageStorage
Organization StructureOrganization Structure
In many large IT departmentsDB and Storage are in separate groups
Storage usually has own objectivesBring all storage into one big system under full management (read: control)
Storage as a Service, in the CloudOne size fits all needs
Usually have zero DB knowledge
Of course we do high bandwidth, 600MB/sec good enough for you?
Data Warehouse StorageData Warehouse Storage
OLTP – Throughput with Fast Response
DW – Flood the queues for maximum through-put
Do not use shared storage for data warehouse!Storage system vendors like to give the impression the SAN is a magical, immensely powerful box that can meet all your needs. Just tell us how much capacity you need and don’t worry about anything else.My advice: stay away from shared storage, controlled by different team.
Nominal and Net BandwidthNominal and Net Bandwidth
PCI-E Gen 2 – 5 Gbit/sec signalingx8 = 5GB/s, net BW 4GB/s, x4 = 2GB/s net
SAS 6Gbit/s – 6 Gbit/s x4 port: 3GB/s nominal, 2.2GB/sec net?
Fibre Channel 8 Gbit/s nominal780GB/s point-to-point,
680MB/s from host to SAN to back-end loop
SAS RAID Controller, x8 PCI-E G2, 2 x4 6G
2.8GB/s
Depends on the controller, will change!
Storage Storage –– SAS Direct-Attach SAS Direct-Attach
Many Fat Pipes
Very Many Disks
Balance by pipe bandwidth
Don’t forget fat network pipes
Option A: 24-disks in one enclosure for each x4 SAS port. Two x4 SAS ports per controller
Option B: Split enclosure over 2 x4 SAS ports, 1 controller
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RAIDPCI-E x8
SAS x4
SAS x4
RAIDPCI-E x8
SAS x4
SAS x4
RAIDPCI-E x8
SAS x4
SAS x4
RAIDPCI-E x8
SAS x4
SAS x4
PCI-E x4
PCI-E x4
RAID
2 x10GbEPCI-E x4 2 x10GbE
SAS x4
Storage Storage –– FC/SAN FC/SAN
PCI-E x8 Gen 2 Slot with quad-port 8Gb FC
If 8Gb quad-port is not supported, consider system with many x4 slots, or consider SAS!
SAN systems typically offer 3.5in 15-disk enclosures. Difficult to get high spindle count with density.
1-2 15-disk enclosures per 8Gb FC port, 20-30MB/s per disk?
2 x10GbE
PCI-E x4 2 x10GbEPCI-E x4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
HBAPCI-E x8
HBAPCI-E x8
8Gb FC
8Gb FC
8Gb FC
8Gb FC
8Gb FC
8Gb FC
8Gb FC
8Gb FC
HBAPCI-E x4
PCI-E x4 HBA
8Gb FC
8Gb FC
8Gb FC
8Gb FC
HBAPCI-E x48Gb FC
8Gb FC
PCI-E x4 HBA8Gb FC
8Gb FC
PCI-E x4 HBA8Gb FC
8Gb FC
Storage Storage –– SSD / HDD Hybrid SSD / HDD Hybrid
Log: Single DB – HDD, unless rollbacks or T-log backups disrupts log writes. Multi DB – SSD, otherwise to many RAID1 pairs to logs
Storage enclosures typically 12 disks per channel. Can only support bandwidth of a few SSD. Use remaining bays for extra storage with HDD. No point expending valuable SSD space for backups and flat files
No RAID w/SSD?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
SASPCI-E x8
SAS x4
SAS x4
SASPCI-E x8
SAS x4
SAS x4
SASPCI-E x8
SAS x4
SAS x4
SASPCI-E x8
SAS x4
SAS x4
PCI-E x4
PCI-E x4
RAID
2 x10GbEPCI-E x4 2 x10GbE
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SAS x4
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
SS
D
PDWPDW
Gig
abit E
thern
et
Infin
i-Ban
d
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
SPSP FC FC
Fib
er Ch
ann
el
Control
Management
Landing Zone
Backup Node
Compute Nodes
ODM X2-2ODM X2-2
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSD
IB IBSSD SSD SSD SSDDatabase Server
Exadata Storage
SAS
SAS x4
SAS x4RAID 2 x10GbE
PCI-E x4 2 x10GbE
SAS x4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
SPSP FC FC
SPFCSP FC
SPSP FC FC
IB IBSSD SSD SSD SSD
SSDSSD
Current: mostly 3Gbps SAS/SATA SDD
Some 6Gbps SATA SSD
Fusion IO – direct PCI-E Gen2 interface
320GB-1.2TB capacity, 200K IOPS, 1.5GB/s
No RAID ?HDD is fundamentally a single point failure
SDD could be built with redundant components
HP report problems with SSD on RAID controllers, Fujitsu did not?
Big DW Storage – iSCSI Big DW Storage – iSCSI
Are you nuts?
Well, maybe if you like frequent long coffee-cigarette breaks
Storage Configuration - ArraysStorage Configuration - Arrays
Shown:two 12-disk Arrays per 24-disk enclosure
Options: between 6-16 disks per array
SAN systems may recommend R10 4+4 or R5 7+1
Very Many Spindles Comment on Meta LUN
Data Consumption Rate: XeonData Consumption Rate: Xeon
TPC-H Query 1 Lineitem scan, SF1 1GB, 2k8 875M
Data consumption rate is much higher for current generation Nehalem and Westmere processors than Core 2 referenced in Microsoft FTDW document. TPC-H Q1 is more compute intensive than the FTDW light query.
ProcessorsTotalCores
Q1sec
SQLTotalMB/s
MB/sper coreGHz
MemGB
SF
2 Xeon 5355
2 Xeon 5570
2 Xeon 5680
8
8
12
85.4
42.2
21.0
5sp2
8sp1
8r2
1,165.5
2,073.5
4,166.7
145.7
259.2
347.2
2.66
2.93
3.33
64
144
192
100
100
100
4 Xeon 7560 32 37.28r2 7,056.5 220.52.26 640 300
Nehalem
Westmere
Neh.-EX
Conroe
8 Xeon 7560 64 183.88r2 14,282 223.22.26 512 3000
Data Consumption Rate: OpteronData Consumption Rate: Opteron
Expected Istanbul to have better performance per core than Shanghai due to HT Assist. Magny-Cours has much better performance per core! (at 2.3GHz versus 2.8 for Istanbul), or is this Win/SQL 2K8 R2?
TPC-H Query 1 Lineitem scan, SF1 1GB, 2k8 875M
ProcessorsTotalCores
Q1sec
SQLTotalMB/s
MB/sper coreGHz
MemGB
SF
4 Opt 8220
8 Opt 8360
8
32
309.7
91.4
5rtm
8rtm
868.7
2,872.0
121.1
89.7
2.8
2.5
128
256
300
300
8 Opt 8384 32 72.58rtm 3,620.7 113.22.7 256 300
8 Opt 8439 48 49.08sp1 5,357.1 111.62.8 256 300
Barcelona
Shanghai
Istanbul
2 Opt 6176 24 20.28r2 4,331.7 180.52.3 192 100 Magny-C
4 Opt 6176 48 31.88r2 8,254.7 172.02.3 512 300 -
8 Opt 8439 48 166.98rtm 5,242.7 109.22.8 512 1000
Data Consumption RateData Consumption RateTPC-H Query 1 Lineitem scan, SF1 1GB, 2k8 875M
ProcessorsTotalCores
Q1sec
SQLTotalMB/s
MB/sper coreGHz
MemGB
SF
2 Xeon 5355
2 Xeon 5570
2 Xeon 5680
2 Opt 6176
8
8
12
24
85.4
42.2
21.0
20.2
5sp2
8sp1
8r2
8r2
1165.5
2073.5
4166.7
4331.7
145.7
259.2
347.2
180.5
2.66
2.93
3.33
2.3
64
144
192
192
100
100
100
100
4 Opt 8220
8 Opt 8360
8
32
309.7
91.4
5rtm
8rtm
868.7
2872.0
121.1
89.7
2.8
2.5
128
256
300
300
8 Opt 8384
8 Opt 8439
32
48
72.5
49.0
8rtm
8sp1
3620.7
5357.1
113.2
111.6
2.7
2.8
256
256
300
300
4 Opt 6176 48 31.88r2 8254.7 172.02.3 512 300
8 Xeon 7560 64 183.88r2 14282 223.22.26 512 3000
Barcelona
Shanghai
Istanbul
Magny-C
Storage Targets Storage Targets
Processors
2 Xeon X5680
4 Opt 6176
4 Xeon X7560
8 Xeon X7560
Total Cores
12
48
32
64
PCI-Ex8-x4
5 - 1
5 - 1
6 - 4
9 - 5
SASHBA
2
4
6
11†
Storage Units/Disks
2 - 48
4 - 96
6 - 144
10 - 240
Actual Bandwidth
5 GB/s
10 GB/s
15 GB/s
26 GB/s
† 8-way : 9 controllers in x8 slots, 24 disks per x4 SAS port 2 controllers in x4 slots, 12 disk
24 15K disks per enclosure, 12 disks per x4 SAS port requires 100MB/sec per disk,
possible but not always practical24 disks per x4 SAS port requires 50MB/sec,
more achievable in practice
2U disk enclosure 24 x 73GB 15K 2.5in disks $14K, $600 per disk
BWCore
350
175
250
225
Target MB/s
4200
8400
8000
14400
StorageUnits/Disks
4 - 96
8 - 192
12 - 288
20 - 480
Think: Shortest path to metal (iron-oxide)
Your Storage and the Optimizer Your Storage and the Optimizer
Assumptions2.8GB/sec per SAS 2 x4 Adapter, Could be 3.2GB/sec per PCI-E G2 x8HDD 400 IOPS per disk – Big query key lookup, loop join at high queue, and short-stroked, possible skip-seek. SSD 35,000 IOPS
Sequential IOPS
1,350
350,000
350,000
Model
Optimizer
SAS 2x4
SAS 2x4
Disks
-
24
48
BW (KB/s)
10,800
2,800,000
2,800,000
“Random” IOPS
320
9,600
19,200
Sequential- Rand IO ratio
4.22
36.5
18.2
45,000FC 4G 30 360,000 12,000 3.75
350,000SSD 8 2,800,000 280,000 1.25
The SQL Server Query Optimizer make key lookup versus table scan decisions based on a 4.22 sequential-to-random IO ratioA DW configured storage system has a 18-36 ratio, 30 disks per 4G FC about matches the QO, SSD is in the other direction
0
50
100
150
200
250
300
350
400
450
Q1 Q9 Q18 Q21
X 5355 5sp2 X 5570 8sp1
X 5680 8R2 O 6176 8R2
Data Consumption RatesData Consumption Rates
0
50
100
150
200
250
300
Q1 Q9 Q18 Q21
O DC 2.8G 128 5rtm O QC 2.5G 256 8rtm
O QC 2.7G 256 8rtm O 6C 2.8G 256 8sp1
O 12C 2.3G 512 8R2 X7560 R2 640
TPC-H SF100Query 1, 9, 13, 21
TPC-H SF300Query 1, 9, 13, 21
Fast Track Reference ArchitectureFast Track Reference Architecture
Several Expensive SAN systems (11 disks)
Each must be configured independently
$1,500-2,000 amortized per disk
Too many 2-disk Arrays2 LUN per Array, too many data files
Build Indexes with MAXDOP 1 Is this brain dead?
Designed around 100MB/sec per diskNot all DW is single scan, or sequential
My Complaints
Scripting?
FragmentationFragmentation
Weak Storage System 1) Fragmentation could degrade IO performance,2) Defragmenting very large table on a weak storage system could render the database marginally to completely non-functional for a very long time.
Powerful Storage System3) Fragmentation has very little impact.4) Defragmenting has mild impact, and completes within night time window.
What is the correct conclusion?
File
Partition
LUN
Disk
Table
Operating System View of Operating System View of StorageStorage
Operating System Disk ViewOperating System Disk View
Controller 1 Port 0
Controller 1 Port 1
Disk 2Basic396GBOnline
Disk 3Basic396GBOnline
Controller 2 Port 0
Controller 2 Port 1
Disk 4Basic396GBOnline
Disk 5Basic396GBOnline
Controller 3 Port 0
Controller 3 Port 1
Disk 6Basic396GBOnline
Disk 7Basic396GBOnline
Additional disks not shown, Disk 0 is boot drive, 1 – install source?
File LayoutFile Layout
Disk 2, Partition 0
File Group for the big TableFile 1
Partition 1
File Group for all othersFile 1
Partition 2
TempdbFile 1
Partition 4
Backup and Load File 1
Disk 3 Partition 0
File Group for the big TableFile 2
Partition 1
Small File GroupFile 2
Partition 2
TempdbFile 2
Partition 4
Backup and LoadFile 2
Disk 4 Partition 0
File Group for the big TableFile 3
Partition 1
Small File GroupFile 3
Partition 2
TempdbFile 3
Partition 4
Backup and Load File 3
Disk 5 Partition 0
File Group for the big TableFile 4
Partition 1
Small File GroupFile 4
Partition 2
TempdbFile 4
Partition 4
Backup and Load File 4
Disk 6 Partition 0
File Group for the big TableFile 5
Partition 1
Small File GroupFile 5
Partition 2
TempdbFile 5
Partition 4
Backup and Load File 5
Disk 7 Partition 0
File Group for the big TableFile 6
Partition 1
Small File GroupFile 6
Partition 2
TempdbFile 6
Partition 4
Backup and Load File 6
Each File Group is distributed across all data disks
Log disks not shown, tempdb share common pool with data
File Groups and FilesFile Groups and Files
Dedicated File Group for largest table
Never defragment
One file group for all other regular tables
Load file group?Rebuild indexes to different file group
Partitioning - PitfallsPartitioning - Pitfalls
Common Partitioning Strategy
Partition Scheme maps partitions to File Groups
What happens in a table scan? Read first from Part 1 then 2, then 3, … ?
SQL 2008 HF to read from each partition in parallel?What if partitions have disparate sizes?
Disk 2
File Group 1
Disk 3
File Group 2
Disk 4
File Group 3
Disk 5
File Group 4
Disk 6
File Group 5
Disk 7
File Group 6
Table Partition 1
Table Partition 2
Table Partition 3
Table Partition 4
Table Partition 5
Table Partition 6