Oracle Exadata Explained UKOUG Technology and E-Business suite 2011 Frits Hoogland Friday, June 22, 12
Oracle Exadata ExplainedUKOUG Technology and E-Business suite 2011
Frits Hoogland
Friday, June 22, 12
Who am I? Frits Hoogland
– Working with Oracle products since 1996– Working with VX Company since 2009
Interests– Databases, Operating Systems, Application Servers– Web techniques, TCP/IP, network security– Technical security, performance
Twitter: @fritshoogland Blog: http://fritshoogland.wordpress.com Email: [email protected] Oracle ACE Director OakTable member
Friday, June 22, 12
Architecture Specifications Physical properties The tale of scanning a table on exadata Conclusion
Agenda
Friday, June 22, 12
4
dm01db04
dm01db03
dm01db02
dm01db01
dm01cel07
dm01cel06
dm01cel05
dm01cel04
dm01cel03
dm01cel02
dm01cel01
Friday, June 22, 12
5
Architecture
DatabaseNode
DatabaseNode
DatabaseNode
DatabaseNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
InfinibandSwitch
InfinibandSwitch
Upper half
Lower half
Friday, June 22, 12
6
Architecture Switched fabric communications link
– High throughput, low latency Mellanox ConnectX QDR HCA
– QDR 4x : 40Gb/s each direction– RDMA– end-to-end latency 1.07 microseconds
Friday, June 22, 12
7
Architecture Upper half / database nodes
– Sun hardware, Intel architecture, 64 bit– OEL5, 64 bit
- TCP/IP, ssh, bash– Oracle database 11.2.0.x
- Listener, sqlplus, instance- A normal (RAC) database- PX, partitioning
– ASM– Clusterware
Friday, June 22, 12
8
Architecture Lower half / storage nodes
– Sun hardware, Intel architecture, 64 bit– OEL5, 64 bit
- TCP/IP, ssh, bash- No modifications/installations permitted
– Storage server- Cell server- Dedicated storage for Oracle databases- Ability to offload database processing
Friday, June 22, 12
9
Specifications2008
2009
2010
?
V1
V2
X2-2 & X2-8
Friday, June 22, 12
10
Specifications / V2 Database: Sun X4170 (1U)
– 2 Quad-core Intel Xeon E5540 ‘Nehalem’– Infiniband QDR dual port card– 72 GB DRAM– 4 1GB NIC– ILOM
Storage: Sun X4275 (2U)– 2 Quad-core Intel Xeon E5540 ‘Nehalem’– Infiniband QDR dual port card– 24 GB DRAM– 4 1GB NIC– 4 96GB Flash PCIe cards– ILOM
Friday, June 22, 12
11
Specifications / X2-2 Database: Sun X4170M2 (1U)
– 2 Six-core Intel Xeon X5670 ‘Westmere’– Infiniband QDR dual port chard – 96 GB DRAM– 4 1GB NIC + 2 10GB NIC– ILOM
Storage: Sun X4270M2 (2U)– 2 Six-core Intel Xeon L5640 ‘Westmere’– Infiniband QDR dual port card– 24 GB DRAM– 4 1GB NIC– 4 96GB Flash PCIe cards– ILOM
Friday, June 22, 12
12
Specifications / X2-8 Database: Sun X4800 (5U)
– 8 Eight-core Intel Xeon X7560 ‘Nehalem-EX’– 8 Infiniband QDR – 1 TB DRAM– 8 1GB NIC + 8 10GB NIC– ILOM
Storage: Sun X4270M2 (2U)– 2 Six-core Intel Xeon L5640 ‘Westmere’– Infiniband QDR dual port card– 24 GB DRAM– 4 1GB NIC– 4 96GB Flash PCIe cards– ILOM
Friday, June 22, 12
Storage is accessed using ASM−Database knows it’s Exadata by looking at the
path
−Normal or high redundancy
SQL> select path from v$asm_disk;
PATH-------------------------------------o/192.168.100.5/DATA_CD_00_dm01cel01...
13
ASM
Friday, June 22, 12
14
ASM How does ASM know how to find ‘grid disks’?
[oracle@dm01db01 [+ASM1] ~]$ cat $ORACLE_HOME/gpnp/profiles/peer/profile.xml ...<orcl:ASM-Profile id="asm" DiscoveryString="o/*/*" SPFile="+SYSTEMDG/dm01-cluster/asmparameterfile/registry.253.726594217"/>...
Friday, June 22, 12
15
Physical properties / disk Disks in cells:CellCLI> list physicaldisk detail name: 20:0 deviceId: 21 diskType: HardDisk
enclosureDeviceId: 20 errMediaCount: 0 errOtherCount: 0 foreignState: false id: E0LB2J luns: 0_0 makeModel: "SEAGATE ST360057SSUN600G“ physicalFirmware: 0605 physicalInsertTime: 2010-07-20T11:36:31+02:00 physicalInterface: sas physicalSerial: E0LB2J physicalSize: 558.9109999993816G ...
Friday, June 22, 12
16
Physical properties / disk This means:
– Disk type: SEAGATE ST360057S- Seagate Cheetah 15K.7 disk with SAS interface- Read IOPS=186
Single disk = 186MB/s (186x1MB)– Theoretical throughput:
- Cell has 12 disks: 12x186= 2.18GB/s– Observed in real life:
- approximately 125MB/s per disk- Cell has 12 disks: 12x125= 1.5GB/s
Friday, June 22, 12
17
Physical properties / Flash Typically flash cache stores single blocks
– Alias blocks which the cell server considers worth caching- Controlfiles, fileheaders, small IO’s (<128kB)
– These are mostly 8kB blocks– 1MB blocks if CELL_FLASH_CACHE is set to KEEP
8kB block (flashcache) performance:– 60K IOPS * 4 = 1.8 GB/s per cell
1MB block (flashdisk) performance:– 1,092 IOPS * 4 = 4.3 GB/s per cell– Oracle flash whitepaper says 3.6 GB/s
Friday, June 22, 12
18
Physical properties / Flash Combining disk and flash a single cell can
generate:
– Disk: 1.5 GB/s (theoretical 2.1GB/s)– Flash: ~1.8 GB/s (cache)– Flash: 3.5 GB/s (disk, theoretical: 4.3GB/s)
– 1.5+1.8= 3.3 GB/s (with flash as cache)– 1.5+3.5= 5.0 GB/s (with flash as disk)
Friday, June 22, 12
19
Physical properties / IB Server
– PCIe 2.0 bus: 500MB/s per lane- Disk controller can use 8 lanes (8x500MB=4GB/s)- Flash card can use 8 lanes- Infiniband card can use 8 lanes
Infiniband– Theoretical bandwidth 40Gb/s (5GB/s) up&down– Point-to-point bandwidth 2.5GB/s (one way)
- Network traffic is bi-directional, so total b/w will exceed 2.5
- See ‘Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate Infiniband’: - www.hoti.org/archive/2008papers/2008_S3_3.pdf
Friday, June 22, 12
20
Physical propertiesBandwidth
DatabaseNode
DatabaseNode
DatabaseNode
DatabaseNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
InfinibandSwitch
InfinibandSwitch
Upper half
Lower half Storage data generation capacity: Disk: 1.5 GB/s (2.1 GB/s) Flash: 1.8 GB/s (4.3 GB/s)
Infiniband: 2.5 GB/s
Infiniband: 2.5 GB/s
Friday, June 22, 12
21
The tale of scanning a table on Exadata
Data is a sample DNA variations set Table: CG_VAR
– Size: 133’425’004’544 – Extents: 2’228– Blocks: 16’287’232
No indexes No constraints
Friday, June 22, 12
22
The tale of scanning a table on Exadata
Exadata has 4 unique features:– IORM– Storage indexes– Smart scans– Exadata Hybrid Column Compression (EHCC)
Friday, June 22, 12
23
The tale of scanning a table on Exadata
Smart scan
– Disable smart scans:
– Enable smart scans:
– Smart scans are enabled by default on exadata.
– A smart scan returns RESULT SETS instead of blocks- Most result sets are much smaller than data in blocks- Depending on query
SQL> alter session set cell_offload_processing=false;
SQL> alter session set cell_offload_processing=true;
Friday, June 22, 12
24
The tale of scanning a table on Exadata
Smart scan– Smart scans are considered for:
- Full table scans- Fast full index scans
– Observed working of smart scan:- Foreground identifies cell servers needed for the object.- Foreground initiates a send and a receive channel to every
cell server.- Foreground sends smart scan requests (enough to keep
receiving).- Foreground receives data.
– This way there is no disk latency penalty.
Friday, June 22, 12
25
The tale of scanning a table on Exadata
Exadata Hybrid Columnar Compression (EHCC)
– Data is ordered in ‘compression units’.– Compression unit consists of multiple blocks.– Two types of EHCC:
- Query mode (lesser compression, make queries go faster)- Archive mode (most compression, save as much space as
possible)
Friday, June 22, 12
26
The tale of scanning a table on Exadata
In all cases the following SQL is executed:
The query-DOP is altered using the ‘parallel hint’ – Query looks like:
SQL> select count(*) from cg_var;
SQL> select /*+parallel(64)*/ count(*) from cg_var;
Friday, June 22, 12
The tale of scanning a table on Exadata
Let’s have a look at the table’s involved!
27
Friday, June 22, 12
28
Friday, June 22, 12
29
Nr Exadata features
Parallel Disk type
1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD
Friday, June 22, 12
30
The tale of scanning a table on Exadata
Response time: 695 seconds– All 130GB is read and send to upper layer– 127,238 IO’s– Time profile:
– Restriction: Disk (652/127238=0.005)
Nr Exadata features
Parallel Disk type
1 - Serial HDD
direct path read 652DB CPU 45total time 695
Oracle database 11.2.0.1and
Oracle cell 11.2.1.3.1
Friday, June 22, 12
31
Friday, June 22, 12
32
Friday, June 22, 12
33
The tale of scanning a table on Exadata
Response time: 153 seconds– All 130GB is read and sent to upper layer– 127,238 IO’s– Time profile:
– Restriction: Disk
Nr Exadata features
Parallel Disk type
1b - Serial HDD
direct path read 108DB CPU 45total time 153
Friday, June 22, 12
34
The tale of scanning a table on Exadata
Response time: 403 seconds– All 130GB is read and sent to upper layer– 127,238 IO’s– Time profile:
– Restriction: Disk (359/127238=0.002)
Nr Exadata features
Parallel Disk type
2 - Serial FDD
direct path read 359DB CPU 44total time 403
Oracle database 11.2.0.1and
Oracle cell 11.2.1.3.1
Friday, June 22, 12
35
Friday, June 22, 12
36
Friday, June 22, 12
37
The tale of scanning a table on Exadata
Response time: 91 seconds– All 130GB is read and sent to upper layer– 127,238 IO’s– Time profile:
– Restriction: Even between disk and CPU.
Nr Exadata features
Parallel Disk type
2b - Serial FDD
direct path read 41DB CPU 50total time 91
Friday, June 22, 12
38
Friday, June 22, 12
39
The tale of scanning a table on Exadata
Response time: 18 seconds– All 130GB is read and sent to upper layer– Time profile:
– Restriction: Disk- 130GB/4=32.5GB/18=1.81GB/s/n ;
4GbFC=32.5/0.4=81.25s
Nr Exadata features
Parallel Disk type
3 - 64 HDD
direct path read 256DB CPU 16total time 272
Friday, June 22, 12
40
Friday, June 22, 12
41
The tale of scanning a table on Exadata
Response time: 13 seconds– All 130GB is read and sent to upper layer– Time profile:
– Restriction: Disk
Nr Exadata features
Parallel Disk type
4 - 64 FDD
direct path read 182DB CPU 18total time 200
Friday, June 22, 12
42
Friday, June 22, 12
43
Friday, June 22, 12
44
The tale of scanning a table on Exadata
Response time: 45 seconds (70% reduction)– All 130GB is read and 19 GB sent to upper layer– Time profile:
– Restriction: CPU
Nr Exadata features
Parallel Disk type
5 SS Serial HDD
cell smart table scan 8DB CPU 37total time 45
Friday, June 22, 12
45
Friday, June 22, 12
46
Friday, June 22, 12
47
The tale of scanning a table on Exadata
Response time: 41 seconds (55% reduction)– All 130GB is read and 19 GB sent to upper layer– Time profile:
– Restriction: CPU
Nr Exadata features
Parallel Disk type
6 SS Serial FDD
cell smart table scan 3DB CPU 38total time 41
Friday, June 22, 12
48
Friday, June 22, 12
49
The tale of scanning a table on Exadata
Response time: 13 seconds (28% reduction)– All 130GB is read and 19 GB sent to upper layer– Time profile:
– Restriction: Disk- 130GB/7=18.6GB/13=1.43GB/s
Nr Exadata features
Parallel Disk type
7 SS 64 HDD
cell smart table scan 168DB CPU 14total time 182
Friday, June 22, 12
50
Friday, June 22, 12
51
The tale of scanning a table on Exadata
Response time: 6 seconds (46% reduction)– All 130GB is read and 19 GB sent to upper layer– Time profile:
– Restriction: Disk- 130GB/6=21.6GB/7=3.1GB/s
Nr Exadata features
Parallel Disk type
8 SS 64 FDD
cell smart table scan 65DB CPU 14total time 79
Friday, June 22, 12
52
Friday, June 22, 12
53
The tale of scanning a table on Exadata
Response time: 1 second– EHCC Query compression: 130GB is reduced to
11GB– All 11GB is read and 260MB sent to upper layer– Time profile:
– Restriction: CPU
Nr Exadata features
Parallel Disk type
9 SS + EHCC 64 FDD
cell smart table scan 4DB CPU 10total time 12
Friday, June 22, 12
0 175 350 525 700
695
403
19
16
153
91
18
13
41
37
13
61
Serial HDD
Serial FDD
PX64 HDD
PX64 FDD
54
Old New SS SS+EHCC
Friday, June 22, 12
55
Conclusion Exadata hardware can give performance
boost Infiniband removes 1Gb/s or FC bottleneck Exadata features can give huge performance
boost– Smart scans– Exadata Hybrid Column Compression (EHCC)– Storage indexes
Performance tuning is a delicate task
Friday, June 22, 12
Friday, June 22, 12
57
Friday, June 22, 12
58
Friday, June 22, 12