Top Banner
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc
59

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Dec 29, 2015

Download

Documents

Jonas Dawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation

Kevin Closson

Chief Software Architect

Oracle Platform Solutions, Polyserve Inc

Page 2: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

The Un-”Show Stopper”• NAS for Oracle is not “file serving”, let me explain…

• Think of GbE NFS I/O paths from Oracle Servers to the NAS device that are totally direct. No VLANing sort of indirection.

– In these terms, NFS over GbE is just a protocol as is FCPover FiberChannel

– The proof is in the numbers.• A single dual-socket/dual-core ADM server running Oracle10gR2 can push through

273MB/s of large I/Os (scattered reads, direct path read/write, etc) of triple-bonded GbE NICs!

• Compare that to infrastructure and HW costs of 4GbE FCP (~450MB/s, but you need 2 cards for redundancy)

– OLTP over modern NFS with GbE is not a challenging I/O profile.

• However, not all NAS devices are created equal by any means

Page 3: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Agenda

• Oracle on NAS

• NAS Architecture

• Proof of Concept Testing

• Special Characteristics

Page 4: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Oracle on NAS

Page 5: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Oracle on NAS

• Connectivity– Fantasyland Dream Grid™ would be nearly impossible with FibreChannel

switched fabric, for instance:• 128 nodes == 256 HBAs, 2 switches each with 256 ports just for the servers then you

have to work out storage paths

• Simplicity– NFS is simple. Anyone with a pulse can plug in cat-5 and mount filesystems.– MUCH MUCH MUCH MUCH MUCH simpler than:

• Raw partitions for ASM• Raw, OCFS2 for CRS• Oracle Home? Local Ext3 or UFS?• What a mess

– Supports shared Oracle Home, shared APPL_TOP too– But not simpler than a Certified Third Party Cluster Filesystem , but that is a

different presentation• Cost

– FC HBAs are always going to be more expensive than NICs– Ports on enterprise-level FC switches are very expensive

Page 6: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Oracle on NAS

• NFS Client Improvements– Direct IO

• open(,O_DIRECT,) works with Linux NFS clients, Solaris NFS client, likely others

• Oracle Improvements• init.ora filesystemio_options=directIO• No async I/O on NFS, but look at the numbers• Oracle runtime checks mount options

• Caveat: It doesn’t always get it right, but at least it tries (OSDS)• Don’t be surprised to see Oracle offer a platform-independent NFS client

• NFS V4 will have more improvements

Page 7: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

NAS Architecture

Page 8: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

NAS Architecture• Single-headed Filers

• Clustered Single-headed Filers

• Asymmetrical Multi-headed NAS

• Symmetrical Multi-headed NAS

Page 9: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Single Headed Filer Architecture

Page 10: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

NAS Architecture: Single-headed Filer

Filesystems/u01/u02/u03

GigE Network

Page 11: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Oracle Database Servers

Filesystems/u01/u02/u03

A single one of these…

Has the same (or more) bus bandwidth

as this!

Oracle Servers Accessing a Single-headed Filer: I/O Bottleneck

I/O Bottleneck

Page 12: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Oracle Servers Accessing a Single-headed Filer: Single Point of Failure

Oracle Database Servers

Filesystems/u01/u02/u03

Single Point of Failure

Highly Available through failover-HA,DataGuard, RAC, etc

Page 13: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Clustered Single-headed Filers

Page 14: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Architecture: Cluster of Single-headed Filers

Filesystems/u01/u02

Filesystems/u03

Paths Active AfterFailover

Page 15: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Oracle Servers Accessing a Cluster of Single-headed Filers

Filesystems/u01/u02

Filesystems/u03

Paths Active AfterFailover

Oracle Database Servers

Page 16: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Architecture: Cluster of Single-headed Filers

Filesystems/u01/u02

Filesystems/u03

Paths Active AfterFailover

Oracle Database Servers

What if /u03 I/O saturates this Filer?

Page 17: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Filer I/O Bottleneck. Resolution == Data Migration

Filesystems/u01/u02

Filesystems/u03

Paths Active AfterFailover

Oracle Database Servers

Filesystems/u04

Migrate some of the “hot” data to /u04

Page 18: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Data Migration Remedies I/O Bottleneck

Filesystems/u01/u02

Filesystems/u03

Paths Active AfterFailover

Oracle Database Servers

Filesystems/u04

Migrate some of the “hot” data to /u04

NEW Single Point of Failure

Page 19: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Summary: Single-headed Filers

• Cluster to mitigate S.P.O.F– Clustering is a pure afterthought with filers– Failover Times?

• Long, really really long. – Transparent?

• Not in many cases.• Migrate data to mitigate I/O bottlenecks

– What if the data “hot spot” moves with time? The Dog Chasing His Tail Syndrome

• Poor Modularity• Expanded by pairs for data availability• What’s all this talk about CNS?

Page 20: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Asymmetrical Multi-headed NAS Architecture

Page 21: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Asymmetrical Multi-headed NAS Architecture

FibreChannel SAN

Three Active NAS Heads / Three For Failover and

“Pools of Data”

Note: Some variants of this architecture support M:1 Active:Standbybut that doesn’t really change much.

Oracle Database Servers

SAN Gateway

Page 22: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Asymmetrical NAS Gateway Architecture

• Really not much different than clusters of single-headed filers:

– 1 NAS head to 1 filesystem relationship

– Migrate data to mitigate I/O contention

– Failover not transparent

• But:

– More Modular

• Not necessary to scale up by pairs

Page 23: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Symmetric Multi-headed NAS

Page 24: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

HP Enterprise File Services Clustered Gateway

Page 25: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Symmetric vs Asymmetric

NASHead

NASHead

NASHead

/Dir1/File1 /Dir2/File2 /Dir3/File3

/Dir1/File1 /Dir2/File2 /Dir3/File3

/Dir3/File3/Dir2/File2

NAS Head

NAS Head

NAS Head

/Dir1/File1

/Dir1/File1

/Dir2/File2

/Dir3/File3

/Dir3/File3/Dir2/File2

/Dir1/File1

/Dir2/File2

/Dir1/File1

EFS-CG

Page 26: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Enterprise File Services Clustered Gateway Component Overview

• Cluster Volume Manager– RAID 0– Expand Online

• Fully Distributed, Symmetric Cluster Filesystem– The embedded filesystem is a fully distributed, symmetric cluster filesystem

• Virtual NFS Services– Filesystems are presented through Virtual NFS Services

• Modular and Scalable– Add NAS heads without interruption– All filesystems can be presented for read/write through any/all NAS heads

Page 27: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Clustered Volume Manager

• RAID 0 – LUNS are RAID 1, so this implements S.A.M.E.

• Expand online– Add LUNS, grow volume

• Up to 16TB– Single Volume

Page 28: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

The EFS-CG Filesystem

• All NAS devices have embedded operating systems and file systems, but the EFS-CG is:

– Fully Symmetric• Distributed Lock Manager• No Metadata Server or Lock Server

– General Purpose clustered file system– Standard C Library and POSIX support– Journaled with Online recovery

• Proprietary format but uses standard Linux file system semantics and system calls including flock() and fcntl() clusterwide

• Expand a single filesystem online up to 16TB, up to 254 filesystems in current release.

Page 29: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Filesystem Scalability

Page 30: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Scalability. Single Filesystem Export Using x86 Xeon-based NAS Heads (Old Numbers)

123246

493

739

986 1,0841,196

0

200

400

600

800

1,000

1,200

Meg

aByt

es p

er

Sec

on

d (

MB

/s)

1 2 4 6 8 9 10

Cluster Size (Nodes)

# Servers Total bytes (Mbytes) Time (sec.) Mbytes/Sec. Gbits/Sec Scale Factor Scaling Coefficient1 16,384 133 123.19 0.96 1.00 100%2 32,768 133 246.38 1.92 2.00 100%4 65,536 133 492.75 3.85 4.00 100%6 98,304 133 739.13 5.77 6.00 100%8 131,072 133 985.50 7.70 8.00 100%9 147,456 136 1,084.24 8.47 8.80 98%

10 163,840 137 1,195.91 9.34 9.71 97%

123246

493

739

986 1,0841,196

Meg

aByt

es p

er

Sec

ond

(MB

/s)

Cluster Size (Nodes)

NAS I/O Throughput (via NFS)

HP StorageWorks Clustered File System is optimized for both READ and WRITE performance.

ApproximateSingle-headed

Filer limit

NAS Heads

Page 31: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Virtual NFS Services

• Specialized Virtual Host IP

• Filesystem groups are exported through VNFS

• VNFS failover and rehosting are 100% transparent to NFS client– Including active file descriptors, file locks (e.g. fctnl/flock), etc

Page 32: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Filesystems and VNFS

Page 33: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

/u01/u02

NAS Head

/u04/u03

vnfs2b

/u03

NASHead

/u01

vnfs1

Enterprise File Services Clustered Gateway

/u04

NAS Head

/u02

NASHead

/u04/u03

vnfs1b vnfs3b

Enterprise File Services Clustered Gateway

Oracle Database Servers

Page 34: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Management Console

Page 35: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Proof of Concept

Page 36: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Proof of Concept

• Goals

– Use Oracle10g (10.2.0.1) with a single high performance filesystem for the RAC database and measure:

– Durability

– Scalability

– Virtual NFS functionality

Page 37: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Proof of Concept

• The 4 filesystems presented by the EFS-CG were:

– /u01. This filesystems contained all Oracle executables (e.g., $ORACLE_HOME)

– /u02. This filesystem contained the Oracle10gR2 clusterware files (e.g., OCR, CSS) and some datafiles and External Tables for ETL testing

– /u03. This filesystem was lower-performance space used for miscellaneous tests such as backup disk-to-disk

– /u04. This filesystem resided on a high-performance volume that spanned two storage arrays. It contained the main benchmark database

Page 38: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C. Parallel Tablespace Creation

• All datafiles created in a single exported filesystem

– Proof of multi-headed, single filesystem write scalability

Page 39: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C. Parallel Tablespace Creation

Multi-headed EFS-CG Tablespace Creation Scalability

111

208

0

50

100

150

200

250

Single-head, Single GigE Path Multi-headed, dual GigE Paths

MB

/s

Page 40: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C. Full Table Scan Performance

• All datafiles located in a single exported filesystem

– Proof of multi-headed, single filesystem sequential I/O scalability

Page 41: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C.Parallel Query Scan Throughput

Multi-headed EFS-CG Full Table Scan Scalability

98

188

0

50

100

150

200

250

Single-head, Single GigE Path Multi-headed, dual GigE Paths

MB

/s

Page 42: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C.OLTP Testing

• OLTP Database based on an Order Entry Schema and workload

• Test areas

– Physical I/O Scalability under Oracle OLTP – Long Duration Testing

Page 43: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C.OLTP Workload Transaction Avg Cost

Oracle Statistics Average Per Transaction

SGA Logical Reads 33

SQL Executions 5

Physical I/O 6.9 *

Block Changes 8.5

User Calls 6

GCS/GES Messages Sent 12

* Averages with RAC can be deceiving, be aware of CR sends

Page 44: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C.OLTP Testing

10gR2 RAC Scalability on EFS-CG

650

1246

1773

2276

0

500

1000

1500

2000

2500

1 2 3 4

RHEL4-64 RAC Servers

Tra

nsa

ctio

ns

per

S

eco

nd

Page 45: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG P.O.C.OLTP Testing. Physical I/O Operations

RAC OLTP I/O Scalability on EFS-CG

5214

8831

1161913743

0

5000

10000

15000

1 2 3 4

RHEL4-64 RAC Servers

Ran

do

m 4

K I

Op

s

Page 46: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG Handles all OLTP I/O Types Sufficiently—no Logging Bottleneck

OLTP I/O by Type

893

5593

8150

0100020003000400050006000700080009000

redo writes datafile writes datafile reads

I/O

Op

s p

er S

eco

nd

Page 47: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Long Duration Stress Test• Benchmarks do not prove durability

– Benchmarks are “sprints”

– Typically 30-60 minute measured runs (e.g., TPC-C)

• This long duration stress test was no benchmark by any means

– Ramp OLTP I/O up to roughly 10,000/sec

– Run non-stop until the aggregate I/O breaks through 10 Billion physical transfers

– 10,000 physical I/O transfers per second for every second of nearly 12 days

Page 48: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Long Duration Stress Test

Page 49: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Long Duration Stress Test

Page 50: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Long Duration Stress Test

Page 51: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.
Page 52: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Special Characteristics

Page 53: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Special Characteristics

• The EFS-CG NAS Heads are Linux Servers

– Tasks can be executed directly within the EFS-CG NAS Heads at FCP speed:

– Compression

– ETL, data importing

– Backup

– etc..

Page 54: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Example of EFS-CG Special Functionality

• A table is exported on one of the RAC nodes

• The export file is then compressed on the EFS-CG NAS head:

– CPU from NAS Head, instead of database servers• The NAS heads are really just protocol engines. I/O DMAs are offloaded to the I/O

subsysystems. There are plenty of spare cycles.

– Data movement at FCP rate instead of GigE• Offload the I/O fabric (NFS paths from servers to the EFS-CG)

Page 55: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Export a Table to NFS Mount

Page 56: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Compress it on the NAS Head

Page 57: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Questions and Answers

Page 58: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Backup Slide

Page 59: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

EFS-CG NAS Head EFS-CG NAS Head

SAN

Ethernet Switch

FiberChannel Switches

3 GbE NFS Paths:Can be triple bonded, etc

EFS-CG Scales “Up” and “Out”