Top Banner
1 © 2008 IBM Corporation System z Storage Options for Linux and Oracle (FCP/SCSI, FICON/ECKD, ASM, Storage Array Striping) 22nd Annual International zSeries Oracle SIG Redwood Shores, California – April 21, 2009 David Simpson – [email protected] © 2008 IBM Corporation ©IBM/Oracle 2009 2 17-Apr-09 The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: AS/400, DB2, e-business logo, ESCON, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/390, System Storage, System z9, VM/ESA, VSE/ESA, WebSphere, xSeries, z/OS, zSeries, z/VM. The following are trademarks or registered trademarks of other companies Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries. LINUX is a registered trademark of Linux Torvalds in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation. SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC. Intel is a registered trademark of Intel Corporation. * All other products may be trademarks or registered trademarks of their respective companies. NOTES: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent Goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use. The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. Trademarks
17

System Z Storage Options for Linux on z - zSeries Oracle Sig

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: System Z Storage Options for Linux on z - zSeries Oracle Sig

1

© 2008 IBM Corporation

The Americas – ATS Solutions Center

System z Storage Options for Linux and Oracle (FCP/SCSI, FICON/ECKD, ASM, Storage Array Striping)

22nd Annual International zSeries Oracle SIG Redwood Shores, California – April 21, 2009

David Simpson – [email protected]

© 2008 IBM Corporation

©IBM/Oracle 20092 17-Apr-09

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: AS/400, DB2, e-business logo, ESCON, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/390, System Storage, System z9, VM/ESA, VSE/ESA, WebSphere, xSeries, z/OS, zSeries, z/VM.

The following are trademarks or registered trademarks of other companies

Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries.LINUX is a registered trademark of Linux Torvalds in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.Intel is a registered trademark of Intel Corporation.* All other products may be trademarks or registered trademarks of their respective companies.

NOTES:Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent Goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.

The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

Trademarks

Page 2: System Z Storage Options for Linux on z - zSeries Oracle Sig

2

© 2008 IBM Corporation

©IBM/Oracle 20093

I/O Architecture

An Oracle Database in10g includes three standard storage options:– File system

• Network attached storage (NAS)• Storage area network (SAN)• Direct attached storage

– Raw partitions

– Automatic Storage Management (ASM)

© 2008 IBM Corporation

©IBM/Oracle 20094

I/O Modes

I/O can be written to disk in several ways by using different system calls: – Synchronous I/O

– Asynchronous I/O

– Direct I/O

Buffer cache Disk file

DirectWrite

Flush

Process

Process

Page 3: System Z Storage Options for Linux on z - zSeries Oracle Sig

3

© 2008 IBM Corporation

©IBM/Oracle 20095 17-Apr-09

Direct I/O

Direct I/O is considered to be the high-performance solution.– Direct reads and writes do not use the OS buffer

– Direct reads and writes can move larger buffers than file system I/Os.

– Set filesystemio_options=setall for LVM file systems

Write

ReadProcess Disk file

© 2008 IBM Corporation

©IBM/Oracle 20096

Bandwidth Versus Size

I/O performance depends on bandwidth.– Number of disks, not size

– Number of controllers

Backgroundprocess

Disk controllers

Page 4: System Z Storage Options for Linux on z - zSeries Oracle Sig

4

© 2008 IBM Corporation

©IBM/Oracle 20097 17-Apr-09

Should I Use RAID 1 or RAID 5?

RAID 1 (Mirroring)– Recommended by Oracle– Most demanding applications

Advantages– Best redundancy– Best performance– Low recovery overhead

Disadvantages– Requires higher capacity

RAID 5 (Parity)– DSS and moderate OLTP

Advantages – Requires less capacity

Disadvantages– Less redundancy– Less performance– High recovery overhead

© 2008 IBM Corporation

©IBM/Oracle 20098 17-Apr-09

DiagnosticsIndicators of I/O issues:– Top waits are reads and writes plus:

• Buffer busy waits • Write complete waits • DB file parallel writes • Enqueue waits

– The file I/O Statistics section shows high waits and AVG Buffer Wait time higher than average on certain files.

Note: On a well-performing system, the top events are likely to be CPU time, DB file scattered read, and DB file sequential read.

Page 5: System Z Storage Options for Linux on z - zSeries Oracle Sig

5

© 2008 IBM Corporation

©IBM/Oracle 20099 17-Apr-09

Metric = MBPS

Important I/O Metrics for Oracle Databases

OLTP(Small random I/O)

DW/OLAP(Large sequential I/O)

Need high RPM andfast seek time

Need largeI/O channel

Metric = IOPSand latency

Disk bandwidth Channel bandwidth

© 2008 IBM Corporation

©IBM/Oracle 200910 17-Apr-09

Oracle Storage – Testing with ORIONORION Simulates Oracle reads and writes, without having to create a database and helps to isolate

I/O issues. When a database is optimally configured you can expect to get up to 95% of the thorughput of Orion.

./orion_zlinux -run oltp -testname mytest -num_disks 2 -duration 30 -simulate raid0ORION VERSION 11.2.0.0.1Commandline: -run oltp -testname mytest -num_disks 2 -duration 30 -simulate raid0This maps to this test: Test: mytestSmall IO size: 8 KB Large IO size: 1024 KBIO Types: Small Random IOs, Large Random IOsSimulated Array Type: RAID 0 Stripe Depth: 1024 KBWrite: 0% Cache Size: Not EnteredDuration for each Data Point: 30 secondsSmall Columns:, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,

32, 34, 36, 38, 40Large Columns:, 0 Total Data Points: 22Name: /dev/dasdq1 Size: 2461679616Name: /dev/dasdr1 Size: 24616796162 FILEs found.Maximum Small IOPS=5035 @ Small=40 and Large=0Minimum Small Latency=0.55 @ Small=2 and Large=0

Page 6: System Z Storage Options for Linux on z - zSeries Oracle Sig

6

© 2008 IBM Corporation

©IBM/Oracle 200911 17-Apr-09

Orion Testing Continued

-run oltp -testname mytest -num_disks 2 -duration 30 -simulate raid0This maps to this test:Test: mytestSmall IO size: 8 KB Large IO size: 1024 KBIO Types: Small Random IOs, Large Random IOsSimulated Array Type: RAID 0 Stripe Depth: 1024 KBWrite: 0%Cache Size: Not EnteredDuration for each Data Point: 30 secondsSmall Columns:, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,

38, 40Large Columns:, 0Total Data Points: 22Name: /dev/sda1 Size: 10737401856Name: /dev/sdb1 Size: 107374018562 FILEs found.Maximum Small IOPS=24945 @ Small=24 and Large=0Minimum Small Latency=0.60 @ Small=12 and Large=0

For this test, the IOPS (Inserts per second) went from 5035 on ECKD to 24945 under FCP storage.

© 2008 IBM Corporation

©IBM/Oracle 200912 17-Apr-09

Oracle Storage

Page 7: System Z Storage Options for Linux on z - zSeries Oracle Sig

7

© 2008 IBM Corporation

©IBM/Oracle 200913 17-Apr-09

System z channel subsystem improves I/O performance

Channel Subsystem– System Assist Processors (SAP)– I/O Channels (RISC Processors)– Shared I/O capability

Significant I/O Offload– Address limit checking– Channel path management– I/O busy conditions – Manage Interrupts– Dynamic path reconnect– Transfer data– Receive CU sense information– Channel subsystem monitoring

This benefits all logical partitions including z/OS and Linux

Hypervisor (PR/SMTM)

I/O I/O I/O I/O

VMz/OS z/OS

Linux Linux Linux

SAP SAP

CHPIDS

ChannelSubsystem

© 2008 IBM Corporation

©IBM/Oracle 200914 17-Apr-09

ECKD Storage

Page 8: System Z Storage Options for Linux on z - zSeries Oracle Sig

8

© 2008 IBM Corporation

©IBM/Oracle 200915 17-Apr-09

FCP (SCSI) Storage

© 2008 IBM Corporation

©IBM/Oracle 200916 17-Apr-09

z10 FICON Express4

1, 2, 4Gbps

1, 2, 4Gbps

1, 2, 4Gbps

1, 2, 4Gbps

1, 2, 4 Gbps auto-negotiatedDesigned to support up to 80% more I/O operations per second for 4K block sizes, compared to System z9Up to 336 channelsLX 10 km, LX 4 km, SXConcurrent repair of optics–FC

• Native FICON• Channel-To-Channel (CTC)

– z/OS, z/VM, z/VSE, z/TPF, TPF, Linux on System z

–FCP (Fibre Channel Protocol)• Support of SCSI devices

– z/VM, z/VSE, Linux on System z

Page 9: System Z Storage Options for Linux on z - zSeries Oracle Sig

9

© 2008 IBM Corporation

©IBM/Oracle 200917 17-Apr-09

© 2008 IBM Corporation

©IBM/Oracle 200918 17-Apr-09

FICON vs FCP Performance

Page 10: System Z Storage Options for Linux on z - zSeries Oracle Sig

10

© 2008 IBM Corporation

©IBM/Oracle 200919 17-Apr-09

FCP (SCSI) vs. FICON (ECKD)

Advantages of FCP:

- Currently FCP is a little faster for throughput

- FCP is handy for sites with existing distributed Storage Array Networks

Advantages of ECKD:

- Multipathing does not need to be setup as Multipathing is done at the Hardware level and not the software layer like FCP multipath.

- Traditional Storage methodology for System z, a lot of the documentation use DASD in the cookbooks.

- FICON utilizes the System Assist Processors (SAP) on the z Series Machines – both FCP and FICON utilize infiniband I/O (z10)

© 2008 IBM Corporation

©IBM/Oracle 200920 17-Apr-09

Multipath Considerations

Page 11: System Z Storage Options for Linux on z - zSeries Oracle Sig

11

© 2008 IBM Corporation

©IBM/Oracle 200921 17-Apr-09

Assigning Storage – Guests in the Same LPAR

First Guest owns the disks in Read/Write mode, while the second Linux guest uses z/VM links to the Mini Disks

e.g. On ORALIN01 you would define

MDISK 0350 3390 1 1669 LIORAD MWV

On ORALIN02

LINK ORALIN01 0350 0350 MW

V- Tells CP to use its virtual reserve/release support in the I/O operations for the minidisk. For example, MWV means the minidisk functions with write linkage using CP's virtual reserve/release.

© 2008 IBM Corporation

©IBM/Oracle 200922 17-Apr-09

Assigning Storage with Linux Guests in One LPAR

Page 12: System Z Storage Options for Linux on z - zSeries Oracle Sig

12

© 2008 IBM Corporation

©IBM/Oracle 200923 17-Apr-09

Assigning Shared Disk from z/VM between LPARs It is best to dedicate the devices that will be shared to the individual Linux guests when using multiple LPARs.

Use an Rdevice statements for the shared DASD devices between LPARs, such as the following:

Rdevice 2E13-2E1C Type DASD Shared Yes MDC OFF

Oracle is managing the “locking” of writes to the disk. zVM and Linux do not manage which guests can write to a device, so its important to limit access to just the nodes that are in the Oracle cluster.

Ensure that Mini Disk Cache is off at the zVM and the Disk level, and when installing tools like Velocity, ensure that these tools do not inadvertently turn on mini disk caching.

© 2008 IBM Corporation

©IBM/Oracle 200924 17-Apr-09

Setting Up FCP Storage – the Triplet

Page 13: System Z Storage Options for Linux on z - zSeries Oracle Sig

13

© 2008 IBM Corporation

©IBM/Oracle 200925 17-Apr-09

FCP Setup Guidelines

Make sure run mkinitrd and zipl

ASM requires that you must create at least one partition.

Bug in Oracle 10.2.0.2 formating OCR devices with multipathed devices. Fixed in 10.2.0.3 but must use singlepath to get around OCR format on an initial Install.

Need to configure Multipathing for redundancy

© 2008 IBM Corporation

©IBM/Oracle 200926 17-Apr-09

Storage Array Striping

Page 14: System Z Storage Options for Linux on z - zSeries Oracle Sig

14

© 2008 IBM Corporation

©IBM/Oracle 200927

A storage manager designed to manage Oracle Database files ASM Disk

ASM DiskASM Disk

ASM DiskASM Disk

ASM Disk Group

+DATAdatafiles

file1… filencontrolfileredolog1

Oracle Automatic Storage Management (ASM)

• Cluster support with Oracle Clusterware

• File System• Even data distribution with optimal

performance• Automatic file management using

Oracle Managed Files (OMF)

• Volume Manager• 1MB/128KB striping• Flexible mirroring• Online disk reconfigurations and

automatic rebalancing

© 2008 IBM Corporation

©IBM/Oracle 200928 17-Apr-09

1MBAU

ASM Striping Granularity

COARSE FINE

Page 15: System Z Storage Options for Linux on z - zSeries Oracle Sig

15

© 2008 IBM Corporation

©IBM/Oracle 200929 17-Apr-09

Automated Storage Management ( ASM )

ASM

0010 00100010 0010

Raw Disk Groups LVs

FileSystems

Files Tablespace

Tables

Non ASM

Eliminates need for conventional file system and volume manager

ASM extends SAME (Stripe and Mirror Everything)

Improved performance, scalability, and reliability

ASM is Oracle’s integrated clusterwareCapacity on demand

– Add/drop disks online

Automatic I/O load balancing– Stripes data across disks to balance load

– Best I/O throughput

– Automatic mirroring and stripping

Easy to manage

Can only host datafiles, not binaries

Disk group

Oracle DB Instance

ASM Instance

Disk 3

Disk 2

Disk 1

Disk 3

Disk 2

Disk 1

Before ASM With ASM

Provisioning storage when you need it… and save money

Conventional wisdom

F

© 2008 IBM Corporation

©IBM/Oracle 200930 17-Apr-09

Automatic Storage Management

A new feature introduced in Oracle Database 10g

Provides a vertical integration of the file system and volume manager for Oracle database files

Spreads database files across all available storage for optimal performance and resource utilization

Enables simple and non-intrusive resource allocation and provides automatic rebalancing

Works in single-instance and RAC databases

Page 16: System Z Storage Options for Linux on z - zSeries Oracle Sig

16

© 2008 IBM Corporation

©IBM/Oracle 200931 17-Apr-09

ASM Installation Best Practices

Install ASM in a separate ORACLE_HOME than the database ORACLE_HOME

Provides higher availability and manageability

Allows independent upgrades of the database and ASM.

De-installation of database software can be performed without impacting the ASM instance

© 2008 IBM Corporation

©IBM/Oracle 200932 17-Apr-09

ASM Instance Performance DiagnosticsSELECT event, total_waits t_wait,

total_timeouts t_timeout,time_waited t_waittm,average_wait a_waittm, wait_class

FROM V$SYSTEM_EVENTWHERE wait_class <> 'Idle' and time_waited > 0ORDER BY 4 DESC;

EVENT WAIT TOUT WAITT AVG CLASS------------------------------ ------ ----- ----- ------- -------ASM mount : wait for heartbeat 1 1 439 438.85 Admin…kfk: async disk IO 578 0 377 .65 SystI/O log write(odd) 7 3 296 42.33 Other rdbms ipc reply 37 1 259 7.01 Other log write(even) 8 2 197 24.58 Other SQL*Net message to client 139249 0 103 0 Network os thread startup 9 0 79 8.77 Conc…buffer write wait 1 0 60 60.31 Other DBFG waiting for reply 16 0 1 .04 Other

Page 17: System Z Storage Options for Linux on z - zSeries Oracle Sig

17

© 2008 IBM Corporation

©IBM/Oracle 200933 17-Apr-09

ASM Diskgroup Best Practices

The size of the Flash Recovery Area Diskgroup will depend on what is stored and how much is retained

The FRA size should be driven by recovery time objectives

To minimize search overhead, perform all required mount operations in a single mount command

If adding or removing multiple disks, make the change in a single rebalance operation.

This coalesces rebalance operations and reduces overhead

Use ASM External Redundancy when using high end storage arrays

With 10.2, you can use block devices . For example, use /dev/sda1 instead of /dev/raw/raw

© 2008 IBM Corporation

©IBM/Oracle 200934 17-Apr-09

ASMLIb vs UDEV for Oracle Disk PermissionsASMLib requires Linux rpms to be installed and are tied to the Linux kernel version, so sometimes there may not be an ASMLib driver if working with a newer kernel version.

If you utilize ASMLib disk access is managed by this service, if you do not you must setup UDEV rules.

With Linux 2.6 kernels and greater UDEV can now be used instead of ASMLIb, to ensure that disk devices remain consistent across reboots and file permissions get set.

if utilizing Oracle RAC without ASMLib or UDEV rules, device permissions that need to be changed from root:root, will prevent Oracle from starting up on reboot.

Performance is about the same between UDEV and ASMLib. ASMLib does provide reduced user mode to kernel mode context switches during periods of high I/O, and reduced file handle usage due to a single call to ASMLib.

Note for Installing Oracle Clusterware (CRS) to FCP multipath devices, may face an OCR Format Issue, you must install on singlepath and then convert to multipath or apply a patch.