Bad Things Happen to Good People: How to Minimize the … · Bad Things Happen to Good People: How to ... and z/OS are trademarks or registered ... – Values calculated based on
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM'S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM's statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM's sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
IBM, the IBM logo, ibm.com, Information Management, DB2, DB2 Connect, DB2 OLAP Server, pureScale, System Z, Cognos, solidDB, Informix, Optim, InfoSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
Some questions to ask yourself when developinga recovery plan– Does the database need to be recoverable?– What are your RPO and RTO requirements?
• Recovery Point Objective – how much data loss is acceptable, if any, if a major incident occurs?
• Recovery Time Objective – what is an acceptable length of time to perform the recovery while the system is unavailable?
– How frequent do the backup operations need to be?– How much storage space can be used for backups and archived logs? – Where do you want the backups and archived logs to go?– Will table space level backups be sufficient, or will full database backups be
necessary? – Do the backups need to be automated?– Is high availability (HA) a consideration?– Is off-site disaster recovery (DR) a consideration?
Use archive logging (not circular logging) forproduction environments– Provides better recovery characteristics (lower RPO)– Permits use of online backups (better availability) and table space level
backups (more granular)– Can setup two archive log paths for best protection
Include logs in backup images (default behavior)– Allows restoring of an online backup if all you have is the backup image
(e.g. disaster recovery)
Configure mirrored logging on separate file systems– Protects against file system corruption or accidental deletion of log files
For archive logging, consider using automated log file management– Choose how long to retain recovery objects like backups and archived logs and
when to automatically prune them– See NUM_DB_BACKUPS, REC_HIS_RETENTN, and AUTO_DEL_REC_OBJ
Various methods and options available for backing up and recovering the data in your databases– Offline or online backups– Database or table space backups– Split mirror / flash copy backups– Incremental backups– Rebuild database from table space backup images– Backup compression
As database size increases, consider using more frequent, onlinetable space level backups
Built-in autonomics for backup command provides optimal values for number of buffers, buffer size, and parallelism– Values calculated based on amount of utility heap memory available, the
number of processors available, and the database configuration
Automatic database backups simplify backup managementby ensuring that recent full backup of database is performed
The need to perform a backup is based on one or more of thefollowing criteria
– You have never created a full database backup – The time elapsed since the last full backup is more than a specified number of hours – The transaction log space consumed since the last backup is more than a specified
number of 4 KB pages (in archive logging mode only).
Backups can be configured to be offline or online
Supports disk, tape, TSM, and vendor DLL media types
Feature is enabled/disabled by using auto_db_backup and auto_maintdatabase configuration parameters
Policies can be defined via SYSPROC.AUTOMAINT_SET_POLICY and SYSPROC.AUTOMAINT_SET_POLICYFILE stored procedures
Incremental BackupsIncremental backups contain data changed since previous backups (depending on the type of backup – incremental or delta)
– In addition to data, each incremental backup image also contains all of the database metadata
Incremental (cumulative) backup image– Contains all database data that has changed since the most recent, successful, full
backup operation– Cumulative because a series of incremental backups taken over time will each have the
contents of the previous incremental backup image
Delta backup image– Contains a copy of all database data that has changed since the last successful backup
(full, incremental, or delta) of the table space in question
Combinations of database and table space incremental backups arepermitted, in both online and offline modes of operation
To restore a database or table space to a consistent state, the recovery process must begin with a consistent image of the object to be restored, followed by the application of the appropriate incremental backup images
TRACKMOD database configuration parameter must be set to YES
Rebuild a database from table space backup images– Means no longer having to take as many full database backups, which is
becoming less possible as databases grow in size– Instead, take more frequent table space backups
In a recovery situation, if you need to bring a subset of table spaces online faster than others, you can do a partial database rebuild– May also be used for
• Creating a separate database for QA purposes• Data recovery purposes
Can choose which table spaces to restore as part of the rebuild– All table spaces in the database at a time that the backup was taken– All table spaces included in a selected table space backup– A specific list of table spaces specified as part of the restore command– All table spaces except those in the specific list provided
Using a backup image as the source, allows you to copy a set of table spaces and SQL schemas from one database into another
A database schema must be transported in its entirety– If a table space contains both the schema you want to transport, as well as
another schema, you must transport all data objects from both schemas– These self contained (from a table space perspective) sets of schemas that
have no references to other database schemas are called transportable sets
Restore will do multiple operations under the covers– Restore SYSCATSPACE and specified table spaces from backup image– Roll them forward to a point of consistency– Validate the schemas specified– Transfer ownership of the specified table spaces (including containers) to the
target database– Recreate the schema in the target database
The database contains the following valid transportable sets:mydata1: schema1 + schema2 mydata2 + myindex: schema3 multidata1 + multiuser2 + multiindex1: schema4 + schema5Any combination of the above transportable sets
To Move All Table Spaces:restore db old_db \tablespace (“mydata1”,”mydata2”,”myindex”,”multidata1”,”multiindex1”,”multiuser2”)\schema (“schema1”,”schema2”,”schema3”,”schema4”,”schema5”) transport into new_db
Not a valid transportable set Valid transportable set
Recover the contents of a dropped table using DB2'stable space restore and rollforward operations– When rolling forward through the drop of the table, the data is
exported prior to the replay of the drop
Requires that the table space be enabled for dropped table recovery– Enabled by default at table space creation time
When a table is dropped, an entry is made in the transaction log files as well as in the recovery history file
You can recover a dropped table by doing the following:1. Identify the dropped table by invoking the LIST HISTORY DROPPED TABLE command2. Restore a database- or table space-level backup image taken before the table
was dropped3. Create an export directory to which files containing the table data are to be written4. Roll forward to a point in time after the table was dropped (or to end of logs) by using the RECOVER DROPPED TABLE parameter on the ROLLFORWARD DATABASE command
5. Re-create the table by using the CREATE TABLE statement from the recovery history file6. Import the table data that was exported during the rollforward operation into the table
IBM DB2 Recovery ExpertGranular and Flexible Data Recovery
Faster– Simplifies and optimizes database recovery by reducing disruption during the
recovery process• DBAs can quickly restore or correct erroneous data
– Log Analysis enables organizations to monitor changes that allow for quick recovery
Smarter– Provides intelligent analysis of DB2 and DB2 recovery assets to find the most
efficient recovery pathSimpler – Facilitates process of rebuilding data assets to a specified point in time, often
without taking operations offline
“AFS is establishing a disaster recovery policy with our Vision Application. DB2 Recovery Expert provides us with the functionality to roll back both databases to a point where the tables are
consistent. This will help us meet 100% of our needs for this project. The product itself is awesome and WEB UI is very nice.”
-Kirk B. Spadt, Principal Architect, Automated Financial Systems
Uses remote disk mirroring technology– Maximum distance between sites is typically 100s of km
(for synchronous, 1000s of km for asynchronous)– For example: IBM Metro Mirror, EMC SRDF
Transactions run against primary site only,DR site is passive– If primary site fails, database at DR site can be brought online
All data and logs must be mirrored to the DR site– Synchronous replication guarantees no data loss– Writes are synchronous and therefore ordered, but “consistency groups” are
still needed• If failure to update one volume, don’t want other volumes to get updated (leaving data
Single command called "TAKEOVER" – Change the standby into a primary – Switch the roles of a healthy primary-standby pair – No db2start / restart database / rollforward etc.
Integrated TSA provides heartbeat monitoring & automated “TAKEOVER”– Set up for you during DB2 installation– Use a network tiebreaker to avoid split brain scenarios– Configuration is available in this whitepaper
Automatic client re-route (ACR) provides transparent failover– And will rerun the statement that was running when the failure occurred as long
as it’s the first statement of a transaction with no data yet returned
Helps recover from application errors– For example, accidental deletion of important table data – Must be noticed before time delay on standby results in change
being replayed
Enabled via the new HADR_REPLAY_DELAY databaseconfiguration parameter– Specifies a delay in seconds for applying changes on a standby– A value of 0 means no time delay (the default)
and scalability– Application transparency – Scales to >100 members– Leverages z/OS cluster technology
Highlights of pureScale enhancementsin DB2 10.5– Rich disaster recovery options, now including
integrated HADR support – Backup and restore between pureScale and non-pureScale environments– Online fix pack updates – Add members online for additional capacity– Included in Advanced Workgroup and Advanced Enterprise editions
• Multiple DB2 members for scalable and available database environment
• Client application connects into any DB2 member to execute transactions
• Automatic workload balancing
• Shared storage for database data and transaction logs
• Cluster caching facilities (CF) provide centralized global locking and pagecache management for highest levels of availability and scalability
• Duplexed, for no single point of failure
• High speed, low latency interconnect for efficient and scalable communication between members and CFs
• DB2 Cluster Services provides integrated failure detection, recovery automation and the clustered file system
Shared Storage
Database
Logs Logs LogsLogs
Cluster Interconnect
MemberCS
MemberCS
MemberCS
MemberCS
Primary CF
CFCSSecondary CF
CFCS
Clients
DB2 pureScale Cluster (Instance)
Architected for extreme scale and availability
42
Scale with Ease
Log LogLogLog
Addmemberonline
Scale up or out… without changingyour applications
– Efficient coherency protocols designedto scale without application changes
– Applications automatically andtransparently workload balancedacross members
– Up to 128 members
Without impacting availability– Members can be added while
cluster remains online
Without administrative complexity– No data redistribution required
Log
MemberMemberMemberMemberMember
CF CF
“DB2 pureScale is the only solution we found that provided near linear scalability... It scales 100 percent, which means when I add servers and resources to the cluster, I get 100 percent of the benefit. Before, we had to ‘oversize’ our servers, and used only 50 - 60 percent of the available capacity so we could scale them when we needed.”-- Robert M. Collins Jr. (Kent), Database Engineer, BNSF Railway Inc.
Online Recovery from Failures
DB2 pureScale design point is to maximize availability duringfailure recovery processing
When a database member fails, only in-flight data remains locked until member recovery completes
– In-flight = data being updated on the failed member at the time it failed
Target time to availability of rows associated with in-flight updates on failed member in seconds
% o
f Dat
a A
vaila
ble
Time (~seconds)
Only data in-flight updates locked during recovery
Database member failure
100
50
“We pulled cards, we powered off systems, we uninstalled devices, we did everything we could do to make the cluster go out of service, and we couldn’t make it happen.”-- Robert M. Collins Jr. (Kent), Database Engineer, BNSF Railway Inc.
Transactions routed away from member undergoing maintenance, so no application outages experienced. Workload balancing brings work back after maintenance finished
Cluster not running at new level until commit is performed
A “stretch” or geographically dispersed pureScale cluster spans two sites– At distances of tens of km– Active/active DR, where half of the cluster is at site A, other half at site B– Enables a level of DR support suitable for many types of disasters– Supported for AIX (using InfiniBand) and RedHat Linux (using 10 Gigabit Ethernet)
Both sites active and available for transactions during normal operation
On failures, client connections are automatically redirected to surviving members– Applies to both individual members within sites and total site failure
Pros:Inexpensive local failover or DR solutionProtection from software, server, storage, and site failuresSimple to setup and monitorFailover time in the range of 30 secReporting on standby without increase in failover time
Cons:Two full copies of the database (a plus from a redundancy perspective)Only read transactions can run on
Pros:Very fast local failover with DR capabilityProtection from software, server, storage, and site failuresLocal failover time in the range 30 seconds
HADR With Disk Mirroring to Remote DR Site
Primary Connection
DB1
HADR Cluster
Primary Database
Local StandbyDatabase
Automatic client rerouteDisaster RecoverySite
Remote Disk Mirror Technology
DB1a DB1aa
Cons:Three full copies of the database (a plus from a redundancy perspective)More costly than HADR for just DR
Pros:Very fast local failover with DR capabilityProtection from software, server, storage, and site failuresAllows for time delay on auxiliary standbysLocal failover time in the range 30 seconds
HADR With Multiple Standby’s (DB2 10)
Primary Connection
DB1
HADR Cluster
Primary Database
Local StandbyDatabase
Automatic client rerouteDisaster RecoverySite
Remote Standby DB1a DB1aa
Cons:Three full copies of the database (a plus from a redundancy perspective)Super Async only for DR site
Pros:Protected from software, server, storage, and site failuresFailover time is “instant”Standby can be full or subset and is fully accessible (read and/or write)Multiple standby servers
HADR with Replication – Best Practice for HA and DR
HADR Pairs with Replication
Delivers:– Fast Local Failover– Active / Active DR– Rolling patch upgrades– Rolling version upgrades– Online database on-disk modifications– Schema modifications online/rolling
Can replace HADR at each site with pureScale for even better HA
DB2 pureScale Availability Option in DB2 10.5DB2 10.5 supports HADR with pureScale – Online recovery, protection from server, storage and site failure– Easy to set up and manage– Any distance (ASYNC or SuperAsync)
When it comes to HA and DR, one size does not fit all
There are many availability options, each with their own advantages– Server failover– HADR– Q-Replication– pureScale– More likely a combination of several of the above
Choose the one that best suits your deployment– Determine the right solution considering