Top Banner
© 2005 EMC Corporation. All rights reserved. 1 1 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005 Achieving Continuity of Operations (COOP)
27

© 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Jan 16, 2016

Download

Documents

Prosper Glenn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 111

Disaster Recovery of Technology Services:

Issues

Strategies

DirectionsPresented by Dave Purdy 6-23-2005

Achieving Continuity of Operations (COOP) 

Page 2: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 22

Ever increasing need for COOPPeople, Data, and Services Availability

Drivers/trends for improved recoverability and/or availability of Services:

– Current measure increasingly deemed inadequate

– Physical vs. Electronic transport of Data– Melding of “DR” and “Operational Availability”– Self Insurance for DR – Public Safety/Service Availability vs. Cost

Maturity in understanding COOP issues:– Recovery vs. Restart– Identification of App/DB inter-dependencies– DR vs. Operational Availability (HA)– Breaking down the problem:

• Information Availability• Application Availability

Page 3: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 33

Production Availability and Disaster Recovery:Converging?

Planned occurrences: Competing workloads (87% of occurrences)

Backup, reporting Data warehouse extracts Application and data restore

Unplanned occurrences: Failure(13% of occurrences)

Database corruption Component failure Human error

Disaster: Natural or man-made (<1% of occurrences) Flood, fire, earthquake Contaminated building“DR”

“CA”

“HA”

Insurance

ROI

Page 4: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 44

100 %Procedural

( 0 % IT ArchitecturalRedundancy )

100 %Automatic

(100 %IT ArchitecturalRedundancy)

24 hrs x 7 days

Manual

Transparent

Failsafe

Non Critical BusinessSmall Industries

Low Failsafe

Resources

High Failsafe

Essential ServicesGovernment, Airlines,Hospital

BanksFinancial Services

TelecommunicationsFood Manufacturer

Consumer GoodsManufacturing

Manufacturing

Retail

Low VolumeHigh Volume

TransportationLogistics

Low security

High Security

Low security

Differing levels of IT Architectural dependency with regard to Availability Strategies:

Creating a context: Government moving up the Continuum

Page 5: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 55

Availability Drivers

Increased realization that critical services depend on IT availability

Pervasive requirements to protect people and data Increasing nature of real-time “transactions” “Lost” transactions cannot be re-created

Increased recognition that traditional recovery from tape is no longer viable

New vision - Merger of production and DR disciplines to focusing on continuous availability

Public Service, Safety, and Inter-Agency dependencies driving criticality of COOP

Page 6: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 66

Traditional Disaster Recovery: Tape

Tape Backup with Offsite Tape Storage RPO = 24+ hours or time of last backup stored offsite RTO = 24 - 96 hours or time required to restart operations

Transport tapes to recovery site Setup systems to receive data Restore from tape Synchronize systems and DB for resumption

SecsMinsHrsDays Wks Secs Mins Hrs Days Wks

Days Wks

Retrieve Tape Set Up Systems

Restore from Tape

Wks Days

Tape BackupOffsite Storage

RPO RTO

Page 7: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 77

Consistency=Usability: This is not a platform or application issue….

Getting All the Data at the Same TimeAcross databases, applications, and platforms….

Consistency Group

Mainframe

Consistency Group

Windows

Consistency GroupUNIX

UNIX

Mainframe

Windows

Page 8: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 88

Patterns of DR Program Evolution:

– Restore is very different than Restart – Testing effectiveness and control: Subset vs. Full / Hotsite vs. Internal– Application/Agency Inter-dependencies– Traditional recovery and restore techniques being deemed inadequate– Increased complexity (and benefit) in justifying “DR” versus “DR + HA” as

2nd site becomes more integrated with primary site

Insourced “CA”

To 2/3 sites-Active

-Triangulate

Insourced DR & HA To 2nd Site-Passive-Active

Commercial Hotsite

withElectronic Vaulting or Replication

CommercialHotsiteQuickShip

Local Remote

OffsiteVital

Records

Key Learnings:

Page 9: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 99

A Practical Approach to Unifying Requirements and IT Capabilities for Mutually Agreement…

Customer Problem AreaMaximum Acceptable Data Loss (RPO)Maximum Acceptable Downtime (RTO)

Zero Sec. Mins Hours > 24 hrs.

TAPE BACKUP & RECOVERY

DISKDATA

REPLICATION

SERVER CLUSTERING &

VIRTUALIZATION

LOCAL

LOCAL

REMOTE

REMOTE

REMOTE

LOCAL

Market

Requirem

ents

Page 10: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 1010

Primary

Secondary

-or-

Tertiary

Secondary

Synch

Asynch

Asynch

AsynchNetwork

In-Region

Out-RegionAvailability Strategies:Disaster Recovery (DR)High Availability (HA)“Continuous Availability” (CA)

Commercial Hotsite

-SunGard

-IBM BRCS

Page 11: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 1111

Remote Replication Capability Continuum Summary

Asynchronous Seconds of data exposure No performance impact Unlimited distance

Source

Unlimited Distance

Target

Asynchronous Point-in-Time Hours of data exposure No performance impact Unlimited distance

Source

Unlimited Distance

Target

Prod

Synchronous No data exposure Limited distance

Source

Limited Distance

Target

Triangulated Synch & Asynch Simultaneous Synchronous and

Asynchronous Three site awareness

Limited

Long-distancesite

Primarysite

2nd Site

Unlimited

Unlimited

Page 12: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

© 2005 EMC Corporation. All rights reserved. 1212

Best Practices for Achieving Business Continuity

Determine requirements / service levels– System / application mapping

Validate ability to achieve service-level agreements

– Evaluate costs / tradeoffs of technologies to meet service levels

Create right level of protection for your Agencies (or Inter-Agencies?) specific business and application requirements

Integrate it– Across information storage platforms– Across processing infrastructure (servers, networks, applications)– Across data centers and geographic locations– Integrate with Change Management

Page 13: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.
Page 14: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Business Continuity Planning: Lessons from the Nation’s Capitol in

the Post-9/11 World

Mary Kaye VavasoureGov Services

Office of the Chief Technology Officer District of Columbia

Page 15: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Recent History as Context• 1996-1999 Y2K made continuity a priority

Internet made networks a focus and eGovernment a reality

• 2001 9-11 the unthinkable happened; security of data, network, and infrastructure became key to recovery

• 2002 Federal Patriot Act made Continuity planning a legal mandate

• 2003 Sarbanes-Oxley Act added more regulatory requirements

Hurricane Isabel caused regional power outages that lasted 4-7 days

Page 16: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Key Elements of Business Continuity Strategy

• High availability platform and procedures• Proven Emergency Operations Process

– Detailed, service-based procedures

– Dedicated staff

– Regional coordination

– Frequent practice with planned events

• Focus on Continuity of Communications– Public safety wireless network

– Public portal resiliency with specialized content

– High availability messaging platform

Page 17: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

High Availability Platform; Centralized Process

• In-sourced, high availability Disaster Recovery• Active-Active for availability

– Multiple servers behind hardware load balancers (millisecond fail-over)

– Separate web application and database tiers– 95% of public web services covered (104 sites + main portal)

• Active-Passive for Disaster Recovery– Two data centers– Multiple types of replication

• Cluster synch for dynamic portal content• MS/CRS for legacy applications and static pages• Database tier uses SQL and Oracle replication

• Future: tertiary site for continuous availability of portal• Centralized failure recovery process run by senior staff

Page 18: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Comprehensive Emergency Operations Process

• Started with Y2K; focus on manual processes to back-up automated systems

• Post 9/11: focus on continuity of services• Dedicated staff=DC EMA + agency representatives + key

service providers (utilities, suppliers, Federal public safety, regional emergency agency staff, etc.)

• Hardened site• 14 Emergency Liaison Officers for key services• Two-tiered operational structure (EOC and JIC)• Clearly defined decision-making process and lines of authority• Redundant communication channels with all levels of

responders, and the public at large• Frequent practice, using planned events

Page 19: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Specialized Content for Public Communication

• Public portal’s Emergency Center provides detailed content for emergency response plans

• Extensive use of GIS-based content – Content to tailored to individual’s location– Facilitates location of shelters, evacuation routes,

and major transportation services

• Specialized “Emergency Mode” will take over entire portal during catastrophic events

Page 20: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.
Page 21: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Focus on Continuity of Communications

• Public safety wireless network for voice and data

• Federal and regional voice interoperability

• 99% of District geography is covered

• Dedicated transmission towers, and mobile repeater systems

• Signals can penetrate thick building walls, metro system tunnels, underground locations

Page 22: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

District of Columbia Office of the Chief Technology Officer – 1

Page 23: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Coverage Improvement With New Network

• Coverage Improvement With New MPD Network

District of Columbia Office of the Chief Technology Officer – 2

Page 24: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Public Portal Resiliency

• Active-Active failover, with load balancing on heartbeat for high availability

• Actual=99.99999%• Active-Passive disaster recovery between two

local data centers• Future Tertiary site for continuous availability

GOAL=Never Go Dark

Page 25: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

High Availability Messaging Platform

• Completely fault tolerant email system enables government officials to communicate and share data during significant outages

• High volume synchronous data replication between primary and secondary data centers, using EMC’s CLARiiON Mirror View

• Homeland Security funding ($900k) made public safety agencies the priority focus during implementation:– MPD– FEMS– DMH– CFSA– DOC

• Can failover email accounts, and the most recent data from 4 hours prior to the outage

Page 26: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Key Success Factors

• People

• Process

• Practice

Page 27: © 2005 EMC Corporation. All rights reserved. 111 Disaster Recovery of Technology Services : Issues Strategies Directions Presented by Dave Purdy 6-23-2005.

Mary Kaye VavasourProgram Manager

eGovernment ServicesOffice of the Chief Technology Officer

District of Columbia

[email protected]