Top Banner
Business Continuity & Disaster Recovery Business Impact Analysis RPO/RTO Disaster Recovery Testing, Backups, Audit
39

Chapter 13

May 05, 2017

Download

Documents

ueki77
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 13

Business Continuity & Disaster Recovery

Business Impact AnalysisRPO/RTO

Disaster RecoveryTesting, Backups, Audit

Page 2: Chapter 13

Imagine a system failure… Server failure Disk System failure Hacker break-in Denial of Service attack Extended power failure Snow storm Spyware Malevolent virus or worm Earthquake, tornado Employee error or revengeHow will this affect each

business?

Page 3: Chapter 13

Event Damage Classification

Negligible: No significant cost or damageMinor: A non-negligible event with no material or financial impact on the businessMajor: Impacts one or more departments and may impact outside clientsCrisis: Has a major material or financial impact on the businessMinor, Major, & Crisis events should be documented and tracked to repair

Page 4: Chapter 13

Workbook:Disasters and Impact

Problematic Event or Incident

Affected Business Process(es)

(Assumes a university)

Impact Classification & Effect on finances, legal

liability, human life, reputation

Fire Class rooms, business departments

Crisis, at times Major,Human life

Hacking Attack Registration, advising, Major,Legal liability

Network Unavailable

Registration, advising, classes, homework,

education

Crisis

Social engineering, /Fraud

Registration, Major,Legal liability

Server Failure (Disk/server)

Registration, advising, classes, homework,

education.

Major, at times: Crisis

Page 5: Chapter 13

Recovery Time: TermsInterruption Window: Time duration organization can wait between point of failure and service resumptionService Delivery Objective (SDO): Level of service in Alternate ModeMaximum Tolerable Outage: Max time in Alternate Mode

Regular Service

Alternate Mode

RegularService

InterruptionWindow

Maximum Tolerable Outage

SDO

Interruption

Time…

Disaster Recovery Plan Implemented

RestorationPlan Implemented

Page 6: Chapter 13

Definitions

Business Continuity: Offer critical services in event of disruptionDisaster Recovery: Survive interruption to computer information systemsAlternate Process Mode: Service offered by backup systemDisaster Recovery Plan (DRP): How to transition to Alternate Process ModeRestoration Plan: How to return to regular system mode

Page 7: Chapter 13

Classification of Services

Critical $$$$: Cannot be performed manually. Tolerance to interruption is very lowVital $$: Can be performed manually for very short timeSensitive $: Can be performed manually for a period of time, but may cost more in staffNonsensitive ¢: Can be performed manually for an extended period of time with little additional cost and minimal recovery effort

Page 8: Chapter 13

Determine Criticality of Business Processes

Corporate

Sales (1) Shipping (2) Engineering (3)

Web Service (1) Sales Calls (2)

Product A (1)

Product B (2)

Product C (3)

Product A (1)

Orders (1)

Inventory (2)

Product B (2)

Page 9: Chapter 13

RPO and RTO

How far back can you fail to? How long can you operate without a system?One week’s worth of data? Which services can last how long?

Inte

rrup

tion

1 1 1Hour Day Week

Recovery Point Objective Recovery Time Objective

Inte

rrup

tion

1 1 1Week Day Hour

Page 10: Chapter 13

Recovery Point Objective

Mirroring:RAID

BackupImages

Orphan Data: Data which is lost and never recovered.RPO influences the Backup Period

Page 11: Chapter 13

Business Impact Analysis SummaryService Recovery

PointObjective(Hours)

RecoveryTime

Objective(Hours)

CriticalResources(Computer,

people,peripherals)

Special Notes(Unusual treatment at

Specific times, unusual risk conditions)

Registration

0 hours 4 hours SOLAR, networkRegistrar

High priority during Nov-Jan,March-June, August.

Personnel 2 hours 8 hours PeopleSoft Can operate manually for some time

Teaching 1 day 1 hour D2L, network, faculty files

During school semester: high priority.

WorkBook

Partial BIA for a university

Page 12: Chapter 13

RAID – Data Mirroring

ABCDABCD

AB CD Parity

AB CD

RAID 0: Striping RAID 1: Mirroring

Higher Level RAID: Striping & Redundancy

Redundant Array of Independent Disks

Page 13: Chapter 13

Network Disaster Recovery

Redundancy

Includes:Routing protocolsFail-overMultiple paths

Alternative Routing

>1 Medium or > 1 network provider

Diverse Routing

Multiple paths,1 medium type

Last-mile circuit protection E.g., Local: microwave & cable

Long-haul network diversityRedundant network providers

Voice RecoveryVoice communication backup

Page 14: Chapter 13

Disruption vs. Recovery Costs

Cost

Time

Service Downtime

Alternative Recovery StrategiesMinimum Cost

* Hot Site

* Warm Site

* Cold Site

Page 15: Chapter 13

Alternative Recovery Strategies

Hot Site: Fully configured, ready to operate within hoursWarm Site: Ready to operate within days: no or low power main computer. Does contain disks, network, peripherals.Cold Site: Ready to operate within weeks. Contains electrical wiring, air conditioning, flooringDuplicate or Redundant Info. Processing Facility: Standby hot site within the organizationReciprocal Agreement with another organization or divisionMobile Site: Fully- or partially-configured trailer comes to your site, with microwave or satellite communications

Page 16: Chapter 13

What is Cloud Computing?

Database

App Server

Laptop

PC

Web ServerCloud Computing

VPN Server

Page 17: Chapter 13

This would cost $200/month.This would cost $200/month.

Introduction to Cloud

NIST Visual Model of Cloud Computing DefinitionNational Institute of Standards and Technology, www.cloudstandards.org

Page 18: Chapter 13

Cloud Service ModelsSoftware(SaaS): Provider runs own applications on cloud infrastructure. Platform(PaaS): Consumer provides apps; provider provides system and development environment.Infrastructure(laaS): Provides customers access to processing, storage, networks or other fundamental resources

Page 19: Chapter 13

Cloud Deployment Models

Private Cloud: Dedicated to one organizationCommunity Cloud: Several organizations with shared concerns share computer facilitiesPublic Cloud: Available to the public or a large industry groupHybrid Cloud: Two or more clouds (private, community or public clouds) remain distinct but are bound together by standardized or proprietary technology

Page 20: Chapter 13

Disaster Recovery

Disaster RecoveryTesting

Page 21: Chapter 13

An Incident Occurs…

Security officerdeclares disaster

Call SecurityOfficer (SO)or committee

member

SO followspre-established

protocol

Emergency Response Team: Human life:

First concern

Phone tree notifiesrelevant participants

IT follows DisasterRecovery Plan

Public relationsinterfaces with media (everyone else quiet)

Mgmt, legalcouncil act

Page 22: Chapter 13

Concerns for a BCP/DR Plan

Evacuation plan: People’s lives always take first priority

Disaster declaration: Who, how, for what? Responsibility: Who covers necessary disaster

recovery functions Procedures for Disaster Recovery Procedures for Alternate Mode operation

Resource Allocation: During recovery & continued operation

Copies of the plan should be off-site

Page 23: Chapter 13

Disaster Recovery ResponsibilitiesGeneral Business First responder:

Evacuation, fire, health… Damage Assessment Emergency Mgmt Legal Affairs Transportation/

Relocation/Coordination (people, equipment)

Supplies Salvage Training

IT-Specific Functions Software Application Emergency operations Network recovery Hardware Database/Data Entry Information Security

Page 24: Chapter 13

BCP DocumentsFocus: IT Business

EventRecovery

Disaster Recovery PlanProcedures to recover at alternate site

Business Recovery PlanRecover business after a disaster

IT Contingency Plan: Recovers major application or system

Occupant Emergency Plan:Protect life and assets during physical threat

Cyber Incident Response Plan: Malicious cyber incident

Crisis Communication Plan:Provide status reports to public and personnel

Business Continuity

Business Continuity Plan

Continuity of Operations PlanLonger duration outages

Page 25: Chapter 13

WorkbookBusiness Continuity Overview

Classifica-tion

(Critical or Vital)

BusinessProcess

Incident orProblematic

Event(s)

Procedure for Handling(Section 5)

Vital Registration

Computer Failure

If total failure, forward requests to UW-

SystemOtherwise, use 1-week-old database for read purposes

onlyCritical Teaching Computer

FailureFaculty DB Recovery

Procedure

Page 26: Chapter 13

MTBF = MTTF + MTTR

• Mean Time to Repair (MTTR)• Mean Time Between Failure (MTBF)

Measure of availability:• 5 9s = 99.999% of time working = 5 ½

minutes of failure per year.

works repair works repair works

1 day 84 days

Page 27: Chapter 13

Disaster Recovery Test Execution

Always tested in this order:Desk-Based Evaluation/Paper Test: A group steps through a paper procedure and mentally performs each step.Preparedness Test: Part of the full test is performed. Different parts are tested regularly.Full Operational Test: Simulation of a full disaster

Page 28: Chapter 13

Testing Objectives

Main objective: existing plans will result in successful recovery of infrastructure & business processes

Also can:• Identify gaps or errors• Verify assumptions• Test time lines• Train and coordinate staff

Page 29: Chapter 13

Testing Procedures

Tests start simple and become more challenging with progressInclude an independent 3rd party (e.g. auditor) to observe testRetain documentation for audit reviews

Develop testobjectives

Execute Test

Evaluate Test

Develop recommendations to improve test effectiveness

Follow-Up to ensure recommendations

implemented

Page 30: Chapter 13

Test StagesPreTest: Set the StageSet up equipmentPrepare staff

Test: Actual test

PostTest: CleanupReturning resourcesCalculate metrics: Time required, % success rate in processing, ratio of successful transactions in Alternate mode vs. normal modeDelete test dataEvaluate planImplement improvements

PreTest

Test

PostTest

Page 31: Chapter 13

Gap Analysis

Comparing Current Level with Desired Level• Which processes need to be improved?• Where is staff or equipment lacking?• Where does additional coordination need

to occur?

Page 32: Chapter 13

InsuranceIPF &

EquipmentData & Media Employee

DamageBusiness Interruption:Loss of profit due to IS interruption

Valuable Papers & Records: Covers cash value of lost/damaged paper & records

Fidelity Coverage:Loss from dishonest employees

Extra Expense:Extra cost of operation following IPF damage

Media ReconstructionCost of reproduction of media

Errors & Omissions:Liability for error resulting in loss to client

IS Equipment & Facilities: Loss of IPF & equipment due to damage

Media TransportationLoss of data during xport

IPF = Information Processing Facility

Page 33: Chapter 13

Auditing BCPIncludes: Is BIA complete with RPO/RTO defined for all services? Is the BCP in-line with business goals, effective, and current? Is it clear who does what in the BCP and DRP? Is everyone trained, competent, and happy with their jobs? Is the DRP detailed, maintained, and tested? Is the BCP and DRP consistent in their recovery coverage? Are people listed in the BCP/phone tree current and do they have a

copy of BC manual? Are the backup/recovery procedures being followed? Does the hot site have correct copies of all software? Is the backup site maintained to expectations, and are the

expectations effective? Was the DRP test documented well, and was the DRP updated?

Page 34: Chapter 13

Summary of BC Security Controls

• RAID• Backups: Incremental backup, differential backup• Networks: Diverse routing, alternative routing• Alternative Site: Hot site, warm site, cold site,

reciprocal agreement, mobile site• Testing: checklist, structured walkthrough,

simulation, parallel, full interruption• Insurance

Page 35: Chapter 13

Step 1: Define Threats Resulting in Business DisruptionKey questions:•Which business processes are of strategic importance?•What disasters could occur?•What impact would they have on the organization financially? Legally? On human life? On reputation?

Impact ClassificationNegligible: No significant cost or damageMinor: A non-negligible event with no material or financial impact on the businessMajor: Impacts one or more departments and may impact outside clientsCrisis: Has a major financial impact on the business

Page 36: Chapter 13

Step 1: Define Threats Resulting in Business Disruption

Problematic Event or Incident

Affected Business

Process(es)

Impact Classification & Effect on finances,

legal liability, human life, reputation

Fire    Hacking incident    Network Unavailable(E.g., ISP problem)Social engineering, fraudServer Failure (E.g., Disk)Power Failure

Page 37: Chapter 13

1 1 1Hour Day Week

Step 2: Define Recovery Objectives

Recovery Point Objective Recovery Time Objective

Inte

rrup

tion

Business Process

Recovery Time

Objective(Hours)

Recovery Point

Objective(Hours)

Critical Resources(Computer,

people, peripherals)

Special Notes(Unusual treatment at

specific times, unusual risk conditions)

                  

1 1 1Week Day Hour

Page 38: Chapter 13

Business Continuity

Step 3: Attaining Recovery Point Objective (RPO)

Step 4: Attaining Recovery Time Objective (RTO)

Classification(Critical or

Vital)

Business Process

Problem Event(s) or Incident

Procedure for Handling(Section 5)

              

Page 39: Chapter 13

Criticality Classification

Critical: Cannot be performed manually. Tolerance to interruption is very low

Vital: Can be performed manually for very short timeSensitive: Can be performed manually for a period of

time, but may cost more in staffNon-sensitive: Can be performed manually for an

extended period of time with little additional cost and minimal recovery effort