1 The Risky Data Center Panelists Don Byrne Jack Pyne Rich Banta Introduction and Concepts Standards Overview Applying the Concepts
Aug 17, 2015
1
The Risky Data Center
PanelistsDon Byrne Jack Pyne Rich Banta
Introduction and Concepts Standards OverviewApplying the Concepts
3
Common Data Center Risks• Unlicensed software• Home-grown code in critical path• Single carriers/ utility providers (no diversity)• No policy/guidance for controlling BYOD • Rogue wireless access points• Local purchasing leading to a lack of configuration
control• Inaccurate change management tracking• Out-of-date documentation• Changing compliance requirements with
rules/standards/laws• Unnoticed facility flaws (e.g., internal wooden
frames)• ‘Sandbox’ projects using actual client data for
testing• No data governance software
4
What does this have to do with risk management?
Risk management faces different issues• Avoiding, mitigating or accepting risk
• What is the risk?• Assuring agencies, clients and stakeholders that you
have managed the risk appropriately.• Confidence• Communication
5
Putting Risk Management in Action Reliability-Centered Maintenance
• Developed by the FAA and the airlines in the 1960s
• Adopted by the US Military in the 1970s• Adopted by the nuclear power industry in the
1980’s• Disney uses it in their theme parks
6
Putting Risk Management in Action - RCM
• Business-case oriented• Formalized in SAE JA1011• Certification is available
from Naval Air Command and others
• Risk assessment and management on steroids – all the way down to equipment component levels
SAEJA1011
7
Putting Risk Management in Action - RCM
FMECA: Failure Mode, Effects, and Criticality Analysis• Bottom-up• Inductive analytical method
performed at the functional or piece-part level
• Includes criticality analysis, • Charts the probability of failure
modes against the severity of their consequences.
ComponentFailure Potential (in 12 month period)
Criticality Factor: 1-5 (where 1 is least critical and 5 is ultra critical)
Priority Comments
Ventilator Fan -- unit 30-b1 99% 5 49.5
Filter Gasket -- g-205 98% 4 39.2Needs monthly replacement
UPS -- unit c25 60% 5 30Generator -- unit g-5 35% 5 17.5 4 years oldHVAC Drain pump -- unit p-304 45% 3 13.5Generator -- unit g-4 20% 5 10 2 years oldVentilator Fan -- unit 30-b2 30% 2 6
8
Putting Risk Management in Action - RCM
FMECA: Failure Mode, Effects, and Criticality Analysis
FMECAs are reviewed, refreshed, and maintained at least on an annual basis, with the collected data incorporated into an ongoing and dynamic failure probability analysis model.
9
Putting Risk Management in Action - RCM
When evaluating and purchasing data center infrastructure equipment (generators, UPS systems, HVAC gear, etc.), demand copies of the FMECAs from the manufacturer.
10
Putting Risk Management in Action - RCM
• Increasingly interface directly with corporate/enterprise risk managers.
• They are becoming more and more conversant in RCM, failure probability analysis,
• and the associated value to the risk assessment and risk management equation.
11
Rich Banta – Co-owner Lifeline Data Centers Indianapolis
Rich is responsible for compliance and certifications, data center operations, information technology, and client concierge services. He has an extensive background in server and network management, large scale wide-area networks, storage, business continuity, and monitoring.
He is formerly the Chief Technology Officer of a major health care system. Rich is hands-on every day in the data centers.
Certifications His certifications include: CISA – Certified Information Systems Auditor CRISC – Certified in Risk & Information Systems
Management CDCE – Certified Data Center Expert CDCDP – Certified Data Center Design Professional CTDC - Certified TIA-942 Design Consultant CTIA - Certified TIA-942 Auditor CFCP – Certified FISMA Compliance Practitioner