How Risk Management Affects Data Centre Design A Practical Approach to Complex Issues
How Risk Management Affects Data Centre Design
A Practical Approach to Complex Issues
My Three Fundamental Beliefs• Facilitative Leadership
– All of our experience and expertise is of value– The diversity of our experience and expertise should not be exclusive or divisive, but rather used to help reach healthy consensus
• There are no magic bullets• Our customers don’t care about cabling
In the News!• Welding mishap blamed for Amazon data center construction fire
• Cabling experts suggest FAAfire is the tip of the sabotage iceberg
• Destroyed in 60 Seconds: Riser Closets Offer Easy Target for Disgruntled Building Tenants to do Damage – Blog Article
Downtime Costs!Technology researcher Infonetics Research, now part of IHS, Inc. (NYSE: IHS), recently conducted in‐depth surveys with 205 medium and large businesses in North America and discovered that companies are losing as much as $100 million per year to downtime related to information and communication technology (ICT).
According to the survey, the most common causes of ICT downtime are failures of equipment, software and third‐party services; power outages; and human error. Infonetics’ respondent organizations said they experience an average of two outages and four degradations per month, with each event lasting around six hours.
“Fixing the downtime issue is the smallest cost component," adds Machowinski. "The real cost is the toll downtime takes on employee productivity and company revenue, illustrating the criticality of ICT infrastructure in the day‐to‐day operations of an organization."
Overview• Data centre growth is exceeding the market• Shift to managed and cloud based services• ‘Big Data’ is here to stay and is only getting bigger• Risk management should be part of every DC design, and
operations review, but;– Are we focused too much on cataclysmic events, nefarious acts, and acts of nature?
– What about the innocent, accidental or evolutionary events that pose threats to the DC
Agenda• Introductions • Market Drivers and Industry Guidance• Risk Assessment Methods• Types of Risk• Group Activity and Review – Identifying Risks • Supplementary Tools & Concepts• Group Activity & Review – Making a Plan
INTRODUCTIONSWho we are and what we do
PresenterHenry Franc RCDD OSP CDCDP
Solutions Specialist t. 416.476.1336
e. [email protected] Member
Member, BICSI Standards CommitteeTIA Engineering Committee Participant (Premises, Copper & Fiber)
Chair TIA TR42.3 (Pathways & Spaces)Vice Chair TIA TR42.10 (Sustainable Information Communications Technology )
Member, Standards Council of Canada (SMC/JTC1/SC25)Past Chair TR42.1 (Commercial Buildings), TR42.4 (OSP) & Editor ANSI/TIA 758
Past Member CSA T104 Standards Committee (disbanded)
INDUSTRY GUIDANCEMarket Drivers and Trends
Lots of Buzz
The World We Live In …The Internet of Everything …
Internet of Information60T web pages (Google Index, October 2014)
Internet of People1.3B Facebook active users (June 2014)
Internet of MobilityMobile devices account for 44% of all IP traffic (2013)
… Connected to 50B Things by 2020
Cisco Connections Counter, October 2014
2015 Tech TrendsThe Merging of the Real and the Virtual Worlds
1 Computing Everywhere
2 Internet of Things
3 3D Printing
Intelligence Everywhere
4 Advanced Pervasive Invisible Analytics
5 Context‐Rich Systems
6 Smart Machines
IT for the Digital Business
7 Cloud/Client Architecture
8 Software‐Defined Infrastructure & Applications
9 Web‐Scale IT
10 Risk‐Based Security & Self‐Protection
Source: Gartner, 2014
Data Characteristics
Volume
Source
Location
Flow
Frequency
Diversity
Multiple “Layer 0”
Technologies
Required
to Support
Future Needs
Data CenterLAN
• Standard driven using ‘commodity’ technologies• Installed base responding to current needs• Modest growth tied to
– New construction– IP Convergence– Power‐over‐cabling– WLAN
• New technology: POLAN
Two Distinct Markets
• Rapidly transforming (and departing from LAN)• Robust growth fuelled by
−More data
− Greater bandwidth
− Improved efficiency
• Cloud providers are changing the rules!− Hotbed for new ideas, new technologies, new topologies
• Conventional and slow‐paced• Upgrade path: Cat5e/6 to Cat6A
• Dynamic and fast‐paced• Cloud vs. Enterprise segmentation leading to
multiple roadmaps and customization
Evolving ICT Model
Data Center Options• What’s in it, where is it, how is it managed, who owns it?
Data Center Investment Market Drivers
Enterprise Owned Multi‐Tenant Cloud
Infrastructure Buyer Enterprise Building: ProviderCabling: Enterprise Provider
Investment Large Capital Capital / Operational Operational
Agility / Scalability Low Med High
Performance “Best Compliant” “Min Compliant” “Purpose Built”
Technology (Cloud, Mobility, Performance) Standard driven Pick & Choose Bleeding Edge
Monthly bill for gas, electric, water, data…
HSA
CloudMulti-TenantEnterprise Owned
25G
Optical Shuffle
White Boxes
Leaf‐Spine
Stimulates Innovation
Data Center Market Segments
Cloud$1.1B TAM, +31% CAGR
Enterprise Owned$3.4B TAM, -4% CAGR
Sources: Dell Oro, Gartner, 451, Cicso Networking Report, Facebook VOC
Multi-Tenant$1.2B TAM, +2% CAGR
Key Attributes:• Capital investment model
for new builds• Cloud applications and
virtualization driving segment decline
• Full spectrum of scale and technologies reside in enterprise owned DCs
Key Attributes:• IT owned by enterprise
client, while space and power are leased
• Multi-tenant providers seeking differentiation are expanding into hosting services
Key Attributes:• Operating expense model
for clients• Enabler of accelerating
mobile / cloud usage• Competitive advantage lies
in DC performance: driving custom switches and servers (white box)
Cloud Server Shipments Will Exceed Enterprise Server Shipments by 2018
Technology Trends
Passive Optical LAN
HDBaseT 2.0
40G/100GHSA 10GBASE‐T
802.11ac Wave 2
40G BiDi
25G/50G/100G
40GBASE‐TCategory 8
20152020
HIGHLOW
R&D
COMMERCIAL MATURITY 40G –SR4
100G –SR10
GROWTH POTENTIAL
EARLY DEPLOYMENT
Power over Cabling
?
LAN DC
?
2015
?
Ultra Wideband MMF
?
Silicon Photonics
25G/50G/100G
Continuity and Availability
• It’s all about availability– Which includes resiliency, redundancy, and recovery
• Many methods– Uptime Tiers
• Tier 1 – Single Path ~ 99.67%• Tier 2 – Single path with redundant components ~ 99.75%• Tier 3 – Concurrently maintainable and operable ~ 99.98%• Tier 4 – Fault tolerant ~ 99.99%Note: these are guidelines and targets not guarantees also there are no half steps for Uptime e.g. Tier III.5 or mixed tiers
– BICSI, TIA, Others
Redundancy• There are multiple ways of achieving, expressing and measuring redundancy– N – basic requirement, no redundancy– N+1 – provides one additional unit/module/path/system in addition to the basic requirement
– N+2 – provides two additional u/m/p/s in addition to the basic requirement
– 2N – provides two complete ‘basic’ requirements– 2(N+1) – provides two complete (N+1) units
Redundancy – Not Clear Cut• Sometimes it’s easier to see – lets use the birthday example a birthday party for my daughter and 4 guests
Redundancy – Multiple Systems• But that was only one system … we have many
One Size Doesn’t Fit All• Most Systems are similair but use different terms
– BICSI uses Facility Availability Classes (F0‐F4 for different classes of availability)– TIA uses the TEAM (Telecommunications, Electrical, Architectural, Mechanical) – Ratings (TIA) /Classes (BICSI) /Tiers (Uptime) build upon each other
• Type 1 – Basic requirements• Type 2 – All Type 1 requirements plus ‘some’ redundant components / systems and
additional requirements• Type 3 – All Type 1 & 2 requirements plus duplicate services / systems and additional
requirements• Type 4 – All Type 1, 2 & 3 requirements plus redundant components / systems / services
and additional requirements
Note: All services do not have to be at the same class (for TIA) e.g. T1E2A1M2
Lets Take a Break
RISK ASSESSMENTMethods for
What is it?
What Not to Do … Other than Keep Calm
Basic Premise and Concept• Risk management is a ‘process’
Control
• Control cannot be achieved without an effective all encompassing security program
• In isolation tools, processes and countermeasures are not enough
Components of Risk Management• A multi‐step process• Plan for the known and
unknown• Regular review should
be mandatory• Should adapt to
changing requirements (assets), threats and vulnerabilities
• Some risks can’t be seen• Some threats can’t be
avoided• Sometimes
countermeasures will fail• What happens when
during an outage?
Situational Analysis
• A thorough analysis should be done, and there are many tools
• A common tool is the SWOT analysis
• It changes based on perspective owner, provider, partner etc.
Threat Evaluation• Threats
– Probability– Impact: scale, recovery and operational
• Basic Evaluation– Red (Critical) needs attention– Amber (Warning) may require attention– Green (‘OK’) no attention required
• Once evaluated priorities can be set– Controls designed/implemented– Countermeasures prepared
Button5 Button10 Button15 Button20 Button25
Button4 Button8 Button12 Button16 Button20
Button3 Button6 Button9 Button12 Button15
Button2 Button4 Button6 Button8 Button10
Button1 Button2 Button3 Button4 Button5
Very High
5
High
4
Medium
3
Low
2
Very Low
1
Very High
5
High
4
Medium
3
Low
2
Very Low
1
Impact
Prob
ability
RISKMany Shades of
Vulnerabilities & Risks• There are many types of vulnerabilities that have risk attached– Facilities (TEAM)– Systems & software– People– Processes– Products (can also be a viewed as a subset)
TEAM ‐ Telecommunications• Telecommunications guidance
– Standards compliance– Diversity & redundancy
• What’s the difference?• What about recovery and continuity?
– Cabling & pathways– Power supplies, fan trays, uplinks etc.– Administration and management
TEAM ‐ Electrical• Electrical guidance
Maintenance Monitoring/analysis Points of failureUtilities UPS PDU Transfer switch(s) Grounding (protection) Emergency Power Off (EPO)Batteries Standby Generation Fuel ConsiderationsLoadbank & Testing Topology
TEAM ‐ Architectural• Architectural guidance
– Site Selection (floods, airports, proximity to services etc.)– Access (parking, roadways etc.)– Type of construction (structural, tenancy, roofing, doors, windows etc.)– Organization of spaces (administration, entry, loading dock, washrooms etc.)
– Special considerations (security, fuel/generator/batteries, monitoring, bullet resistance etc.)
TEAM ‐Mechanical• Mechanical guidance
Redundancy Pipe routing DrainsAir pressure (+/‐) Drains Cooling systemsHeat rejection HVAC controls Fuel oil system requirementsFire suppression Smoke detection Water leak detection
Published Guidance• Most published guidance is about failure, acts of nature and/or nefarious acts– What about change, accidental, innocent, evolutionary, business and other risks?
– People make mistakes– Processes have gaps– Products wear, fail, or can be improperly used
Regulatory Compliance
Payment Card IndustryData Security Standards
Federal Information Security Management Act
(FISMA) Statement on Standards for Attestation Engagements
(SSAE)
Lets Take a Break
RISK ASSESSMENTGroup Activity
TOOLS & CONCEPTSSupplementary
The Ones You Don’t See Coming• The Sunscreen song:
“Don't worry about the future; or worry, but know that worrying is as effective as trying to solve an algebra equation by chewing bubblegum. The real troubles in your life are apt to be things that never crossed your worried mind. The kind that blindside you at 4pm on some idle Tuesday.”
• A recent survey of DC & IT operations professionals had interesting results:
– On average 2 downtime events per respondent in the 2 year study period– 62% of IT executives believed unplanned outages don’t happen frequently (41% rank and file
agreed with this assessment)– 75% of senior level respondents believed they fully support efforts to prevent and manage
unplanned outages (31% of supervisory and lower staff agreed with the statement)
Examples TEAM
• Events– Patching the wrong port– Accidental removal of patching
– Dirty fiber– Design mismatch
• Countermeasures– DCIM, AIM, asset management
– Traceable cords– Tabs & locks– Design review– Labelling and administration
Examples TEAM
• Events– Phase balancing– Load sharing– Plug removal– Panel access– Breaker trip, fuse removal– EPO
• Countermeasures– Labelling and administration– Metering, monitoring, DCIM– Locks and clips– Design review (BIM)– Finger guards– AHJ coordination/covers
Examples TEAM
• Events– Door height/width– Loading docks– Access/security bypass– Location (flood plain)– Address
• Countermeasures– Procedural checklists– Shipment/material verification
– Security audit– Monitoring and surveillance
Examples TEAM
• Events– Airflow blockage– Air bypass– Venturi effect– Aisle alignment– Containment ‘work‐arounds’
• Countermeasures– Cross‐team collaboration and coordination
– Work flow management– Commissioning– Design review– Monitoring and alarms– DCIM
Examples Systems and Software
• Events– Decreasing budgets (loss/length)
– Connector wars– Polarity– Software– Airflow mismatch
• Countermeasures– Factory terminations– Solution flexibility and common footprint
– Common‐sense design– Workflow process– Connectivity management (DCIM)
Examples People
• Events– Language barriers (acronyms,
terms, understanding etc.)– Lack of skilled/qualified resources– Human nature (multi‐tasking,
assuming, shortcuts)– Honest mistakes– Tools
• Countermeasures– Training & equipment– Skills mapping– Established & documented
workflow– Monitoring and surveillance– Dispatch management and alarms– Asset and connectivity
management (DCIM)
Examples Processes
• Events– Wrong disconnect– Unscheduled disconnect– Load balancing– Stop work / shutdowns– Scheduling conflicts, multi‐disciplinary conflict
• Countermeasures– Workflow documentation– Audit and review– Monitoring and alarms– Dispatch management– DCIM, asset and connectivity management
Examples Products
• Events– Product mismatch (Category/Class/Connector)
– Polarity and pinning– Product availability– Explosion of SKUs– Wear and tear
• Countermeasures– Solution commonality– Product rationalization– Standards based design– Modular design– Common building blocks– Spares & ERKs (Emergency restoration kits)
Create a Project Charter • char∙ter ˈCHärdər/
nounnoun: charter; plural noun: charters1. a written grant by a country's legislative or sovereign power, by which an institution such as a company, college, or city is created and its rights and privileges defined.
• Essentially a definition– Concept– Goals, objectives and constraints– Specifications – Measures of success
• Aligns and focuses the team
Dealing with Issues (FSNP)
• Understand:– Tribalism – Motives – Point of view
• Manage Conflict– Compete, collaboration, compromise, avoid, accommodation
FormingLearning about each other
F
StormingChallenging each other
S
NormingWorking with each other
N
PerformingWorking as one
P
Growth and Evolution in the DC• Green Grid
– Expects organizations to progress through the model
– Encourages to move up when feasible taking into account business and facility constraints
• Not just regarding efficiency– Changes in organization, business model– Technology capabilities and requirements
RACI Matrix
• What is a RACI Matrix?“A responsibility assignment matrix (RAM), also known as RACImatrix /ˈreɪsiː/ or ARCI matrix or linear responsibility chart (LRC), describes the participation by various roles in completing tasks or deliverables for a project or business process.” ‐Wikipedia
Accountable
RACI ConsultedResponsible
Informed
ARRR
CCCCCCIIIIIIIIIIIIIIIIIII
The RACI Triangle
How to Use a RACI Matrix• Mapping overall risk management
RAM– Sub‐sets of the procedures– Good for what‐if scenarios
• Any component of the RACI matrix may have no, one or more levels of expectation of a particular function in the matrix
• Ensures accountability through all steps and roles
Objectives• Primary objectives of a risk management plan:
– Identification of potential for negative events and their causes– Reduction of negative events (e.g., errors, outages, loss of data etc.)
– Recovery from negative events– Limit impacts of negative events– Review events to evolve the risk management plan
MAKING A PLANGroup Activity & Review
Sidebar – New In Standards
• New DRAFT Addendum to ANSI/TIA‐598‐D.1 in ballot stage, additional colours for fiber– Because of MPO(16) & MPO(32)
– Lime, Tan, Olive, Magenta– For 17‐32 black tracer– For >32 different tracers
Thank You!