Myths and realities about designing high availability data centers
Post on 07-Aug-2015
249 Views
Preview:
Transcript
1
Myths and realities about designing high availability data centers
Tier III and Tier IV: What do you need to know?
Steven Shapiro, P.E., ATD
Mission Critical Practice Lead
2
Data Center World – Certified Vendor Neutral
Each presenter is required to certify that their presentation will be vendor-neutral.
As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.
3
Agenda
• Tier definitions
• Nines
• Tier III/IV issues – one line diagram
• Factors affecting performance
• Reliability and availability
• Causes of critical failures
• Key takeaways
• Questions
4
Tier Definitions
5
Things that are not tier-dependent
• Site location
• Facility construction
• Quality of equipment
• Facility commissioning
• Age of site
• Operations and maintenance program
• Personnel training
• Level of personnel coverage
Tier Definitions
6
• Align business mission and facility performance expectation
• Benchmark against the industry
• Assist in developing business case for capital expenditures
Tier Requirements
User must define tier requirements for a facility
7
Five 9’s Refers To Availability
• Availability (A) is the long-term average percentage of time that a component or system is in service and satisfactorily performing its intended function.
• Five nines availability means:
Minutes of Downtime Each Year
Hours of Downtime Every 20 Years
• Availability does not specify how often an outage occurs
“Nines”
8
Tier Requirements
Tier I Tier II Tier III Tier IV
Number of Delivery Paths 1 11 Active
1 Passive2 Active
Redundancy N N+1 N+1 2N Minimum
Compartmentalization No No No Yes
Concurrent Maintainability No No Yes Yes
Fault Tolerance No No No Yes
Availability 99.671 99.749 99.982 99.95
Downtime in Hr/Yr 28.8 22 1.6 0.4
9
• Tier I: $10,000 US/kW of useable UPS Power Output
• Tier II: $11,000 US/kW of useable UPS Power Output
• Tier III: $20,000 US/kW of useable UPS Power Output
• Tier IV: $22,000 US/kW of useable UPS Power Output
• Plus $225 US/SF of computer room
Based on a 15,000 SF white space, +/- 30%
Data Center Costs
From The Uptime Institute
10
One Line Diagram2N Utility
N+2 Gens
2N Gen Distribution
2N UPS
2NDistribution
Mechanical UPS
One Line Diagram
11
2N Utility
Not a tier requirement
12
Generator Count and Distribution
• 2N generators not a tier requirement
• Some sort of 2N distribution is a Tier III and IV requirement
13
• UPS can be configured in
many ways
• N = number of modules
installed meets the load – Tier
I And II
• N+1 = number of modules to
meet the load plus 1 additional
module, Tier III
Multi-Module UPS System Configuration
14
• UPS can be configured in many
ways
• 2N Systems = 2X the number of
systems than required to meet
the load – Tier IV
• 2(N+1) Systems = 2x the
number of N+1 systems installed
than required to meet the load –
Tier IV
Multi-Module UPS System Configuration
15
UPS Systems With External Maintenance Bypass
16
• Mechanical UPS is required to keep
data center HVAC systems
operational until generator plant
supports load
• May run CRAC units, secondary or
primary pumps, etc.
• Sized to match cooling load for data
center and battery time of data center
UPS
Mechanical UPS
17
Certain things can
be overdone.
How Much Redundancy is Enough?How Much Redundancy Is Enough?
18
The Cost of Reliability
99.0
.9
99.9
99.99
99.999
Reliability
99.9999
Cost $
19
• Location
• Design
• Redundancy level
• Construction
• Quality of equipment
• Thoroughness of commissioning program
• Age
• Operations & maintenance program
• Personnel training
• Level of coverage
Factors Affecting Performance But Not Tier Level
Lurking vulnerabilities
20
• Document Management
• Maintenance Programs (CMMS)
• Commissioning
• Vendor Management
• Change Management
• Standard and Emergency Operating Procedures
• Training
• Staffing
Factors Affecting Performance But Not Tier Level
21
• Harmonics Analysis
• EMF Studies
• Short Circuit Studies
• Coordination Studies
• CFD Modeling
Cold Aisle
Hot Aisle
IT Equipment
Computer Room Air ConditioningUnits
Factors Affecting Performance But Not Tier Level
22
• Probability of failure/reliability
• Availability
• MTTF
• MTTR
• Susceptibility to natural disasters
• Fault tolerance
• Single points of failure
• Maintainability
• Operational readiness
• Maintenance program
Reliability Considerations
23
Single Utility Feeder, Parallel Redundant UPS and Generators, Single-Corded IT Rack
24
2N UPS, N+1 Generators, ASTSs and Dual-Corded IT Rack
25
Two Utility Feeders, 2(N+1) UPS, 2(N+1) Generators, ASTSs, Dual Corded IT Rack
26
Distributed Redundant UPS, N+2 Generators, Two Utility Feeders, ASTSs and Dual Corded IT Rack
27
Reliability Considerations
28
• 2(N+1) / system + system with dual utility feeders is the most
reliable topology
• There is no significant reliability improvement in using a 2(N+1)
UPS configuration over 2N
• Distributed redundant configuration is less reliable than 2N
• Improvement if a second utility feeder is provided
• N+2 and/or 2N generator systems are marginally more reliable
than N+1
Reliability Considerations
29
Fail after 24 hours
Reliability Considerations
Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants
Emergency Diesel Generators
Fail to start
Fail after ½ hour
Fail after 8 hours
30
• A hybrid configuration may be most effective
• STS’s on the secondary side of the PDU transformer yield a 2-to-1
reliability improvement over 480 V STS’s
• Dual cord has higher impact than the use of STS’s
• Ultimate reliability: STS + Dual Cord
• Assess the condition of the mechanical plant in conjunction with the
electrical system
• The facility reliability will be driven by the least reliable component
(typically the electrical infrastructure)
Reliability Considerations
31
• Segregate system in independent blocks
• Eliminate common source components to minimize fault
propagation (i.e., LBS, hot-tie, manual bus ties)
• Move single points of failures as close to the load as possible
• Always maintain two independent sources of power to the critical
load
• Optimize the design of monitoring and controls circuits
• Keep it simple and minimize human intervention
Fundamentals of High Availability Design
32
Causes of Critical Failures
28%
20%
18%
13%
10%
4%4% 3%
Equipment failure
System design
Human error
Equipment design
Installation error
Commissioning or test deficiency
Maintenance oversight
Natural disaster
33
• Typically a combination of factors
• External event (power failure)
• Equipment failure
• Human factor
• Latent failures
• Root cause not always easy to ascertain
• Most major failures occur during change of state events
• Loss of utilities
• System transfers during maintenance activities
• More maintenance does not necessarily mean higher availability
Causes of Critical Failures
34
• What reliability level do you really need based on your business case?
• Do you want concurrent maintainability?
• Do you want fault tolerance?
• Minimize single points of failure within systems
• Ensure adequacy of operations, maintenance and testing programs
• Review/develop SOPS and EOPS
• Review/develop existing documentation
• Review/develop training practices
Key Takeaways
35
Steven Shapiro, PE, ATDMission Critical Practice Lead
(914) 420-3213
sshapiro@morrisonhershfield.com
http://www.linkedin.com/in/stevenshapirope
Twitter: @stevenshapirope
Questions?
References:Uptime Institute White Paper: Tier Myths and MisconceptionsUptime Institute White Paper: Data Center Site Infrastructure Tier Standard - Topology
top related