Top Banner
1 Myths and realities about designing high availability data centers Tier III and Tier IV: What do you need to know? Steven Shapiro, P.E., ATD Mission Critical Practice Lead
35

Myths and realities about designing high availability data centers

Aug 07, 2015

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Myths and realities about designing high availability data centers

1

Myths and realities about designing high availability data centers

Tier III and Tier IV: What do you need to know?

Steven Shapiro, P.E., ATD

Mission Critical Practice Lead

Page 2: Myths and realities about designing high availability data centers

2

Data Center World – Certified Vendor Neutral

Each presenter is required to certify that their presentation will be vendor-neutral.

As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.

Page 3: Myths and realities about designing high availability data centers

3

Agenda

• Tier definitions

• Nines

• Tier III/IV issues – one line diagram

• Factors affecting performance

• Reliability and availability

• Causes of critical failures

• Key takeaways

• Questions

Page 4: Myths and realities about designing high availability data centers

4

Tier Definitions

Page 5: Myths and realities about designing high availability data centers

5

Things that are not tier-dependent

• Site location

• Facility construction

• Quality of equipment

• Facility commissioning

• Age of site

• Operations and maintenance program

• Personnel training

• Level of personnel coverage

Tier Definitions

Page 6: Myths and realities about designing high availability data centers

6

• Align business mission and facility performance expectation

• Benchmark against the industry

• Assist in developing business case for capital expenditures

Tier Requirements

User must define tier requirements for a facility

Page 7: Myths and realities about designing high availability data centers

7

Five 9’s Refers To Availability

• Availability (A) is the long-term average percentage of time that a component or system is in service and satisfactorily performing its intended function.

• Five nines availability means:

Minutes of Downtime Each Year

Hours of Downtime Every 20 Years

• Availability does not specify how often an outage occurs

“Nines”

Page 8: Myths and realities about designing high availability data centers

8

Tier Requirements

Tier I Tier II Tier III Tier IV

Number of Delivery Paths 1 11 Active

1 Passive2 Active

Redundancy N N+1 N+1 2N Minimum

Compartmentalization No No No Yes

Concurrent Maintainability No No Yes Yes

Fault Tolerance No No No Yes

Availability 99.671 99.749 99.982 99.95

Downtime in Hr/Yr 28.8 22 1.6 0.4

Page 9: Myths and realities about designing high availability data centers

9

• Tier I: $10,000 US/kW of useable UPS Power Output

• Tier II: $11,000 US/kW of useable UPS Power Output

• Tier III: $20,000 US/kW of useable UPS Power Output

• Tier IV: $22,000 US/kW of useable UPS Power Output

• Plus $225 US/SF of computer room

Based on a 15,000 SF white space, +/- 30%

Data Center Costs

From The Uptime Institute

Page 10: Myths and realities about designing high availability data centers

10

One Line Diagram2N Utility

N+2 Gens

2N Gen Distribution

2N UPS

2NDistribution

Mechanical UPS

One Line Diagram

Page 11: Myths and realities about designing high availability data centers

11

2N Utility

Not a tier requirement

Page 12: Myths and realities about designing high availability data centers

12

Generator Count and Distribution

• 2N generators not a tier requirement

• Some sort of 2N distribution is a Tier III and IV requirement

Page 13: Myths and realities about designing high availability data centers

13

• UPS can be configured in

many ways

• N = number of modules

installed meets the load – Tier

I And II

• N+1 = number of modules to

meet the load plus 1 additional

module, Tier III

Multi-Module UPS System Configuration

Page 14: Myths and realities about designing high availability data centers

14

• UPS can be configured in many

ways

• 2N Systems = 2X the number of

systems than required to meet

the load – Tier IV

• 2(N+1) Systems = 2x the

number of N+1 systems installed

than required to meet the load –

Tier IV

Multi-Module UPS System Configuration

Page 15: Myths and realities about designing high availability data centers

15

UPS Systems With External Maintenance Bypass

Page 16: Myths and realities about designing high availability data centers

16

• Mechanical UPS is required to keep

data center HVAC systems

operational until generator plant

supports load

• May run CRAC units, secondary or

primary pumps, etc.

• Sized to match cooling load for data

center and battery time of data center

UPS

Mechanical UPS

Page 17: Myths and realities about designing high availability data centers

17

Certain things can

be overdone.

How Much Redundancy is Enough?How Much Redundancy Is Enough?

Page 18: Myths and realities about designing high availability data centers

18

The Cost of Reliability

99.0

.9

99.9

99.99

99.999

Reliability

99.9999

Cost $

Page 19: Myths and realities about designing high availability data centers

19

• Location

• Design

• Redundancy level

• Construction

• Quality of equipment

• Thoroughness of commissioning program

• Age

• Operations & maintenance program

• Personnel training

• Level of coverage

Factors Affecting Performance But Not Tier Level

Lurking vulnerabilities

Page 20: Myths and realities about designing high availability data centers

20

• Document Management

• Maintenance Programs (CMMS)

• Commissioning

• Vendor Management

• Change Management

• Standard and Emergency Operating Procedures

• Training

• Staffing

Factors Affecting Performance But Not Tier Level

Page 21: Myths and realities about designing high availability data centers

21

• Harmonics Analysis

• EMF Studies

• Short Circuit Studies

• Coordination Studies

• CFD Modeling

Cold Aisle

Hot Aisle

IT Equipment

Computer Room Air ConditioningUnits

Factors Affecting Performance But Not Tier Level

Page 22: Myths and realities about designing high availability data centers

22

• Probability of failure/reliability

• Availability

• MTTF

• MTTR

• Susceptibility to natural disasters

• Fault tolerance

• Single points of failure

• Maintainability

• Operational readiness

• Maintenance program

Reliability Considerations

Page 23: Myths and realities about designing high availability data centers

23

Single Utility Feeder, Parallel Redundant UPS and Generators, Single-Corded IT Rack

Page 24: Myths and realities about designing high availability data centers

24

2N UPS, N+1 Generators, ASTSs and Dual-Corded IT Rack

Page 25: Myths and realities about designing high availability data centers

25

Two Utility Feeders, 2(N+1) UPS, 2(N+1) Generators, ASTSs, Dual Corded IT Rack

Page 26: Myths and realities about designing high availability data centers

26

Distributed Redundant UPS, N+2 Generators, Two Utility Feeders, ASTSs and Dual Corded IT Rack

Page 27: Myths and realities about designing high availability data centers

27

Reliability Considerations

Page 28: Myths and realities about designing high availability data centers

28

• 2(N+1) / system + system with dual utility feeders is the most

reliable topology

• There is no significant reliability improvement in using a 2(N+1)

UPS configuration over 2N

• Distributed redundant configuration is less reliable than 2N

• Improvement if a second utility feeder is provided

• N+2 and/or 2N generator systems are marginally more reliable

than N+1

Reliability Considerations

Page 29: Myths and realities about designing high availability data centers

29

Fail after 24 hours

Reliability Considerations

Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants

Emergency Diesel Generators

Fail to start

Fail after ½ hour

Fail after 8 hours

Page 30: Myths and realities about designing high availability data centers

30

• A hybrid configuration may be most effective

• STS’s on the secondary side of the PDU transformer yield a 2-to-1

reliability improvement over 480 V STS’s

• Dual cord has higher impact than the use of STS’s

• Ultimate reliability: STS + Dual Cord

• Assess the condition of the mechanical plant in conjunction with the

electrical system

• The facility reliability will be driven by the least reliable component

(typically the electrical infrastructure)

Reliability Considerations

Page 31: Myths and realities about designing high availability data centers

31

• Segregate system in independent blocks

• Eliminate common source components to minimize fault

propagation (i.e., LBS, hot-tie, manual bus ties)

• Move single points of failures as close to the load as possible

• Always maintain two independent sources of power to the critical

load

• Optimize the design of monitoring and controls circuits

• Keep it simple and minimize human intervention

Fundamentals of High Availability Design

Page 32: Myths and realities about designing high availability data centers

32

Causes of Critical Failures

28%

20%

18%

13%

10%

4%4% 3%

Equipment failure

System design

Human error

Equipment design

Installation error

Commissioning or test deficiency

Maintenance oversight

Natural disaster

Page 33: Myths and realities about designing high availability data centers

33

• Typically a combination of factors

• External event (power failure)

• Equipment failure

• Human factor

• Latent failures

• Root cause not always easy to ascertain

• Most major failures occur during change of state events

• Loss of utilities

• System transfers during maintenance activities

• More maintenance does not necessarily mean higher availability

Causes of Critical Failures

Page 34: Myths and realities about designing high availability data centers

34

• What reliability level do you really need based on your business case?

• Do you want concurrent maintainability?

• Do you want fault tolerance?

• Minimize single points of failure within systems

• Ensure adequacy of operations, maintenance and testing programs

• Review/develop SOPS and EOPS

• Review/develop existing documentation

• Review/develop training practices

Key Takeaways

Page 35: Myths and realities about designing high availability data centers

35

Steven Shapiro, PE, ATDMission Critical Practice Lead

(914) 420-3213

[email protected]

http://www.linkedin.com/in/stevenshapirope

Twitter: @stevenshapirope

Questions?

References:Uptime Institute White Paper: Tier Myths and MisconceptionsUptime Institute White Paper: Data Center Site Infrastructure Tier Standard - Topology