Robust Design of Air Cooled Server Cabinets
Nathan Rolander, Jeff Rambo, Yogendra Joshi, Farrokh MistreeASME InterPACK Conference19 July 2005
Systems Realization Laboratory
Microelectronics & Emerging Technologies Thermal Laboratory
Support for this work provided by the members of
CEETHERM
05/01/23
Background: What is a data center?
10,000-500,000 sq. ft. facilities filled with cabinets which house data processing equipment, servers, switches, etc.
Tens to hundreds of MW power consumption for computing equipment and associated cooling hardware
Trend towards very high power density servers (30 kW/cabinet) requiring stringent thermal management
* Image: B. Tschudi, Lawrence Berkeley Laboratories1
05/01/23
Introduction & Motivation
Up to 40% of data center operating costs can be cooling related*
Cooling challenges are compounded by a lifecycle mismatch: New computer equipment introduced ~ 2 years Center infrastructure overhauled ~ 25 years
* Source: W. Tschudi, Lawrence Berkeley Laboratories
How do we efficiently integrate high powered equipment into an existing cabinet infrastructure
while maximizing operational stability?
2
05/01/23
Cabinet Design Challenges
Flow complexity The turbulent CFD models required to analyze the
air flow distribution in cabinets are impractical to use iterative optimization algorithms
Operational stability Variations in data center operating conditions,
coupled with model inaccuracies mean computed “optimal” solutions do not translate to efficient or feasible physical solutions
Multiple design objectives Objectives of efficient thermal management, cooling
cost minimization, & operational stability are conflicting goals3
05/01/23
Approach Overview
Integration of three constructs to tackle cabinet design challenges:
Flow complexity
Operationalstability
Multipleobjectives
Challenge
Thermally efficient & robust server cabinet design
approach
Integration
POD* based turbulent modeling
Robust designprinciples
The compromise DSP**
Construct
* Proper Orthogonal Decomposition ** Decision Support Problem4
05/01/23
Modal expansion of basis functions, :
Fit optimal linear subspace through a series of system observations, .
Maximize the projection of the basis functions onto the observations:
Introduction to the POD
1)()(),(
iii xtatxu
}1|||||),(max{ 2 u
)'(')'()',( xdxxxxC
φ
u
5
05/01/23
Modal expansion of basis functions, :
Fit optimal linear subspace through a series of system observations, .
Maximize the projection of the basis functions, onto the observations:
Introduction to the POD
1)()(),(
iii xtatxu
}1|||||),(max{ 2 u
)'(')'()',( xdxxxxC
< , > denotes ensemble averaging
( , ) denotes L2 inner product
Constrained variational calculus
problem
φ
u
5
05/01/23
Modal expansion of basis functions, :
Fit optimal linear subspace through a series of system observations, .
Maximize the projection of the basis functions onto the observations:
Introduction to the POD
1)()(),(
iii xtatxu
}1|||||),(max{ 2 u
)'(')'()',( xdxxxxC
< , > denotes ensemble averaging
( , ) denotes L2 inner product
Assemble observations
covariance matrix
u
φ
5
05/01/23
Modal expansion of basis functions, :
Fit optimal linear subspace through a series of system observations, .
Maximize the projection of the basis functions onto the observations:
Introduction to the POD
1)()(),(
iii xtatxu
}1|||||),(max{ 2 u
)'(')'()',( xdxxxxC
< , > denotes ensemble averaging
( , ) denotes L2 inner product
Take cross correlation tensor
of covariance matrix
u
φ
5
05/01/23
Modal expansion of basis functions, :
Fit optimal linear subspace through a series of system observations, .
Maximize the projection of the basis functions onto the observations:
Introduction to the POD
1)()(),(
iii xtatxu
}1|||||),(max{ 2 u
)'(')'()',( xdxxxxC
< , > denotes ensemble averaging
( , ) denotes L2 inner product
Take eigen-decomposition of the
cross-correlation tensor
u
φ
5
05/01/23
POD Based Turbulent Flow Modeling
Vector-valued eigenvectors form empirical basis of m-dimensional subspace, called POD modes
Superposition of modes used to reconstruct any solution within the range of observations ~10% error*
Flux matching procedure applied at boundaries >> areas of known flow conditions, resulting in the minimization problem:
Values of found using method of least squares Resulting model has ~O(105) reduction in DoF*
1
min{|| ( ) ||} ,p
i ii
G a F p m
G is the flux goalF(.) is contribution to boundary flux from the POD modesa is the POD mode weight coefficient
ai
* see: Rambo HT2005-72143 paper for complete analysis6
05/01/23
Robust Design Principles
Determine superior solutions through minimizing the effects of variation, without eliminating their causes. Type I – minimizing variations in performance
caused by variations noise factors (uncontrollable parameters)
Type II – minimizing variations in performance caused by variation in control factors (design variables)
A common implementation of Type I robust design is Taguchi Parameter Design
7
05/01/23
Robust Design Application
Goals:Y
X
Objective Function
DesignVariable
Deviationat Optimal Solution
OptimalSolution
( )y f x
Optimization minimizes the objective function:
Deviationat Robust Solution
RobustSolution
Solution insensitivity is obtained by minimizing curvature:
2
2 2
1
n
y ii i
f xx
n is the no. control variables
8
05/01/23
Robust Design Application
Constraints:
ConstraintBoundary
Robust Solution Bounds
RobustSolution
FeasibleDesignSpace
OptimalSolution
X2
X1DesignVariable
DesignVariable
Variability bounds in control parameters must be accounted for to to avoid infeasible solutions:
() 0j jEg gx , j = 1,…,p
1
nj
j ii i
gg x
x, j = 1,…,p
E (.) is the mean function
p is the no. constraints
n is the no. control variables
Optimal Solution Bounds
2ΔX2
9
05/01/23
The Compromise DSP Mathematics
Hybrid of Mathematical Programming and Goal Programming optimization routines:
min ( )f x. . ( ) 0is t g x
( ) 0ih x
Objective function
Inequality constraints
Equality constraints
Mathematical Programming
( ) i i iA d d G x
G is a goal
Goal Programming
Goal function
is under/over achievement of goal
,i id d
1
( )m
i i ii
f Z W d d
x
Minimization of Archimedean Deviation Function
W is the goal weight vector
10
05/01/23
Problem Geometry
Enclosed Cabinet containing 10 servers
Cooling air supplied from under floor plenum
Cabinet Profile Server Profile
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
Server 9
Server 10
H
W
Vin ,
Section a
Section b
Section c
Cold Supply Air
Hot Exhaust Air
x
z
Lc
Server 1
Section c
Section b
Section a
W
H
Vin
Hot/Cold airflow
Isoflux Blocks Qa,b,c
Ls
Fan Model
Hs
x
z Fan
Ls
Hs
Qa,b,c
Dimensions:
H = 2 m
W = 0.9 m
Ls = 0.6 m
Hs = 2U ~ 0.09m
11
05/01/23
Cabinet Modeling
9 Observations of Vin = 0:0.25:2 m/s for POD
k-ε turbulence model for RANS implemented in commercial CFD software
Finite difference energy equation solver used for thermal solution, using POD computed flow field
1 iteration ~ 12 secVin = 0.95 m/s
x x
y
12
05/01/23
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
Server 9
Server 10
H
W
Vin ,
Section a
Section b
Section c
Cold Supply Air
Hot Exhaust Air
x
z
Lc
Server 1
Section c
Section b
Section a
W
H
Vin
Hot/Cold airflow
Design Variables & Objectives
Response:Chip Temperatures (oC)
Control Variables:Inlet air velocity, Vin [0, 1] m/sSection a chip power, Qa [0, 200] WSection b chip power, Qb [0, 200] WSection c chip power, Qc [0, 200] W
Server Cabinet Model
13
05/01/23
Design Variables & Objectives
Response:Chip Temperatures (oC)
Goals:Minimize Inlet air velocityMinimize Chip TemperaturesMinimize Chip Temperature Variation
Constraints:Meet Target Cabinet Power Qtotal
All Chip Temperatures < 85oC
Control Variables:Inlet air velocity, Vin [0, 1] m/sSection a chip power, Qa [0, 200] WSection b chip power, Qb [0, 200] WSection c chip power, Qc [0, 200] W
Server Cabinet Model
Objective:Minimize ΔTmax & cooling energy for a given cabinet heat generationrate Qtotal
13
05/01/23
Design Variables & Objectives
Response:Chip Temperatures (oC)
Goals:Minimize Inlet air velocityMinimize Chip TemperaturesMinimize Chip Temperature Variation
Control Variables:Inlet air velocity, Vin [0, 1] m/sSection a chip power, Qa [0, 200] WSection b chip power, Qb [0, 200] WSection c chip power, Qc [0, 200] W
Server Cabinet Model
iterate
Constraints:Meet Target Cabinet Power Qtotal
All Chip Temperatures < 85oC
13
05/01/23
Results
Baseline vs. Maximum efficient power dissipation
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
Server 9
Server 10
Vin ,
Section a
Section b
Section c
x
z
Lc
Server 1
Section c
Section b
Section a
W
H
Vin
Hot/Coldairflow
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
Server 9
Server 10
Vin ,
Section a
Section b
Section c
x
z
Lc
Server 1
Section c
Section b
Section a
W
H
VinVin
Hot/Coldairflow
14
Without server power re-distribution, increasing flow of cooling air alone is ineffective
05/01/23
Results
Inlet air velocity vs. Total cabinet power level
Cooling air is re-distributed to different cabinet sections depending upon supply rate >> server cooling efficiency
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
Server 9
Server 10
Vin ,
Section a
Section b
Section c
x
z
Lc
Server 1
Section c
Section b
Section a
W
H
Vin
Hot/Coldairflow
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
Server 9
Server 10
Vin ,
Section a
Section b
Section c
x
z
Lc
Server 1
Section c
Section b
Section a
W
H
VinVin
Hot/Coldairflow
15
05/01/23
Results
Maximum chip temperature and bounds
Maximum chip temperature constraint met as variation in response changes with varying power & flow rates16
05/01/23
Conclusions
How do we efficiently integrate high powered equipment into an existing cabinet infrastructure
while maximizing operational stability?
17
05/01/23
Conclusions
For the typical enclosed cabinet modeled, over 50% more power than baseline can be reliably dissipated through efficient configuration
Robust solutions account for variability in internal & external operating conditions, as well as a degree of modeling assumptions &inaccuracies
Server cabinet configuration design can be accomplished without center level re-design
17
05/01/23
Questions?
Thank you for attending!
18
05/01/23
Final Validation
Comparison of results obtained using robust design and compact model to FLUENT
Total Cabinet Power (W)
Mean Chip Temp. Difference ( oC)
1600 3 2100 9 2400 3
05/01/23
Robust vs. Optimal Configuration
Pareto Frontier
0.8 0.85 0.9 0.95 1124
125
126
127
128
129
Inlet Air Velocity (m/s)
Chi
p H
eat
Flu
x (W
)
0.8 0.85 0.9 0.95 187
88
89
90
91
Inlet Air Velocity (m/s)
Chi
p H
eat
Flu
x (W
)
0.8 0.85 0.9 0.95 140
45
50
55
60
Inlet Air Velocity (m/s)
Chi
p H
eat
Flu
x (W
)
0.8 0.85 0.9 0.95 161.8
61.9
62
62.1
62.2
62.3
Inlet Air Velocity (m/s)
Mea
n C
hip
Tem
pera
ture
(
o C)
(y) (a)
(b) (c)
Feasiblespace
Feasiblespace
Feasiblespace
05/01/23
Effects of Robust Solution
Optimal >> Robust :Temperature Variation
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1180
200
220
240
260
280
300
320
Optimal >> Robust Weighting Value
Sum
of
Tem
pera
ture
Var
iabi
lity
Optimal vs. Robust Server Temperature Variability
Qtotal = 2000 W
05/01/23
Effects of Robust Solution
Optimal >> Robust :Temperature Variation
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1180
200
220
240
260
280
300
320
Optimal >> Robust Weighting Value
Sum
of
Tem
pera
ture
Var
iabi
lity
Optimal vs. Robust Server Temperature Variability
Qtotal = 2000 W
05/01/23
System Model
Control Factors, xInlet air velocity, Vin [0, 1] m/s
Section a chip power, Qa [0, 200] WSection b chip power, Qb [0, 200] WSection c chip power, Qc [0, 200] W
Noise Factors, ZInlet air temperature, Tin = 25 oC
Res
pons
e, y
Inle
t Air
Vel
ocity
(m/s
)C
hip
Tem
pera
ture
s (o C
)To
tal c
abin
et p
ower
(W)
Sig
nal F
acto
rs, M
Inle
t air
velo
city
(min
imiz
e)C
hip
Tem
pera
ture
s (m
inim
ize)
Cab
inet
Pow
er (n
omin
aliz
e)
Server Cabinet System
05/01/23
Design Objective Specification
System Design Objectives >> Goals Minimize flow rate of cooling air supplied to cabinet Minimize server chip temperatures Minimize sensitivity of configuration to changes in
cabinet operating conditions System Design Specifications >> Constraints
All server chips must operate at under 85oC Total cabinet power must meet target value