Top Banner
Energy Efficient data centers A holistic approach and best practice at LRZ CERN / ERF / ESS Workshop Energy for Sustainable Science at Research Infrastructures DESY Hamburg, October 29 th , 2015 Arndt Bode Chairman of the Board, Leibniz-Rechenzentrum of the Bavarian Academy of Sciences and Humanities and Technische Universität München
26

Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Oct 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Energy Efficient data centers

A holistic approach and best practice at LRZCERN / ERF / ESS Workshop – Energy for Sustainable Science at Research Infrastructures

DESY Hamburg, October 29th, 2015

Arndt Bode Chairman of the Board, Leibniz-Rechenzentrum of the Bavarian

Academy of Sciences and Humanities and Technische Universität München

Page 2: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn
Page 3: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

27.02.2019 Leibniz-Rechenzentrum3

Page 4: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

LRZ TCO

Normalized TCO:

Source: DataCenterDynamics Focus, Volume 3, Issue 33, Jan/Feb 2014

2012-2013:

6%

Europe

Estimated Worldwide Data Center Power Consumption

Energy Efficient data centers – A holistic approach and best practice at LRZ 4

Page 5: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

SIMOPEK Project Coverage

Open Access 4 Pillar Framework Paper: http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00450-013-0244-6

External Influences/Constraints

Data Center (Goal: Reduce Total Cost of Ownership)

Nei

ghbo

ring

Bui

ldin

gsU

tilit

y Pr

ovi

ders

Pillar 1Building Infrastructure

Pillar 3HPC System Software

Pillar 4HPC Applications

Improve PUE (Power Usage

Effectivness)

Optimize Resource Usage

Optimize

Performance

Tune System

SIMOPEK AdvancedAdsorption Cooling

CooLMUC

SuperMUC

Pillar 2HPC System Hardware

Reduce Hardware Power

Consumption

Global Optimization Strategy

SIMOPEK Data Collection using PowerDam V.2.0

PowerDam V.1.0

SIMOPEK Power Consumption Modeling, Simulation & Optimization using MYNTS

Energy Efficient data centers – A holistic approach and best practice at LRZ 5

Page 6: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

SuperMUC – Phase 1/2

27.02.20196

Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 7: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

I/O

nodes

NAS80 Gbit/s

18 Thin node islands

(each >8000 cores)

1 Fat node island

(8200 cores)

also used as Migration System

$HOME

1.5 PB / 10 GB/s

Snapshots/Replica

1.5 PB

(separate fire section)

non blocking

pruned tree (4:1)

SB-EP

16 cores/node

2 GB/core

WM-EX

40cores/node

6.4 GB/core

10 PB

200 GB/s

GPFS for

$WORK

$SCRATCH

Visualization

Internet

Archive and Backup

~ 30 PB

Disaster Recovery Site

Compute nodes Compute nodes

non blocking

SuperMUC General Configuration – Phase 1

727.02.2019

Page 8: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

27.02.2019 Leibniz-Rechenzentrum8

Page 9: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

● Lenovo NeXtScale Water Cool (WCT)

Cooling liquid temperatures 30°C – 45°C

Compressor-free cooling 365 d.p.a.

SuperMUC Phase 2

27.02.2019

Foto: Torsten Bloth, Lenovo

● 72 Racks

48 Compute

9 Infiniband

6 In-Row Cooler

9 Management + Storage

Energy Efficient data centers – A holistic approach and best practice at LRZ9

Page 10: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

● 3072 Compute Nodes

Lenovo NeXtScale nx360M5 WCT

2 x Intel E5-2697v3 2.6GHz 14c

64GB Main memory

Mellanox Connect-IB Single Port HCA

Diskless

Direct water cooling

SuperMUC Phase 2

27.02.2019 Energy Efficient data centers – A holistic approach and best practice at LRZ

Fotos: Torsten Bloth, Lenovo

10

Page 11: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

SuperMUC Phase 2

197 TByte Main Memory (24576 8GB

DIMMs)

6144 Prozessors (4.09m² CMOS)

+7,5 PByte Storage

+3,6 Pflop/s Peak Performance

11

Mellanox Infiniband FDR14 Fat Tree –

4295 optical cables 58,3 km

122 m2 (1/4 of phase1-)

27.02.2019Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 12: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

SuperMUC Phase 2

27.02.2019

Savings in Energy: New Intel-Processor-Technology

Direct Cooling

10% Savings compared to air cooling

~25% Savings by compressor-free cooling

Fotos: Torsten Bloth, Lenovo

Energy-aware Scheduling

+ 6% Savings

~40% better energy efficiency

12

Page 13: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

„Extreme Scale-out“

28 days later

27.02.2019

Friendly-User Phase of the upcoming

SuperMUC Phase 2 (3.6 PFlop/s peak, 2.8

Pflop/s Linpack, 86016 cores)

Available: 63.4 million core-h

Used: 43.8 million core-h

41 Scientists from 14 Institutes

14 Applications running on full system

Page 14: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

14 Applications 2015

27.02.2019 Leibniz-Rechenzentrum14

Software Application

BQCD Quantumchomodynamics

SeisSol Seismology

GPI-2 / GASPI Global Adress Space Library

Seven-League Hydro Astropysics

ILBDC Lattice Boltzmann

Iphigenie Molekular Dynamics

FLASH Astro CFD

Gadget Cosmology

PSC Plasmaphysics

waLBerla Lattice Boltzmann

Musubi Lattice Boltzmann

CIAO CFD, Combustion

Vertex3D Stellar Astrophysics

LS1-Mardyn Material Science 14

Page 15: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Results

27.02.2019

• Largest Cosmology Simulation so far (10%

of the visible universe)

• Largest pseudo-spectral simulation of

interstellar turbulence (10,000^3 Cells)

• Factor 100 better resolution for molecular

spectra

• 2 Applications with sustained PFLOP/s

Performance (SeisSol and LS-Mardyn) for

more than 20 hours

• Strong scaling of a seismic reconstruction

problem using GPI-2 (from 16 hours to 55

seconds)

Page 16: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Integration of all IT-Systems into LRZ Dual-Cube

Dark Center

27.02.2019

Infrastructure and Servers for HPC, Back-up and Archiving, Munich Network, Visualization

and „General IT-Services“

Advantages of LRZ-Dual-Cube Dark Center:

RAS through isolation, redundancy and IT-control

Efficient fire extinction on the basis of Argon

Reliable energy provision by layered USV concepts based on flying wheels (12) array of

batteries, diesel engine

Energy efficiency by layered cooling concept: cold air, chilled water at different

temperatures, direct cooling based on warm water

… and intelligent technologies: free cooling, reuse of waste heat (adsorption machines,

building climate), use of ground water / geothermal heat, fine grained and intelligent

monitoring, tools for optimization („automatic DVFS“) and user-information

Flexibility supported by functional spezilization and separation of floors of the Dark

Center

Synergies for end-users by tight interaction of HPC-Big Data-Networking-Visualization

Energy Efficient data centers – A holistic approach and best practice at LRZ 16

Page 17: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

View of control part for direct water cooling

infrastructure

27.02.2019 17Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 18: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

New Generation of HPC Data Centers Use

a Mix of Different Cooling Technologies

Water Conditioning

Compute

HPC

HRR

Dis

ks &

Tap

e

Lib

rari

es

Cold Water Distribution

Serv

ers

Inte

rco

nn

ec

t

Ch

ille

rs

Storage

3.OG 2.OG

NSR

1.OG EG

Cooling

Towers Vapor

Cooling

Tower

Cooling

Towers

Well

Co

re S

erv

ers

I &

Netw

ork

Serv

ers

Precission

Cooling

Towers

DAR

USV

UGWKZ

USVdyn.

Electr

Chillers

Co

re S

erv

ers

II

& N

etw

ork

Fre

e c

oo

lin

g w

inte

r

Ch

ille

r C

oo

lin

g T

ow

ers

Ho

t W

ate

r C

oo

ling

stat

Cooling capacity LRZ

(new construction):

- Vapor cooling: 2MW

- Well water: 600kW

- Chillers: 3.2MW

- Evaporative cooling

towers: 8MW

Energy Efficient data centers – A holistic approach and best practice at LRZ 18

Page 19: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Power Profile of LRZ (6. – 10.1.2014)

27.02.2019 19Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 20: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Energy Efficiency of SuperMUC (6. – 10.1.2014)

27.02.2019 20Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 21: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Predicting the Power Consumption of Strong and

Weak Scaling HPC Applications, Hayk Shoukourian

Syste

m A

ve

rage

Po

we

r

Con

su

mption

Time

Cooling Towers

System Average Power Consumption View

Introduced average power

consumption constraint

during the maintenance of

cooling towers

1 2 3 4 5

Time T

Job Queue J

?Can the job J be

scheduled and the

power consumption

constraint preserved

#!/bin/bash

#@ job_type=parallel

#@ node = 270…

echo –n "Starting job J"

mpiexec –n 270 ./myJobJ

echo –n "Job J finished"

27.02.2019 21Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 22: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Adaptive Energy and Power Consumption

Prediction (AEPCP) Process & Model

Number of NodesApplication Energy

Tag

𝐴2𝐸𝑃2

PowerDAM

Available application

EtS/APC history data

Predicted EtS/APC of the Application

for a Given Number of Nodes

(1)

(3)

(4)

(2)

(3)

Number of

ResourcesApplication

Identifier

𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟

Available history data

Monitoring

Tool

Predicted EtS/APC of the Application

for a Given Number of Resources

(1)

(2)

(3)

(3)

(4)

27.02.2019 22Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 23: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Adaptive Application Energy and Power Predictor (𝐴2𝐸𝑃2)

Compute node

number in history?

Average all the

available EtS/APC

history data for that

node number and

report the averaged

one

Take the available

EtS/APC application

history data

Determine predictor-

function

Predict for the given

number of compute

nodes –

report the predicted one

(Y1)(N1)

(N2)

(N3)

spline/polynomial

linear function

(I) (II)

(III)

(IV)

(V)

%𝑅𝑀𝑆𝐸 =1

𝑛σ𝑖=1𝑛 (𝑥𝑖

𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 − 𝑥𝑖𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑

)2∙100∙𝑛

σ𝑖=1𝑛 𝑥𝑖

𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑

𝐴2𝐸𝑃2 Workflow

𝐴2𝐸𝑃2 predictor-function

estimation scenarious

27.02.2019 23

Page 24: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Power Data Aggregation Monitor (PowerDAM)

27.02.2019 24Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 25: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

Differences In Node Power Draw - SuperMUC

27.02.2019 25Energy Efficient data centers – A holistic approach and best practice at LRZ

Page 26: Energy Efficient data centers A holistic approach and best ... · waLBerla Lattice Boltzmann Musubi Lattice Boltzmann CIAO CFD, Combustion Vertex3D Stellar Astrophysics LS1-Mardyn

LRZ: „holistic“ approach to streamline the „four „pillars“:

work in progress:

• building / infrastructure: ( )

• energy efficient system hardware:

• system monitoring / analysis / control:

• Efficient application algorithms: ( )

Energy Efficient data centers – A holistic approach and best practice at LRZ26

Energy efficiency in the (HPC-) Data Center