Top Banner
HPC Research Project CINECA - Unibo Dr. Daniele Cesarini, HPC Software Engineer and Analyst, CINECA School of Engineering, University of Bologna, 19 February 2020
13

HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

May 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

HPC Research ProjectCINECA - Unibo

Dr. Daniele Cesarini, HPC Software Engineer and Analyst, CINECASchool of Engineering, University of Bologna, 19 February 2020

Page 2: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

CINECA Italian National Supercomputing Center

Page 3: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

High-Performance Computing (Supercomputing)

Science Engineering Business

Page 4: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

CINECA HPC Systems

Marconi• Partition CPU:

• Nodes: 2,188• CPU: Intel Xeon

Platinum 8160 CPU @ 2.10GHz• Cores: 105,024• DRAM: 410TB• Network: Omni-path 100Gbit/s

• Partition GPU:• Nodes: 1,000• GPU: 4,000 Nvidia Volta Tesla• Cores: 44,000• DRAM: 250TB• Network: Infiniband 200Gbit/s

Galileo• Nodes: 1,022• CPU: Intel Xeon

E5-2697 v4 @ 2.30GHz• Cores: 36,792• DRAM: 128TB• Network: Omni-path 100Gbit/s

Page 5: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

CINECA – Data Center Power Consumption

Data Center at Casalecchio di Reno (BO)Data Hall: 1.000 m²

Racks: ≈100HPC nodes: ≈4000

Electrical power committed: 5.0 MW - 30.000 MWh / year

Page 6: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

EuroHPC hosting @ Bologna Science Park

ECMWF

Cooling equipment:3MW (2020) -> 5MW(2023)

Computer Rooms: 10MW (2020) -> 20MW (2023)

Page 7: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

Energy Saving

MW

Energy

Saving!!!

-15%

APPMPI n

Comm𝑭𝒎𝒊𝒏

𝑭𝒎𝒂𝒙

Time

Page 8: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

A new trend - Datacentre Automation

PerformanceAnalysis

Scalable MoniitoringFramework

MachineLearning

DataVisualization

Resources Management

Energy efficiency

JobScheduling

Heterogeneous Sensors

Common Interface

CRAC

PDU

CLUSTERReactive and Proactive

Feedbacks

ENV.

Page 9: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

Anomaly Detection

Sens_pu

b

Broker1

Sens_pu

b

Sens_pu

b

Cassandra

node1

MQTT

Sens_pu

b

BrokerM

Sens_pu

b

Sens_pu

b

Cassandra

nodeM

GrafanaApache

Spark

Target

Facility

MQTT

Brokers

Applications

NoSQL

ADMIN

MQTT2Kair

osMQTT2kairos

Kairosdb

Python Matlab

Long-Term

Storage DB

ML Model

(Autoencoder)

Training

Online Inference /

Anomaly

Detection

Live

Data

[EAAI19] Borghesi et. al «A semisupervised autoencoder-based approach for

anomaly detection in high performance computing systems»

[AICAS18] Borghesi et al. «Online Anomaly Detection in HPC Systems»

(1) how to overpass the lack of faulty data in production systems?

(2) Is it possible to train anomaly detection models without domain knowledge?

Page 10: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

European Processor Initiative (EPI)

Page 11: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

Research Projects and Thesis

1. Energy Efficient Runtime (COUNTDOWN): a) Implementing power management strategy in OpenMP runtime and ARM+GPU

systems.

Strong requirements:• Good knowledge of C language

Topics, tools, and languages used in the project/thesis:• CMAKE, GIT and Python• C++/Fortran language and compilators (GNU, Intel, LLVM, etc.)• HPC environment, parallel and distribute programming model (OpenMP, MPI, etc.)• Performance evaluation and modeling• Power management system for high-performance processors (DVFS, RAPL, P/C states, etc.)

2. Porting and performance evaluation of HPC applications on high-performance ARM and Risc-V systems:a) Numerical weather prediction (NWP)

Page 12: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

Research Projects and Thesis

1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.b) AI/DL on big data for datacenter automation.

Strong requirements:• ML and/or DL and/or AI and/or control theory background

Topics, tools, and languages used in the project/thesis:• Python, C, C++, Bash, O.S., Cassandra DB, KairosDB.• Power management system for high-performance processors (DVFS, RAPL, P/C states, etc.)

2. Deep Learning and edge AI:a) Deep Reinforcement learning for power managementb) HW acceleration for on-chip anomaly detectionc) Edge AI for enhanced security

Page 13: HPC Research Project CINECA - Unibo · 2020-05-06 · Research Projects and Thesis 1. Big Data & Deep Learning for HPC: a) Explore AI solution for application acceleration on GPU.

Work Location & Contact

Work LocationEnergy-Efficient Embedded Systems Laboratory (EEES Lab)

Viale Carlo Pepoli 3/1, Bologna, Italy

Unibo SupervisorProf. Andrea [email protected]

CINECA Co-SupervisorDr. Daniele [email protected]