Top Banner
Data Analytics and High Performance Computing - a Convergence?
97

Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

May 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Data Analytics and High Performance Computing - a Convergence?

Page 2: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 2

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Motivation: Big Data, Data Analytics, and Machine Learning

— Introduction ScaDS Dresden/Leipzig

— Infrastructure for Data Analytics and Machine Learning

Services at HPC

Performance Aspects

— Outlook – Future Perspectives

Outline

Page 3: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 3

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Nice Example ...

Page 4: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 4

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Nice Example: Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance (Todd W. Schneider)

— The New York City Taxi & Limousine Commission has released a staggeringly detailed historical dataset covering over 1.1 billion individual taxi trips in the city from January 2009 through June 2015

— http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/

— Maps show every taxi pickup in New York City from 2009–2015

— Brighter regions indicate more taxi activity.

— Green tinted regions represent activity by green boro taxis, which can only pick up passengers in upper Manhattan and the outer boroughs

Taxi pickups

Page 5: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 5

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Taxi Pickups

Page 6: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 6

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Taxi Dropoffs

Page 7: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 7

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Uber vs. Taxi Pickups in Brooklyn

— Between June 2014 and June 2015, the

number of Uber pickups in Brooklyn grew by

525%

— As of June 2015, Uber accounts for more

than twice as many pickups in Brooklyn

compared to yellow taxis

— Rapidly approaching the popularity of green

taxis:

Page 8: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 8

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Brooklyn Monthly Taxi Pickups

— Introduction of the green boro taxi program

in August 2013 dramatically increased the

amount of taxi activity in the outer boroughs

— From 2009–2013, a period during

which migration from Manhattan to Brooklyn

generally increased, yellow taxis nearly

doubled the number of pickups they made in

Brooklyn.

— green taxis quickly overtook yellow taxis so

that as of June 2015, green taxis accounted

for 70% of Brooklyn’s 850,000 monthly taxi

pickups

Page 9: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 9

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Manhattan Monthly Taxi Pickups

— Manhattan, not surprisingly, accounts for by far the largest number of taxi pickups of any borough

— In any given month, around 85% of all NYC taxi pickups occur in Manhattan, and most of those are made by yellow taxis

— Even though green taxis are allowed to operate in upper Manhattan, they account for barely a fraction of yellow taxi activity

— Uber has grown dramatically in Manhattan, notching a 275% increase in pickups from June 2014 to June 2015, while taxi pickups declined by 9% over the same period

— Uber made 1.4 million Manhattan pickups in June 2015

Page 10: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 10

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Travel Time Midtown to JFK / La Guardia

Page 11: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 11

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Snowfall vs. Rain (Based on NYC)

Page 12: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 12

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

NYC Late Night Taxi Index

— We can use the taxi data to draw some inferences about what parts of the city are popular for going out late at night by looking at the percentage of each census tract’s taxi pickups that occur between 10 PM and 5 AM—the time period I’ve deemed “late night.”

— According to the late night taxi index, if you’re looking for a neighborhood with vibrant nightlife, try Williamsburg, Greenpoint, or Bushwick in Brooklyn

— The census tract with the highest late night taxi index is in East Williamsburg, where 76% of taxi pickups occur between 10 PM and 5 AM

Page 13: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 13

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Investment Bankers

— We can isolate all taxi trips that dropped off in that driveway to get a sense of where Goldman Sachs employees—at least the ones who take taxis—come from in the mornings, and when they arrive. Here’s a histogram of weekday drop off times at 200 West Street

— The cabs start dropping off around 5 AM, then peak hours are 7–9 AM, before tapering off in the afternoon

— Presumably most of the post-morning drop offs are visitors as opposed to employees

— If we restrict to drop offs before 10 AM, the median drop off time is 7:59 AM, and 25% of drop offs happen before 7:08 AM

Page 14: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 14

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Cash or Credit

Page 15: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 15

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Update September 2016 (Brooklyn)

Page 16: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 16

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Update March 2018

Page 17: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 17

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

What else could you compare … ???

— 2016 presidential election results for every

neighborhood in the city compared to Lyft’s

market share gain in each neighborhood to

the neighborhood’s voting patterns

— The data shows that, on average, Lyft gained

more market share from Uber in

neighborhoods that voted more heavily for

Hillary Clinton

— Todd W. Schneiders guess is that liberal

voters were in fact more likely to switch from

Uber to Lyft

Page 18: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 18

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

A View on Some Different Data

Page 19: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 19

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Published: 1896

Arbeiten aus dem Kaiserlichen Gesundheitsamte (10. Band)

Page 20: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 21

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Number of Cholera Dead 1892/1893 in Hamburg and Altona

Page 21: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 23

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Information

Decision support

Suggest conclusions

Digitization and Data Analytics

digitization data analytics

Open questions:

What sort of data ?

Which methods from data analytics and machine learning are appropriate?

How to support humans to cope with these amounts of data?

What are the requirements concerning storage?

Which architecture is adequate?

Which processor power is needed?

Page 22: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 24

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Logistics

— Traffic

— Science

— Industrial environments

— Wheather

— Finance

— Text

— Business

— Social networks

— ...

Many data and many different forms of data! Big Data?

Sorts of Data

Many data and many different forms of data! Big Data?

Page 23: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 25

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

How large is the amount of data?

Source: IDC’s Digital Universe study, sponsored by EMC, 2014

Big means not a fixed scale!

What is „Large“? ZB = 1021B

B kB MB GB TB PB EB ZB

x1000

= 109 x TB

Page 24: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 26

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Big Data Definition(s)

Volume

Data at Rest

Terabytes to exabytes of existing data to process

Velocity

Data in Motion

Streaming data miliseconds to seconds

to respond

Variety

Data in Many Forms

Structures, unstructured, text,

multimedia

Veracity

Data in Doubt

Uncertainty due to data inconsistency

&incompleteness, ambiguities, latency,

deception, model approximations

More important: extract new content from database

Page 25: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 27

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Where is Big Data coming from?

Event- Analysis

Sensor Data

Mobile Revolution

Page 26: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 28

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Users experience in sciences

Support of complete workflows

Speicher

Computing

User interface

Page 27: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 29

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Science point of view: Data life cycle management

Data(flow) perspective Systems perspective

Page 28: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 30

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data analytics processing pipeline

Data Collection

Data Integration/ Aggregation

Analysis/ Modeling

Inter-pretation

Extraction/ Cleaning/

Annotation

Volume

Veracity

Velocity

Variety

… P

riva

cy

Hu

ma

n

Inte

ract

ion

Value

Often ¾ of total effort to get pipeline running Get human in the loop!

Page 29: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 31

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Success of machine learning methods heavily depends on quality and quantity of data

Many machine learning methods are known before „Big Data“

Some machine learning methods are only successfully applicable with a certain amount of

training data

Today training data are available due to

digitization

powerful hardware (artificial training data)

Example Deep Learning

Success results on large amount of training data

Data only available due to thorough digitization

Processing of large amount of data only possible due to hardware evolution

Machine Learning and Big Data

Page 30: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 32

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Where is Big Data coming from?

Scientific Data

Simulation and scientific applications produce large amount of data

Climate models: combine many external data

(geoinformation, measurements, detailed models, ...)

High-energy physics: Many measured values in a

short time

Quelle: CERN

Page 31: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 33

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Requirements from the Users perspective

— Data must be managed, annotated and curated

to extract their potential

— Many research communities do not have the

necessary tools to transform ever-growing data

into scientific knowledge

In science: Not just “big players” – Long Tail of Science

Large Collaborations (e.g. @Cern)

DNA sequencing

And many more!!!

Engineering

Transportation

Page 32: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 34

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Goal:

— Catalog the unique genetic endowment and diversity present in all living bats

— In order to:

understand the molecular basis of their unique adaptations

link genotype with phenotype

uncover their evolutionary history

better understand, promote, and conserve bats.

Real world example: Platinum genome assemblies – The Bat1K project

Collaboration with MPI-CBG: Gene Myers, Martin Pippel

Page 33: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 35

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Multiple DNA sequencing technologies

Avg. Length Application

20 - 40kb

PacBio long reads

Full Chromosomes

Hi-C read pairs

1. Genome Assembly based on noisy long reads

2. Scaffolding: order and orient contigs by using multiple sequencing technologies with increasing long-range information

Contig

150 - 400kb

Bionano Optical Maps

50 - 200kb

10x Genomics read clouds

CMAP Multi Mb

Page 34: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 36

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Assembly pipeline: runtime

Read Patching

Genome Assembly

Error Correction

Scaffolding

Detect and correct sequencing artifacts within PacBio reads, e.g. chimers, missed adapters, low quality read segments

Calculate local alignments between patched reads, followed by several overlap scrubbing phases and generation of an overlap graph. Contigs are generated by touring the overlap graph.

Correct base errors and haplotype phasing by using PacBio reads and 10x read clouds.

Order and orient contigs into Chromosomes by using Bionano optical maps and long-range Hi-C read pairs.

Mapping of complex workflows not trivial to keep overall

performance some very long and short tasks in workflow

Page 35: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 37

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Disruptive Changes due to Hardware Innovation (1994-2018)

Phone of 1994 (Nokia 2110)

Monochrome display (4x13 chars)

20 physical keys

Micro controller for user interface

125 phone book entries

(SMS)

---

---

---

→ Smartphone of 2018 (Galaxy S9)

→ Super AMOLED display, 2960 x 1440 pixel

→ 5.8-inch touchpad

→ Octa-Core 2,7 GHz, 1,7 GHz

→ 4 GB RAM, 64 GB memory

→ Permanent Internet connectivity

→ 8.0 MP front, 12.0 MP rear camera

→ WIFI, Bluetooth, NFC, location (GPS, Glonass,

Beidou, Galileo)

→ compass, barometer, accelerometer, gyroskope

Page 36: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 38

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Today’s phones

see (multiple cameras, ambient light sensor)

hear (multiple microphones)

feel (touchscreen, accelerometer)

are aware of their position, orientation, and 6-axis movement in

3D space (GPS, compass, barometer, gyroskope, accelerometer)

are permanently connected to the Internet (including Cloud services

and other devices)

cannot taste or smell (yet)

Past Hardware Innovation

Page 37: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 39

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

What else can we expect?

— faster CPUs and GPUs

— faster network connectivity

— better auto-connection with more devices

— better cameras

— 3D displays and cameras

— wireless charging

— lower power consumption

More evolution less revolution!

Past Hardware Innovation

Page 38: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 40

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— How to turn transistors into performance?

Processor Power

Page 39: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 41

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Processor Power

— Power consumption per processor

(socket) reached manageable limit

— Hardly any frequency increase in recent

years

⇒ Almost no more IPC improvement

— Transistor count still increasing

⇒ More parallelism

Page 40: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 42

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

HPC Hardware

HPC systems

— High complexity of large scale systems

— Multiple hardware options

‘heavy’ nodes – large RAM and fast CPUs

Accelerators (hybrid systems)

Fast interconnect

Com Commodity systems

— ‚Isolated‘ systems, but interconnects

(internet/network)

— Already some level of parallelism (will

continue in future)

Supercomputer

100,000+ Cores

Cluster

1000+ Cores

shared memory distributed memory + network

Server

~ 12-24+ Cores

Notebook

2-8 Cores

Mobile

2-8 Cores

Page 41: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 43

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Exponential growth

— Increase in parallelism

— Different architectures

— Next Level:

1 ExaFlop/s

(1018 Floating

Point Operationen

per second)

— Currently, energy

consumption is next hurdle

HPC – Top 500

http://www.top500.org/ Vector

SMP Accelerator

Cluster, Commodity

Page 42: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 44

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

The „Big Data paradigm“: Why is not everything in HPC – or parallel computing … Architecture Data driven applications are not easily mapped on HPC architectures Data Sources

— HPC applications also source of Big Data, e.g. large scale simulations

— Streaming applications were not well covered in the past by HPC architectures

— limiting factors are not always of technical nature Storage

— Intermediate and temporary storages need to be organized by users

— lack of sophisticated computing middlewares for data management and organization

HPC and Data Analytics

Page 43: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

ScaDS Dresden/Leipzig

Page 44: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 46

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Motivation

Domain perspective:

— Specifics of data/information: formats,

content, error handling

— Combine theory-driven models with

experimental data (e.g. simulation vs. exp.)

— Often knowledge not well formalized (“in the

experts head“)

— Little or no HPC background

HPC perspective

— Adoption of workloads to larger

infrastructures

— Optimization of workloads / (parallel)

application to provided infrastructure

— Support for use of hard and software layer

(parallel programming, filesystems,

communication), but not on content

— Little or no domain knowledge

Domain Scientist HPC Expertise

Page 45: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 47

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Motivation

Page 46: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 48

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Most important: bring experts together to investigate

requirements of data-intensive applications

and derive solution

— Connect experts and application domain scientists

Motivation

Domain Scientist HPC Expertise Service Center

Page 47: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 49

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Competence Center for collaborative

Big Data driven research

— Established 2014 in Saxony

(TU Dresden, U. Leipzig,

MPI-CBG, IÖR, HZDR, UZF)

ScaDS Dresden/Leipzig

Motivation

Domain Scientist HPC Expertise Service Center

Page 48: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 50

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

National big Data Competence center and Associated Partners

Focal point for new research activities

Specialists from computer & domain sciences

Collaborative big data research

Page 49: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 51

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Still growing network of interested parties: Contacts to industry and academia

Page 50: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 52

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Success Story ScaDS DResden/Leipzig

National & international outreach & visibility 200 keynotes/talks worldwide , 4 successful summer schools, 30 proven experts in guest program, 3 successful Big Data in Industry workshops

Many project aquisitions > 11 Mio Euro (Exploids, BIGGR, TIQ-Graph, KOBRA, MASI, GERDIE, EMUDIG4.0..)

Strong scientific output and competence (>200 publications) i.a. Big Graph Analytics, Sierra Platinum , CTS, data intensive workflows for HPC, settlement recognition in historic maps, Interactive Multi-Scale Visualization...

Service Center for Big Data with high impact Numerous interdisciplinary big data application projects and industry collaborations & transfer in industry

Successful training and education program “Big-Data-Schwerpunkt”: lectures/ seminars/ trainings/ PhD seminars Hundreds of Graduates with Big Data Expertise (Master) >10 PhDs in Big Data close to finishing

Page 51: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 53

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Currently running phase 2 of center with evolved research program

— Connect to many

application areas

— Service Center as

integrative component

— strong network of

internationally

recognized experts (PI)

Page 52: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 54

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Scalable Architectures (Prof. Dr. Wolfgang E. Nagel)

— Integration of hardware features into application layer

— Agile provisioning of Big Data environments for analytics

— Performance investigations of community and general frameworks

Hardware-based Data Security (Prof. Dr. Martin Bogdan)

— Architecture and implementation of a verification system for

efficient key exchange in secure communication

(e.g. IoT applications)

Scalable and secure Data Platforms

www.scads.de

Scalable and Secure Data Platforms

Scalable Architectures Hardware-based Data Security

Page 53: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 55

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Information extraction from partially structured data

(Prof. Dr. Wolfgang Lehner)

— Graph based and privacy preserving data integration

(Prof. Dr. Erhard Rahm)

— Analytics of dynamic graph data (Prof. Dr. Erhard Rahm)

— Intelligent text analysis (Dr. Martin Potthast)

— Information extraction on super genomes

(Prof. Dr. Peter Stadler)

— Data analytics for process models (Prof. Dr. Bogdan Franczyk)

Big Data Integration & Analytics

Big Data Integration and Analytics

Big Data Integration Data Analytics

Page 54: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 56

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Scalable Visual Analytics (Prof. Dr. Gerik Scheuermann)

— Integration of annotated data in super genome browser and connection

to visual analysis

— Modular extension for new data types

Immersive Visual Interaction (Prof. Dr. Stefan Gumhold, Prof. Dr. Raimund Dachselt)

— Methods for cross-scale and ensemble visualization

— Methods development for data interaction on immersive

large-scale displays and in virtual reality

— Interactive visual analysis using large scale

Scalable and secure Data Platforms

Visual Analytics

Scalable Visual Analytics Immersive Visual Interaction

Page 55: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

ZIH and HPC / Data Analytics Infrastructure at TU Dresden

Page 56: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 58

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Central Scientific Unit at TU Dresden

— Running computing and communication

infrastructure for the university

— Development of algorithms and methods:

Cooperation with users from all departments

— Providing infrastructure and qualified service

for scientists all over Saxony

— Dresden CUDA Center for Excellence

— Dresden Intel® Parallel Computing Center (IPCC)

— Competence center for „Parallel Computing and Software Tools“

— Competence center for Big Data – ScaDS Dresden/Leipzig

Center for Information Services and HPC (ZIH)

Page 57: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 59

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Research topics

Scalable software tools to support the optimization of applications for HPC systems

Data intensive computing and data life cycle

Performance and energy efficiency analysis for innovative computer architectures

Distributed computing and cloud computing

Data analysis, methods and modelling in life sciences

Parallel programming, algorithms and methods

— Pick up and preparation of new concepts, methods,

and techniques

— Teaching and Education

Areas of Expertise

Page 58: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 60

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data Center and High Performance Computing for Saxony

Data center design innovation

— Innovative cooling

— Energy efficiency

— Reliability

Open for research collaborations

New hardware (10 Mio. €) for machine learning and Big Data

Page 59: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 61

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— HPC system design expertise

— Focussing on data intensive tasks for more than 15 years

— HRSK-I

Two machines, one HPC, one Throughput, one Capability

Lots of tape drives to move data in and out (SGI CXFS), almost 2 GB/s to tape in 2006

— HRSK-II

Island concept with HPC and Througput

High I/O bandwidth

HDD+SSD file system, 100 GB/s to disk and lots of IOPS

— And now: HPC-DA – new hardware (10 Mio. €) for machine learning and Big Data

Data Intensive Computing at ZIH

Page 60: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 62

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data Center and High Performance Computing for Saxony

Bull Cluster (Taurus)

Petaflop cluster Bull, bullx DLC B720/R400

— ~ 44,000 cores Intel

— 256 GPUs Nvidia Tesla K80 +

— 44 GPUs: Nvidia Tesla K20

— 136 TB RAM, >5 PB scratch file system

Extensions for Machine Learning (10Mio. € extension)

— 22 nodes IBM Power9 CPU (44 cores), 6 Nvidia V100 per node

— NVLink between GPUs and CPUs with 100 GB/s bi-directional bandwidth

— 612 nodes (x86-64) within Taurus (Data Analytics Island) have a high-bandwidth connection

to the NVMe-based storage component with up to 1.5 TB/s bandwidth

Page 61: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 63

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

HPC-DA extension towards extremely fast I/O

— Redesigned one compute island of HRSK II

— Strong focus on highest bandwidth and low latency

— 612 CPU compute nodes

— 22 new Machine Learning Nodes IBM AC922

Each: 2x Power-9 CPUs,

6x NVIDIA V100 GPUs, NVLink

Is extended to 32 Nodes (192 V100), acceptance in preparation

— 90 NVME storage nodes (2 PB PCI2, 2TB/s)

Each node with 8 3,2 TB PCIe x4 NVME cards

Dual-link EDR IB, NVME over fabric

— 10 PB Object Storage with 50 GB/s bandwitdth

HPC-DA Extensions 2018/19

New Data

Analytics Island

612 CPU Nodes

(24 core Haswell)

Island Switch

90 NVMe

Storage Nodes

(2 PB PCIe NVME)

22 IBM AC922

ML Nodes

(2 Power9 CPUs,

6 NVIDIA V100) 2

TB/s

1,5 TB/s

0,4 TB/s

Core

Switches

500 GB/s

Core Switch Core Switch

Other

Compute

Nodes

10 PB

Object

Storage 50 GB/s

500 GB/s

Page 62: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 64

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Installation is open to scientists from all over Germany whose HPC and Big Data application

cases can benefit from HPC-DA

— Overall more than 35000 Cores, additionally:

2 petabytes of flash memory (bandwidth of about 2 terabytes/s)

Object storage of 10 petabytes

IBM Power-9 nodes (22), each with six Nvidia V100 GPUs, closely connected to the fast storage systems

— Scalable virtual research environments tailored to user requirements

— Project proposals can be submitted:

https://tu-dresden.de/zih/hochleistungsrechnen/zugang/hpc-da

HPC and Data Analytics (HPC-DA)

Page 63: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 65

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Reading through 1 PB of data: 500 Seconds Move data to Dresden via DFN: 36 TB/hour Archiving 1 PB: 6 hours

Data Center and High Performance Computing for Saxony

Page 64: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 66

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

What do Users do on our Machines?

Page 65: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 67

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data and HPC – Convergence Patterns at ZIH

Virtual Research Environments

HPC HPC HTC NVRAM ML

Memory Virtualization

Compute Virtualization

classical HPC

Lustre Memory Memory …

Flink YARN …

Federation

Abstraction,

Services

Compute

Memory

Simulation Analysis Throughput

Streams, Data

Memory

Compute

Legend

HPC-DA: Hardware

HPC-DA: Software

HRSK-II

Page 66: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 68

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data and HPC – Convergence Patterns at ZIH

Virtual Research Environments

HPC HPC HTC NVRAM ML

Memory Virtualization

Compute Virtualization

classical HPC

Lustre Memory Memory …

Flink YARN …

Federation

Abstraction,

Services

Compute

Memory

Simulation Analysis Throughput

Streams, Data

Memory

Compute

Legend

HPC-DA: Hardware

HPC-DA: Software

HRSK-II

Classic HPC

Page 67: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 69

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data and HPC – Convergence Patterns at ZIH

Virtual Research Environments

HPC HPC HTC NVRAM ML

Memory Virtualization

Compute Virtualization

classical HPC

Lustre Memory Memory …

Flink YARN …

Federation

Abstraction,

Services

Compute

Memory

Simulation Analysis Throughput

Streams, Data

Memory

Compute

Legend

HPC-DA: Hardware

HPC-DA: Software

HRSK-II

Add system features via

virtualization layer

Page 68: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 70

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Data and HPC – Convergence Patterns at ZIH

Virtual Research Environments

HPC HPC HTC NVRAM ML

Memory Virtualization

Compute Virtualization

classical HPC

Lustre Memory Memory …

Flink YARN …

Federation

Abstraction,

Services

Compute

Memory

Simulation Analysis Throughput

Streams, Data

Memory

Compute

Legend

HPC-DA: Hardware

HPC-DA: Software

HRSK-II

Enhance software stack up to

complete unique software settings

Page 69: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 71

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Provisioning of required environments

(Hadoop, Spark, Flink, ML-frameworks, …)

— Big Data session created on demand

— Run directly as analytics service at

HPC site

— Adoptable to other frameworks/applications

Provision of data analytics@HPC

Page 70: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 72

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Beyond usual HPC job scheduling

— Running jobs and workflows on this

infrastructure is complex

— Provide SW environments and

— templates for primary use cases

Complex scheduling @ heterogeneous hardware

Page 71: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 73

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

NVME leases

— Provision NVME devices according to optimal

access model (parallel FS, read-only FS, HDFS,

database, raw block devices, …), several

templates to start from

— Either on NVME host or mounted to compute

nodes via NVME-over-fabrics

— Flexible NVME device assignment, not tied to

one compute node as in burst buffers

— Users allocate NVME devices exclusively for

their working data set as “NVME lease” over

medium-term periods (days to weeks)

Provisioning of NVME nodes

Evacuate and restore

— NVME leases may be evacuated to the object

storage and restored later

Page 72: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Performance Investigations

Page 73: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 75

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

— Stage-in and Stage-out of complete research environments

— Analysis Tools

BYO

— Our own projects

ADA-FS, HP-DLF, NextGenIO

Vampir, ProPE, Score-P

ScaDS Dresden/Leipzig

Versatile Support

Page 74: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 76

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 (First shot on Taurus) – V100 GPU performance at 6.4 TFlops sustained (fixed Hz)

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 75: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 77

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 (First shot on Taurus) – Copying data and its impact on performance

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 76: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 78

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 (Best results on Taurus) – Sustained performance @ 7.0 TF and ~5.7 TF (incl. copy)

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 77: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 79

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 (Best results on Taurus) – Close-up reveals performance pattern (every 64 operands)

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 78: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 80

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 (Best results on Taurus) – Repetitive pattern also visible for the data copy case

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 79: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 81

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 with varying precision – Tensor cores: matrix dimension should be multiple of 8

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 80: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 82

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 with varying precision – Fall-back to non-tensor math if dim. not multiple of 8

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 81: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 83

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

cuBLAS on V100 with varying precision – Fall-back to non-tensor math if dim. not multiple of 8

Linear Algebra Performance Insights on Power9 and NVIDIA V100

Page 82: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Early results on Power9 and V100

Page 83: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 85

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Taurus vs. Power9 + V100 (no TensorCores)

Performance Insights on Power9 and NVIDIA V100

Page 84: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 86

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Performance differences

V100 + Power9

— ~ 7.8 TF sustained per GPU (double)

— ~ 6.5 TF sustained with copy

V100 + Taurus

— 7.0 TF

— ~5.7 TF with copy

Similarities

— Performance jumps for problem size > 5.500 (?)

Performance Insights on Power9 and NVIDIA V100

Page 85: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 87

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

hgemm on Power9 (+TensorCores) and CUDA 9.2 Peak: ~90 TFlops, with memory Transfer: ~60 TFlops

Performance Insights on Power9 and NVIDIA V100

Page 86: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 88

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

hgemm on Power9 with TensorCores and CUDA 10.0

Performance Insights on Power9 and NVIDIA V100

CUDA 10 required to exceed 100 TFlops. Jump @28000?

Page 87: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 89

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

hgemm on Power9 with TensorCores and CUDA 10.0

Performance Insights on Power9 and NVIDIA V100

CUDA 10 required to exceed 100 TFlops. Jump @28000?

Page 88: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Outlook – Future Perspective

Page 89: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 91

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Machine Learning at scale only successful

— if there are enough data to learn with

— if data is understood

— if data are quality data

— Our topics are essential for ML/AI

Outreach to support Big Data and Machine Learning communities in Germany

Germany-wide offer: HPC-DA infrastructure and expertise

ScaDS Dresden/Leipzig is an important part of the Big Data / AI community in Germany

Big Data and Machine Learning in Germany

Page 90: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 92

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Long term continuation of ScaDS Dresden/Leipzig as one Germany wide center for data analytics and artificial intelligence

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

Data centric research

(Big Data) AI algorithms

Knowledge representation

ScaDS Dresden/Leipzig: Big Data Competence & Service Center

ScaDS.AI Dresden/Leipzig

Page 91: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 93

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

ScaDS Dresden/Leipzig: Big Data Competence & Service Center

Knowledge

AI Foundations

Applied AI

Basic methodical research for AI

Formal methods for content description and semantics

Application into domain fields

Page 92: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 94

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

Knowledge

AI Foundations

Applied AI

Machine

Learning for

Graph Data

Neuro-inspired

AI-Methods

Privacy-Preserving

Machine Learning AI-Driven 3D

Reconstruction

Explanations

for Trusted AI

ScaDS Dresden/Leipzig: Big Data Competence & Service Center

Page 93: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 95

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

ScaDS Dresden/Leipzig: Big Data Competence & Service Center

Knowledge

AI Foundations

Applied AI

Knowledge aware

computing Knowledge Graphs

for AI Scalable Training

Data Acquisition Conversational AI

Page 94: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 96

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

ScaDS Dresden/Leipzig: Big Data Competence & Service Center

Knowledge

AI Foundations

Applied AI AI for Security

Data Science

for Biomedical

Applications

Solving social

problems by AI

Hyperspectral

Imaging

Page 95: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 97

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

ScaDS Dresden/Leipzig: Big Data Competence & Service Center

Knowledge

AI Foundations

Applied AI

Page 96: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 98

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Extensions towards center for AI – ScaDS.AI Dresden/Leipzig

Proposed extension has been approved by external reviewers and extension of center is going to be implemented starting Q4/2019 as

Big Data / AI competence center!

Page 97: Digitization and Data Analytics - ScaDS€¦ · Update September 2016 (Brooklyn) Slide 16 Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel Update March

Slide 99

Data Analytics and High Performance Computing - a convergence? Wolfgang E. Nagel

Thank You!

Acknowledgements: Holger Brunst Robert Dietrich René Jäkel Michael Kluge Andreas Knüpfer Ulf Markwardt Hartmut Mix Eric Peukert Erhard Rahm Robert Schöne Sunna Torge … and many more