INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

Post on 21-Sep-2020

1 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

July 2020

INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY

2

THE NEW SCIENTIFIC COMPUTING WORLD

NETWORK

EDGE

APPLIANCE

SUPERCOMPUTER

STORAGE

3

HDR 200G INFINIBAND ACCELERATES NEXT GENERATION HPC AND AI SUPERCOMPUTERS (EXAMPLES)

8K HDR NodesDragonfly+ Topology

9 PetaFLOPS3K HDR NodesDragonfly+ Topology

7.5 PetaFLOPS2K HDR NodesDragonfly+ Topology

35.5 PetaFLOPS2K HDR NodesFat-Tree Topology

23 PetaFLOPS5.6K HDR NodesDragonfly+ Topology

HPC/AI CloudHDR InfiniBand

HDR Supercomputers

23.5 PetaFLOPS8K HDR NodesFat-Tree Topology

27.6 PetaFLOPS3K HDR NodesFat-Tree Topology

3K HDR Nodes16 PetaFLOPSDragonfly+ Topology

4

INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS

GPU

CPU

DPU

Smart End-Point Architected to Scale Centralized Management Standard

5

INFINIBAND ACCELERATED SUPERCOMPUTING

SHARP AI Technology

AI Acceleration Engines

2.5X Higher AI Performance

UFM Cyber AI

Data Center Cyber Intelligence and Analytics

Speed of Light

200Gb/s Data Throughput

RDMA and GPUDirect RDMA

3X Better (Lower) Latency

SHIELD AI Technology

Self Healing Network

1000X Faster Recovery Time

6

THE NEW DATA CENTER Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale

CPU-Centric (Onload)

Must Wait for the Data

Creates Performance Bottlenecks

Security Limitations

Onload Network

CPU

GPU

CPU

GPU

CPU

GPU

CPU

GPU

Data-Centric (Offload)

Analyze Data as it Moves!

Higher Performance and Scale

Secured Supercomputing

In-Network Computing

CPU

GPU

CPU

GPU

CPU

GPU

CPU

GPU

7

Network

Communication

Application High Performance Computing

Data Analysis

Deep Learning

Cyber Security

In-Network Computing

NVMe, Containers, OpenStack

Storage / Other Resource Disaggregation

Full Network Transport Offload

RDMA and GPU-Direct RDMA

SHIELD (Self-Healing Network)

Enhanced Adaptive Routing and Congestion Control

Connectivity Ultimate Software Defined Network

Multi-Host and Socket-Direct Technology

Enhanced and Flexible Topologies

THE SMARTEST INTERCONNECT

8

SCALABLE HIERARCHICAL AGGREGATION AND

REDUCTION PROTOCOL (SHARP)

9

SCALABLE HIERARCHICAL AGGREGATION AND REDUCTION PROTOCOL (SHARP)

In-network Tree based aggregation mechanism

Multiple simultaneous outstanding operations

For HPC (MPI / SHMEM) and Distributed Machine Learning applications

Scalable High Performance Collective Offload

Barrier, Reduce, All-Reduce, Broadcast and more

Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND

Integer and Floating-Point, 16/32/64 bits

DataAggregated

AggregatedResult

Aggregated Result

Data

Switch Switch

Switch

HostHostHost Host Host

10

SHARP ALLREDUCE PERFORMANCE ADVANTAGES Providing Flat Latency, 7X Higher Performance

11

SHARP PERFORMANCE ADVANTAGE OVER ROCE4X Higher Performance

12

INFINIBAND SHARP AI PERFORMANCE ADVANTAGE2.5X Higher Performance

13

INFINIBAND ACCELERATED AI PLATFORMS

NVIDIA DGX A100 SuperPODWorld’s most Advanced AI System

AISTThe AI Bridging Cloud

Infrastructure

Microsoft Azure200 Gigabit HDR InfiniBand Boosts Microsoft Azure High-Performance

Computing Cloud Instances

ContinentalAdvanced Driver Assistance

Systems (ADAS)

14

HDR INFINIBAND

15

HDR 200G INFINIBAND SOLUTIONS (NOW)

Transceivers

Active Optical and Copper Cables

40 HDR (200Gb/s) InfiniBand Ports

80 HDR100 InfiniBand Ports

Modular Switch - 800 HDR (1600 HDR1000) Ports

200Gb/s Adapter

PCIe Gen4

Drivers, Management, Frameworks and Accelerations

UFM, UCX, MPI, SHMEM/PGAS, UPC

System on Chip and SmartNIC

Programmable adapter, Smart Offloads

16

MELLANOX SKYWAY™ INFINIBAND TO ETHERNET GATEWAY

100G EDR / 200G HDR InfiniBand to 100G and 200G Ethernet gateway

400G NDR / 800G XDR InfiniBand speeds ready

Eight EDR/HDR100/HDR InfiniBand ports to eight 100/200G Ethernet

Max throughput of 1.6 Terabit per second

High availability and load balancing

Mellanox Gateway operating system

Scalable and efficient

17

METROX®-2

Seamlessly connects InfiniBand data-centers up to 40 kilometers-apart

Scalability and load balancing across data-centers

Continues compute service in case of data-center failures

Standard HDR and EDR InfiniBand end-to-end

Advanced In-Network Computing

Extending InfiniBand to 40km Reach

18

UFM INFINIBAND CYBER INTELLIGENCE AND

ANALYTICS PLATFORM

19

REVOLUTIONIZING SUPERCOMPUTINGAI-Powered InfiniBand Cyber Intelligence and Analytics Platform

Management and Orchestration

Predictive and Preventive Maintenance

Telemetry and Monitoring Cyber-security and Anomaly Detection

Integration of Real-Time Telemetry with AI Algorithms to Secure Supercomputers, and Enable Predictive Maintenance for OPEX Optimizations

20

UFM PLATFORMS PORTFOLIO

UFM Cyber-AICyber Intelligence and Analytics

(UFM Cyber-AI includes UFM Enterprise)

UFM EnterpriseManagement, Monitoring & Orchestration

(UFM Enterprise includes UFM Telemetry)

UFM Telemetry Real-Time Monitoring

22

SUMMARY

23NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

INFINIBAND DELIVERS HIGHEST PERFORMANCE AND ROI

200G end-to-end, extremely low latency, high message rate, RDMA and GPUDirect

Advanced adaptive routing, congestion control and quality of service for highest network efficiency

In-Network Computing engines for accelerating applications performance and scalability

Self Healing Network with SHIELD for highest network resiliency

Standard - backward and forward compatibility – protecting datacenter investments

InfiniBand

NVMe / Storage

InfiniBand High Speed Network

Advanced In-Network Computing

Extremely Low Latency

Ethernet

NVMe / Storage

High Speed Gateway

InfiniBand to Ethernet

Compute Servers

InfiniBand

Long-Haul InfiniBand

top related