Top Banner
July 2020 INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY
24

INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

Sep 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

July 2020

INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY

Page 2: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

2

THE NEW SCIENTIFIC COMPUTING WORLD

NETWORK

EDGE

APPLIANCE

SUPERCOMPUTER

STORAGE

Page 3: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

3

HDR 200G INFINIBAND ACCELERATES NEXT GENERATION HPC AND AI SUPERCOMPUTERS (EXAMPLES)

8K HDR NodesDragonfly+ Topology

9 PetaFLOPS3K HDR NodesDragonfly+ Topology

7.5 PetaFLOPS2K HDR NodesDragonfly+ Topology

35.5 PetaFLOPS2K HDR NodesFat-Tree Topology

23 PetaFLOPS5.6K HDR NodesDragonfly+ Topology

HPC/AI CloudHDR InfiniBand

HDR Supercomputers

23.5 PetaFLOPS8K HDR NodesFat-Tree Topology

27.6 PetaFLOPS3K HDR NodesFat-Tree Topology

3K HDR Nodes16 PetaFLOPSDragonfly+ Topology

Page 4: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

4

INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS

GPU

CPU

DPU

Smart End-Point Architected to Scale Centralized Management Standard

Page 5: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

5

INFINIBAND ACCELERATED SUPERCOMPUTING

SHARP AI Technology

AI Acceleration Engines

2.5X Higher AI Performance

UFM Cyber AI

Data Center Cyber Intelligence and Analytics

Speed of Light

200Gb/s Data Throughput

RDMA and GPUDirect RDMA

3X Better (Lower) Latency

SHIELD AI Technology

Self Healing Network

1000X Faster Recovery Time

Page 6: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

6

THE NEW DATA CENTER Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale

CPU-Centric (Onload)

Must Wait for the Data

Creates Performance Bottlenecks

Security Limitations

Onload Network

CPU

GPU

CPU

GPU

CPU

GPU

CPU

GPU

Data-Centric (Offload)

Analyze Data as it Moves!

Higher Performance and Scale

Secured Supercomputing

In-Network Computing

CPU

GPU

CPU

GPU

CPU

GPU

CPU

GPU

Page 7: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

7

Network

Communication

Application High Performance Computing

Data Analysis

Deep Learning

Cyber Security

In-Network Computing

NVMe, Containers, OpenStack

Storage / Other Resource Disaggregation

Full Network Transport Offload

RDMA and GPU-Direct RDMA

SHIELD (Self-Healing Network)

Enhanced Adaptive Routing and Congestion Control

Connectivity Ultimate Software Defined Network

Multi-Host and Socket-Direct Technology

Enhanced and Flexible Topologies

THE SMARTEST INTERCONNECT

Page 8: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

8

SCALABLE HIERARCHICAL AGGREGATION AND

REDUCTION PROTOCOL (SHARP)

Page 9: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

9

SCALABLE HIERARCHICAL AGGREGATION AND REDUCTION PROTOCOL (SHARP)

In-network Tree based aggregation mechanism

Multiple simultaneous outstanding operations

For HPC (MPI / SHMEM) and Distributed Machine Learning applications

Scalable High Performance Collective Offload

Barrier, Reduce, All-Reduce, Broadcast and more

Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND

Integer and Floating-Point, 16/32/64 bits

DataAggregated

AggregatedResult

Aggregated Result

Data

Switch Switch

Switch

HostHostHost Host Host

Page 10: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

10

SHARP ALLREDUCE PERFORMANCE ADVANTAGES Providing Flat Latency, 7X Higher Performance

Page 11: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

11

SHARP PERFORMANCE ADVANTAGE OVER ROCE4X Higher Performance

Page 12: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

12

INFINIBAND SHARP AI PERFORMANCE ADVANTAGE2.5X Higher Performance

Page 13: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

13

INFINIBAND ACCELERATED AI PLATFORMS

NVIDIA DGX A100 SuperPODWorld’s most Advanced AI System

AISTThe AI Bridging Cloud

Infrastructure

Microsoft Azure200 Gigabit HDR InfiniBand Boosts Microsoft Azure High-Performance

Computing Cloud Instances

ContinentalAdvanced Driver Assistance

Systems (ADAS)

Page 14: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

14

HDR INFINIBAND

Page 15: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

15

HDR 200G INFINIBAND SOLUTIONS (NOW)

Transceivers

Active Optical and Copper Cables

40 HDR (200Gb/s) InfiniBand Ports

80 HDR100 InfiniBand Ports

Modular Switch - 800 HDR (1600 HDR1000) Ports

200Gb/s Adapter

PCIe Gen4

Drivers, Management, Frameworks and Accelerations

UFM, UCX, MPI, SHMEM/PGAS, UPC

System on Chip and SmartNIC

Programmable adapter, Smart Offloads

Page 16: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

16

MELLANOX SKYWAY™ INFINIBAND TO ETHERNET GATEWAY

100G EDR / 200G HDR InfiniBand to 100G and 200G Ethernet gateway

400G NDR / 800G XDR InfiniBand speeds ready

Eight EDR/HDR100/HDR InfiniBand ports to eight 100/200G Ethernet

Max throughput of 1.6 Terabit per second

High availability and load balancing

Mellanox Gateway operating system

Scalable and efficient

Page 17: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

17

METROX®-2

Seamlessly connects InfiniBand data-centers up to 40 kilometers-apart

Scalability and load balancing across data-centers

Continues compute service in case of data-center failures

Standard HDR and EDR InfiniBand end-to-end

Advanced In-Network Computing

Extending InfiniBand to 40km Reach

Page 18: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

18

UFM INFINIBAND CYBER INTELLIGENCE AND

ANALYTICS PLATFORM

Page 19: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

19

REVOLUTIONIZING SUPERCOMPUTINGAI-Powered InfiniBand Cyber Intelligence and Analytics Platform

Management and Orchestration

Predictive and Preventive Maintenance

Telemetry and Monitoring Cyber-security and Anomaly Detection

Integration of Real-Time Telemetry with AI Algorithms to Secure Supercomputers, and Enable Predictive Maintenance for OPEX Optimizations

Page 20: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

20

UFM PLATFORMS PORTFOLIO

UFM Cyber-AICyber Intelligence and Analytics

(UFM Cyber-AI includes UFM Enterprise)

UFM EnterpriseManagement, Monitoring & Orchestration

(UFM Enterprise includes UFM Telemetry)

UFM Telemetry Real-Time Monitoring

Page 22: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

22

SUMMARY

Page 23: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND

23NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

INFINIBAND DELIVERS HIGHEST PERFORMANCE AND ROI

200G end-to-end, extremely low latency, high message rate, RDMA and GPUDirect

Advanced adaptive routing, congestion control and quality of service for highest network efficiency

In-Network Computing engines for accelerating applications performance and scalability

Self Healing Network with SHIELD for highest network resiliency

Standard - backward and forward compatibility – protecting datacenter investments

InfiniBand

NVMe / Storage

InfiniBand High Speed Network

Advanced In-Network Computing

Extremely Low Latency

Ethernet

NVMe / Storage

High Speed Gateway

InfiniBand to Ethernet

Compute Servers

InfiniBand

Long-Haul InfiniBand

Page 24: INFINIBAND IN-NETWORK COMPUTING TECHNOLOGY · INFINIBAND NETWORK TECHNOLOGY FUNDAMENTALS GPU CPU DPU Smart End-Point Architected to Scale Centralized Management Standard. 5 INFINIBAND