Top Banner
Information: https://abci.ai/ Contact: [email protected] ABCI Hardware nWorld top-level compute and data process capability nOpen, Public, and Dedicated infrastructure for Al & Big Data Algorithms, Software, and Applications nOpen Innovation Platform to accelerate joint academic-industry R&D for AI Peak Performance: 550 PFlops (FP16) 37.2 PFlops (FP64) Effective Performance: 19.88 PFlops (#7 in Top500) 14.423 GFlops/W (#4 in Green500) 508.85 TFlops (#5 in HPCG) ImageNet training 75 seconds Average PUE: < 1.1 (Estimated) ABCI: World’s First Large-Scale Open AI Infrastructure n Single floor, cost effective building n Hard concrete floor 2t/m 2 weight tolerance for racks and cooling pods n Cooling capacity: 3.2MW 70kW/rack: 60kW water+ 10kW air Warm water (32°C) free cooling n Power capacity: 3.25 MW ABCI uses 2.3MW max DC Facilities Cooling Pod Overview Gateway and Firewall Large-scale Storage Systems Interactive Nodes x 4 Interconnect (InfiniBand EDR) Service Network (10GbE) Management and Gateway Nodes x 15 High-Performance Computing System 550 PFlops(FP16), 37.2 PFlops (FP64) 476 TiB Mem, 1.74 PB NVMe SSD Computing Nodes (w/GPU) x1088 Multi-platform Nodes (w/o GPU) x10 Intel Xeon Gold6132 (2.6GHz/14cores) x2 768GiB Memory, 3.8TB NVMe SSD 22 PB GPFS DDN SFA14K (w/ SS8462 Enclosure x 10) x 3 12TB 7.2Krpm NL-SAS HDD x 2400 3.84TB SAS SSD x 216 Mellanox CS7500 x 2 Mellanox SB7890 x 229 Nexsus 3232C x2 FortiGate 1500D x2 FortiAnalyzer 400E x1 100Gbps SINET5 GPU NVIDIA Tesla V100 SXM2 x4 CPU Intel Xeon Gold 6148 x2 Memory 384GiB Local Storage 1.6TB NVMe SSD Interconnect InfiniBand EDR x2 1 PB Lustre DDN SFA14KX (w/ SS9012 Enclosure x 10) x 1 7.68TB SAS SSD x 185 for data 960GB SAS SSD x 13 for metadata 17 PB Scality RING Object Storage HPE Apollo 4510 Gen10 x 24 12TB SATA HDD x 1440 3.2TB SSD x 24 Services for Satisfying User Needs n Provide various types of resources by separating each Compute Node on demand Full node, 1 GPU, 4 GPUs, no GPU instances Each instance has different #GPU, #CPU core, amount of Memory and amount of NVMe SSD n Not only providing traditional HPC libraries, but also users can deploy various DL frameworks via Pip n Run any NGC container images using Singularity CPU0 CPU1 GPU0 GPU1 GPU2 GPU3 CPU0 CPU1 GPU0 GPU1 GPU2 GPU3 C.small G.small G.large C.large DL Performance on ABCI 0.74 0.745 0.75 0.755 0.76 0.765 0 200 400 600 800 1000 1200 1400 1600 MSRA (2015) Facebook (2017) Google Brain (2017) Preferred Networks (2017) Tencent (2018) Sony + ABCI (2018) Google (2018) Google (2018) Sony + ABCI (2019) Fujitsu Lab + ABCI (2019) ImageNet / ResNet-50 (Relative speedup & Accuracy) Relative speedup Accuracy 224 sec Tesla V100 x2176 2.2 min TPU v3 x1024 1.8 min TPU v3 x1024 122 sec Tesla V100 x3456 75 sec Tesla V100 x2048 ABCI Grand Challenge #2 ABCI Grand Challenge #3 n World’s Highest Speed in ImageNet-1k Training The current world record is by Fujitsu Lab and ABCI: 75.08% accuracy in 74.7 seconds Links to related papers
1

ABCI: World’s First Large-Scale Open AI Infrastructure...2019/06/16  · Relative speedup Accuracy 224 sec Tesla V100 x2176 2.2 min TPU v3 x1024 1.8 min TPU v3 x1024122 sec Tesla

Oct 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ABCI: World’s First Large-Scale Open AI Infrastructure...2019/06/16  · Relative speedup Accuracy 224 sec Tesla V100 x2176 2.2 min TPU v3 x1024 1.8 min TPU v3 x1024122 sec Tesla

Information: https://abci.ai/Contact: [email protected]

ABCI Hardware

nWorld top-level compute and data process capability

nOpen, Public, and Dedicated infrastructure for Al & Big Data Algorithms, Software, and Applications

nOpen Innovation Platform to accelerate joint academic-industry R&D for AI

Peak Performance:550 PFlops (FP16)37.2 PFlops (FP64)

Effective Performance:19.88 PFlops (#7 in Top500)14.423 GFlops/W (#4 in Green500)508.85 TFlops (#5 in HPCG)ImageNet training 75 seconds

Average PUE: < 1.1 (Estimated)

ABCI: World’s First Large-Scale Open AI Infrastructure

nSingle floor, cost effective building

nHard concrete floor 2t/m2

weight tolerance for racks and cooling pods

nCooling capacity: 3.2MW• 70kW/rack: 60kW water+10kW air

• Warm water (32°C) free cooling

nPower capacity: 3.25 MW• ABCI uses 2.3MW max

DC Facilities

Cooling Pod

Overview

Gateway and Firewall

Large-scale Storage Systems

Interactive Nodes x 4

Interconnect (InfiniBand EDR)

Service Network (10GbE)

Management and Gateway Nodes x 15

High-Performance Computing System 550 PFlops(FP16), 37.2 PFlops (FP64)476 TiB Mem, 1.74 PB NVMe SSD

Computing Nodes (w/GPU) x1088

Multi-platform Nodes (w/o GPU) x10• Intel Xeon Gold6132 (2.6GHz/14cores) x2• 768GiB Memory, 3.8TB NVMe SSD

22 PB GPFSDDN SFA14K (w/ SS8462 Enclosure x 10) x 3•12TB 7.2Krpm NL-SAS HDD x 2400•3.84TB SAS SSD x 216

• Mellanox CS7500 x 2• Mellanox SB7890 x 229

• Nexsus 3232C x2• FortiGate 1500D x2• FortiAnalyzer 400E x1

100Gbps

SINET5

GPU NVIDIA Tesla V100 SXM2 x4

CPU Intel Xeon Gold 6148 x2

Memory 384GiB

Local Storage 1.6TB NVMe SSD

Interconnect InfiniBand EDR x2

1 PB LustreDDN SFA14KX (w/ SS9012 Enclosure x 10) x 1•7.68TB SAS SSD x 185 for data•960GB SAS SSD x 13 for metadata

17 PB Scality RING Object StorageHPE Apollo 4510 Gen10 x 24•12TB SATA HDD x 1440•3.2TB SSD x 24

Services for Satisfying User NeedsnProvide various types of resources by separating each

Compute Node on demand• Full node, 1 GPU, 4 GPUs, no GPU instances• Each instance has different #GPU, #CPU core, amount of

Memory and amount of NVMe SSD

nNot only providing traditional HPC libraries, but also users can deploy various DL frameworks via Pip

nRun any NGC container images using Singularity

CPU0

CPU1

GPU0

GPU1

GPU2

GPU3

CPU0

CPU1

GPU0

GPU1

GPU2

GPU3

C.small G.small

G.large C.large

DL Performance on ABCI

0.74

0.745

0.75

0.755

0.76

0.765

0

200

400

600

800

1000

1200

1400

1600

MSRA(2015)

Facebook(2017)

GoogleBrain

(2017)

PreferredNetworks

(2017)

Tencent(2018)

Sony +ABCI

(2018)

Google(2018)

Google(2018)

Sony +ABCI

(2019)

FujitsuLab +ABCI

(2019)

ImageNet / ResNet-50 (Relative speedup & Accuracy)

Relative speedup Accuracy

224 secTesla V100 x2176

2.2 minTPU v3 x1024

1.8 minTPU v3 x1024 122 sec

Tesla V100 x3456

75 secTesla V100 x2048ABCI Grand

Challenge #2

ABCI Grand Challenge #3

nWorld’s Highest Speed in ImageNet-1k Training• The current world record is by Fujitsu Lab and

ABCI: 75.08% accuracy in 74.7 seconds

Links to related papers