Timothy Lanfear ENDURING DIFFERENTIATION 2 WHERE ARE WE? 3 LIFE AFTER DENNARD SCALING 1980 1990 2000 2010 2020 10 2 10 3 10 4 10 5 10 6 10 7 40 Years of Microprocessor Trend Data Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp Single-threaded perf 1.5X per year 1.1X per year Transistors (thousands) 4 0 8 16 24 32 40 AMBER Performance (ns/day) P100 2016 K80 2015 K40 2014 K20 2013 AMBER 12 CUDA 4 AMBER 14 CUDA 5 AMBER 14 CUDA 6 AMBER 16 CUDA 8 0 2400 4800 7200 9600 12000 GoogleNet Performance (i/s) cuDNN 2 CUDA 6 cuDNN 4 CUDA 7 cuDNN 6 CUDA 8 NCCL 1.6 cuDNN 7 CUDA 9 NCCL 2 8x K80 2014 8x Maxwell 2015 DGX-1 2016 DGX-1V 2017 GPU-ACCELERATED PERFORMANCE
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Timothy Lanfear
ENDURING DIFFERENTIATION
2
WHERE ARE WE?
3
LIFE AFTER DENNARD SCALING
1980 1990 2000 2010 2020
102
103
104
105
106
107
40 Years of Microprocessor Trend Data
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,
O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected
for 2010-2015 by K. Rupp
Single-threaded
perf
1.5X per year
1.1X per yearTransistors
(thousands)
4
0
8
16
24
32
40
AMBER Performance (ns/day)
P100
2016
K80
2015
K40
2014
K20
2013
AMBER 12
CUDA 4
AMBER 14
CUDA 5
AMBER 14
CUDA 6
AMBER 16
CUDA 8
0
2400
4800
7200
9600
12000
GoogleNet Performance (i/s)
cuDNN 2
CUDA 6
cuDNN 4
CUDA 7
cuDNN 6
CUDA 8
NCCL 1.6
cuDNN 7
CUDA 9
NCCL 2
8x K80
2014
8x Maxwell
2015
DGX-1
2016
DGX-1V
2017
GPU-ACCELERATED PERFORMANCE
5
Delivered value grows over time
10X Perf in 8 Years 6X Perf in 6 Years
TESLA VALUE: $15-20M COST SAVINGS DELIVERED
Life Sciences (AMBER) Oil & Gas (RTM)
Amber performance: Nano Seconds Per Day delivered on 1xServer with GPUs and CPUS
TESLA PLATFORM ADVANTAGE
6
GPU-ACCELERATED EFFICIENCY
On Track To Meet Exascale Goal
GFLO
PS p
er
Watt
0
5
10
15
20
25
30
35
9.5 SaturnV
P100
Top GPU Systems in Green500 Lists and NVIDIA Projections for V100
35 GF/WExascale Goal
14.1 TSUBAME 3.0
P1005.3 TiTechW780IK80
4.4 Tsubame-
KFCK20X
3.2 EurotechAurora
K20
V100
13/13 Greenest Supercomputers Powered by Tesla P100
TSUBAME 3.0
Kukai
AIST AI Cloud
RAIDEN GPU subsystem
Piz Daint
Wilkes-2
GOSAT-2 (RCF2)
DGX Saturn V
Reedbush-H
JADE
Facebook Cluster
Cedar
DAVIDE
7
HOW ARE WE DOING THIS?
• What are the most important dimensions of our differentiation?
• Why are GPUs so much more efficient than CPUs?
• How can we continue scaling performance/efficiency as Moore’s Law fades?