GPU Accelerated Computing with OpenPOWER John Ashley Senior IBM Developer Relationship Manager NVIDIA Join the conversation at #OpenPOWERSummit 1 #OpenPOWERSummit
GPU Accelerated Computing with OpenPOWER
John AshleySenior IBM Developer Relationship Manager
NVIDIA
Join the conversation at #OpenPOWERSummit
1
#OpenPOWERSummit
NVIDIA Vision for Tesla
Architectural Cadence Adoption Capability Porting
Enable faster science, deeper learning, bigger insights…
...for your customers!
Join the conversation at #OpenPOWERSummit
2
Join the conversation at #OpenPOWERSummit
3
Architectural Cadence
Accelerated Computing Roadmap
Rela
tive
Perf
2012 20142008 2010 2016
TeslaCUDA
FermiFP64
KeplerDynamic Parallelism
MaxwellDX12
PascalUnified Memory3D MemoryNVLink
32
16
8
2
0
4
VoltaNVLink2
NEXT
Join the conversation at #OpenPOWERSummit
5
Adoption
From HPC to Enterprise Data Centers
Join the conversation at #OpenPOWERSummit
6
Government
Supercomputing
FinanceHigher Ed
Oil & Gas
Consumer Web
Air Force ResearchLaboratory
Naval ResearchLaboratory
Tokyo Institute of Technology
DevelopmentData Center Infrastructure
Tesla Accelerated Computing Platform
GPU Accelerators
InterconnectSystem
ManagementCompiler Solutions
GPU Boost…
GPU DirectNVLink
…
NVML…
LLVM…
Profile and Debug
CUDA Debugging API…
Development Tools
Programming Languages
Infrastructure
Management
Communication
System Solutions
/
Software Solutions
Libraries
cuBLAS…
Tesla: Platform for Accelerated Datacenters
Rapid Adoption
Join the conversation at #OpenPOWERSummit
8
2011 2012 2013 20140
50
100
150
200
250
300
350
113
206
242
2010 2011 2012 20130%
20%
40%
60%
80%
100%
Rapid Adoption of Accelerators
Hundreds of GPU
Accelerated Apps
NVIDIA GPU is Accelerator of
Choice
NVIDIA GPU85%
INTEL PHI
4%
OTHERS
11%
Intersect360 ResearchHPC User Site Census: Systems, July
2013
Intersect360 HPC User Site Census: Systems, July 2013
IDC HPC End-User MSC Study, 2013
% of HPC Customers with Accelerators
287
Join the conversation at #OpenPOWERSummit
9
Capability
Deeply Integrated Heterogenous Computing
Join the conversation at #OpenPOWERSummit
10
POWER CPUFor Serial Tasks
GPU AcceleratorFor Parallel Tasks
IBM POWER CPUMost Powerful Serial Processor
NVIDIA NVLinkFastest CPU-GPU Interconnect
NVIDIA Volta GPUMost Powerful Parallel Processor
Heterogenous Computing5x Higher Energy Efficiency
80-200 GB/s
Join the conversation at #OpenPOWERSummit
Innovative
12
ec
NVLINK delivers 5-12X Bandwidth of PCI-E
Stacked Memory delivers 4X Bandwidth of GDDR5
Join the conversation at #OpenPOWERSummit
CORAL: Built for Grand Scientific Challenges
Fusion EnergyRole of material disorder, statistics, and fluctuations in nanoscale materials and systems. Combustion
Combustion simulations to enable the next gen diesel/bio- fuels to burn more efficiently
Climate Change Study climate change adaptation and mitigation scenarios; realistically represent detailed features Nuclear Energy
Unprecedented high-fidelity radiation transport calculations for nuclear energy applications
BiofuelsSearch for renewable and more efficient energy sources
AstrophysicsRadiation transport – critical to astrophysics, laser fusion, atmospheric dynamics, and medical imaging
Join the conversation at #OpenPOWERSummit
Join the conversation at #OpenPOWERSummit
14
Porting
How hard is it really?
Join the conversation at #OpenPOWERSummit
15
The combination of POWER8 CPUs & NVIDIA Tesla accelerators is amazing. It is the highest performance we have ever seen in individual cores, and the close integration with accelerators is outstanding for heterogeneous parallelization. Thanks to the little endian chip and standard CUDA environment it took us less than 24 hours to port and accelerate GROMACS. - Erik Lindahl, Professor of Biophysics at the Science for Life Laboratory, Stockholm University & KTH http://devblogs.nvidia.com/parallelforall/porting-gpu-accelerated-applications-power8-systems/
CUDA 7 available from Registered Developer Website already…
Placeholder for commercial software firm’s quote in clearance…
Join the conversation at #OpenPOWERSummit
16
Bottom Line
ec
NVLINK delivers 5-12X Bandwidth of PCI-E
POWER CPUFor Serial Tasks
GPU AcceleratorFor Parallel Tasks
GPUs Transforming the Data Center
Stacked Memory delivers 4X Bandwidth of GDDR5 – 1 TB/s
Join the conversation at #OpenPOWERSummit
18
Leverage $300M of US Federal R&D dollars to improve your product.
Contact {jashley,bradd}@nvidia.com to discuss how GPU accelerated OpenPOWER systems can help you deliver better science or faster discovery.
Leverage IBM, NVIDIA R&D and Marketing to grow your business.
Provide incredibly effective solutions to customers.