ORNL is managed by UT-Battelle for the US Department of Energy What does Titan tell us about preparing for exascale supercomputers? Jack Wells Director of Science Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory HPC Day 16 April 2015 West Virginia University Morgantown, WV
43
Embed
What does Titan tell us about preparing for exascale ... · PDF fileWhat does Titan tell us about preparing for exascale supercomputers? Jack Wells Director of Science ... Requirements
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ORNL is managed by UT-Battelle
for the US Department of Energy
What does Titan
tell us about
preparing for
exascale
supercomputers?
Jack Wells
Director of Science
Oak Ridge Leadership Computing Facility
Oak Ridge National Laboratory
HPC Day
16 April 2015
West Virginia University
Morgantown, WV
2
Outline
• U.S. DOE Leadership Computing Program
• Hardware Trends: Increasing Parallelism
• The Titan Project: Accelerating Leadership
Computing
– Application Readiness & Lessons Learned
– Science on Titan
– Industrial Partnerships
• The Summit Project: ORNL’s pre-exascale
supercomputer
– Application Readiness
3
DOE’s Office of Science
Computation User Facilities
• DOE is leader in open High-Performance Computing
• Provide the world’s most powerful computational tools for open science
• Access is free to researchers who publish
• Boost US competitiveness
• Attract the best and brightest researchersNERSC
Edison is 2.57 PF
OLCF
Titan is 27 PFALCF
Mira is 10 PF
4
Origin of Leadership Computing Facility
Department of Energy High-End
Computing Revitalization Act of
2004 (Public Law 108-423):
The Secretary of Energy, acting
through the Office of Science, shall
• Establish and operate
Leadership Systems Facilities
• Provide access [to Leadership Systems
Facilities] on a competitive, merit-
reviewed basis to researchers in U.S.
industry, institutions of higher education,
national laboratories and other Federal
agencies
5
What is the Leadership Computing
Facility (LCF)?
• Collaborative DOE Office of Science user-facility program at ORNL and ANL
• Mission: Provide the computational and data resources required to solve the most challenging problems.
• 2-centers/2-architectures to address diverse and growing computational needs of the scientific community
• Highly competitive user allocation programs (INCITE, ALCC).
• Projects receive 10x to 100x more resource than at other generally available centers.
• LCF centers partner with users to enable science & engineering breakthroughs (Liaisons, Catalysts).
Power consumption of 2.3 PF (Peak) Jaguar: 7 megawatts, equivalent to that of a small city (5,000 homes)
Power is THE problem
15
Using traditional CPUs
is not economically feasible
20 PF+ system: 30 megawatts (30,000 homes)
16
Why GPUs? Hierarchical Parallelism
High performance and power efficiency
on path to exascale
• Expose more parallelism through code
refactoring and source code directives
– Doubles CPU performance of many codes
• Use right type of processor for each task
• Data locality: Keep data near processing
– GPU has high bandwidth to local memory
for rapid access
– GPU has large internal cache
• Explicit data management: Explicitly
manage data movement between CPU
and GPU memories
CPU GPU Accelerator
• Optimized for sequential multitasking • Optimized for many
simultaneous tasks
• 10 performance per socket
• 5 more energy-efficient systems
17
200 cabinets
4,352 ft2 (404 m2)
8.9 MW peak power
SYSTEM SPECIFICATIONS:
• 27.1 PF/s peak performance• 24.5 GPU + 2.6 CPU
• 17.59 PF/s sustained perf. (LINPACK)
• 18,688 compute nodes, each with:• 16-Core AMD Opteron CPU
• NVIDIA Tesla “K20x” GPU
• 32 + 6 GB memory
• 710 TB total system memory
• 32 PB parallel filesystem (Lustre)
• Cray Gemini 3D Torus Interconnect
• 512 Service and I/O nodes
ORNL’s “Titan” Hybrid System:
Cray XK7 with AMD Opteron + NVIDIA Tesla processors
Throwing away 90% of available performance if not using GPUs
18
Center for Accelerated Application
Readiness (CAAR)
• We created CAAR as part of the Titan project to help prepare applications for accelerated architectures
• Goals:
– Work with code teams to develop and implement strategies for exposing hierarchical parallelism for our users applications
– Maintain code portability across modern architectures
– Learn from and share our results
• We selected six applications from across different science domains and algorithmic motifs
19
WL-LSMSIlluminating the role of material disorder, statistics, and fluctuations in nanoscale materials and systems.
S3DUnderstanding turbulent combustion through direct numerical simulation with complex chemistry..
NRDFRadiation transport –important in astrophysics, laser fusion, combustion, atmospheric dynamics, and medical imaging –computed on AMR grids.
CAM-SEAnswering questions about specific climate change adaptation and mitigation scenarios; realistically represent features like precipitation patterns / statistics and tropical storms.
DenovoDiscrete ordinates radiation transport calculations that can be used in a variety of nuclear energy and technology applications.
LAMMPSA molecular dynamics simulation of organic polymers for applications in organic photovoltaic heterojunctions , de-wetting phenomena and biosensor applications
Application Readiness for Titan
20
Application Power Efficiency of the Cray XK7
WL-LSMS for CPU-only and Accelerated Computing
• Runtime Is 8.6X faster for the accelerated code
• Energy consumed Is 7.3X less
o GPU accelerated code consumed 3,500 kW-hr
o CPU only code consumed 25,700 kW-hr
Power consumption traces for identical WL-LSMS runs
with 1024 Fe atoms on 18,561 Titan nodes (99% of Titan)
21
CAAR Lessons Learned
• Up to 1-3 person-years required to port each code
– Takes work, but an unavoidable step required for exascale
– Also pays off for other systems—the ported codes often run significantly faster CPU-only (Denovo 2X, CAM-SE >1.7X)
• An estimated 70-80% of developer time is spent in code restructuring, regardless of whether using CUDA, OpenCL, OpenACC, …
• Each code team must make its own choice of using CUDA vs. OpenCLvs. OpenACC, based on the specific case—may be different conclusion for each code
• Science codes are under active development—porting to GPU can be pursuing a “moving target,” challenging to manage
• More available flops on the node should lead us to think of new science opportunities enabled—e.g., more DOF per grid cell
• We may need to look in unconventional places to get another ~30X thread parallelism that may be needed for exascale—e.g., parallelism in time
22
2014 Strategic Science Accomplishments
from Titan
Habib and collaborators used its HACC Code on Titan’s CPU–GPU system to conduct today’s largest cosmological structure simulation at resolutions needed for modern-day galactic surveys.
K. Heitmann, 2014. arXiv.org, 1411.3396
Salman HabibArgonne National Laboratory
Cosmology
Chen and collaborators for the first time performed direct numerical simulation of a jet flame burning dimethyl ether (DME) at new turbulence scales over space and time.
A. Bhagatwala, et al. 2014. Proc. Combust. Inst. 35.
Jacqueline ChenSandia National Laboratory
Combustion
Paul Kent and collaborators performed the first ab initio simulation of a cuprate. They were also the first team to validate quantum Monte Carlo simulations for high-temperature superconductor simulations.
K. Foyevtsova, et al. 2014. Phys. Rev. X 4
Paul KentORNL
SuperconductingMaterials
Researchers at Procter & Gamble (P&G) and Temple University delivered a comprehensive picture in full atomistic detail of the molecular properties that drive skin barrier disruption.
M. Paloncyova, et al. 2014. Langmuir 30
C. M. MacDermaid, et al. 2014. J. Chem. Phys. 141
Michael KleinTemple University
Molecular Science
Chang and collaborators used the XGC1 code on Titan to obtain fundamental understanding of the divertor heat-load width physics and its dependence on the plasma current in present-day tokamak devices.
C. S. Chang, et al. 2014. Proceedings of the 25th Fusion Energy Conference, IAEA, October 13–18, 2014.
C.S. ChangPPPL
Fusion
23
2014 High Impact Science at the OLCF
Palaeoclimate evidence has suggested that the El Niño Southern Oscillation (ENSO) — the most important driver of interannualclimate variability — has swung from periods of intense variability to relative quiescence.
Reference: Zhengyu Liu, Nature,
Volume 515, 2014
Climate Science
Coarse-grained protein models on leadership computers allows the study of single protein properties, DNA–RNA complexes, amyloid fibril formation and protein suspensions in a crowded environment. Reference: Fabio Sterpone, Chemical Society Reviews,Vol. 43, 2014
Biological Sciences
Using OLCF resources, researchers were able to quantify the shape evolution of a platinum nanoparticle by tracking the propagation of different facets.Reference: Hong-GangLiao, Science,Vol. 345, 2014
Materials Sciences
Using a general circulation model (GCM), the team has shown that allowing Northern Hemisphere ice sheet meltwaterto flow into the ocean generates a bipolar seesaw effect, with the Southern Hemisphere warming at the expense of the Northern Hemisphere.Reference: Christo Buizert, ScienceVol. 345, 2014
Climate Science
Theory, modeling, and simulation can accelerate the process of materials understanding and design by providing atomic level understanding of the underlying physicochemical phenomena.Reference: Bobby Sumpter, Accounts of Chemical Research, Vol. 47, 2014
Materials Sciences
First-principles many-body methods are used to study the spin dynamics and the superconducting pairing symmetry of a large number of iron-based compounds, showing high-temperature superconductors have both dispersive high-energy and strong low-energy commensurate or nearly commensurate spin excitations. Reference: Z. P. Yin, Nature Physics,Vol. 10, 2014
Materials Sciences
24
35 5
8
1519
21
28
34
0
5
10
15
20
25
30
35
40
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Nu
mb
er
of
Pro
jects
Year
Number of OLCF Industry Projects Operational Per Calendar Year
Industrial Partnerships:
Accelerating Competitiveness through
Computational Sciences
25
Human skinbarrier
Global flood maps
Engine cycle-to-cycle variation
Fuel efficient jet engines
Wind turbine resilience
Welding Software
Demonstrated small molecules can have large
and varying impact on skin permeability
depending on their molecular
characteristics—important for
product efficacy and safety
Developed
fluvial and
pluvial high
resolution global
flood maps to
enable
insurance firms
to better price
risk and reduce
loss of life and
property
Developing novel
approach to using
massively parallel,
multiple
simultaneous
combustion cycle
simulations to
address cycle-to-
cycle variations in
spark ignition
engine
Conducting first-of-a-kind high-
fidelity LES computations
of flow in turbomachinerycomponents for
more fuel efficient, next-generation jet
engines
First time simulation of ice formation within million-molecule water droplets is
Our Science requires that we advance computational capability 1000x
over the next decade.
What are the Challenges?
30
What is CORAL (Partnership for 2017 System)
• CORAL is a Collaboration of Oak Ridge, Argonne, and Lawrence Livermore Labs to acquire three systems for delivery in 2017.
• DOE’s Office of Science (DOE/SC) and National Nuclear Security Administration (NNSA) signed an MOU agreeing to collaborate on HPC research and acquisitions
• Collaboration grouping of DOE labs was done based on common acquisition timings. Collaboration is a win-win for all parties.
– It reduces the number of RFPs vendors have to respond to
– It improves the number and quality of proposals
– It allows pooling of R&D funds
– It strengthens the alliance between SC/NNSA on road to exascale
– It encourages sharing technical expertise between Labs
31
Accelerating Future DOE Leadership
Systems (“CORAL”)
“Summit” System “Sierra” System
5X – 10X Higher Application Performance
IBM POWER CPUs, NVIDIA Tesla GPUs, Mellanox EDR 100Gb/s InfiniBand
Paving The Road to Exascale Performance
32
2017 OLCF Leadership System
Hybrid CPU/GPU architecture
Vendor: IBM (Prime) / NVIDIA™ / Mellanox Technologies®
At least 5X Titan’s Application Performance
Approximately 3,400 nodes, each with:
• Multiple IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs using the NVIDIA Volta architecture
• CPUs and GPUs completely connected with high speed NVLink
• Large coherent memory: over 512 GB (HBM + DDR4)
– all directly addressable from the CPUs and GPUs
• An additional 800 GB of NVRAM, which can be configured as either a burst buffer or as extended memory
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.