5/20/2018 Talk Final
1/42
Rethinking Productivity and Performance
for the Exascale Era
Allen D. Malony
Keynote Talk
7thWorkshop on Productivity and Performance (PROPER)*
*
Supported by the Virtual Institute High Productivity Supercomputing (VI-HPS)
5/20/2018 Talk Final
2/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Abstract
The push to exascale systems is forcing the parallel computing community to
rethink fundamental notions of productivity and performance. The rapidly
growing degrees of parallelism brought on by manycore processors is just one
aspect of an evolving landscape of architectural, system, and software features
that is increasing the complexity of the application development and
optimization process. It is becoming more apparent that in order to address thecomplexity concerns unfolding in the exascale space, we must think of
productivity and performance in a more connected way and the technology to
support them as being more open, integrated, and intelligent. This talk will
discuss directions for parallel performance research and tools that target the
scalability, optimization, and programmability challenges of next-generation
HPC platforms with high-productivity as an essential outcome.
2
5/20/2018 Talk Final
3/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Outline
! Productivity! Scientific Productivity
! HPC Productivity Factors and Landscape
! Exascale productivity crisis
! Rethinking productivity and performance
! Directions
" Performance knowledge engineering
"
Integration and synthesis (for autotuning)" Dynamic introspection and adaptation
! Conclusions
3
5/20/2018 Talk Final
4/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Productivity a Computing Metric of Merit*
!
Rich measure of quality of the computing experience" Captures key factors that determine overall impact
" Greater productivity, better computing experience! Productivity is strongly related to ease of use
" Less effort for same result in same time
! Expands our notion of computing effectiveness" Focuses attention on important effectiveness contributors" Exposes relationships between
!program development and program execution!time to develop/maintain/ with time to solution
!Productivity unifies usability and performance
" Expresses tradeoff between
!programmability and delivered performance
4
* Courtesy of
Thomas Sterling,
Indiana University
5/20/2018 Talk Final
5/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
HPC is about Scientific Productivity
! Scientific productivity is a qualitymeasure of the process of achievingscience results, incorporating:
"Software productivity: developmenteffort, time, maintenance, support
"Execution-time productivity: efficiency,
time, cost to run scientific workloads" Workflow and analysis productivity:
experiment design, results analysis, validation,hypothesis testing
"End-to-end productivity:from science questionsto scientific discovery (i.e., valueof scientific insights)
! Productivity costs" Human resource in development and re-engineering
" Machine and energy resources in runtime (performance)" Utility and correctness of computational results
5
5/20/2018 Talk Final
6/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 6Rethinking Productivity and Performance in the Exascale EraPROPER 2014
HPC Productivity Factors
6
!"#$%&'( "* +,"-.' /%&$01234 5261.2. 78
5/20/2018 Talk Final
7/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 7Rethinking Productivity and Performance in the Exascale EraPROPER 2014
HPC Productivity Factor Elaboration
7
!"#$%&'( "* +,"-.' /%&$01234 5261.2. 78
5/20/2018 Talk Final
8/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
HPC Productivity in Terascale / Petascale Eras
!
Productivity in the terascale era focused on evolving vectorcodes to distributed memory parallelism
" MPI was the major advancement
! Trends in the petascale era rode Moores law and focusedon scalability via large-scale clusters and increasing cores
" Mixed-mode programming with threading (MPI+OpenMP)
" Accelerators for high-throughput (GPUs, manycore)
! Productivity through scalable, evolutionary improvement
" Familiar programming paradigms and environments
" More algorithm sophistication and early runtime coupling
" Development of more robust application libraries
! Performance tools followed along a similar path
" Focused on scalability, robustness, automation,
8
5/20/2018 Talk Final
9/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Performance Trends for Exascale Architectures
! Increasing flopsbecause of corecounts, notclock speed
! Multi-level,massive concurrency
! Declining memory per core
! Tradeoffs in performance,power, and resilience
" Multi-objective
" Hard energy constraints
! Hybrid systems alternatives
9
!"#$%&'( "* 98 :"33&4 ;"%$& ?;>
@!"$&'
A2&$3(B&$C>D9
!0"EF/B&&6
G-.H
5/20/2018 Talk Final
10/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Growing Crisis in Scientific Productivity at Exascale
! HPC is increasingly important in science domains, but HPCapplication development has not advanced as in other fields
! Exascale factors will further affect productivity problems
" Disruptive changes in extreme-scale architectures"New frontiers in modeling, simulation, and analysis of
complex multiscale and multiphysics phenomena! Effects" Significant lag between extreme-scale hardware and
algorithmic innovations and their effective use inapplications
"
Poor support for code coupling of independent components" Lack of agile yet rigorous software engineering practices
for HPC that are both performant and maintainable" Failure to consider the entire lifecycle of large scientific
software efforts, leading to fragile, complicated codes
10
5/20/2018 Talk Final
11/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Exascale Computing Productivity Attention
!
DARPA High Productivity Computing Systems (HPCS)http://en.wikipedia.org/wiki/High_Productivity_Computing_Systems
! Extreme-Scale Scientific Application
Software Productivity: Harnessing the Full
Capacity of Extreme-Scale Computing,
white paper, September 9, 2013.http://www.orau.gov/swproductivity2014/ExtremeScaleScientificApplicationSoftwareProductivity2013.pdf
! Software Productivity for Extreme Scale
Science, DOE ASCR Workshop, January
13-14, 2014.http://www.orau.gov/swproductivity2014/
!
Exascale Computing Systems Productivity,DOE ASCR Workshop, June 3-4, 2014.http://www.orau.gov/ecsproductivity2014/
! ACS Productivity Workshop, DOE Office
of Science, July 2014, Indiana University.
11
5/20/2018 Talk Final
12/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
What is Exascale Computing Productivity?
!
Exascale computing productivity is the effective andefficient use of all exascale resources (hardware,application software, runtime, people, processes, energy)in the production of new scientific insights
! Goal
" Productivity awareness embedded in all exascalelifecycle activities from R&D through deployment tooperation and production of scientific insights
" Increase efficiency of overall exascale ecosystem
during research and development by identifying,removing, and ameliorate productivity and
performancebottlenecks
! Sounds good so what is the problem?
12
5/20/2018 Talk Final
13/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Exascale Breaks HPC Programming Paradigm
! Programming model must reflect underlying machine model
! Argues for a more data centric approach in future (not easy)
13
5/20/2018 Talk Final
14/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Difficult Mapping Problems with Exascale
! Multiple levels of mapping to exascale hardware" Mapping decisions affects performance portability
14
;#-&$1E.0
-&%,"6
9$"3$.--12
3
-"6&0
!"-B10&$I
$#2J-&
!"#$%&'( "* =8 /,.0*4 >?;>4 KA!DLG MNOP F&(2"%&
5/20/2018 Talk Final
15/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Multiphysics and Multiscale Science Frontiers
! Single physics simulations are maturing! Coupling of simulations across different types of physics
(multiphysics) and different time and length scales
(multiscale) is becoming increasingly important for
improving model fidelity! Parameters studies and design optimization that employ
ensembles of simulations extend science beyond point
solutions and explore problem space
! Uncertainty quantification and sensitivity analysis! Compounded complexities of code interactions and
collaboration across algorithms and applications
15
5/20/2018 Talk Final
16/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Assumptions of Uniformity are No Longer Valid
!
Assumptions of uniformity are not longer valid" Heterogeneous compute engines
" Fine-grained power management affects homogeneity
"Non-uniformities in process technology cause variation
" Fault resilience introduces inhomogeneity
! Current bulk synchronous model is deficient
" Current focus is on removing sources of performancevariation (jitter) will be increasing impractical
" Huge costs in power/complexity/performance to extend the
life of a purely bulk synchronous model!Embrace performance heterogeneity
" Assume asynchronous execution model as the norm
16
5/20/2018 Talk Final
17/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 17Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Exascale Productivity Improvement End-to-End
17
!"#$%&'( "* +,"-.' ;6"#''&QC&R&$4
5/20/2018 Talk Final
18/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Performance Technology Implications
!
Performance technology has evolved to serve the dominantarchitectures and programming models
" Observabilityera (1991 1998)
!instrumentation, measurement, analysis
"Diagnosisera (1998 2007)
!
identifying performance inefficiences
" Complexityera (2008 2012)
!scale, memory hierarchy, network, multicore, GPU
! 20+ years of reasonably stable parallel execution models
" 1stperson application focus
" Productivity somewhat decoupled from performance
!not directly necessary to know application semantics
18
5/20/2018 Talk Final
19/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Exascale Era and Performance Directions
!
Exascale era (2012 ??) is fundamentally different! Productivity and performance are more intimately coupled in
exascale environments
" High productivity high performance applications
" Applications must be mapped to new exascale systems
" Performance awareness is necessary at all levels
" 1stperson (application) + 3rdperson (system resources)performance views necessary
! Directions
" Performance knowledge engineering
" Integration and synthesis (for autotuning)
" Dynamic introspection and adaptation
19
5/20/2018 Talk Final
20/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Extreme Performance (Knowledge) Engineering
Capture all available sources performance data and metadata(collectively knowledge) that can be use to reason aboutapplication performance expectations
Optimization
Measurement / Analysis
Model
Relative
Model
EmpiricalExtreme!scale
Model
Performance
Symbolic
Performance
Expectations
Data
Mining
Expert
System
ProgrammingLanguageModel
Execution Models
Simulation
Insight
Extreme!scale
Application
System
Performance Models
Model
System
Model
Computation
Model
Knowledge Base
Performance
20
5/20/2018 Talk Final
21/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Case Study: MPAS Ocean
!
Use multiscale methods for accurate,efficient, scale-aware earth system models
! MPAS-O uses a variable resolution irregular
mesh of hexagonal grid cells
! Cells assigned to MPI processes" Grouped as blocks
" Each cell has 1-40 vertical layers
! MPAS-O has demonstrated scaling limits
! Look at increasing concurrency
! Focus on role of partitioning
21
l i l I I : ,
l i l l :
i i, l li
l l , l i l
l i i l
i i i l
i i i i i l
ili
l
i
l i l I I ,RBSII-B.'Q6&T831%,#U81"
5/20/2018 Talk Final
22/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
MPAS-O Load Imbalance
! Stencil codes introduceload imbalances due tohalo/ghost cells
! Performance measurementin isolution is insufficientto explain source of
imbalance and attribute toperformance factors
!Need to combine:
" Performance measurements" Application metadata
" Correlations betweenmeasurements and metadata
" Visualization of performance data incontext of application domain
22
5/20/2018 Talk Final
23/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Stencil Properties and Metadata Collection
! Capture stencil propertiesin metadata" nCellSolve= cells in partition
" nCell= nCellSolve+ nCellsHalo! Correlate computation balance
with metadata fields! Original partitioning based on
balancing nCellsSolveon MPI processes! Really need to balance with respect to nCell
" This depends on the partitioner, but do not know in
advance!! Apply knowledge to Hindsight partioner
" Create initial Metis partition" Assign nCell weights on graph and iterate to optimize
23
800
900
1000
1100
1200
1300
1400
1500
1.50E+07 2.00E+07 2.50E+07 3.00E+07 3.50E+07 4.00E+07
Metadata Correlated with
dv Timer
nCellsSolve
nCells
5/20/2018 Talk Final
24/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 24Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Hindsight Results (Original versus Refined)
24
+"%.0 E&00' V&HB01E1% W,.0"X B&$ U0"EF
+1-& 'B&2%
E"-B#J23
+1-& 'B&2% 12Y95Z[.1%V.0'" $&6#E&6 J-& 12
,.0" $"#J2&'X
;#-U&$ "* 2&13,U"$'B&$ U0"EF
-12SP\]!^]^
-.HS_P`"\\O
-12SO!M-.HS\!a
-12S_8]&\!a8_&\
-.HSM8^&_"M8P&_
-12SM8\&\"a8N&`
-.HSO8a&_"O8^&_
Original Refined240 processors
on NERSC Edison
5/20/2018 Talk Final
25/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 25Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Tools/Technology Integration and Synthesis
Performance
Modeling
Reliability
Autotuning
Performance
Optimization
Resilience
Energy
+L7
-B19
9?"#26
LEJT&b.$-"2(
!b1>>
G""c12&
9L95
9A?5>
9/1;%$.E&$
d9+>
Tools /Technologies
G!G+""0F1%
Code analysis
Center of mass for
performance engineerng
GD/A
D$1"
End-to-end
Integration
SciDAC
applications
Prior researchfunding
/79AG 1' . ^Q(&.$ /E1
5/20/2018 Talk Final
26/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Autotuning for Performance Productivity
!
Increasing level of parallelism and heterogeneity inhardware exposes a complex performance landscape
! Can not expect humans or stand-alone tools to be able to
achieve desired performance objectives in the future
" Lack of knowledge and automation (humans)
" Lack of scope and openness (isolated tools)
! Autotuning is promising, but faces two challenges
"Integrationof performance measurement/analysis tools
with code analysis/transformation tools
" Synthesisof information from collective sources and
preservations of knowledge for shared use
?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 26%CA*B ;
5/20/2018 Talk Final
27/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Tools Integration with SUPER Autotuning
! SUPER is applying autotuning to optimize HPC applications" Active Harmony autotuning system (Hollingsworth, UMD)
!software architecture for optimization and adaptation" CHiLL compiler framework (Hall, Utah)
!CPU and GPU code transformations for optimization
" Orio annotation-based autotuning (Norris, ANL)!code transformation with optimization (CUDA, OpenCL)
! Integrate performance tools (TAU) with these frameworks
" Use to gather performance data for autotuning/specialization
" Store performance data with metadata for each experimentvariant and store in performance database (TAUdb)
" Use machine learning and data mining to increase the levelof automation of autotuning and specialization
?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 27A.#,#.&%&( ,4 D*E AFF9%/30,&8
5/20/2018 Talk Final
28/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 28Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Orio Empirical-Based Autotuning Process
28?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&(A.#,#.&%&( ,4 D*E AFF9%/30,&8
Optimized
CodeCUDA
Code withDSL Annotations
DSLParser
CodeTransformations
EmpiricalPerformanceEvaluation
Sequence of (Nested)Annotated Regions
Transfomed Code CodeGenerator
best performing version
TuningSpecification
SearchEngine
Fortran
C
OpenCL
M_
5/20/2018 Talk Final
29/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 29Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Orio and TAU Integration
29
Orio Code Generator
Experiment
TAU Metadata Entries
Transformations
Execution TimeWrites
CUPTI callback
measurement library
TAU Profiles
TAUdb
Writes
Uploaded
Links at Runtime
!7
Y&.'#$&-&2%
Y&%$1E B$"e0123 Y&%.6.%. +L76U '%"$.3&
L#%"%#2123 .2.0('1' Y.E,12& 0&.$2123 DBJ-1f.J"2 '&.$E, /B&E1.01f.J"2
?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&(A.#,#.&%&( ,4 D*E AFF9%/30,&8
D$!7
Ma
5/20/2018 Talk Final
30/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Autotuning Radiation Transport Code
!
Solid fuel ignition solver
! Our goal is to replace two functions which represent alarge proportion of overall execution time and offload
them to the GPU
! The implementations take too long to run to exhaustively
enumerate the search space" Use a Nelder-Mead search
30?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&(A.#,#.&%&( ,4 D*E AFF9%/30,&8 ]N
5/20/2018 Talk Final
31/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
FormJacobian (different sizes, platforms)
?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 31A.#,#.&%&( ,4 D*E AFF9%/30,&8
1,450
1,250
1,300
1,350
1,400
Ge
neratedCodeExecutionTime(milliseconds)
FJ 64x64x64Radeon 7970
Xeon Phi GTX 480
Tesla C2075
Tesla K20c
1,210
1,250
1,300
1,350
1,400
1,450
Ge
neratedCodeExecutionTime(milliseconds)
FJ 75x75x75
1,210
Radeon 7970Xeon Phi
GTX 480
Tesla C2075
Tesla K20c
1,450
1,250
1,300
1,350
1,400
GeneratedCodeExecutio
nTime(milliseconds)
Radeon 7970 Xeon Phi
GTX 480
Tesla C2075Tesla K20c
FJ 100x100x100
1,210 1,210
1,250
1,300
1,350
1,400
1,450
GeneratedCodeExecutio
nTime(milliseconds)
FJ 128x128x128
Radeon 7970Xeon Phi
GTX 480
Tesla C2075
Tesla K20c
+#2123%$.g&E%"$1&'
&HB&$1-&2% %$1.0
&HB&$1-&2% %$1.0&HB&$1-&2% %$1.0
&HB&$1-&2% %$1.0
5/20/2018 Talk Final
32/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Factor Analysis
! Parameter values for the best performing ex14FF andex14FJ kernels across the architectures
" Workgroups, Workitemspergroup, Unrollinner,
Compilerflags, Sizehint, Vechint
32
5/20/2018 Talk Final
33/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 33Rethinking Productivity and Performance in the Exascale EraPROPER 2014
TAU and Autotuning in SUPER
TAUdb
OutlinedFunction
Selective InstrumentationFile (specifying parameters
to capture)
InstrumentedVariant
tau_instrumentor
ParameterizedPerformance Profile
execute
PerfDMFTauDB
parametersfrom TauDB
CHiLLRecipes
Search Driver(brute force or Active
Harmony)
code variant
TAUdb
CHiLL
profile dataand metadata
WEKA
decision treeinductionalgorithm
ROSE-basedCode
GenerationTool
Code VariantsCode Variants
Code VariantsCode Variants
Wrapper Function
PerfDMFTauDBTAUdb
Orio Code Generator
Experiment
TAU Metadata Entries
Transformations
Execution TimeWrites
CUPTI callbackmeasurement library
TAU Profiles
TAUdb
Writes
Uploaded
Links at Runtime
!7CHiLL+ AH
Orio
ROSE
Geant4MPAS-O
CESM PerfExplorerhd!O
33
5/20/2018 Talk Final
34/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Synthesis for Automated Performance Tuning
!
Integration of performance measurement and analysiswith autotuning frameworks is important
! However, there is an opportunity for synthesis of 2 types
" Incorporate broader information from all programming,
development, performance, optimization, systems toolsand technologies in the autotuning process
" Preserve performance knowledge and enables higher-
order understanding (learning) of the relationship
between performance factors across tuning dimensions!Need a unified architecture for information synthesis and
performance knowledge preservation
?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 34A.#,#.&%&( ,4 D*E AFF9%/30,&8
5/20/2018 Talk Final
35/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
A New Performance Observability
!
Key exascale parallel performance abstraction" Inherent state of exascale execution is dynamic
" Embodies non-stationarity of performance
" Constantly shaped by the adaptation of resources to
meet computational needs and optimize objectives! Requires a fundamentally different performance
observability paradigm
" Designed to support introspective adaptation
" Reflects computation to execution model mapping
" Aware of multiple (performance) objectives
" In-situ analysis of performance state and objectives
35
5/20/2018 Talk Final
36/42
!"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 36Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Introspection, In Situ Analysis, and Feedback
36
!"#$%&'( Y.$J2 /E,#0f4 >>;> i 91B&$ 9$"g&E%
5/20/2018 Talk Final
37/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
XPRESS Project (DOE X-Stack)
! Design and development of exascale software stack to support the
ParalleX execution model" Highly concurrent" Asynchonous
" Message driven" Global address space
! OpenX" XPI programming API
" HPX runtime system" RIOS interface to OS
" APEX performance system
! Team" Universities: IU, LSU, UH, UNC/RENCI, UO
" Laboratories: SNL, LBNL, ORNL
37
5/20/2018 Talk Final
38/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Legacy
Applications
New Model
Applications
MPI
Metaprogramming
FrameworkDomain Specific
Active Library
Compiler
AGASname spaceprocessor
LCOdataflow, futuressynchronization
Lightweight
Threadscontext manager
Parcelsmessage driven
computation
...
OpenMP
XPI
Task recognition Address
space control
Memory bankcontrol
OSthread
Instrumentation
Networkdrivers
Distributed FrameworkOperating System
HardwareArchitecture
OperatingSystem
Instances
RuntimeSystem
Instances
PRIME MEDIUMInterace / Control
{
{
Domain Specific
Language
+106 nodes 103 cores / node + integration network
...
Integrated Software Stack for ParalleX
! OpenX"
XPI programming API" HPX runtime system
" RIOS interface to OS" APEX performance system
! APEX" OS (LXK) tracks system-
level resources" Runtime (HPX) tracks
threads, queues, parcels,remote ops, memory,
concurrency" XPI allows, allow
language-level performancesemantics to be measured
38
5/20/2018 Talk Final
39/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Argo DOE ExaOSR Project
! Exascale OS and runtime research project"
Labs: ANL, LLNL, PNL" Universities: BU, UC, UIUC, UO, UTK
! Philosophy" Whole-system view
!dynamic user environment (functionality, dynamism,flexibility)
!first-class managed resources (performance, power, )
!hierarchical response to faults" Massive concurrency support
! Key ideas
" Hierarchical (control, communication, goals, data resolution)
" Embedded (performance, power, ) feedback and response
" Global system support
39
5/20/2018 Talk Final
40/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Argo Global Information Backplane
!
BEACON: event/action/control notification! Expos: performance observability system
40
5/20/2018 Talk Final
41/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Argo Global Optimization View
# i 3&2&$.01f&6 E"2%$"0 '132.0'T i '&j23' *"$ 2&H% 0&T&0- i B"k&$I$&'101&2E& -"6&0' l -&E,.21'-'
3 i B"k&$4 B&$*"$-.2E& 3".0'* i *&&6U.EF *$"- ?AL!D; .26 Ah9D/m& i &$$"$ U&%k&&2 3".0' .26 *&&6U.EF
41
5/20/2018 Talk Final
42/42
Rethinking Productivity and Performance in the Exascale EraPROPER 2014
Conclusions
!
Exascale brings fundamentally new challenges (plusopportunities!) to performance and to productivity
" ExaFLOPs will require new technology innovations
"New science will require more sophisticated methods
! Performance and productivity are intimately coupled" Achieving scientific productivity requires performance
" Performance can not be considered an afterthought
" Performance depends on integration and synthesis
!
with application development environment
!throughout the exascale software stack
! FLOPs versus Brains
42