Page 1
Sandia&National&Laboratories&is&a&multimission laboratory&managed&and&operated&by&National&Technology&and&Engineering&Solutions&of&Sandia,&LLC,&a&wholly&owned&subsidiary&of&Honeywell&International,&Inc.,&for&the&U.S.&Department&of&Energy's&National&Nuclear&Security&Administration&under&contract&DEDNAD0003525.
Sandia’s(ARM–centric(Co3Design(StrategyIntroduction(to(the(NNSA/ASC(Vanguard(Project(
James(A.(Ang,(Ron(Brightwell,(Simon(D.(Hammond,(K.(Scott(Hemmert.(Robert(J.(Hoekstra,(James(H.(Laros III,(Kevin(Pedretti and(Arun F.(RodriguesCenter(for(Computing(ResearchSandia(National(LaboratoriesAlbuquerque,(NM(8718531319((USA
ARM(Research(Summit,(Going(ARM(WorkshopRobinson(College,(Cambridge,(UKSeptember(11313,(2017
Sandia&National&Laboratories&is&a&multimission laboratory&managed&and&operated&by&National&Technology&and&Engineering&Solutions&of&Sandia,&LLC,&a&wholly&owned&subsidiary&of&Honeywell&International,&Inc.,&for&the&U.S.&Department&of&Energy's&National&Nuclear&Security&Administration&under&contract&DEDNAD0003525.
1
SAND2017D9713&C
Page 2
Sandia(ARM(R&D(Overview! Sandia's(ASC(Advanced(Architecture(Testbed(project
! For(co3design(early(access(to(new(architectural(features(is(critical(! Positive(ROI(for(the(pain(associated(with(early(hardware(&(immature(
software! ARM(testbed(experiences(– with(Cavium’s(permission(to(share(pre3
production(measurements(of(relative(performance
! Architectural(Simulators! Structural(Simulation(Toolkit((SST),(Open(architectural(simulation(
framework! Open(Source,(Open(Development,(catalyst(for(collaboration(that(can(
be(proprietary! Links(to(Sandia/DOE's(experience(with(application(drivers
! Sandia(is(responsible(for(the(NNSA/ASC's(first(large&scaleARM(platform! Overview(of(key(requirements(targeting(2019
2
Page 3
APM/HPE
Xgene+1
Cavium/HP3Labs
ThunderX1
=32019
=3Retired
=32015
=32017
TODAY
ARM
Next3Gen
ExascalePlatform
Sept32011
ARM
Next3Gen
Sandia’s(NNSA/ASC(ARM(Platforms
3
Small3Pre+
Vanguard3
early3access3
system3– FY18
Install3FY193Target3
ThunderX33(Triton)
Cavium&ThunderX1
PreDproduction&Cavium&ThunderX2
Page 4
Memory(Bandwidth(Intensive(Kernels
4
0
2
4
6
8
10
12
0 16 32 48 64 80 96
Speedup3vs.3Pre+Prod3TX1
OpenMP Threads
STREAM3Triad
ThunderX1&D Final& ThunderX1&D PreDProductionThunderX2&PreDProduction&(Wave&1)
0
1
2
3
4
CG&Solve WAXPY DOT MATVEC
Speedup3vs.3Pre+Prod3TX1
MiniFE CG3Solve/Kernels
ThunderX1&D PreDProduction ThunderX1&Final&SiliconThunderX2&D PreDProduction&(Wave&1)
Benchmarked(on(maximum(number(of(cores((node3node(comparison)
https://github.com/Mantevo/miniFE (Sandia&National&Labs)
Significant(improvement(in(memory(bandwidth((~4X)
Individual(cores(more(powerful ~2.5X
Page 5
Compute(Intensive(Kernels
5
0
1
2
3
Speedup3vs.3Pre+Prod3TX1
LULESH3OpenMP (Figure3of3Merit)
ThunderX1&D PreDProductionThunderX1&D Final&ThunderX2&PreDProduction&(Wave&1)
~2.7X
0
1
2
3
Speedup3vs.3Pre+Prod3TX1
PENNANT3(Hydro3Solve3Time)
ThunderX1&D PreDProduction ThunderX1&D Final&ThunderX2&D PreDProduction&(Wave&1)
https://github.com/lanl/PENNANT (Los&Alamos&Nat.&Lab) https://codesign.llnl.gov/lulesh.php (Lawrence&Livermore&Nat.&Lab)
~2.2X
Page 6
Full(Engineering(Applications(on(ARM
! On(Pre3production(TX2,(Sandia(has(been(actively(working(on(libraries(and(packages(for(its(ARM(systems! Math(Libraries/Kernels((Trilinos)! I/O(Libraries((Exodus(Mesh)! YAML
! Ported(Sandia’s(open3source(Trinity&acceptance(Nalu application(to(ARM! Representative(of(some(SIERRA(engineering(
applications! Complex(mesh(handling,(load(balancing,(
solvers,(etc.6
https://github.com/NaluCFD
Page 7
Architectural(Simulation(Framework:(SST! Use(Supercomputers(to(Design(Supercomputers! Parallel(Discrete3Event(Simulator(Framework
! Flexible(framework(allows(multitude(of(custom(simulators! Demonstrated(scaling(to(over(512(processors
! Comes(with(many(built3in(simulation(models! Processors,(Memory,(Network
! Open(API! Easily(extensible(with(new(models! Modular(framework! (Non3Viral)(Open3source(core
! Time3scale(independent(core! Handles(Micro3,(Meso3,(Macro3scale(simulations
! “Best(of(Breed”(– Bring(together(work(from(Labs,(Industry,(Academia
7
SST::Component SST::Component
SST3Core
Configuration
Partitioning
Link
Event
InstantiationTime3Coordination
Parallel3Communication
http://sst3simulator.org/
Page 8
Example(SST(Element(Libraries! memHierarchy 3 Cache(and(Memory(! cassini 3 Cache(prefetchers! DRAMSim 3 DDR! NVDIMMSim 3 Emerging(Memories
! ariel 3 PIN3based(Tracing
! m5C 3 Gem5(integration(layer
! ember 3 State3machine(Message(generation! firefly 3 Communication(Protocols! hermes 3 MPI3like(interface
! merlin 3 Network(router(model(and(NIC
! scheduler 3 Job3scheduler(simulation(models8
Detailed(Memory(Models
Dynamic(Trace3based(Processor(Model
Cycle3based(Processor(Model
High3level(Program(Communication(Models
Cycle3based(Network(Model
High3level(System(Workflow(Model
Page 9
SST(Use(Cases
9
NVM-Based DIMM
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
Rank
Bank
NVM Internal Controller
Write Buffer
Requests Tracker
RequestBuffer Scheduler
Wear LevelerPower
Manager
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
NVM Chip
Rank
Memory Bus
Network
P
Scratch
P P
"$"
Fast
SlowMemory
P
Scratch
P P
"$"
Fast
SlowMemory
P
Scratch
P P
"$"
Fast
SlowMemory Network
P
Scratch
P P
"$"
Fast
SlowMemory
P
Scratch
P P
"$"
Fast
SlowMemory
P
Scratch
P P
"$"
Fast
SlowMemory
L3
L2
L1
L0
Emerging&NV&Memory&Technologies
Photonic&Network&Topology&&&Routing
MultiDLevel&Memory(HBM+DDR+NV)
Disaggregated&Memory
Communication/&Interconnect&Modeling
Page 10
The(NNSA/ASC(Vanguard(Project! Expand'HPC'ecosystem'by(developing(ARM3based(system(necessary(to(enable(a(credible ARM3based(Advanced(Technology(System(offering(for(ASC(post32022! Mature(ARM(HPC(ecosystem
– Software(stack(and(toolchain((OS,(compilers,(scalable(MPI,(runtime,(system(management,(development(tools,(I/O,(…)
– On3package(high3bandwidth(memory– Advanced(HPC(interconnect
! Influence(Supplier/Integrator(community(to(accelerate(and(optimize(ARM(technologies(for(leadership(HPC,(and(foster(HPC(community(confidence
! Leverage(the(open(ARM(architecture(platform(to(enable(innovations(in(processor,(interconnect,(memory,(specialized(acceleration(and(packaging(technologies(that(focus&on&performance&and&scalability&of&our&multi3physics&production&codes&and(could(impact(2025(systems
10
Page 11
The(Vanguard(Project(Approach
! Bridge(the(gap(between(what(we(can(accomplish(with(Testbed(project(and(fielding(an(Advanced(Technology(System(platform
! Vanguard(is(a(project,(not(a(single(platform! Accelerate(maturity(of(emerging(technologies(for(ASC(program
11
Test3BedsVanguardATS/Capability
Platforms
Greater3Stability,3Larger3Scale
Higher3Risk,3Greater3Architectural3Choices
Page 12
Vanguard(Approach((cont.)
! Appropriate scale(systems! Must(be(large(enough(to(serve(as(proof(of(concept(for(future(Advanced(Technology(Systems
! Sufficient(scale(to(interest(production(multi3physics(application(teams! Appropriate investment
! Gain(and(maintain(vendor(and(collaborator(attention! Appropriate level(of(risk
! Goals(target(mission(workloads(but(not(turn3key(production(mission(support
12
Page 13
Software(Environment(Plan(3 Overview! Goal:(Accelerate(maturity(of(ARM(ecosystem(for(ASC(computing(mission! Need(an(integrated(software(stack(for(the(2019(ARM(Prototype(to(enable(
application(development(and(optimization! Programming(environment((compilers,(math(libs,(tools,(MPI,(OMP,…..)! Low3level(OS((optimized(Linux,(I/O,(network,(containers(+(VMs,(PowerAPI,(…)! Job(Scheduling(and(management((WLM(Slurm?),(app(launcher,(user(tools,(….)! System(Management((boot,(monitoring,(image(mgt.,(rapid(re3provisioning,(….)
! Focus(Areas! Integration(and(robustness,(overall(user(experience! Address(known(weaknesses:(compilers,(libs,(and(tools! Increase(modularization(and(openness(of(system(software(stack;(seek(“plugin”(capability(for(
externally(developed(components.
13
Page 14
Software(Environment(Collaboration(Roles
14
! System(Vendor! Deliver(and(support(core(elements(of(the(software(environment(necessary(
for(a(viable(integrated(system((part(of(system(contract)! Tri3lab(team
! Integrate(system(into(our(computing(environment! Identify(and(resolve(SW(issues(in(collaboration(with(system(vendor! Contribute(tools(and(other(capabilities(to(fill(gaps(and(improve(the(overall(
computing(environment! Other(Eco3system(Vendors/Stakeholders
! Linaro! OpenHPC! ARM/Allinea! Others
Page 15
Drivers(for(ARM(Software(Stack1.(Focus(on(NNSA/ASC’s(unique(requirements
! Many(programs(are(already(exploring(HPC(on(ARM((e.g.,(Mont3Blanc,(Post3K)! Need(to(demonstrate(full(multi3physics(applications(running(at(scale(on(ARM
2.(Build(an(integrated(team! Integrate(work(across(tri3labs(and(vendors,(managed(as(single(project! Focus(on(strengths(of(each(laboratory,(support(technology(deployment
3.(Provide(a(robust(“traditional”(HPC(software(stack! Meet(user(expectations,(provide(familiar(environment! ARM(is(just(an(ISA;(much(of(system(software(stack(should(“just(work”
4.(Engage(with(users! Enable(low3effort(issue(reporting,(rapidly(resolve,(get(and(provide(feedback! Build(community(through(talks,(training,(hackathons,(workshops,(etc.
5.(Look(forward! Improve(support(for(real3world(workflows(and(data3centric(computing! Seek(to(reduce(friction(points(for(users(when(moving(between(platforms! Leverage(software(technology(from(US(DOE(Exascale Computing(Project
15
Page 16
Technology(Development(Areas
16
! 2019(ARM(platform(is(not a(production(system(– increased(latitude(for(experimentation(and(R&D(activities! Need(to(address(options(for(scheduling(and(types(of(access(to(the(system(to(
provide(for(full(scale(projects,(alternate(environments,(etc.! Plan(to(pursue(several(technology(development(areas:
! On3package(high3bandwidth(memory! Advanced(HPC(Interconnect! Compilers,(math(libraries,(tools,(task3based(runtimes,(...
! Vendor(framework(that(OpenHPC components(get(plugged(into! OpenHPC framework(that(vendor(components(get(plugged(into
! Experimental(OS/R(stacks! Lightweight(kernels((Hobbes/Kitten,(Intel(mOS,(RIKEN(McKernel)! KVM(support(for(cyber(“simulate(the(Internet”(frameworks! Support(for(HPC(+(Analytics((including(AI(and(ML)(workflows(and(compositions! More(interactive(IaaS3style(usage(models
Page 17
Required(Vendor(Support
17
! Modular(system(software(environment(architected(to(facilitate(site(modifications(and(additions.
! Documented(APIs(for(integrating(new(software(components(with(vendor(stack((e.g.(RAS(APIs,(Network,(I/O,(Health(of(subsystems,…)
! Ability(to(partition(the(system(and(manage(multiple(OS(images! Carve(out(dedicated(resources(for(R&D(activities! Boot(new(or(experimental(software(environments
! Buildable(source(– High(Priority! Ability(for(labs(to(debug(and(fix(system(software(issues
! For(example,(be(able(to(debug(MPI,(PMI,(Lustre,(Burst(Buffers,(app(load,(…! Ability(to(rebuild(vendor(Linux(kernels(with(modified(config options
! Requires(vendor(to(provide(source(for(all(“binary3only”(kernel(drivers(and(all(non3standard(Linux(kernel(patches
! Ability(to(build(and(evaluate(experimental(OS/R(stacks
Page 18
Recap:(Goals(of(NNSA/ASC(Vanguard(Project
! Key(Requirements(for(this(2019(Platform! System(prototype(for(future(leadership(class(DOE(platform! Competitive(HPC(64(Bit(ARM(processor(technology! Integrated(On3package(Memory! First(of(a(kind,(Advanced(Interconnect(technology! Large(focus(on(maturing(the(ARM(software(ecosystem
! Opportunities(for(holistic co1design&of innovative test(hardware! Logic(in(NIC(to(improve(interconnection(network(performance! Logic(in/near(memory(to(support(sparse(linear(algebra(acceleration! Other(advanced(processor(architectures
18
Page 19
Acknowledgements
! NNSA/ASC3Program:3 Mark&Anderson,&Douglas&Wade,&Thuc Hoang! Advanced3Architecture3Testbeds3Operations3Team: John&Noe,&Jim&Brandt,&Ann&Gentile,&Victor&Kuhns,&Nate&Gauntt,&Mike&Aguilar
! SST3Developers:33K.&Scott&Hemmert,&Branden&Moore,&Gwen&Voskuilen,&AmroAwad,&Clay&Hughes,&Jeremy&Wilke
! SST3SQE3team:33Jon&Wilson,&Aaron&Levine,&John&Van&Dyke! Tri+lab Vanguard3team: Jim&Laros,&Kevin&Pedretti,&Si&Hammond,&Rob&Hoekstra,&Ron&Brightwell,&Ken&Alvin,&Trent&D’Hooge,&James&Foraker,&Rob&Neely,&Becky&Springmeyer,&Bronis&de&Supinski,&Mike&Lang,&Pat&McCormick
! Cavium:33Gopal&Hegde,&Rishi&Chugh,&Larry&Wikelius,&David&Hass,&Giri&Chukkapalli&! ARM:33Nigel&Paver,&Eric&Van&Hensbergen,&Geraint North&! HPE:33Mike&Vildibill,&Nic&Dube,&Kelly&Pracht! Cray:33Duncan&Roweth,&Dan&Ernst,&Larry&Kaplan,&Heidi&Poxon
19
Page 21
Current(NNSA(Architecture(Strategy
21
Advanced&Technology
Systems&&(ATS)
Fiscal&Year
‘13 ‘14 ‘15 ‘16 ‘17 ‘18
UseRetire
‘20‘19 ‘21
Commodity&
Technology
Systems&(CTS)&
Procure&&Deploy
Sequoia(((LLNL)
ATS'1'– Trinity''(LANL/SNL)
ATS(2(– Sierra((LLNL)
Tri3lab(Linux(Capacity(Cluster(II((TLCC(II)
CTS(1
CTS(2
‘22
SystemDelivery
ATS(3(– Crossroads((LANL/SNL)
ATS(4(– (LLNL)
‘23
ATS(5(– (LANL/SNL)