The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014
Dec 18, 2015
Let’s Build an Exascale Machine
And make it useful.–Adequate memory capacity–No disks (except for archival store)–20 MW (good thing)–More than a loosely-connected cluster– It won’t always work!
How will we get that much parallelism?
• Big problems
• More than one problem
• Pipelines of problems– Preprocess, mesh gen– Solve, resolve, UQ, optimize– Postprocess and visualize
• Solving the same problem twice!
And we will need a lot of memory
• Many problems on the machine at one time
• Performance costs memory
• No disks!
How slow is a millisecond?
• From 1960 – present: clock improves 6000 X. (2 micro – 0.3 nanoseconds).
• Latency: 11 X– IBM 1405, year = 1961, RPS = 23. Today 250 RPS
• Seek: 100X (IBM 1405 = 600 ms. Today 6 ms.)
The millisecond is not a reasonable unit in the exascale era.
Things are becoming unbalanced !!!
Argonne National Labs plans for leading-edge supercomputing. Thanks, Rick!
2012 2016 2020 2024
Peak FLOP/s 10-20 PF 100-200 PF 500-2000 PF 2000-4000 PF
Memory 0.5-1 PB 5-10 PB 32-64 PB 50-100 PB
Flops/Bytes 20 20 16 – 32 40
Moore’s LawLast Call?
“There’s no getting around the fact that we make these things out of atoms.”
- Gordon Moore
We need some new Moore’s Laws
Density challenge
$10 per gigabyte (DRAM) today– A DRAM exabyte costs $10B– At exascale time, still billions
Let’s consider other memory technologies
Cost, Density, Power
• Feature shrink – running on empty
• MLC technologies
• 3D technologies (not stacking, real 3D…)
• Static power and memory capacity
Power challenges
DRAM – Reduced-memory exascale– Overfetch, leakage, refresh, scrubbing– Giridhar et al, SC 13: 100PB can be achieved at 4.7 MW
Nonvolatile memory is usually energy-costly to write,
but no static power, no scrub, no refresh
We could make a lot more money if our customers had a bigger plug to plug our machines into.
Flash in Exascale Systems
Return of the millisecond
And it can wear out; so it would be a separate tier
Flash is the new disk
NVRAM has a future
"I'm reasonably confident that ... nonvolatile technologies will replace flash and bring non-volatile memory very close [to compute] with dramatic improvements in latency. Architectures will clearly have to react and respond to that.“
-- Justin Rattner
New memory on the horizon
19
• Spin-Torque-Transfer RAM (STTRAM)– Grandis (54nm, acquired by Samsung)
• Phase-Change RAM (PCRAM)– Samsung (20nm, diode, up to 8Gb)– Micron and Nokia – In phones now
• Resistive RAM (memristor)– Panasonic (180nm process, 4-layer xpoint)– Unity Semi (64MB, acquired by Rambus)– Crossbar– Several others under development
ReRAM
–Very promising, still– Technical issues– Fits in the density range between
DRAM and Flash– Should scale well– Low power – exabyte will be feasible
NVM Programming
• Storage (files, databases)• Persistent heaps• Just plain memory• The SNIA PM Programming TWG• Caches and persistence, transactions, failure
atomicity• A co-design opportunity
Networks and machine usability
• Ethernet cadence: 1Gb, 10, 40, 100.• No Moore’s Law• Very high overheads• New interest, even outside HPC, in
– RDMA– Topological routing– User-level comm– Very fine grained, low latency
Error detection and correction
How will we know we got a correct answer?
How do we respond to an error flag?
Verify in-line (at every timestep) and recomputed if there is a probable error!
Embedded auxiliary scheme
0 50 100 150 200 250 300iteration
350 400 450 5000
0.1
0.2
0.6
0.5
0.4
0.3
0.7
0.8
N
Appa
rent
Loc
al E
rror
Bigger errors are easier to detectThere is high recall (almost all errors found) and very few false positives(For a simple case – the heat equation)
A plan for progress
DRAM cannot provide adequate low-static-power capacity
No disks. Solid-state memory+storage
Twilight of the one-size-fits-all server
Low-latency communication
Self-checking algorithms + in-NVRAM checkpoints resilience