Top Banner
The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014
28

The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Dec 18, 2015

Download

Documents

Carol Rich
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

The Memory is the Computer

Rob SchreiberHP Labs

DOE Salishan Conference, 2014

Page 2: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Let’s Build an Exascale Machine

And make it useful.–Adequate memory capacity–No disks (except for archival store)–20 MW (good thing)–More than a loosely-connected cluster– It won’t always work!

Page 3: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

It will be a very parallel machine

Exaflops at gigahertz

a billion operations per clock

Page 4: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

How will we get that much parallelism?

• Big problems

• More than one problem

• Pipelines of problems– Preprocess, mesh gen– Solve, resolve, UQ, optimize– Postprocess and visualize

• Solving the same problem twice!

Page 5: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

And we will need a lot of memory

• Many problems on the machine at one time

• Performance costs memory

• No disks!

Page 6: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Disk is the New Tape

Page 7: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

How slow is a millisecond?

• From 1960 – present: clock improves 6000 X. (2 micro – 0.3 nanoseconds).

• Latency: 11 X– IBM 1405, year = 1961, RPS = 23. Today 250 RPS

• Seek: 100X (IBM 1405 = 600 ms. Today 6 ms.)

The millisecond is not a reasonable unit in the exascale era.

Page 8: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

We need a lot of memory

Page 9: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Things are becoming unbalanced !!!

Argonne National Labs plans for leading-edge supercomputing. Thanks, Rick!

2012 2016 2020 2024

Peak FLOP/s 10-20 PF 100-200 PF 500-2000 PF 2000-4000 PF

Memory 0.5-1 PB 5-10 PB 32-64 PB 50-100 PB

Flops/Bytes 20 20 16 – 32 40

Page 10: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Moore’s LawLast Call?

“There’s no getting around the fact that we make these things out of atoms.”

- Gordon Moore

We need some new Moore’s Laws

Page 11: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Density challenge

$10 per gigabyte (DRAM) today– A DRAM exabyte costs $10B– At exascale time, still billions

Let’s consider other memory technologies

Page 12: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

How warm

As we move data from disk to memory, all other things being equal, the memory cools

Page 13: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Cost, Density, Power

• Feature shrink – running on empty

• MLC technologies

• 3D technologies (not stacking, real 3D…)

• Static power and memory capacity

Page 14: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Power challenges

DRAM – Reduced-memory exascale– Overfetch, leakage, refresh, scrubbing– Giridhar et al, SC 13: 100PB can be achieved at 4.7 MW

Nonvolatile memory is usually energy-costly to write,

but no static power, no scrub, no refresh

We could make a lot more money if our customers had a bigger plug to plug our machines into.

Page 15: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Flash

3D NAND Flash is BIG128Gb chips reported (vs. 4-8 Gb for DRAM).But ..

Page 16: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Characteristics

Page 17: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Flash in Exascale Systems

Return of the millisecond

And it can wear out; so it would be a separate tier

Flash is the new disk

Page 18: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

NVRAM has a future

"I'm reasonably confident that ... nonvolatile technologies will replace flash and bring non-volatile memory very close [to compute] with dramatic improvements in latency. Architectures will clearly have to react and respond to that.“

-- Justin Rattner

Page 19: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

New memory on the horizon

19

• Spin-Torque-Transfer RAM (STTRAM)– Grandis (54nm, acquired by Samsung)

• Phase-Change RAM (PCRAM)– Samsung (20nm, diode, up to 8Gb)– Micron and Nokia – In phones now

• Resistive RAM (memristor)– Panasonic (180nm process, 4-layer xpoint)– Unity Semi (64MB, acquired by Rambus)– Crossbar– Several others under development

Page 20: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

ReRAM

–Very promising, still– Technical issues– Fits in the density range between

DRAM and Flash– Should scale well– Low power – exabyte will be feasible

Page 21: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

NVM Programming

• Storage (files, databases)• Persistent heaps• Just plain memory• The SNIA PM Programming TWG• Caches and persistence, transactions, failure

atomicity• A co-design opportunity

Page 22: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Attack of the killer cellphones

The end of Moore’s Law a restoration of diversity

Page 23: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Networks and machine usability

• Ethernet cadence: 1Gb, 10, 40, 100.• No Moore’s Law• Very high overheads• New interest, even outside HPC, in

– RDMA– Topological routing– User-level comm– Very fine grained, low latency

Page 24: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Error detection and correction

How will we know we got a correct answer?

How do we respond to an error flag?

Verify in-line (at every timestep) and recomputed if there is a probable error!

Embedded auxiliary scheme

Page 25: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

0 50 100 150 200 250 300iteration

350 400 450 5000

0.1

0.2

0.6

0.5

0.4

0.3

0.7

0.8

N

Appa

rent

Loc

al E

rror

Page 26: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Bigger errors are easier to detectThere is high recall (almost all errors found) and very few false positives(For a simple case – the heat equation)

Page 27: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

A plan for progress

DRAM cannot provide adequate low-static-power capacity

No disks. Solid-state memory+storage

Twilight of the one-size-fits-all server

Low-latency communication

Self-checking algorithms + in-NVRAM checkpoints resilience

Page 28: The Memory is the Computer Rob Schreiber HP Labs DOE Salishan Conference, 2014.

Generic Disclaimer, and Acknowledgements

The views in this presentation are not necessarily those of HP.

Thanks to: Sarah Anthony, Cullen Bash, Al Davis, Paolo Faraboschi, Dick Henze, Kevin Lim, Moray McLaren, Naveen Muralimanohar, Jerry Rolia, Mike Tan