Expanding the World of Heterogenous Memory Hierarchies The Evolving Non-Volatile Memory Story 16 May 2019 Bill Gervasi Principal Systems Architect
Expanding the World of Heterogenous Memory Hierarchies
The Evolving Non-Volatile Memory Story
16 May 2019
Bill GervasiPrincipal Systems Architect
2Data Processing Challenges
Checkpointing Memory Tiers
Persistence
Current Solutions
Agenda
Seeking the Ideal
A New Standard
Mixed Mode
Solutions
Distributed Processing
Security
Sharing Time
3
Data processing is great
4
Data processing is great
Until something goes wrong
5The Cost of Power Failure
6
7
Run
DRAM
StorageCheckpoint
Run
StorageCheckpoint
Run
StorageCheckpoint
Checkpointing degrades
performance
Checkpointingburnspower
Checkpointingsucks
8
Run
DRAM
StorageCheckpoint
Run
FAIL!Run
Checkpoint
RESTART
But checkpointing avoids data loss from
failure
9
Data persistence is essential
System failure is a key factor in server
software design
Storage access time impactstransaction granularity
10
The game we play totrade off performance,
capacity, and cost
11
…move non-volatile storage closer to the CPU
To reduce the penaltiesfrom checkpointing…
12Traditional Server Architecture Review
CPUI/O MemoryControl
Mem
ory
Mem
ory
Mem
ory
Mem
ory
…
Network
…
$
…
Mem
ory
Mem
ory… Mem
ory
Mem
ory…
Faster,lower
latency
13
The Search for
The holy Grail
14
DATA PERSISTENCE
When we no longer fear power failure…
15What if you couldreplace DRAM with
a non-volatilememory?
You’d call itMemory
ClassStorage
16
The non-volatile memory revolution is under way
3DXP ReRAM
PCMMRAM
NRAM™
When was the last time you read about a new volatile memory?
17
From vacuum tubesTo core memory
To DRAMTo NVRAM
18
THIS is why the term “Persistent Memory” is insufficient
The industry must distinguish between deterministicand non-deterministic persistent memory
Only “Memory Class Storage” isfully deterministic AND persistent
19
Not all “persistence”is created equal
SRAMDRAM
Flash
3DXpoint
NRAMFeRAM
MRAMReRAM
20
“Write endurance”determines HOW persistent
Wear leveling needed if writes are limited
21
Temperature sensitivityimpacts long term retention
Weeks of Data Retention
22
READ WRITE WRITE READ WRITE
DATA DATA DATA DATA DATA
DRAM interface is deterministicData latency is FIXED
READ WRITE WRITE READ WRITE
DATA DATA DATA HOUSEKEEPING
Any endurance limit breaks determinismX X
23
Full DRAM Speed
No endurance limits
Fully deterministic
Memory Class Storage
24
NVRAMis a
Memory Class Storage
25
Memory Class Storage
NVRAM=
For now…
NVRAM
Memory Class Storage
In the future?
26
Storage Class Memory
Is NOT a
Memory Class Storage
27
Flash
Storage
Magnetic RAMResistive RAM
3DXpointPhase Change
3D NOR
Storage ClassMemory
DDRNVRAM
≥ DRAM performance= DRAM endurance≥ DRAM capacity
Memory ClassStorage
Hard Disk
SSD
NVMe
DDRDRAM
Wasteland
28
Deterministic
Non-Deterministic
Deterministic
Non-Deterministic
Deterministic
Non-Deterministic
29
DRAM
NVDIMM-N
Optane
NVRAM Memory Class Storage
NVDIMM-P
30
DRAM
ACT
RD
WR
PRE
ACT
RD
WR
PRE
ACT
RD
WR
PRE REFR
ESH
ACT
RD
WR
PRE
Refresh time consumesup to 15% of bandwidth
Itty bitty leakycapacitors lose charge
On power fail, you lose
31
Run
FAIL!
DRAM
32
NVDIMM-NDRAM Array
Flash Backup
NVMControl
Isolation Buffers
VoltageRegulator
VoltageRegulator
Host SystemCPU
Energy Source
Power Fail
NVD
IMM
-N
Use DRAM normally
On Power Fail, copy to Flash
Power restored, copy to DRAM
33
Run
FAIL!NVDIMM-N
Run
Switch to Battery Power
Copy DRAM to Flash
Copy Flash to DRAM
RESTORE
34
NVDIMM-N
Copy DRAM to Flash
Copy Flash to DRAM
1-2 MINUTES
1-2 MINUTES
One power failcycle pays for
a LOT ofprotection
35
Optane
3DXpoint Array
NVMControl
Host SystemCPU
RD
Data
Reads are slow
WR
Data
Writes are deathly slow
Could be used as a very slow DRAMbut more common as expansion
Faster than Flash!!!
But vs DRAM? Meh
Decent capacity, though
36
3DXpoint Array
NVMControl
Host SystemCPU
App Direct
3DXpoint Array
NVMControl
Host SystemCPU
DRAM as Cache
Memory Mode
512GB = 512GB
512GB + 64GB = 512GB
Optane
37
VoltageRegulator
NVDIMM-P DRAM Cache
Non-Volatile Memory Array – Any Kind
NVMControl
Host SystemCPU
Small Energy Source
Read A
RSP Data A
Read B Read CSend
Data C Data BRSP
Send
RSP
Send
New non-deterministic protocol
Not backward compatible with DDR
Requires NVDIMM-P aware CPU
NVDIMM-P Protocol
38
NVDIMM-P Persistence
Options
Volatile ModeNo Persistence
Explicit FLUSH Command
Battery Backup ala NVDIMM-N
Reduced Energy,
Cacheless
39
DRAM speed
Non-volatility
Scalable beyond DRAM
Low power
Low cost
Unlimited write endurance
Wide temperature range
Flexible fabrication & application
NVRAM
40
Host System
Drop in replacement for DRAM
Permanently persistent
Always available
DRAMNVRAM Memory Class Storage
Fully Deterministic
41
DDR5 NVRAM
NRAM™
ReRAM * MRAM *
PCM *
* Future generation devices
42
43
Comparing DRAM & NVRAM
No refresh is required
“Self refresh” can be power OFF
Some timing differences (but deterministic!)
Data persistence definitions
Greater per-die capacity
44
NRAM™
ReRAM MRAM
PCM
≠Timings Precharge
requirementPersistencedefinition
DDR5 NVRAM Specification brings coherence
45
IDLE
REFRESH
DRAM
“350 ns”
IDLE
REFRESH = NOP
NVRAM
Refresh command is not neededDecoded as NOP for compatibility
“0 ns”
46
IDLE
SELFREFRESH
DRAM
REFRESH FREQUENCYCHANGE
Power burned
IDLE
NVRAM
FREQUENCYCHANGE
SELFREFRESH
“No” power burned
47
IDLE
ACTIVATE
PRECHARGE
WRITE READ
DRAM
IDLE
READ
WRITE
NVRAM
Precharge command is not neededDecoded as NOP for compatibility
48
Persistence Definitions*
Intrinsic:Immediately
AfterWRITE Extrinsic:
AfterFLUSH
Command
Power Fail:On
NVRAMRESET
* Discussions on-going
49
* Discussions on-going
WR WR WR
Data is persistent
IntrinsicPersistence
WR WR FLUSH WR WR FLUSHExtrinsicPersistence
WR WR WR WR WR RESETPower FailPersistence
50
DDR5 DRAMis limited
to 32Gb per die
DDR5 NVRAM enables up to128Tb per die
51
ACT RD WR ACT RD WR ACT RD WR
DDR5 SDRAM
REXT ACT RD WR ACT RD WR REXT ACT RD WR
DDR5 NVRAM
Row Extension adds up to 12 more bits of addressing
Backward compatible with DDR5 – Acts like REXT = 0 until needed
52
Bank buffer 0ROW
COLUMNS
Bank buffer 31ROW … DDR5SDRAM
Bank buffer 0ROW
COLUMNS
Bank buffer 31
REXT
ROWREXT … DDR5NVRAM
“ROW” includes bank group & bank…
53
REXT A
ACT BANK W, ROW K
ACT BANK X ,ROW L
ACT BANK Y, ROW M
REXT B
ACT BANK Z, ROW N
READ BANK W
READ BANK X
READ BANK Y
READ BANK Z
Row A + K
Row A + L
Row A + M
Row B + N
REXT C
READ BANK Y Row A + M
ACT BANK W, ROW P
READ BANK W Row C + PWRITE BANK X Row A + L
ACT BANK X, ROW L
WRITE BANK X Row C + L
READ BANK Z Row B + N
REXT A
ACT BANK X, ROW R
READ BANK X Row A+R
Row Extension Example
54
REXT A
ACT BANK W, ROW K
ACT BANK X ,ROW L
ACT BANK Y, ROW M
REXT B
ACT BANK Z, ROW N
READ BANK W
READ BANK X
READ BANK Y
READ BANK Z
Row A + K
Row A + L
Row A + M
Row B + N
REXT C
READ BANK Y Row A + M
ACT BANK W, ROW P
READ BANK W Row C + PWRITE BANK X Row A + L
ACT BANK X, ROW L
WRITE BANK X Row C + L
READ BANK Z Row B + N
REXT A
ACT BANK X, ROW R
READ BANK X Row A+R
Row Extension Replacement Example
55
NVRAMMemory Class Storage
56
Run
DRAM
StorageCheckpoint
Run
StorageCheckpoint
Run
StorageCheckpoint
Run
Run
Run
Run
NVRAM
Checkpointing can be made to persistent
memory
Checkpointing can be turned off completely
OR
57
Run
DRAM
StorageCheckpoint
Run
StorageCheckpoint
Run
StorageCheckpoint
Run
NVRAM
Checkpoint
Run
Checkpoint
Run
Checkpoint
NVRAM
NVRAM
NVRAM
Phase 1
Run
NVRAM
Run
Run
No checkpoint
No checkpoint
Phase 2
58
Keep in mind…
Power failureis not the onlything to fear
Checkpointsmay include
system failure
Knowing whena task mayresume is
complicated
59
Remember Those
Persistence Definitions
ImmediatelyAfter
WRITE
Tasks may be safe in nanoseconds
AfterFLUSH
Command
Tasks may be safe in microseconds
OnNVRAMRESET
Tasks may not be safe until system
stability confirmed
60
Performance CapacityPersistence
System designers havea lot of options to balance
61HomogenousMain Memory
DRAM
MCS
Optane
NVDIMM-N
NVDIMM-P
62
DRAM +Optane
MCS +NVDIMM-P
MCS +Optane
HeterogenousMain Memory
63
DRAM
NVDIMM-N
Optane
NVRAMMemory Class Storage
NVDIMM-P
32GB
64GB
512GB
When capacitymeets persistence
64
DRAM
NVDIMM-N
Optane
MCS
HomogenousMain Memory Combinations
NVDIMM-P
Data Safe
No
Yes
Yes
Yes
Yes
Performance
Best
Best
Worst
Mid
Best+
Capacity
1.0 X
0.5 X
10 X
10 X
1 X+
65
DRAM + Optane
MCS + Optane
MCS + NVDIMM-P
Data Safe
No
Yes
Yes
DRAM + NVDIMM-P No
Performance
High
High
High
High
Capacity
6 X
6 X
6 X
6 X
HeterogeneousMain Memory Combinations
66HomogenousMain Memory Combinations
Software need not care
All functions take thesame time
HeterogeneousMain Memory Combinations
Software encouraged toput critical functions
in faster memory
Often mount slowermemory as RAM drive
67
Software support via DAXassists in moving…
from mounted drives…
…to direct access mode
…to RAM drive…
68
The Power
of
Zero Power
69
Putting a Node to Sleep
OperatingMode
Self RefreshMode
Instant On meanspower must stay alive
Refresh operations burnsignificant power
70
33
Memory Class Storagecan be turned off entirely
OperatingMode
PowerOff
71
DDR5 memory moduleshave on-DIMM voltage
regulation (PMIC)
DIMM power may beshut off independently
of system power
Data Buffers
Memory Media
System Power
ModulePower
PMIC
Memory Module (DIMM)
System Motherboard
72
Data Buffers
Memory Media
System Power
ModulePower
PMICData Buffers
Memory Media
ModulePower
PMIC
Multiple powermanagement options
System power off; both DIMMs off
System power on & both DIMMs off
System power on & DIMM1 on, DIMM2 off
DIMM1 DIMM2
73
Nantero NRAM™
My favorite NVRAM
Full presentation on Wednesday…
74
Van der Waals energy barrier keeps CNTs apart or together
Data retention >300 years @ 300 ֯C, >12,000 years @ 105 ֯C
Stochastic array of hundreds nanotubes per each cell
ELECTRODE
ELECTRODE
75
5 ns balancedread/write performance
No temperature sensitivity
76
2,500 years ago
4,500 years ago
10,000 years ago
NRAM Data Retention= 12,000 Years
77
Array size tuned to the size of drivers & receivers
Drivers Receivers
Z
Y
X
NRAM LAYER
I/O PHY
64 Kb tileX
256 K tiles=
16 Gb
Chip-level timing is a function of bit line flight timesReplicate this “tile” as needed for device capacity
Add I/O drivers to emulate any PHY needed
78
Data Strobe DataStrobe
FIFO FIFO
SECDED ECC Engine
64 bits
72 bits
x4/x8
Address
RowDecode
ColumnDecode
Carbon Nanotube Arrays
Chip ID Die Selector
BankDecode
DDR4, DDR5NRAM
79
DDR4/DDR5
Elimination of refresh
Elimination of tFAW restrictions
Elimination of bank group restrictions
Elimination of power states
Base throughput
Architectural improvementsimprove data throughput
15% or greater at the sameclock frequency
15-20%
Bandwidth: larger is better
Elimination of inter-die delays
DDR4/DDR5 NRAM
80NVRAMMemory Class Storage
NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM
NRA
M
NRA
M
NRA
M
NRA
M
NRA
M
NRA
M
NRA
M
NRA
M
NRA
M
Plugs into an RDIMM slot
Appears to the CPU as DRAM
Memory controller may optionally be tuned for NVRAM
81One less layer ofmarshmallows to
deal with
Fully deterministic
Non-deterministic
Persistence
Persistence
82
83
A LEGO?
KnowYour
Enemy
Would you rather…
Step onbroken glass?
Or some jacks?
84
…about those energy stores…
Batteries Supercapacitors Tantalums (etc.)
85
Batteries Supercapacitors Tantalums (etc.)
High capacityHigh energy density
Low reliability
Medium capacityLow energy densityDegrade over time
Low capacityLow energy density
…but stable
86
Flash orStorage Class
Memory
StorageController DRAM Energy
I/O
Energy needed for backup of DRAM cache
87
Flash orStorage Class
Memory
StorageController NVRAM Energy
I/O
XEliminate need
for backup energy
XMore roomfor storage
88
NVRAM Changes the MathDRAM cache limited by
energy available
No DRAM? Cache size dictated by
cost/performance
1GB/TB
89
…to Systems Evolution
Switching gears again…
90
Pop quizHow many CPUs in a 1980s PC?
91
One?
GraphicsAdapter
Modem
NetworkAdapter
SoundBlaster
92
They were called “DSPs”Digital Signal Processors
They put processingnext to the data
They were killed by“Native Signal Processing”
Drivers
Analog front end devices
93
$ $ $
W WW
With NSP…
So why do it?
94
CPU
CPU
Memory
Memory
Storage Storage
AI
FPGA
Fabric
Now We Are Trending Back
95
96
ProcessorElements
ProcessorElements Processor
Elements
I/OElements
I/OElements
StorageElements
StorageElements
StorageElements
Bridge
Bridge
Bridge
LowLatencyFabric
97
Distributed resources
In-memory computing
Application-specific computing
Artificial intelligence and deep learning
Security
98
Network Adapter
LowLatencyFabric
Artificial IntelligenceAccelerator
Search EngineGraphics Accelerator
Human interface
Standard CPU
HTML processing
Human interfacemanagement
Memory Array
Filesystem AwareStorage
99
HBM
HBM
HBM
HBM
ExecUnit SRAM
ExecUnit SRAM
ExecUnit SRAM
…
I/O NNP Control
SIMD architecturesMatrix interconnectionsFast pipes still limit load/save time
Challenges:• Model checkpointing• Data loss on power fail• Temperature sensitivity
Tbps links
Example AI accelerator
100I/O
Back propagation algorithms complicate things
Data loss problems are amplified
Checkpointing highly time and bandwidth consuming
101
The more distributed memory gets,the harder to load and unload
MEM MEM MEM MEM
MEM MEM MEM MEM
MEM MEM MEM MEM
102
NVRAM TO THE RESCUE!
Replacing dynamic memory withpersistent memory resolves thedata loss issues
103
ExecUnit SRAM
ExecUnit SRAM
ExecUnit SRAM
…
I/O NNP Control
MCS MCS MCS MCS
MCS MCS MCS MCS
MCS MCS MCS MCS
Just leave the data in placeas long as you want
HBM
HBM
HBM
HBM
MCSHBM
MCSHBM
MCSHBM
MCSHBM
Replace DRAM with NVRAM
Replace eRAM with NVRAM
104
SRAM & RegistersThe final frontier…
105
Continuing to look for ways to bringMemory Class Storage down under 1ns
It will happen
Faster edge rates
Voltage adjustment
Better error check
Shadow registers
Getting smarter
106
DATA PERSISTENCE
When we no longer fear power failure…
Full END TO END persistence
107
Are we getting near the day when we look back
at volatile memory…
…and LAUGH?
108
……b…bu…but…but…
109
Persistent data introduceschallenges, too
110
Data isALWAYSthere!
Data securityis a growing
concern
111
So many potential breaches
Application opens data fromprevious application
Memory moved from onesystem to another
Spy devices on memorybuses
112
Infection viahack Infection via
spy devices
113
Password: X2.Hd44**3#jj0%
General trend is to encrypt data beforetransmission or storage
114
Keep the bad guys out
X2.Hd44**3#jj0%
X2.Hd44**3#jj0%
115
VoltageRegulator
DRAM Cache
Non-Volatile Memory Array – Any Kind
SmartNVM
Control
Host SystemCPU
Small Energy Source
Some are adding in-memorycompute functions
including encryption
Works as long as the busis secure
Encryption quality may belimited by block transfer size
Management of many keyscan get complicated quickly
Password:
X2.Hd44**3#jj0%
116
ISO/IEC 11889
117
Power Fail Sucks
Saving Data is a Pain
Need tiers of memory & storage
Persistence is Essential
Today’s Solutions
HelpSummary
But We Can Do BetterDDR5
NVRAM Spec in Progress
Mix & Match
Memories
Data Distribution Challenges
Persistence Complications
Sharing Time
119
I’m here tolearn too
What do youdeal with?