Top Banner
© 2013 IBM Corporation PSS A Prototype Storage Subsystem based on PCM IBM Research – Zurich Ioannis Koltsidas, Roman Pletka, Peter Mueller, Thomas Weigold, Evangelos Eleftheriou University of Patras Maria Varsamou, Athina Ntalla, Elina Bougioukou, Aspasia Palli, Theodore Antonakopoulos
20

A Prototype Storage Subsystem based on Phase Change Memory

Nov 28, 2014

Download

Technology

IBM Research

IBM scientists for the first time demonstrated a hybrid storage and caching subsystem, code-named Project Theseus, at the recent 2014 Non-Volatile Memories Workshop in San Francisco, California. And the amazing achievement is that they were using two year old PCM chip prototypes.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

PSSA Prototype Storage Subsystem based on PCM

IBM Research – ZurichIoannis Koltsidas, Roman Pletka, Peter Mueller, Thomas Weigold, Evangelos Eleftheriou

University of PatrasMaria Varsamou, Athina Ntalla, Elina Bougioukou, Aspasia Palli, Theodore Antonakopoulos

Page 2: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation2

Phase Change Memory (PCM)

Based on the thermal threshold switching effect of chalcogenidic meterials

Two Phases:

Set

Reset

Amorphous Phase Crystalline Phase

Phases have very different electrical resistances (ratio of 1:100 to 1:1000)

Transition between phases by controlled heating and cooling

Read time: 100-300 nsec

Program time: 10-150 μsec

PCM cells can be reprogrammed at least 106 times

Performance and price characteristics between DRAM and Flash

Page 3: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

Placing PCM in Servers and Storage Systems

Server System Storage System

RAID ctrl

HBA

CPUs DRAMCache

I/Obus

RAID ctrl

CPUs

I/Obus

HA

PCM

PCM

Hybrid PCIeattached SSD

Hybrid SAS/SATA

attached SSD

PCM

Hybrid PCIeattached SSD

Hybrid SAS/SATA

attached SSD

PCM

DRAM DRAM

PCM

DRAMCache PCMPCM

PCM PCM

3

PCMPCM

HDDSSD

All-FlashArray

Metadata Tables

Page 4: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

PSS: A PCM-based PCI-e Prototype Card

4

Goal: Architect a PCM-based device and implement a fully-functional, high-performance PCI-e card

Take advantage of the characteristics of PCM and mitigate its limitations

Target is workloads dominated by 4kB requests

Simple, lightweight hardware design

System integration of multiple cards through software

Emphasis on consistently low, predictable latency

PCM PCI-e cardHost

PCMchips

PCM Channels

LeanPCM

ControllerMulti-lanePCI-e bus

BlockDeviceDriver

PCMpipe #1

PCMpipe #n

512B

4kB

Limited in capacity due to the density of commercially available PCM parts (as of early 2013)

Use cases:– Caching device– Metadata store– Backend for low-latency Key-Value store – Tiered storage device in a hybrid configuration with Flash

Page 5: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation5

PCM Parts: Micron P5Q

90nm technology node

128 Mbit devices (NP5Q128AE3ESFC0E)

SPI bus compatible serial interface

Maximum clock frequency: 66 MHz

64-byte write buffer– 120 μsec average program time– about 0.5 MB/s write bandwidth

Block transfer time: 8.24 usecs (64+4 Bytes)

Sector I/O (512B + 64B):Write: 1.15 msecs (0.86 kIOPs)Read: 75.24 usecs (13.29 kIOPs)

Very asymmetric read / write performance

Page 6: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation6

2D PCM Channel Architecture

PCM(1,NI)

PCM(1, NI -1)

PCM(1,1)

k

PCM(2, NI)

PCM(2, NI -1)

PCM(2,1)

k

PCM(NS, NI)

PCM(NS, NI -1)

PCM(NS,1)

k

FSM

Data Buffer

FSM FSMFSM

Sub-channel

PCM ChannelController

Sub-bank

Page 7: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation7

Read vs. Write Performance Trade-Offs

A high degree of pipelining:– Increases the write performance

• By having the long programming times overlap– May reduce the read performance

• Read times are anyway very short• Less parallelism due to fewer I/O pins

For a given budget of I/O pins:– More sub-channels Better read performance– More sub-banks Better write performance

Application needs should drive the configuration– Channels with different geometry in the same device

possible

We chose a configuration that minimizes write latency without severely penalizing reads

Page 8: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation8

3x3 Channel Architecture

PCM(1,3)

PCM(1, 2)

PCM(1,1)

k

PCM(2, 3)

PCM(2, 2)

PCM(2,1)

k

PCM(3, 3)

PCM(3, 2)

PCM(3,1)

k

FSM

Data Buffer

FSM FSMFSM

PCM Channel Controller

1 Block = 64 bytes

3 blocks

3 blocks

3 blocks

For each user sector (512b= 8x64), we store 64bytes of metadata, i.e., 9 blocks in total

3 pi

pelin

ed p

rogr

am

oper

atio

ns to

the

sub-

bank

s

PCM Bank

Page 9: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

PSS Channel Card

One PCM channel per side

Two PCM channels per card

Channel card

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout[1,1]

[1,3]

[3,1]

[3,3]

[1,2]

[2,1]

[2,3]

[2,2] [3,2]

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout

CS/ Din

CLK Dout[1,1]

[1,3]

[3,1]

[3,3]

[1,2]

[2,1]

[2,3]

[2,2] [3,2]

CS[3] CS[2] CS[1] CLK D1[1:0] D2[1:0] D3[1:0]DIR

C B A Y0

Y1

Y2Y3Y4Y5

3 to 8DECODER

PCM channel specs

Data transfer Rate: 49.5 MBps

Sector read time: 13.8 usecs

Sector read rate: 61.6 ksectors/sec

Sector write time: 133.8 usecs

Sector write rate: 14.8 ksectors/sec

1 PCM Channel = 2 Banks (2x3x3)

Page 10: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

PSS PCI-e Card

10

Dau

gh

ter

card

PC

M C

hann

el m

odul

e

PC

M C

ha

nn

el

mo

du

le

PC

M C

hann

el m

odul

e

PC

M C

hann

el m

odul

e

Xilinx Zynq-7045 FPGA Board

2X H

ost

-att

ach

ed P

SS

Car

ds

Error correction based on simple BCH codeso 6 BCH codewords per 512 bytes sector with 4 bits error correction capability per codeword.

Wear leveling using a Start-Gap scheme 512MB of DRAM, mostly used as a write cache Support for cached writes, direct writes with early completion, direct writes with late completion

8 PCM channels with pipeline support per card

Page 11: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

PSS Experimental ResultsThroughput versus Offered Load (4kB pages)

Page 12: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

Random Read

PSS Experimental ResultsI/O Completion Latency Distribution

PCM technology programming time Random Write (cached)

Page 13: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation13

Latency distribution comparison to Flash-based devices

Devices– PSS PCI-e Card– MLC Flash PCI-e SSD 1– MLC Flash PCI-e SSD 2– TLC Flash SATA SSD

ExperimentPer-I/O latency measurements for 2 hours of uniformly random 4kB writes at QD=1(after 12 hours of preconditioning with the same workload)

Page 14: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

Latency Profile up to 1msec

PSS MLC1

MLC2 TLC

Page 15: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

Latency Profile up to 10msec

PSS MLC1

MLC2 TLC

Page 16: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

Total Latency Profile

PSS MLC1

MLC2 TLC

Page 17: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

PCM Endurance Measurements

17

minimum write time per sector

maximum write time per sector

mean write time per sector

The effect of PCM aging on write time and BER

Experimental parameters:• Random data• 32 sectors per write cycle• 4 PCM channels • 8 PCM banks• Pipeline is active

Experiment- Perform 10K write cycles

with random data (x32 sectors)

- Write, read and compare a set of 32 sectors (single write cycle)

Page 18: A Prototype Storage Subsystem based on Phase Change Memory

PCM Write Latency Distribution

Experimental parameters:• Random data• 32 sectors per write cycle• 4 PCM channels • 8 PCM banks/controllers• Pipeline is active

Experiment- Perform 10K write cycles with random data

(x32 sectors) - Write, read and compare a set of 32 sectors

(single write cycle)

Page 19: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation

Conclusions

PCM is a promising new memory technology

PSS is a PCI-e attached subsystem that mitigates the limitations of current PCM technology

The 2D Channel Architecture allows the designer to trade-off read performance for write performance and vice-versa

PSS achieved good performace– 65k Read IOPS @ 35 μsec– 15k Write IOPS @ 61 μsec

PSS achieved consistently low write latency– 99.9% of the requests completed within 240 μsec

• 12x and 275x lower than MLC and TLC Flash SSDs, respectively– Highest observed latency was 2 msec

• 7x and 61x lower than MLC and TLC Flash SSDs, respectively

19

Page 20: A Prototype Storage Subsystem based on Phase Change Memory

© 2013 IBM Corporation20