Santa Clara, CA USA August 2008 1 Accelerating SSD Performance with HLNAND Roland Schuetz MOSAID Technologies Inc. David Won, INDILINX
Jun 12, 2015
Santa Clara, CA USAAugust 2008 1
Accelerating SSD Performance with HLNAND
Roland Schuetz
MOSAID Technologies Inc.
David Won, INDILINX
Santa Clara, CA USAAugust 2008 2
Presentation Outline
PC Architecture and the Solid State Drive HLNAND Introduction HLNAND Enhances SSD Performance Conclusions
Santa Clara, CA USAAugust 2008 3
PC System ArchitectureWhere are the Bottlenecks?
CPU
North Bridge
South Bridge
GPU DRAM
HDD
BIOS
DRAM/GDRAM
PC System ArchitectureMigration of System Memory
Creating New MemorySub-Architecture
Utilize Upgraded PCIelink for New Storage
Memory Sub-Architecture
Most Memory Interfaces and System Interconnect Have
Undergone Dramatic BW Upgrades Over the Years
Bulk Storage Access BW Lags(Mechanical Latency)
DRAM
Santa Clara, CA USAAugust 2008 4
Three categories:• Enterprise• Notebook / PC• NetBook / Ultraportable
Three cost models:• >$100 non-NAND in
Enterprise SSD’s• ~$5.00 non-NAND BOM in
Notebook SSD’s• <$1.00 non-NAND BOM in
NetBook / Ultraportable SSD’s
Enterprise
Notebook
NetBook / Ultraportable
SSD Market Segment
Source: STECSource: SanDisk
Source: STEC
Santa Clara, CA USAAugust 2008 5
Enterprise application move huge amounts of data and are IO bound
These include:• File, web, transaction servers• Multimedia editing systems• Simulation
servers/workstations
Current Flash based enterprise applications based on conventional, 40 – 100 MBps sub-systems
Source: Violin 1010 Memory Appliance, Violin Memory, Inc.
Enterprise Flash Storage Excellent Market for HLNAND SSDs
Santa Clara, CA USAAugust 2008 6
Current SSD Offerings (1st Generation)
Leading NAND, memory module, and specialized SSD manufacturers offer conventional NAND flash based product
Cost is pivotal for SSD adoption Little product and architectural differentiation
• 4 channel• 4-way interleave
Similar cost structure• NAND flash constitutes the majority of the BOM• Vertically integrated manufacturers have pricing advantage
Similar performance• 30 ~ 100MB/s Read/Write performance• No one competitor has performance lock on the market
Santa Clara, CA USAAugust 2008 7
How are Current SSDs Stacking Up?
Source: Engadget
Santa Clara, CA USAAugust 2008 8
Current SSDs vs. HDDs
59.9 MB/s
76.5 MB/s
85.9 MB/s
50.1MB/s
60.0 MB/s
94.6 MB/sMTRON SSD 2.5"SATA150, 32GB
86.9 MB/sWestern Digital HDD, WD1500ADFD
150GB, SATA150, 10,000 rpm(Enterprise HDD)
68.3MB/sSanDisk SSD5000SATA150, 32GB
Fujitsu HDD, MHW2160BJ160GB, SATA300, 7200 rpm
MTRON SSD 2.5"SATA150, 32GB
Western Digital HDD, WD1500ADFD150GB, SATA150, 10,000 rpm
(Enterprise HDD)
SanDisk SSD5000SATA150, 32GB
Fujitsu HDD, MHW2160BJ160GB, SATA300, 7200 rpm
Window XP Startup PerformancePCMark05 HDD Benchmark
25.9
51.5
10.9
7.9
(MB/s)
Read Throughput Write Throughput
Sources: Tom’s Hardware, Mtron SSD 32 GB: Performance with a Catch,Patrick Schmid, Achim Roos, November 21, 2007;Tom’s Hardware, Flash-Based Hard Drives Cometh,Patrick Schmid, Achim Roos, August 13, 2007;
Santa Clara, CA USAAugust 2008 9
HLNAND – New High Speed NAND Standard
HLNAND Unidirectional, point-to-point, daisy-chain
cascade with programmable link width; 1- 8 bits
Synchronous DDR signaling up to 800Mb/s/pin
Each ring supports up to 255 devices with no bandwidth degradation
Single CE per ring enables pin controller count reduction
Low Power 1.8V I/O
Controller
HLNAND
HLNAND
HLNAND
HLNAND
Controller
HLNANDHost
Interface
HLNAND
HLNAND
HLNANDNAND NAND NAND
CE1CE2
CE3Host
Controller
Conventional NAND 8 bit, bidirectional, multi-drop bus
Asynchronous LVTTL signaling up to 40Mb/s/pin
Speed degradation with more than 4 devices on bus
Chip Enable (CE) signal required for each device
Power hungry 3.3V I/O
Santa Clara, CA USAAugust 2008 10
HLNAND – New High Speed NAND Standard (cont’d)
HLNAND New device features enhance
performance and simplify controller and SSD design
New, low-stress program scheme enables:
Random page program Page-pair erase Multi-page & Multi-block erase
Controller
HLNAND
HLNAND
HLNAND
HLNAND
Controller
HLNANDHost
Interface
HLNAND
HLNAND
HLNANDNAND NAND NAND
CE1CE2
CE3Host
Controller
Conventional NAND No new features to enhance flash
performance or simplify controller and SSD design
Santa Clara, CA USAAugust 2008 11
PC Demands on System Storage
Typical PC use is IO intensive -> frequent HDD/SSD access; Ex: creating slide presentation, virus scan, etc.
Booting PC is very IO intensive since the OS must load a large amount of data from bulk storage to DRAM
Santa Clara, CA USAAugust 2008 12
IO Operation ExampleCan SSD Deliver “Instant Boot”
Populate 1GB DRAM from Flash Based Bulk Storage
Conventional Flash SSD with 60MB/s BW takes ~17sec
Optimized conventional flash SSD with 95MB/s* takes ~10 sec
* Mtron SSD 32 GB: Performance with a Catch Patrick Schmid, Achim Roos, November 21, 2007; SSD: Mtron MSDSATA6025032NA
10 Seconds is very observable time!
Santa Clara, CA USAAugust 2008 13
IO Operation ExampleCan SSD Deliver “Instant Boot”
Populate 1GB DRAM from Flash Based Bulk Storage
HLNAND HL1 SSD with 266MB/s BW takes ~3.8sec
HLNAND HL2 SSD with 800MB/s BW takes ~1.3sec
HLNAND based SSDs offer 260% - 450% and up to 800% IO rate improvement. 3.8 sec. not instant, but much closer
Santa Clara, CA USAAugust 2008 14
IOPS are Holly Grail in Enterprise Applications
SSDs offer significantly higher I/O (inputs/outputs) per second Than HDDs: HLNAND will shine
HDD (typical enterprise class) ~150 SSD, SATA (advertised by Sandisk)
7,000 SSD, 4G Fiber Channel (advertised by
STEC) 45,000 HLNAND 300,000*Source: Sandisk, STEC, and Deutsche Bank estimates
I
O P S
* Assume similar design to STEC SSD and multiply by media speed improvement; 45,000 * (266/40).Not including 4GFC saturation limit.Holy Grail of the Monastery of Xenophontos, Macedonian Heritage, 2000-2008
Santa Clara, CA USAAugust 2008 15
SSD Controllerfor HLNAND
HL1 RingHost Interface
Max. Media BW = 1066MB/s @x8b, 4 rings Total 96 signal pins excluding power: 24 signal pins per ring
SSD Controllerfor NAND
Channel 1 - Max. 25MB/s x8b
Channel 2 - Max. 25MB/s x8b
Channel 8 - Max. 25MB/s x8b
Parallel Bus
Host Interface
Max. Media BW = 200MB/s @x8b, 8 channels Total 208 signal pins excluding power: 29 signal pins per
channel, 8 CE# and 8 R/B# per channel
32 dies per ring
16 dies per channel, no termination
128GB SSD using 128 * 8Gb SLC
Ring 1 - Max. 266MB/s @x8b
SSD Performance ComparisonHLNAND-based vs. NAND-based
Ring 1 - Max. 266MB/s @x8b
Santa Clara, CA USAAugust 2008 16
HLNAND SSD vs. NAND SSD
266 MB/s per ring 1 ring to saturate SATA1
(1.5Gbs, 188MB/s) 2 rings to saturate SATA2
(3Gbps, 375M/s) Reduced pin count; 24
ctrl/data pins per ring Expanded feature set reduces
controller overhead Pwr/Media BW : 1.87mW/MB/s
Max. 40MB/s per channel 4-5 channels to saturate
SATA1 8-10 channels to saturate
SATA2 Ctrl/data pin count 29 per
channel No new overhead-reducing
features Pwr/Media BW: 8.96mW/MBs
8 Bit HLNAND-based SSD 8 Bit NAND-based SSD
Santa Clara, CA USAAugust 2008 17
HLNAND SSD vs. NAND SSDRead Throughput, Experimental Results
HLNAND-based SSD, Media Rate NAND-based SSD, Media Rate
Plots from Mobile Embedded Lab, University of Seoul, 2008
Santa Clara, CA USAAugust 2008 18
HLNAND SSD vs. SSD vs. HDDEstimates and Measured Performance
Read Throughput
60.0 MB/s
210 MB/s
420 MB/s (SATA300 IO saturated @ 300 MB/s)
HLNAND SSD, HL1-266, 2KB page, 8bit Link, 1 Ring, SATA300
HLNAND SSD, HL-1266, 2KB page, 8bit Link, 2 Rings, SATA300
94.6 MB/s MTRON SSD 2.5"SATA150, 32GB
86.9 MB/sWestern Digital HDD, WD1500ADFD
150GB, SATA150, 10,000 rpm(Enterprise HDD)
68.3MB/s SanDisk SSD5000SATA150, 32GB Sources:
Tom’s Hardware, Mtron SSD 32 GB: Performance with a Catch,Patrick Schmid, Achim Roos, November 21, 2007;Tom’s Hardware, Flash-Based Hard Drives Cometh,Patrick Schmid, Achim Roos, August 13, 2007;Mosaid EstimatesFujitsu HDD, MHW2160BJ
160GB, SATA300, 7200 rpm
Throughput Calculation: Read throughput = (Maximum media BW) – (20% controller overhead)
Santa Clara, CA USAAugust 2008 19
HLNAND SSD vs. NAND SSDWrite Throughput, Experimental Results
HLNAND-based SSD, Media Rate NAND-based SSD, Media Rate
Plots from Mobile Embedded Lab, University of Seoul, 2008
Santa Clara, CA USAAugust 2008 20
HLNAND SSD vs. SSD vs. HDDEstimates and Measured Performance
59.9 MB/s
170 MB/s
339 (SATA300 IO saturated @ 300 MB/s)
Fujitsu HDD, MHW2160BJ160GB, SATA300, 7200 rpm
HLNAND SSD, HL1-266, 2KB page, 8bit Link, 1 Ring, SATA300
HLNAND SSD, HL-1266, 2KB page, 8bit Link, 2 Rings, SATA300
76.5 MB/sMTRON SSD 2.5"SATA150, 32GB
85.9 MB/s Western Digital HDD, WD1500ADFD150GB, SATA150, 10,000 rpm
50.1MB/s SanDisk SSD5000SATA150, 32GB
Calculated base on MTRON SSD 2.5; (Read Throughput)/(Write Throughput) = 1.24HLNAND:(Read throughput) / 1.24 = (210MB/s)/1.24 ≈ 170MB/s
Sources: Tom’s Hardware, Mtron SSD 32 GB: Performance with a Catch,Patrick Schmid, Achim Roos, November 21, 2007;Tom’s Hardware, Flash-Based Hard Drives Cometh,Patrick Schmid, Achim Roos, August 13, 2007;Mosaid Estimates
Write Throughput
Santa Clara, CA USAAugust 2008 21
SSD Controller Design:Flash Translation Layer, FTL
FTL responsible for high cost jobs of wear-leveling and garbage collection
Wear leveling operations include• Merge• Switch• Switch after copy• Copy after PPE (Page-Pair Erase)• Migration
Santa Clara, CA USAAugust 2008 22
HLAND Improves FTL Performance
New HLNAND features reduce block recycling costs• Page-Pair Erase• Random page program• Partial block erase• Multi block erase
Santa Clara, CA USAAugust 2008 23
Wear Leveling Enhancement:Copy-After-Page-Pair-Erase
p
cpEE
N
CkCMC
Blk 0
1
2
3
4
Log 0
1
2
4
4
1. Page-pair erase
2. Copy
Copy-after-PPE
CE: Erase cost
Ccp : Copy cost
M : Number of blocks erased concurrently
Np: Number of pages in a block
k: Number of page-copies: Additional copy overhead
Page-pair erase introduces low cost wear-leveling opportunities
Translates into more greater system longevity and less controller overhead
Santa Clara, CA USAAugust 2008 24
Synthetic Workload Experiment X8b 4KB page size 256KB block size 2048 blocks/bank 1 bank
Experiment performed by Mobile Embedded Lab, University of Seoul, 2008
Read Program Copyback Erase
60% 25% 10% 5%
617 263 82 38
Wokload
0
5000
10000
15000
20000
25000
30000
35000
40000
1 2 3 4 5 6 7 8 9 10
# of Chip per Ring/Bus
To
tal E
lasp
ed T
ime
(us)
Santa Clara, CA USAAugust 2008 25
Register
Cache
DRAM (5.3 ~ 6.4GB/s per Channel)
HDD (20 ~ 70MB/s)
Memory BWGap
Historical Storage Hierarchy
Register
Cache
DRAM (5.3 ~ 6.4GB/s per Channel)
HDD (20 ~ 70MB/s)
HLNAND(500MB/s ~ 1GB/s)
New Storage Hierarchy
New Hierarchy for New User Experience
Santa Clara, CA USAAugust 2008 26
Conclusions
High Speed Interface
Low Power Consumption
High Scalability
Interface Extensibility
Reduced Overall Cost with Increased Performance
Advance Core Features
HLNAND’s features contribute to the acceleration of SSD satisfaction and adoption