NANDFlashSim : Intrinsic Latency Variation NANDFlashSim : Intrinsic Latency Variation Aware NAND Flash Memory System Modeling and Simulation at Microarchitecture Level and Simulation at Microarchitecture Level Myoungsoo Jung (MJ), Ellis H. Wilson III, David Donofrio, John Shalf, Mahmut T. Kandemir
29
Embed
NANDFlashSim :IntrinsicLatencyVariation: Intrinsic …storageconference.us/2012/Presentations/R25.Flash.2.NANDFlashSim.… · • Latencies of the NAND flash memory...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Legacy OperationLegacy OperationA I/O ti lit i t l ti t• An I/O operation splits into several operation stages
• Each stage should be appropriately handled by device drivers
writeSttatu
cmdcheck
data
s
cmd
dataaddr
Cache OperationCache Operation
C h d i i l• Cache mode operations use internal registers in an attempt to hide performance
h d f d t toverhead from data movements
data2data1
Internal Data Move ModeInternal Data Move Mode
S i d l t d t• Saving space and cycles to copy data• Source and destination page address should
be located in the same diebe located in the same die
data
N d t t
cmd
No data movement through NAND interface
cmdsrc addrdst addr
Multi plane Mode OperationMulti-plane Mode OperationT diff t b d i ll l• Two different pages can be served in parallel
• Addresses should indicate same page offset in a block, same die address and should have different ,plane addresses (plane addressing rule)
cmdcheck
cmddata1data2
cmdaddr1addr2
Interleaved Die Mode OperationInterleaved-Die Mode Operation
idi t ki d t f i t l• providing a way, taking advantage of internal parallelism by interleaving NAND transactions
• Scheduling NAND transactions and bus arbitrations gare critical dominant of memory system performance
cmd cmd
cmdaddr1data1
cmdaddr2data2
ChallengesChallenges
P f i d b d• Performances are varied based on:– intrinsic latency variation characteristic– internal parallelism– advanced flash operations typesp yp
• Performances are affected byhow to deal with diverse advance flash– how to deal with diverse advance flash operationshow to effectively schedule NAND– how to effectively schedule NAND transactions
Prior Simulation WorksPrior Simulation Works
Fl h b d S lid S Di k Si l i• Flash-based Solid State Disks Simulation– Tightly coupled to specific flash firmware
• Unaware of latency variation of NAND flash– Latency approximation model with constantsy pp
• Course-grain NAND command handling– In-order executionIn order execution
e n
SSD SimulatorI/O Subsystem
h Fi
rmw
are
Late
ncy
roxi
mat
ion
Flas
h LAp
p
NANDFlashSimNANDFlashSim
Si l ti d M d li NAND fl h• Simulating and Modeling NAND flash rather than flash firmware or SSDs
NANDFlashSim can be applied to diverse– NANDFlashSim can be applied to diverse application like off-chip caches of a multi-core system and I/O subsystems of mobile systems
– Multiple instances can be used for building SATA, PCI-e based SSDs
e n
SSD SimulatorI/O Subsystem
war
e
h Fi
rmw
are
Late
ncy
roxi
mat
ion
lash
Firm
w
Flas
h LAp
p Fl
NANDFlashSimNANDFlashSim
D t il d Ti i M d l• Detailed Timing Model • Awareness of intrinsic latency variation
d i d b f i i– designed to be performance variation-aware and employs different page offsets in a physical blockb oc
• Reconfigurable Microarchitecture– Supports highly reconfigurable architectures in pp g y g
terms of multiple dies and planes• Fine-grain NAND flash command handling
– 16 combinations of advance flash operation– Supporting out-of-order execution
High level ViewHigh-level View
C d t hit t d i di id l t t• Command set architecture and individual state machine associated with itH t d NAND fl h l k d i t• Host and NAND flash clock domain are separate.
• All entries (controller, register, die, …) are updated at every cyclesupdated at every cycles
Uni
t
BusLo
gica
l U
AND
Flas
h I/
O
k*j
Bloc
ks
NA
Command Set ArchitectureCommand Set Architecture
• Multi-stage Operation– Stage are defined by common operationsStage a e de ed by co o ope at o s– CLE, ALE, TIR, TIN, TOR, TON, etc…
C d Ch i Latency• Command Chains– Defines command sequences
Latencies of NANDFlashSim are almost completely overlapped with realLatencies of NANDFlashSim are almost completely overlapped with real product latencies (Hynix)
Read-intensive Workload (Webserch, User)
Performance of Multiple PlanesPerformance of Multiple Planes
• Performance of write are significantly enhanced as the number of plane increases– Cell activities (TIN) can be executed in parallel
• Data movement (TOR) is a dominant factorData movement (TOR) is a dominant factor in determining bandwidth