Architectural Techniques for Improving NAND Flash Memory Reliability Thesis Proposal Yixin Luo B.S., Computer Engineering, University of Michigan B.S., Electrical Engineering, Shanghai Jiao Tong University Thesis Prospectus Committee Prof. Onur Mutlu (Chair) Prof. Phillip B. Gibbons Prof. James C. Hoe Dr. Yu Cai Dr. Erich F. Haratsch July 12, 2016 Carnegie Mellon University Pittsburgh, PA
29
Embed
Architectural Techniques for Improving NAND Flash Memory ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Architectural Techniques for
Improving NAND Flash Memory Reliability
Thesis Proposal
Yixin Luo
B.S., Computer Engineering, University of Michigan
B.S., Electrical Engineering, Shanghai Jiao Tong University
We divide the endurance capacity by the cold write frequency (writes per day) to determine
the number of days remaining before the cold pool is worn out. We use hill climbing to find
the partition boundary at which the cold pool size maximizes the flash lifetime. The cold write
frequency is dependent on cold pool size, because as the cold pool size increases, the hot pool size
correspondingly shrinks, shifting writes of higher frequency into the cold pool.
Finally, once the partition boundary converges to obtain the maximum lifetime, we must adjust
what portion of the cold pool belongs in the cooldown window. We size this window to minimize
1Due to wear-leveling, the remaining endurance (i.e., the number of P/E operations that can still be performedon the block) is the same across all of the blocks.
16
the ping-ponging of requests between the hot and cold pools. For this, we want to maximize the
number of hot virtual queue hits ( 6○ in Figure 8), while minimizing the number of requests evicted
from the hot window ( 5○ in Figure 8). We maintain a counter of each of these events, and then use
hill climbing on the cooldown window size to maximize the utility function Utility = ( 6○ – 5○).
In our work, we limit the hot pool size to the number of over-provisioned blocks within the flash
device (i.e., the extra blocks beyond the visible capacity of the device). While the hot pages are
expected to represent only a small portion of the total flash capacity, there may be rare cases where
the size limit prevents the hot pool from holding all of the hot data (i.e., the hot pool is significantly
undersized). In such a case, some less-hot pages are forced to reside in the cold pool, and lose the
benefits of WARM (i.e., endurance improvements from relaxed retention times). WARM will not,
however, incur any further write overhead from keeping the less-hot pages in the cold pool. For
example, the dynamic sizing of the cooldown window prevents the less-hot pages from going back
and forth between the hot and cold pools.
4.2. Flash Management Policies
WARM partitions all of the blocks in a flash device into two pools, storing write-hot data in
the blocks belonging to the hot pool, and storing write-cold data in the blocks belonging to the cold
pool. Because of the different degrees of write-hotness of the data in each pool, WARM also applies
different management policies (i.e., refresh, garbage collection, and wear-leveling) to each pool, to
best extend their lifetime. We next describe these management policies for each pool, both when
WARM is applied alone and when WARM is applied along with refresh.
4.2.1. WARM-Only Management
WARM relaxes the internal retention time of only the blocks in the hot pool, without requiring
a refresh mechanism for the hot pool. Within the cold pool, WARM applies conventional garbage
collection (i.e., finding the block with the fewest valid pages to minimize unnecessary data move-
ment) and wear-leveling policies. Since the flash blocks in the cold pool contain data with much
lower write frequencies, they (1) consume a smaller number of P/E cycles, and (2) experience much
lower fragmentation (which only occurs when a page is updated), thus reducing garbage collec-
tion activities. As such, the lifetime of blocks in the cold pool increases even when conventional
management policies are applied.
Within the hot pool, WARM applies simple, in-order garbage collection (i.e., finding the oldest
block) and no wear-leveling policies. WARM performs writes to hot pool blocks in block order
(i.e., it starts on the block with the lowest ID number, and then advances to the block with the
next lowest ID number) to maintain a sequential ordering by write time. Writing pages in block
order enables garbage collection in the hot pool to also be performed in block order. Due to the
higher write frequency in the hot pool, all data in the hot pool is valid for a shorter amount of
time. Most of the pages in the oldest block are already invalid when the block is garbage collected,
increasing garbage collection efficiency. Since both writing and garbage collection are performed
17
in block order, each of the blocks will be naturally wear-leveled, as they will all incur the same
number of P/E cycles. Thus, we do not need to apply any additional wear-leveling policy.
4.2.2. Combining WARM with Refresh
WARM can also be used in conjunction with a refresh mechanism to reap additional endurance
benefits. WARM, on its own, can significantly extend the lifetime of a flash device by enabling re-
tention time relaxation on only the write-hot pages. However, these benefits are limited, as the cold
pool blocks will eventually exhaust their endurance at the original internal retention time. While
WARM cannot enable retention time relaxation on the cold pool blocks due to infrequent writes to
such blocks, a refresh mechanism can enable the relaxation, greatly extending the endurance of the
cold pool blocks. WARM still provides benefits over a refresh mechanism for the hot pool blocks,
since it avoids unnecessary write operations that refresh operations would incur.
When WARM and refresh are combined, we split the lifetime of the flash device into two phases.
The flash device starts in the pre-refresh phase, during which the same management policies as
WARM-only are applied. Note that during this phase, internal retention time is only relaxed for
the hot pool blocks. Once the endurance at the original retention time is exhausted, we enter
the refresh phase, during which the same management policies as WARM-only are applied and a
refresh policy (such as FCR [8]) is applied to the cold pool to avoid data loss. During this phase,
the retention time is relaxed for all blocks. Note that during both phases, the internal retention
time for hot pool blocks is always relaxed without the need for a refresh policy.
During the refresh phase, WARM also performs global wear-leveling to prevent the hot pool
from being prematurely worn out. The global wear-leveling policy rotates the entire hot pool to a
new set of physical flash blocks (which were previously part of the cold pool) every 1K hot block
P/E cycles. Over time, this rotation will use all of the flash blocks in the device for the hot pool
for one 1K P/E cycle interval. Thus, WARM wears out all of the flash blocks equally despite the
heterogeneity in write-frequency between the two pools.
4.3. Summary of Results
We evaluate lifetime improvement using an I/O trace based simulator, which simulates a NAND
flash-based SSD with different flash management policies. Figure 10 plots flash lifetime provided by
WARM alone (WARM), adaptive rate flash correct and refresh mechanism (ARFCR) [8], and WARM
combined with refresh (WARM+ARFCR), normalized to a conventional management policy without
WARM or refresh (Baseline). Using these results, we show that, when applied alone, WARM
improves overall flash lifetime by an average of 3.24× over Baseline. When WARM is applied
together with an adaptive refresh mechanism, the average lifetime improves by 12.9×, 1.21× over
adaptive refresh alone. We also analyze the hardware and performance overhead of WARM. WARM
requires four hardware counters and 1056B memory overhead. In the worse case, WARM has a
performance penalty of 5.8% over Baseline due to flash management overhead. On average across
all workloads, this overhead is negligible (<2%). In conclusion, WARM can improve flash lifetime
18
significantly while requiring minimal hardware and performance overhead.
0
1
2
4
8
16
Baseline WARM ARFCR WARM+ARFCRNo
rmal
ized
Lif
etim
e Im
pro
vem
ent
Figure 10. Normalized lifetime improvement when WARM is applied on top of Baseline andARFCR.
5. Proposed Work 1: Online Characterization and Modeling of
NAND Flash Memory Errors
Motivation: NAND flash memory errors are common in raw flash chips and they significantly
impact flash reliability. To guarantee the reliable operation of NAND flash memory, strong ECC
codes are applied to mask these errors from the user, leading to significant hardware and capacity
overhead [2, 8, 18, 54]. Understanding these errors through offline characterization and modeling
can enable more cost-effective ways to tolerate them than uniformly applying stronger ECC codes
regardless of the error properties. We expect to examine modern NAND flash chips such as 3D,
multi-level cell (MLC), or triple-level cell (TLC) flash chips to understand the nature of these
errors, depending on the availability of the NAND flash chips we can test. Based on these results,
we expect to construct an accurate threshold voltage distribution model online, which will enable
other mechanisms to exploit the knowledge for improving flash reliability. In this work, we hope to
arrive at a new online mechanism to characterize and model the threshold voltage distribution of
flash cells during system operation at low cost and low latency. We divide this proposed work into
four major directions.
First, we expect to perform a thorough characterization of the threshold voltage distribution.
Such characterization is enabled by the existing read-retry capability of raw NAND flash chips,
which allows us to sweep the read reference voltage and accurately obtain the threshold voltage
for each flash cell. Using this methodology, we expect to study the properties of the threshold
voltage distribution, especially those that affect NAND flash error rates such as the tail distribution
(i.e., the part of the distribution that is far from its mean). We expect to study the effects of
wear out on the distribution by programming different flash blocks to different P/E cycles before
the characterization. We expect to analyze the effects of temperature, stored data pattern, and
19
retention on the threshold voltage distribution to achieve higher accuracy.
Second, we expect to model the threshold voltage distribution from real characterization data.
We start by statically fitting various distribution models to the characterized data under each P/E
cycle count. We expect to evaluate and compare the accuracy of these different static models by
comparing their estimated NAND flash error rates as well as their modeling error rates. Once we
have determined the best static model, we then expect to model the dynamic shift of threshold
voltage distribution over P/E cycles. We expect to evaluate the accuracy of the dynamic model
by showing how well it can predict future threshold voltage distribution and flash errors using
only data obtained in lower P/E cycles. To make it more practical to construct the models,
we also expect to develop techniques that minimize the computation and hardware overhead of
the construction process of these models, by selecting an easy-to-compute model and designing
hardware to accelerate the models.
Third, we expect to understand different factors affecting the accuracy of such a model. In
particular, we would like to develop an understanding of how temperature, stored data pattern,
and retention age affect the distribution and model accuracy.
Fourth, in order to examine the evolution of flash errors and the accuracy of models for them
in newer generation NAND flash chips, we expect to study error patterns in 3D NAND flash
devices. These include error patterns for P/E cycling errors, retention errors, read disturb errors,
and program interference errors. We also expect to perform similar characterization and modeling
(as described above) on 3D NAND flash chips. We expect to understand different factors affecting
the accuracy of our model. In particular, we would like to develop an understanding of how process
variation across layers, P/E cycling, retention, and read disturb affect the distribution and model
accuracy. We also expect to study how three-dimensional program interference impacts NAND
flash reliability.
Towards these four directions, we expect to answer at least the following research questions:
• How does flash wear out affect threshold voltage distribution and flash error rates? How do
other effects (retention and read disturb) affect threshold voltage distribution and flash error
rates?
• How can we accurately model the threshold voltage distribution under any static amount of
wear out? What is the property of the tail distribution (i.e., the part of the distribution far
away from the mean), and which models can be used to represent the shape of the tail?
• How to model the dynamic effect of threshold voltage distribution shifts under wear out?
• Can we combine the dynamic and static models of the threshold voltage distribution to increase
the prediction accuracy of NAND flash error rate?
• How to minimize the computation and hardware overhead for characterizing and modeling the
threshold voltage distribution in flash controllers?
• How do temperature, stored data pattern, retention age affect the threshold voltage distribution
and accuracy of the models we develop?
20
• What are the flash error characteristics in 3D NAND devices for P/E cycling errors, retention
errors, read disturb errors, and program interference errors?
• How do process variation, P/E cycling, retention, read disturb, and program interference in 3D
NAND affect threshold voltage distribution and our model?
• How can we characterize three dimensional program interference in 3D NAND? Does three
dimensional program interference affect data reliability in neighboring flash block?
6. Proposed Work 2: Model-Driven Flash Management Policies
Motivation: Today’s flash controllers manage multiple flash chips based on a set of fixed, con-
servatively estimated flash parameters provided by the flash vendor. These parameters, such as
read reference voltages, ECC strength, flash memory health, etc., are not specifically tuned for the
NAND flash chips connected to the flash controller and therefore cannot adjust to the amount of
wear on each flash block to improve flash reliability and performance. In this work, however, we
expect to take advantage of the threshold voltage distribution model constructed in Section 5. Our
goal is to show that our online model can be exploited in various ways to improve flash reliability.
Our approach can be divided into three steps.
First, we aim to find out which flash parameters to estimate. We expect to explore those
parameters that can be estimated using our proposed model. Among these parameters, we expect
to select those that can be helpful for improving flash reliability. We expect to estimate the optimal
read reference voltage of each read using the predicted threshold voltage distribution. We expect to
estimate the raw bit error rate under different assumptions of read reference voltages. We expect
to estimate the expected remaining lifetime of each flash block without counting P/E cycles. We
expect to estimate the log-likelihood ratio of the distribution, which can be used for improving
ECC coding efficiency. We also expect to quantitatively evaluate and compare the accuracy of
these estimations using different models which we used in Section 5.
Second, we aim to develop techniques to utilize these estimations to improve flash reliability. We
expect to quantitatively evaluate and show how these techniques improve flash reliability in different
ways. With the optimal read reference voltage estimation, we can adapt the read reference voltage
to minimize raw bit error rate before applying read-retry technique. With the raw bit error rate
estimation, we can provide the right amount of ECC protection with the lowest overhead. With the
expected lifetime estimation, we can fully utilize the lifetime of each flash block without suffering
from loss of flash memory capacity. With the more accurate log-likelihood ratio estimation, we can
improve the efficiency of existing ECC codes.
Third, we aim to develop techniques for 3D NAND flash chips based on the characterization
and modeling we perform in Section 5. In particular, we expect to develop techniques to mitigate
any potential new reliability issues in 3D NAND such as three dimensional program interference.
We also expect to develop techniques to tolerate process variation across layers for 3D NAND chips.
To this end, we expect to answer the following research questions:
21
• How can we use our proposed models to predict flash parameters such as optimal read reference
voltage, raw bit error rate, remaining flash lifetime, and optimal ECC parameters? How does
the accuracy of the model affect these estimations?
• How often and at what granularity shall we predict the optimal read reference voltage to mini-
mize flash read error rate and read latency with low overhead?
• How can we efficiently adapt to the right amount of ECC protection to the predicted raw
bit error rate? How can we provide appealing flash reliability or lifetime benefits using such
techniques?
• How can we adjust flash management policies to adapt to the expected remaining flash lifetime,
instead of the P/E cycle counts? How often and for which flash block shall we estimate the
remaining flash lifetime to maximize flash lifetime and minimize performance overhead?
• How does the accuracy of log-likelihood ratio estimation affect error correcting capability?
• How can we mitigate three dimensional program interference in 3D NAND?
7. Proposed Work 3: Characterization and Utilization of NAND
Flash Memory Self-Healing Effect
Motivation: The self-healing effect is a phenomenon that NAND flash memory cells gradually
recover a fraction of its wear over time [37, 56], which can be accelerated by high temperature. As
we discussed in Section 3.6, no prior work attempts to verify their model on modern NAND flash
chips and demonstrate successful self-healing operation in real flash chips. In this work, we strive
to characterize and understand the self-healing effect and design techniques that utilize this effect
to improve flash reliability. Our approach has two steps.
First, we expect to comprehensively characterize different aspects of the self-healing effect using
real NAND flash chips. We expect to investigate the effectiveness of the self-healing effect (i.e., if
it can be used to improve flash reliability) by comparing raw bit error rates before and after heat-
accelerated self-healing under different P/E cycles. We expect to study whether the self-healing
effect persists after P/E cycling (i.e., if it can be used to improve overall lifetime of the flash
memory) by comparing total P/E cycle endurance with and without self-healing. We also expect
to study whether the self-healing effect is repeatable (i.e., if it can be used multiple times to further
improve flash lifetime) by comparing the endurance improvement of the first self-healing, second
self-healing, etc. Dwell-time, the time duration between two consecutive P/E cycles for which the
flash memory cell can recover, directly affects the effectiveness of the self-healing operation. Dwell-
time, similar to retention time, can be accelerated by high temperature according to Arrhenius
Law [6], allowing faster recovery of flash memory cells. We expect to study the relation between
dwell-time and the effectiveness of self-healing operation.
Second, we expect to design techniques to utilize the self-healing effect to improve flash reliability
and lifetime. We expect to investigate the feasibility of heating NAND flash memory at different
granularities using an internal or external heat source. We expect to design mechanisms to trigger
22
self-healing operations that maximize flash lifetime and minimize performance overhead. As heat-
accelerated self-healing operation also accelerates the retention effect, the data currently stored on
the flash memory can be damaged. We expect to design mechanisms that avoid data corruption
due to retention loss by moving the data, while minimizing the performance overhead. We can
predict the idle period of the workload and the effectiveness of the self-healing operation such that
we can schedule the self-healing operation when it is most effective and has the least interference.
We expect to evaluate the flash lifetime improvement and the performance penalty of our proposed
techniques based on our characterization results.
To this end, we expect to answer the following research questions:
• Can heat-accelerated self-healing operation effectively reduce raw bit error rate in real flash
chips?
• How well can the benefit of a self-healing operation persist over P/E cycles? How much flash
lifetime can a self-healing operation improve when performed under different P/E cycles?
• Can we repeat the self-healing operation to further extend flash lifetime?
• How does the self-healing effect correlate with dwell time? How do we design experiments to
characterize this effect?
• How can we utilize the self-healing effect to improve flash reliability and lifetime? How can we
design online/offline mechanisms to trigger the self-healing effect?
8. Timeline
Depending on the success of the different ideas presented in this proposal and the availability
of time, we will aim to explore as many ideas as possible. My goal is to graduate in the Summer of
2017. Table 1 lists my tentative timeline for pursuing the ideas proposed in this document. Note
the success of some of our ideas heavily depend on the data from the experimental results.
Duration Description
Apr-Jun 2016 Work on model-driven flash management policies (Potential milestone: sub-mission to JSAC).
Jul-Sep 2016 Work on characterization and modeling of 3D NAND flash memory errors(Potential milestone: submission to SIGMETRICS).
Oct 2016-Mar 2017 Work on characterization and utilization of self-healing effect (Potential mile-stone: submission to MICRO).
Apr-Jul 2017 Defend and submit thesis.
Table 1. Timeline for this proposal.
23
9. Conclusion
In this proposal, our goal is to improve NAND flash memory reliability with a multitude of
low-cost architectural techniques. To this end, we first describe a mechanism that we have already
worked on: WARM, a technique that manages flash retention differently for write-hot data and
write-cold data, and improves flash lifetime at low cost and low performance overhead. For our
future work, we propose to explore three new directions. The first direction proposes to develop an
online technique to characterize and model flash errors. The second direction proposes to develop
flash management policies that improves flash lifetime by exploiting our online model. The third
direction proposes to understand and develop new techniques that utilize flash self-healing effect.
We hope that this research will demonstrate that NAND flash memory reliability can be improved
at low cost and with low performance overhead by deploying various architectural techniques that
are aware of higher-level application behavior and underlying flash device characteristics.
[2] R. C. Bose and D. K. Ray-Chaudhuri. On A Class of Error Correcting Binary Group Codes. Informationand control, 1960.
[3] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. Error Patterns in MLC NAND Flash Memory: Mea-surement, Characterization, and Analysis. In DATE, 2012.
[4] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. Threshold Voltage Distribution in NAND Flash Memory:Characterization, Analysis, and Modeling. In DATE, 2013.
[5] Y. Cai, Y. Luo, S. Ghose, and O. Mutlu. Read Disturb Errors in MLC NAND Flash Memory: Char-acterization, Mitigation, and Recovery. In DSN, 2015.
[6] Y. Cai, Y. Luo, E. F. Haratsch, K. Mai, and O. Mutlu. Data Retention in MLC NAND Flash Memory:Characterization, Optimization, and Recovery. In HPCA, 2015.
[7] Y. Cai, O. Mutlu, E. F. Haratsch, and K. Mai. Program Interference in MLC NAND Flash Memory:Characterization, Modeling, and Mitigation. In ICCD, 2013.
[8] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai. Flash Correct andRefresh: Retention Aware Management for Increased Lifetime. In ICCD, 2012.
[9] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai. Error Analysis andRetention-Aware Error Management for NAND Flash Memory. Intel Technology Journal (ITJ), 2013.
[10] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, O. Unsal, A. Cristal, and K. Mai. Neighbor Cell AssistedError Correction in MLC NAND Flash Memories. In SIGMETRICS, 2014.
[11] Y.-M. Chang, Y.-H. Chang, J.-J. Chen, T.-W. Kuo, H.-P. Li, and H.-T. Lue. On Trading Wear-LevelingWith Heal-Leveling. In DAC, 2014.
[12] F. Chen, T. Luo, and X. Zhang. CAFTL: A Content-Aware Flash Translation Layer Enhancing theLifespan of Flash Memory based Solid State Drives. In FAST, 2011.
[13] J. Cooke. The Inconvenient Truths of NAND Flash Memory. Flash Memory Summit, 2007.
24
[14] N. Dayan, P. Bonnet, and S. Idreos. GeckoFTL: Scalable Flash Translation Techniques For Very LargeFlash Devices. In SIGMOD, 2016.
[15] G. Dong, N. Xie, and T. Zhang. Enabling NAND Flash Memory Use Soft-Decision Error CorrectionCodes at Minimal Read Latency Overhead. IEEE Trans. on Circuits and Systems, 2013.
[17] T. Frankie, G. Hughes, and K. Kreutz-Delgado. SSD TRIM Commands Considerably Improve Over-provisioning. Flash Memory Summit, 2011.
[18] R. G. Gallager. Low-Density Parity-Check Codes. Information Theory, IRE Transactions on, 1962.
[19] A. Gupta, Y. Kim, and B. Urgaonkar. DFTL: A Flash Translation Layer Employing Demand-basedSelective Caching of Page-level Address Mappings. In ASPLOS, 2009.
[20] J.-U. Kang, J. Hyun, H. Maeng, and S. Cho. The Multi-Streamed Solid-State Drive. In HotStorage,2014.
[21] S. Lee, T. Kim, K. Kim, and J. Kim. Lifetime Management of Flash-Based SSDs Using Recovery-AwareDynamic Throttling. In FAST, 2012.
[22] J. Li, K. Zhao, J. Ma, and T. Zhang. Realizing Unequal Error Correction for NAND Flash Memoryat Minimal Read Latency Overhead. Circuits and Systems II: Express Briefs, IEEE Transactions on,2014.
[23] J. Li, K. Zhao, X. Zhang, J. Ma, M. Zhao, and T. Zhang. How Much Can Data Compressibility Helpto Improve NAND Flash Memory Lifetime? In FAST, 2015.
[24] W. Li, G. Jean-Baptise, J. Riveros, G. Narasimhan, and M. Zhao. CacheDedup: In-line Deduplicationfor Flash Caching. In FAST, 2016.
[25] C.-Y. Liu, Y.-M. Chang, and Y.-H. Chang. Read Leveling for Flash Storage Systems. In SYSTOR,2015.
[26] Y. Lu, J. Shu, and W. Zheng. Extending the Lifetime of Flash-Based Storage Through Reducing WriteAmplification from File Systems. In FAST, 2013.
[27] Y. Luo, Y. Cai, S. Ghose, J. Choi, and O. Mutlu. WARM: Improving NAND Flash Memory Lifetimewith Write-Hotness Aware Retention Management. In MSST, 2015.
[28] Y. Luo, S. Ghose, T. Li, S. Govindan, B. Sharma, B. Kelly, A. Boroumand, and O. Mutlu. CREAM(Capacity- and Reliability-Adaptive Memory): Enabling the Use of ECC DRAM to Increase MemoryCapacity. In under submission to MICRO, 2016.
[29] Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, andO. Mutlu. Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost viaHeterogeneous-Reliability Memory. In DSN, 2014.
[30] D. Ma, J. Feng, and G. Li. LazyFTL: A Page-Level Flash Translation Layer Optimized for NANDFlash Memory. In SIGMOD, 2011.
[31] D. Ma, J. Feng, and G. Li. A survey of address translation technologies for flash memories. CSUR,2014.
[32] D. J. MacKay and R. M. Neal. Near Shannon Limit Performance of Low Density Parity Check Codes.Electronics letters, 1996.
25
[33] S.-I. B. Members. Serial ATA International Organization: Serial ATA Revision 3.0, 2009.
[34] J. Meza, Q. Wu, S. Kumar, and O. Mutlu. A Large-Scale Study of Flash Memory Failures in The Field.In SIGMETRICS, 2015.
[35] N. Mielke, T. Marquart, N.Wu, J.Kessenich, H. Belgal, E. Schares, and F. Triverdi. Bit Error Rate inNAND Flash Memories. In IRPS, 2008.
[36] V. Mohan. Modeling The Physical Characteristics of NAND Flash Memory. PhD thesis, University ofVirginia, 2010.
[37] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. R. Stan. How I Learned to Stop Worrying and LoveFlash Endurance. 2010.
[38] R. Motwani. Estimation of Flash Memory Level Distributions Using Interpolation Techniques for Op-timizing the Read Reference. In GLOBECOM, 2015.
[39] R. Motwani and C. Ong. Design of LDPC Coding Schemes for Exploitation of Bit Error Rate DiversityAcross Dies in NAND Flash Memory. In ICNC, 2013.
[40] R. Motwani and C. Ong. Soft Decision Decoding of RAID Stripe for Higher Endurance of Flash MemoryBased Solid State Drives. In ICNC, 2015.
[41] I. Narayanan, D. Wang, M. Jeon, B. Sharma, L. Caulfield, A. Sivasubramaniam, B. Cutler, J. Liu,B. Khessib, and K. Vaid. SSD Failures in Datacenters: What, When and Why? In SIGMETRICS,2016.
[42] Y. Pan, G. Dong, and T. Zhang. Exploiting Memory Device Wear-Out Dynamics to Improve NANDFlash Memory System Performance. In FAST, 2011.
[43] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler,and A. Walls. Using Adaptive Read Voltage Thresholds to Enhance The Reliability of MLC NANDFlash Memory Systems. In GLSVLSI, 2014.
[44] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler,and A. Walls. Enhancing the Reliability of MLC NAND Flash Memory Systems by Read ChannelOptimization. TODAES, 2015.
[45] D. Park, B. Debnath, and D. Du. CFTL: A Convertible Flash Translation Layer Adaptive to DataAccess Patterns. In SIGMETRICS, 2010.
[46] J. Park, J. Jeong, S. Lee, Y. Song, and J. Kim. Improving Performance and Lifetime of NAND StorageSystems Using Relaxed Program Sequence. In DAC, 2016.
[47] K.-T. Park, M. Kang, D. Kim, S.-W. Hwang, B. Y. Choi, Y.-T. Lee, C. Kim, and K. Kim. A ZeroingCell-To-Cell Interference Page Architecture With Temporary LSB Storing and Parallel MSB ProgramScheme for MLC NAND Flash Memories. JSSC, 2008.
[48] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis. Modelling of the Threshold Voltage Dis-tributions of Sub-20nm NAND Flash Memory. In GLOBECOM, 2014.
[49] A. Prodromakis, S. Korkotsides, and T. Antonakopoulos. MLC NAND Flash Memory: Aging Effectand Chip/Channel Emulation. Microprocessors and Microsystems, 2015.
[50] B. Schroeder, R. Lagisetty, and A. Merchant. Flash Reliability in Production: The Expected and TheUnexpected. In FAST, 2016.
[51] K.-D. Suh, B.-H. Suh, Y.-H. Lim, J.-K. Kim, Y.-J. Choi, Y.-N. Koh, S.-S. Lee, S.-C. Suk-Chon, B.-S.Choi, J.-S. Yum, et al. A 3.3 V 32 Mb NAND Flash Memory With Incremental Step Pulse ProgrammingScheme. Solid-State Circuits, IEEE Journal of, 1995.
26
[52] H. Tabrizi, B. Peleato, R. Agarwal, and J. Ferreira. Improving NAND Flash Read Performance ThroughLearning. In ICC, 2015.
[53] S. Technology. Serial Attached SCSI (SAS) Interface Manual, 2009.http://www.seagate.com/staticfiles/support/disc/manuals/Interface%20manuals/100293071c.pdf.
[54] J. Wang, K. Vakilinia, T.-Y. Chen, T. Courtade, G. Dong, T. Zhang, H. Shankar, and R. Wesel.Enhanced Precision Through Multiple Reads for LDPC Decoding in Flash Memories. Selected Areas inCommunications, IEEE Journal on, 2014.
[55] G. Wu and X. He. Delta-FTL: Improving SSD Lifetime via Exploiting Content Locality. In EuroSys,2012.
[56] Q. Wu, G. Dong, and T. Zhang. Exploiting Heat-Accelerated Flash Memory Wear-Out Recovery toEnable Self-Healing SSDs. In HotStorage, 2011.