What you should know about Flash Storage
What you should know about Flash Storage
2
The flash storage is often a topic on our support channels. Toradex invests a lot of resources into
making the storage as reliable as possible. Nevertheless, it is important to understand some basics of
the underlying storage device. One of the most important things you have to know is that if the
storage wears out, you can destroy your storage device by writing a lot to the built-in storage device.
With this post, we want to give you a basic overview of potential issues flash storage can have. Let’s start with a short
technology overview first.
Flash types: Raw Flash vs Managed Flash
Currently, Toradex computer modules used NOR, NAND, and eMMC flash.
NOR and NAND are raw storage devices. The main difference between NAND and NOR is that NOR allows
random access, doesn’t need error correction as well as has higher cost-per-bit. NAND on the other side
can only be read in pages, some bits in a page may be wrong and need to be corrected by an error
correction mechanism.
3
eMMC Flash combines NAND memory with a built-in controller that handles most of the nasty things you have to take
care of when dealing with NAND flash. eMMC is also called managed NAND. With NAND and NOR flash on the other
side, the OS and device drivers are responsible to handle these issues. We will discuss the different kinds of challenges
later in this blog post.
Here is a small overview on the flash type used on our computer modules;
4
Evolution of NAND Flash: From SLC to MLC
The bit density on NAND flash has evolved over time. First NAND devices were Single Level Cell (SLC) flash.
This means every flash cell stores one single bit. With Multi Level Cell (MLC), flash can store two or more
bits per cell, so the bit density gets increased. Sounds great but with MLC there are downsides as well: with
MLC NAND, comes also a higher bit error rate and lower endurance. All eMMC use MLC NAND. Some of the
eMMC devices allow you to switch into a pseudo-SLC (PSLC) mode on parts of (or) all the storage. This will
reduce the size of the storage whereas the endurance of the device gets increased.
5
Here is a rough comparison of SLC and MLC.
Endurance: Limited amount of erase cycles
As already mentioned, one of the most important things you have to know about any
flash technology used on our devices is that you can write and erase flash only a limited
number of times.
6
Writing huge amounts of data to the flash device is not a good idea! As shown in the table above, depending on the
type of flash you have between 100K and 10K erase cycles available before the data potentially gets corrupted or
lost. The term “erase cycles” is irritating. One limitation of flash storage is, that it cannot be rewritten without
being erased before. Further on, this cannot be done at the bit level but only at bigger chunks called block. In a
worst case, this means that if you only want to write one single byte, you potentially have to erase and write one
whole block. The block size can be up to 512 KB. The effect of erasing / writing more than you actually want is
called write amplification. May be, there are even additional write operations needed by the flash file system. If
you want to estimate the lifetime of the flash storage on your embedded device, you should take that into
consideration.
Increase lifetime of flash
The following section shows how the lifetime of NAND or eMMC flash can be improved. Don’t worry,
all these things are already handled by Toradex, there is no need for any action on your side.
7
Prevent wearing: Wear leveling
Let’s assume you are aware of the fact, that flash can be erased / written only a limited number of times and you only
update small amounts of data periodically. If this data would be written always to the same flash cell you could only
write max.15K times on MLC flash. While you have never touched all the other flash cells, your data could get lost and
the flash is broken as the cells you have been writing to are worn out. Smart flash drivers use wear leveling. This
technique ensures that all flash cells are worn similarly and not always the same cells are used.
Detect and correct errors: Error correction Codes
On a NAND flash device, it can happen that single bits start flipping and your data could get
corrupted. This can either be due to wearing or any other disturbance. Therefore, the data is secured
by Error Correction Codes (ECC). This allows first to detect corrupted data and second to correct the
data. Depending on the Flash Controller and the NAND / eMMC flash itself, more or less errors can be
detected and corrected.
8
Bad block handling
As ECCs enables us to find erroneous blocks, we can stop using these bad blocks any longer. Depending on the ECC and
the amount of bits that can corrected, a threshold is set that defines the maximal number of errors that are accepted
before further action is taken. Once we reach this threshold, the data gets corrected and is moved to a good block on
the device. The previous location is marked as bad. Bad blocks are not used any longer as they are potentially broken.
Power fail tolerance
What happens to your device in case of a sudden power loss while writing to the flash? On embedded
devices, you expect that the device still boots properly and your data did not get corrupted. To reach
that, all software layers and hardware parts involved have to be capable of handling such a situation.
You find some more details in the next section on how we reach that goal.
9
Implementation Details on Toradex SoMs
As seen above, having a proper setup depending on the underlying storage type is crucial. Let’s go into the details of
the current setup you on the Toradex BSPs.
NAND-based devices
The following figure gives you a generic overview on the setup of our WinCE and Linux BSPs on NAND based devices.
10
Storage device: On all our devices using NAND, we use SLC NAND.
Hardware Driver: The hardware driver offers a generic interface between the NAND device and the upper layers. This
layer is also responsible to detect and correct errors. On Linux, all our current images use MTD. On WinCE, we use the
Microsoft Flash PDD layer. There are some exceptions such as Colibri T20, where we use a device specific PDD layer on
WinCE.
Flash Translation Layer: This layer is responsible for wear leveling and bad block management. On
Linux, this is done by the UBI subsystem; while on WinCE, it is done by the Microsoft MDD layer.
Again, on the Colibri T20, we use a device specific layer and not the Microsoft Flash MDD.
11
Filesystem: The file system is actually the part that manages the partitions and the files stored in them. A user will
use the file API to use the file system (on Linux trough the VFS layer). On Linux, we use currently UBI FS; while on
WinCE, Transaction Save exFAT (TexFAT). Both are power-cut tolerant. The underlying layers are power-cut tolerant
as well by supporting atomic operations.
eMMC-based devices
The following table shows the setup using the Toradex System on Modules using eMMC flash devices.
12
Storage device: Compared to the raw NAND, most magic is done by the eMMC itself. Higher layers do not have to take
care of wear leveling, error correction or bad block management.
Hardware Driver: This is the interface between the MMC controller and the file system.
Filesystem: As for the NAND based devices on WinCE, here also we use TexFAT; our Linux Images use the ext3
filesystem. Again, both are power-cut tolerant.
13
Conclusion and Recommendations
Toradex does its best to provide reliable and enduring flash storage. Nevertheless, you should always keep an eye on
flash usage during application development.
• Reduce write access to the flash device
• Know the write behavior of your final product
• Check if with the write behavior, the requested lifetime of your product is feasible or not
• Run stress tests and longtime tests
• Not using the full capacity greatly improves the efficiency of wear leveling algorithms
If you need any further information or you think we could improve our default setup, please get in
contact with our engineers.
Thank you