What you should know about Flash Storage
The flash storage is often a topic on our support channels. Toradex invests a lot of resources into making the storage as reliable as possible. Nevertheless, it is important to understand some basics of the underlying storage device. One of the most important things you have to know is that if the storage wears out, you can destroy your storage device by writing a lot to the built-in storage device. With this post, we want to give you a basic overview of potential issues flash storage can have. Let’s start with a short technology overview first. Flash types: Raw Flash vs Managed Flash Currently, Toradex computer modules used NOR, NAND, and eMMC flash. NOR and NAND are raw storage devices. The main difference between NAND and NOR is that NOR allows random access, doesn’t need error correction as well as has higher cost-per-bit. NAND on the other side can only be read in pages, some bits in a page may be wrong and need to be corrected by an error correction mechanism. 2
eMMC Flash combines NAND memory with a built-in controller that handles most of the nasty things you have to take care of when dealing with NAND flash. eMMC is also called managed NAND. With NAND and NOR flash on the other side, the OS and device drivers are responsible to handle these issues. We will discuss the different kinds of challenges later in this blog post. Here is a small overview on the flash type used on our computer modules;
3
Evolution of NAND Flash: From SLC to MLC The bit density on NAND flash has evolved over time. First NAND devices were Single Level Cell (SLC) flash. This means every flash cell stores one single bit. With Multi Level Cell (MLC), flash can store two or more bits per cell, so the bit density gets increased. Sounds great but with MLC there are downsides as well: with MLC NAND, comes also a higher bit error rate and lower endurance. All eMMC use MLC NAND. Some of the eMMC devices allow you to switch into a pseudo-SLC (PSLC) mode on parts of (or) all the storage. This will reduce the size of the storage whereas the endurance of the device gets increased.
4
Here is a rough comparison of SLC and MLC.
Endurance: Limited amount of erase cycles As already mentioned, one of the most important things you have to know about any flash technology used on our devices is that you can write and erase flash only a limited number of times.
5
Writing huge amounts of data to the flash device is not a good idea! As shown in the table above, depending on the type of flash you have between 100K and 10K erase cycles available before the data potentially gets corrupted or lost. The term “erase cycles” is irritating. One limitation of flash storage is, that it cannot be rewritten without being erased before. Further on, this cannot be done at the bit level but only at bigger chunks called block. In a worst case, this means that if you only want to write one single byte, you potentially have to erase and write one whole block. The block size can be up to 512 KB. The effect of erasing / writing more than you actually want is called write amplification. May be, there are even additional write operations needed by the flash file system. If you want to estimate the lifetime of the flash storage on your embedded device, you should take that into consideration.
Increase lifetime of flash The following section shows how the lifetime of NAND or eMMC flash can be improved. Don’t worry, all these things are already handled by Toradex, there is no need for any action on your side. 6
Prevent wearing: Wear leveling Let’s assume you are aware of the fact, that flash can be erased / written only a limited number of times and you only update small amounts of data periodically. If this data would be written always to the same flash cell you could only write max.15K times on MLC flash. While you have never touched all the other flash cells, your data could get lost and the flash is broken as the cells you have been writing to are worn out. Smart flash drivers use wear leveling. This technique ensures that all flash cells are worn similarly and not always the same cells are used. Detect and correct errors: Error correction Codes On a NAND flash device, it can happen that single bits start flipping and your data could get corrupted. This can either be due to wearing or any other disturbance. Therefore, the data is secured by Error Correction Codes (ECC). This allows first to detect corrupted data and second to correct the data. Depending on the Flash Controller and the NAND / eMMC flash itself, more or less errors can be detected and corrected. 7
Bad block handling As ECCs enables us to find erroneous blocks, we can stop using these bad blocks any longer. Depending on the ECC and the amount of bits that can corrected, a threshold is set that defines the maximal number of errors that are accepted before further action is taken. Once we reach this threshold, the data gets corrected and is moved to a good block on the device. The previous location is marked as bad. Bad blocks are not used any longer as they are potentially broken.
Power fail tolerance What happens to your device in case of a sudden power loss while writing to the flash? On embedded devices, you expect that the device still boots properly and your data did not get corrupted. To reach that, all software layers and hardware parts involved have to be capable of handling such a situation. You find some more details in the next section on how we reach that goal.
8
Implementation Details on Toradex SoMs As seen above, having a proper setup depending on the underlying storage type is crucial. Let’s go into the details of the current setup you on the Toradex BSPs. NAND-based devices The following figure gives you a generic overview on the setup of our WinCE and Linux BSPs on NAND based devices.
9
Storage device: On all our devices using NAND, we use SLC NAND. Hardware Driver: The hardware driver offers a generic interface between the NAND device and the upper layers. This layer is also responsible to detect and correct errors. On Linux, all our current images use MTD. On WinCE, we use the Microsoft Flash PDD layer. There are some exceptions such as Colibri T20, where we use a device specific PDD layer on WinCE. Flash Translation Layer: This layer is responsible for wear leveling and bad block management. On Linux, this is done by the UBI subsystem; while on WinCE, it is done by the Microsoft MDD layer. Again, on the Colibri T20, we use a device specific layer and not the Microsoft Flash MDD.
10
Filesystem: The file system is actually the part that manages the partitions and the files stored in them. A user will use the file API to use the file system (on Linux trough the VFS layer). On Linux, we use currently UBI FS; while on WinCE, Transaction Save exFAT (TexFAT). Both are power-cut tolerant. The underlying layers are power-cut tolerant as well by supporting atomic operations.
eMMC-based devices The following table shows the setup using the Toradex System on Modules using eMMC flash devices.
11
Storage device: Compared to the raw NAND, most magic is done by the eMMC itself. Higher layers do not have to take care of wear leveling, error correction or bad block management. Hardware Driver: This is the interface between the MMC controller and the file system. Filesystem: As for the NAND based devices on WinCE, here also we use TexFAT; our Linux Images use the ext3 filesystem. Again, both are power-cut tolerant.
12
Conclusion and Recommendations Toradex does its best to provide reliable and enduring flash storage. Nevertheless, you should always keep an eye on flash usage during application development. • Reduce write access to the flash device • Know the write behavior of your final product • Check if with the write behavior, the requested lifetime of your product is feasible or not • Run stress tests and longtime tests • Not using the full capacity greatly improves the efficiency of wear leveling algorithms If you need any further information or you think we could improve our default setup, please get in contact with our engineers.
13
Thank you