MLC Flash for Big Data Acceleration

Big data analysis demands bandwidth and concurrent access to stored data. Write load will depend on data ingest rates and batch processing demands. The data involved will typically be new data and updates of existing data. Indices and other metadata may be recalculated, but is generally not done in real time. The economics of supporting such workloads focus on the ability to cost effectively provide bulk access for concurrent streams. If only a single stream is being processed, spinning disk is fine. However, providing highly concurrent access to the dataset requires either a widely-striped caching solution or a clustered architecture with local disk (Hadoop). Because write lifetimes for flash are not stressed in this environment, utilizing wide stripes of MLC for caching is the most cost-effective way to provide highly concurrent access to the dataset in a shared-storage environment.

Now, a lot of the SLC versus MLC debate centers on blocking and write performance – specifically dealing with write latency and the blocking impact on reads. With traditional storage layout, data can be striped over only a few disks (4 data disks for stripes of RAID 5/6). This creates high read blocking probability for even the smallest write loads. By distributing the data over very wide non-RAID stripes (up to 40 disks wide), the affect of variable write latency can be mitigated by dynamically selecting least-read disks for new cache data and greatly reducing the impact of writes on the general read load. The wider the striping of physical disks in a caching media the greater the support for concurrent access and mixed read and write loads from the application. MLC is an excellent media choice, both technically and economically.

By employing affordable MLC as a write-through caching layer that is consistent with the backend storage, the effect of even multiple simultaneous flash SSD failures can be removed. Most traditional storage systems cannot survive multiple concurrent drive failures and suffer significant performance degradation when recovering (rebuilding) from a single device failure. Cache systems can continue operation in the face of cache media failures by simply fetching missing data from the storage system and redistributing to other caching media. However, it’s important to note that placing the cache in front of the storage controller is critical to achieving concurrency. The storage controller lacks the horsepower necessary to sustain performance – but that’s a topic for another day.

MLC is driving the price point of Flash towards that of enterprise high-performance spinning disk. The constant growth in the consumer space means that MLC will continue to be the most cost-effective flash technology and benefit the most from technology scaling and packaging innovations. Lower volume technologies such as eMLC and SLC do not share the same economic drivers and thus will continue to be much more expensive. The ability to utilize MLC efficiently and adapt the technology to meet the performance and access needs of Big Data will be hugely advantageous to customers and the vendors who can deliver intelligent, cost-effective solutions that utilize MLC – such as the GridIron TurboCharger™!