In the first part of this series on Disruptive Technology Trends, I discussed disruptive core technologies such as flash, virtualization and cloud/scale out. In this continuation, I will discuss the storage architectures that have grown up around these disruptions.
Looking back ten years, before virtualization, flash and cloud architectures were mainstream, we had traditional shared/multi-host storage arrays. The strongest trends were around unified block and file protocols from the same arrays and storage tiering, mixing of different performing media in the same array and exporting LUNs with different characteristics. Tiering was primitive and very coarse grained by today’s abstraction standards. The biggest architectural difference then was whether you had a native block device with file support layered on or a native file server with block layered on.
Today’s world is far more diverse and sophisticated. I assert that there are at least five distinct storage architectures in the market today, giving us a variety of options to consider.
Before I delve into each of the five, one important dimension to consider is scale-up vs. scale-out architectures and the related division between centralized, shared storage controllers vs. more distributed architectures. Although there are exceptions:
- Scale-up architectures are monolithic and most centralized multi-host storage controllers fall into this category. By scale-up, I mean that when you run out of capacity or performance, you either need to add faster/larger components to your array (if possible) or you need to replace it with a bigger model.
- Scale-out architectures tend to take a more modular, distributed building block approach. When more performance or capacity is needed an additional unit can be added to the existing pool. Today, scale-out architectures are often associated with converged infrastructure solutions – executing the storage controller software on the same servers that are executing the application workload.
In the world of dedicated/centralized storage arrays, there are two newer variants – Hybrid flash/disk Arrays and All-Flash Arrays (aka AFA). Hybrid arrays add PCIe flash or SSD to the array, but do not expose the flash directly as usable application storage.
Instead the array software uses the flash as a combination of a read cache for frequently accessed blocks and a write log buffer for consolidating and accelerating write performance. Note that populating the SSD read cache takes additional operations when compared to a memory cache. In the write case, the array frequently prunes the write log by moving data asynchronously from the write log to the spinning media. In summary, this architecture uses relatively small amounts of flash to deliver better performance and especially lower latency from the disk arrays and these designs originally were intended to make optimal use of a limited amount of expensive flash cards or SSD.
All-Flash Arrays are also dedicated/centralized multi-host storage devices, but as the name implies do not have any spinning mechanical disks. Instead they are comprised of all flash and solid state disk medium which is exposed to applications like traditional storage would be. Historically, flash/SSDs have been more costly then spinning disk media, so AFAs had less capacity than alternatives and hence effective data deduplication and compression strategies were critical for mainstream uses. More recently, prices have been falling as flash/SSD becomes more common. Maturing dedup and compression implementations reduce capacity consumption as well. However, prices have been falling on mechanical disk devices as well, while their capacities continue to grow every larger.
An alternative to centralized traditional, hybrid and all-flash arrays is to execute the storage controller software stack on the same servers that are running the general application workloads. Performing storage stack computation and optimizations locally using the processing power of the server can have several advantages when compared to centralized arrays. Storage is located closer to the application with less overhead and storage processing power scales with additional servers. This is especially useful when thinking about flash, where more processing power is required to handle the expected increased IOPS.
There are several variants of this architecture, the first of which I’ll discuss is Converged Infrastructure. In my opinion, converged infrastructure is as much a sales strategy as it is a technology architecture. Customers purchase integrated building blocks that deliver computes, memory, storage and network capacity as an integrated unit for a specific workload. These building blocks can easily be aggregated together to deliver additional workload capability, but the converged nature means trading that simplicity off against the ability to address specific limitations individually such as increasing just storage capacity or alleviating performance bottlenecks. My former VMware/EMC colleague, Chuck Hollis, wrote a very informative blog drilling further into this topic with more subcategories; converged vs. hyper-converged vs. what he called “hypervisor converged” (aka VMware VSAN).
Software-Defined Storage (aka SDS) is another architecture where the storage array controller software stack is executing on general purpose servers alongside application workloads. Typically the software stack is controlling direct-attached (i.e. non-multi-host) storage media on each server as well. SDS software can be built into a hypervisor and execute in kernel-mode similar to a device driver or alternatively can execute in user-mode as an application inside of a VM or virtual appliance. SDS software is usually “scale out”, meaning that it is a distributed application that is pooling storage resources across multiple instances and servers. Storage is abstracted and the same service is offered from multiple servers to applications that span the cluster. Many times, the control plane or management plane and the data plane are separate entities. Note that most Converged Infrastructure solutions also contain SDS software, but they do not offer that software independently.
One final variant of Software Defined Storage is Software Defined Storage Services. This type of technology consists of value added storage software capabilities also executing on the server, but delivering these values to one of the other core storage architectures described earlier, including traditional arrays. These technologies do not consist of complete storage stacks and instead offer next generation storage services such as scale-out performance, caching, encryption, deduplication, etc. to the other storage solution types.
I’d say this is a pretty rich set of diverse options! Each of these architectures has their use cases. To some extent, some or all of these are likely to be transitional technologies. I expect flash and SSDs to evolve beyond some of the limitations highlighted in my previous Disruptive Technology Trends post. I also expect the distributed scale-out architecture trend in particular and its impact on both storage and application design to continue and eventually dominate the landscape, but that’s a topic for a subsequent blog post!