A Not-So-Brief History of Scale-Out Storage Design

Cloud application architectures such as Hadoop/Map-Reduce and Google FS have profoundly influenced how we build scalable systems. In turn, I’m seeing this profoundly influence storage system architectures as well. One of the key principles at work here is to partition data and minimize synchronization.

Distributed systems didn’t always work this way. In the late 80s-early 90s, I worked on the precursor to many of these systems – VAXclusters for DEC. For all its pioneering concepts (such as single system image, distributed compute availability and distributed scalability), the VAXcluster design proved to have significant scalability bottlenecks with its shared-everything, non-partitioned data model. The limits of VAXcluster-style technology topped out at approximately 100 nodes.

VAXclusters were pioneering complex distributed systems, while for the most part storage arrays stuck to simpler designs. By that I mean scale-up architectures with failover for availability. Scale-up design means that to add performance or capacity, you need to add more components to the storage array or buy a bigger, more powerful array. Clustering storage devices together to gain scalability usually entailed coarse-grained partitioning and manually dividing your data between the devices. These designs worked well enough for a long time.

But first virtualization – many more workloads and IOPs from servers – started to seriously increase the load on centralized storage arrays. Then distributed cloud workloads – many instances of applications on many servers all demanding IOPs – really started to tax these architectures. And buying more spindles to address the performance problems was very inefficient and not cost effective – it meant taking on significantly more and unnecessary capacity.

Enter scale-out architecture: explicitly partition the data across multiple systems, paying careful attention to minimizing/eliminating shared metadata. Those lessons from the VAX/VMS days informed the industry towards these highly scalable, shared-nothing designs. The goal of these systems is to evenly and automatically distribute the data across multiple systems, make sure locating which system has specific data is efficient, and minimize coordination across these data nodes. A poor design on any of these elements will result in scalability bottlenecks.

Scale-up vs. Scale-Out

Scale-up vs. Scale-Out

Also, a fundamental tenet of distributed systems is “the more systems involved, the more often failures occur”, so failure processing and recovery must be carefully considered as well. This has driven designs towards failure recovery as a steady-state condition, as opposed to a temporary situation to be quickly recovered from. Most scale-out architectures also collocate execution of the application instance with the storage stack on the same server. This enables scaling that is far more modular and cost-effective than scale-up.

Scale-out storage designs are most frequently seen with the software defined storage/converged infrastructure approaches that I discussed in my recent blog post on Software Defined Storage & Hyper-Converged Infrastructure. However, the hyper-converged infrastructure offerings tend towards monolithic scaling, where compute, storage performance, and storage capacity are delivered together in one unit. This is simple and a good business model.

However, I’d make the observation that a more flexible, optimal approach is achieved with scale-out designs that separate the storage performance from the storage capacity. I believe storage performance is best suited to server-side execution and resources as this attribute is directly correlated with the number of application instances and the load each generates. Conversely, storage capacity is actually independent of both of these attributes – the size of the data sets and the number of application instances. Scale-out capacity also adds a lot of complexity and number of resources for data availability. Compute load and storage performance are ephemeral, as opposed to storage capacity, which must be persistent across failures. And as I pointed out above, more distributed components mean more failures, which results in the requirement for more data copies and thus more management overhead.

This all leads me to the observation that the ideal storage architecture for today’s applications may be moving towards separating storage performance from storage capacity, with a scale-out server-side architecture for storage performance coupled with a more centralized model for storage capacity. And that is what Infinio looks like, which is not a coincidence! 😉

Infinio is a pure scale-out, server-side storage performance layer whose content-based addressing scheme seamlessly and efficiently distributes the storage performance load across all of the servers. And it is independent of the storage capacity tier, which can be as centralized or as distributed, as you desire. A quite flexible design, described in this video series by Infinio’s founder and Chief Scientist Vishal Misra.

This entry was posted in Evolution of Storage Architecture. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *