Technology marches onward and there is probably no area as ripe for dramatic change as storage is today. Today is the first in this series of blog posts where I’m planning to explore the forces of change impacting storage and the alternative new architectures coming to the forefront as a result.
Today’s post is on three disruptive trends: virtualization, flash technology, and application storage access APIs.
Virtualization is certainly not new, but I would be remiss if I didn’t mention the profound effect this technology has had on storage architecture and workloads. Operating Systems, their caching mechanisms, file systems, storage stacks, and shared storage devices such as SAN and NAS, had mutually and cooperatively optimized their platforms for physical workloads and maximizing locality of reference. Remember that locality of reference is a critical metric for spinning physical media that has to move drive heads to access data. Virtualization dramatically changed these techniques’ effectiveness. By running many workloads on the same server, by dynamically moving workloads around with vmotion and DRS technologies and by creating an I/O blender effect, the performance characteristic shifted radically, exposing the performance limitations of existing storage designs.
Enter Flash-based SSD devices. Flash technology provides a significantly faster/lower latency method of persistent data storage. In Figure 2 – Relative IOPs and Latency, I show the performance/latency landscape at a high level. On one end of the spectrum, we have memory access in the nano-second range with great consistency and predictability and at the opposite end we have traditional mechanical storage spindles clocking in at 4-7 milliseconds and under load with poor locality of reference, can be a lot worse than that. For the not so mathematically inclined, a single millisecond equals 1,000,000 nanoseconds. Ouch!
Flash is much faster, typically in the 15-100 uSecond range. Flash has a lot of variability for write access however, as I will discuss in a moment. Remote memory shouldn’t be overlooked either as it is a critical building block for scale out architectures. Remote access over the network can be similarly speedy with today’s 10GBE coming in at 4-20 uSeconds. Read on for more on this point as well, as it is a key enabler of scale-out storage architectures.
Flash’s unpredictability is due to the way write operations need to be handled, and various mitigation techniques necessary for this handling. Flash is divided into blocks and further divided into pages. While empty, pages can be written to directly, but they can’t be re-written directly. They need to be erased first. And the erase granularity is at the block level. The result is that pages and blocks require multiple operations performed somewhat serially, including moving pages around to erase a block and perform a simple overwrite with new content.
Another factor that makes flash unpredictable is that it pages have a cycle lifetime, meaning they wear out over many operations. Hence to maximize the life of a flash based device, software must perform “load-leveling” i.e. making sure that all the blocks and pages on the device are cycled through and written a similar amount of times. Note that many forms of traditional RAID are poorly suited for these flash mechanics as well. Fortunately, this level of complexity is handled by storage software, not application developers!
Application Storage Access APIs
Which brings me to fundamental changes in the way we are building our applications and accessing storage today with scale out architectures, cloud and big data. There are several intersecting trends here. A key one is the move towards object storage – a flat namespace coupled with monolithic get/puts for object updates. This is fundamentally different from in-place, POSIX compliant read/write interfaces within a file system or database. Object based storage has come to prominence with cloud workloads and big data alongside popular key-value pair and No-SQL data abstractions and the scalability requirements of the cloud.
Another interesting trend supports the ability to achieve cloud scale and match these application requirements. This is a major architectural shift in and of itself that I expect to cover more fully in a future post. In this case, strict consistency has been traded off for availability and partition-tolerance. Application failure tolerance and object level replication replaces the block-level raid prevalent in traditional scale-up storage arrays. The fundamental shift is partly philosophical – we replace instead of repair.
As a final thought though, I will borrow a humorous analogy about this last topic from my friend and former colleague, Joe Baguley (@JoeBaguley). Joe likens enterprise scale up servers to cats – system administrators care for them individually and carefully, they have 10’s of them, name them individually and fix them when they break. The Admin would describe their job as to care for servers. Joe likens cloud/scale-out to chicken farming – you have thousands of them, none of them have names, you don’t fix them when they get ill and the job is to produce eggs!
In my next few blogs I will delve into how these disruptive innovations are expressed in:
- Flash, SSDs and memory technologies
- All-flash and hybrid arrays
- Hyper-Converged systems
- Software-Defined Storage & Services
- Scale-Out Architectures