Why the Right Hypervisor-Level Cache is a Great Use of Memory

When I first joined VMware back in 2007 and started delving into the technical details of the vSphere storage stack, one of the areas that initially surprised me was the lack of caching in the hypervisor. VMFS had an i-node cache for its file system meta-data, but that was all. As an experienced Operating System architect, the benefits of caching were well known to me. However, VMware’s ESX storage architects had chosen not to build a cache for virtual disk (VMDK) blocks in the hypervisor, the rationale being that if there was extra memory in the platform, it would be better utilized in the working sets of the guest OS’s running inside the VMs. The guest OS’s were closer to the workloads and could make better decisions on the use of “excess” memory. After all, a physical address-based cache applied to VMDKs would be downstream of the guest OS buffer cache and provide little resource sharing benefit as most VMs use dedicated/non-shared VMDKs.

Later on, when looking to address the VDI boot storm issue in VMware View, I had the insight to create a form of dedup’ed cache for the OS image that would enable an effective shared cache based on content addressability. This idea eventually turned into the View Storage Accelerator functionality, a niche implementation of this general idea. The reason I call it niche is that this single node cache implementation requires static, offline generation of the dedup hash signatures for the OS images and hence is only suitable for the boot storm use case.

So was VMware right back in 2007 in not dedicating memory to a hypervisor level cache? It depends on your technical assumptions and insights. If you have a buffer cache in the guest OS in a VM, that cache should be more effective than a shared physical address indexed cache at the hypervisor level. Such a hypervisor cache is not really sharing the dedicated content in each VM and the guest OS simply has more knowledge about the blocks being referenced. This latter aspect is similar to the philosophy behind the VMware balloon driver – the Guest OS can make a more informed decision on what pages to swap out under memory pressure than the hypervisor can.

However, the key assumption here is that we are working with a physical address indexed cache, so that there will be no shared cache blocks between VMs. At Infinio, we had a better idea, namely our patent-pending dynamic, distributed content-based cache. This type of cache structure at the hypervisor level can be very effective and complementary to guest OS-based caches. It puts dedicated memory to great use across all VMs by efficiently identifying common content across the VM images and caching them in a manner that facilitates cross VM cache hits for like content. A hypervisor cache that has these properties is caching not just based on access frequency from one VM, but access frequency of the content across all the VMs in an entire vSphere cluster. With this mechanism, the hypervisor-based cache is providing greater and complementary insights about the blocks it chooses to cache.

Infinio's Content-Based Architecture

Infinio’s Content-Based Architecture

Essentially, Infinio is virtualizing the cache address space so that discrete VMDKs can share a common index mechanism and also dynamically identifying common content across these VMDKs, so the cache can identify and serve shared content to discrete, non-shared VMs across the entire cluster. This is why Infinio is a unique and complementary addition to vSphere deployments and why it changes the equation on the value of dedicating memory to a hypervisor-level cache.

This entry was posted in Evolution of Storage Architecture, Software-Defined Data Center. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *