BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

IDF Shows Intel's Mission to Power the Cloud Includes Silencing Noisy Neighbors

This article is more than 9 years old.

Noisy neighbors are a major problem in any shared environment, whether it’s a cubicle farm, apartment building or cloud service. However unlike the party animal upstairs who regularly keeps you up until the wee hours, in multi-tenant cloud, IaaS environments, you typically don’t know precisely who the noisy tenant is. The only thing for certain is that at least one workload on the same physical server is hogging resources, slowing down everyone else. But wait, isn’t virtualization supposed to solve this? Doesn’t each workload have its own virtual slice of the system, walled off and isolated from every other VM? Of course; but the dirty little secret of virtualization is that the software abstraction only goes so far and there remain several shared resources on every system.

The most visible resources hogged by noisy neighbors are network and disk I/O, however these are easy to spot. However another shared element, processor cache memory, has been completely invisible and inaccessible to the hypervisors powering all cloud services: at least until now. Intel, in its new Haswell architecture Xeon E5 v3 processors, just announced at IDF, takes a first step at silencing noisy neighbors at the source, namely the on chip execution environment.

Xeon E5 v3 die. Source: Intel

The problem stems from a fundamental tenet of modern, multi-core processor design: shared cache memory. On-chip cache memory, which is typically partitioned in a three-level hierarchy, offers running programs the fastest access to frequently used data, however the biggest chunk of cache, level 3 (L3) is shared among all cores. While this wasn’t much of a problem in the days of dual- and quad-core CPUs running conventional multi-threaded applications, the advent of virtualization, along with vastly more integrated CPUs – the new Xeons top out at 18 cores sharing 45 MB of so-called last-level cache (LLC) – allow many more, and more diverse, applications to share the same slice of silicon. IaaS systems like AWS, Google Cloud or Microsoft Azure compound the problem since applications with vastly different compute workload profiles may share the same physical server. This means a single application that repeatedly accesses the same relatively large data set, say a database whose index conveniently fits in the L3 cache, can starve other applications from allocating as much cache memory as needed, with a consequent, sometimes dramatic, drop in performance. The problem for system administrators is that until now, it’s been impossible to see the root cause of the problem since cache memory allocation is handled completely on chip by the processor’s memory management system and not visible to monitoring and virtualization software.

Don’t like the noise? Move

The only recourse has been to move poor-performing applications to another physical system in hopes of improving things: the cloud equivalent of rebooting your PC when it gets slow. Netflix, which famously runs its service on AWS, saw the problems of shared infrastructure early on and shared its experience in a nearly four-year old blog post. In writing about the problems of co-tenancy, a Netflix engineer wrote:

“When designing customer-facing software for a cloud environment, it is all about managing down expected overall latency of response. AWS is built around a model of sharing resources; hardware, network, storage, etc. Co-tenancy can introduce variance in throughput at any level of the stack. You’ve got to either be willing to abandon any specific subtask, or manage your resources within AWS to avoid co-tenancy where you must.”

In other words, be prepared for systems to behave erratically, and even fail, and become adept at moving and restarting applications. It’s clearly a sub-optimal, hit-and-miss approach.

Xeon E5 v3 package Source: Intel

Haswell Finally Provides Visibility to Cache Usage

The Haswell architecture Xeon server CPUs, the “tock” in Intel’s Tick-Tock processor development strategy, includes a host of functional improvements. Many, like support for faster, next-generation DDR4 memory, more sophisticated and efficient power states and improvements to vector processing and floating point instructions, are logical (and expected) extensions of existing technology. However the new cache monitoring feature takes the first step allowing hypervisors and cloud control software to offer quality-of-service (QoS) guarantees and limits for on-chip memory.

Existing x86 processor designs allocate cache on a first come, first served basis, which allows some workloads to monopolize the shared pool. Cache monitoring provides information to the hypervisor and cloud management software about exact cache usage by every VM on chip, information that can be used to automatically move cache-abusing noisy neighbors to an underutilized system and free up cache for the remaining “good neighbors.” Intel benchmarks show that cache interference can double or even triple the run time of common workloads.

Although the current implementation only monitors the cache, it can’t truly implement QoS and reserve a minimum allocation for each core, this is clearly the ultimate goal and I would expect to see cache allocation limits in a future Xeon release.

Intel is serious about eliminating VM performance bottlenecks

Xeon E5 v3 has several other features designed to improve performance in virtualized environments, including:

Together, these architectural advancements clearly demonstrate that Intel is committed to ensuring its x86 processors remain the engines powering the cloud server farms that have become this generation’s mainframes. As virtualization and multi-tenant cloud deployments have changed the way IT and service providers use servers, Intel needs to adapt with technology optimized for new workloads. The Haswell architecture E5 v3 Xeons show that it has no intention of slowing the innovation train.