Data Locality & Why is important for vSphere DRS clusters

I have had a lot of people reach out to me since VMworld SFO, where I was interviewed by Eric Sloof (@esloof) on VMworldTV (interview can be seen here) about Nutanix.

So I thought I would expand on the topic of Data Locality and why it is so important for vSphere DRS clusters to maintain consistent high performance and low latency.

So first, the below diagram shows three (3) Nutanix nodes, and one (1) Guest VM.

NutanixLocalRead

The guest VM is reading data from the local storage in the Nutanix node and as a result this read access is very fast. The read I/O will be served from one of 4 places.

1. Extent Cache (DRAM – For “Active Working Set”)
2. Local SSD (For “Active Working Set”)
3. Local SATA (Only for “Cold” data)

and the forth we will discuss is a moment.

So as a result for Read I/O

1. There is no dependency on a Storage Area Network (FCoE, IP, FC etc)
2. Read I/O from one node does not contend with another node
3. Read I/O is very low latency as it does not leave the ESXi host
4. More Network bandwidth is available for Virtual Machine traffic, ESXi Mgmt, vMotion , FT etc

But wait, the what happens if DRS (or a vSphere admin) vMotion’s a VM to another node? – I’m glad you asked!

The below shows what happens immediately after a vMotion

NutanixAftervmotion

As you can see, only the Purple data is local to the new node, so transparently to the virtual machine, if/when remote data is required by the VM (ie: The VMs “Active Working Set”) the Nutanix controller VM (CVM) will get the requested data over the 10GB Network in 1MB extents. (It does not do a bulk movement or “Storage vMotion” type movement of all the VMs data EVER!)

And, all future Write I/O will be written local, so future Read I/O will all be local by default.

So, the worst case scenario for a read I/O in a Nutanix environment, is that the required data is not available locally and the CVM will access the data over a 10GB network.

This scenario will only occur in situations where

1. Maintenance is occurring and hosts are rebooted
2. A Host Failure (HA restarts VM on another node)
3. Following a vMotion

Generally in BAU (Business as Usual) operation Read I/O should be local in the high 90% range.

So the worst case scenario for Read I/O on a vSphere Cluster running on Nutanix, is actually the Best case scenario for a traditional storage array, because in a traditional array all data is accessed over some form of storage network and generally via a small number of controllers.

It is important to note, the Nutanix DFS (Distributed File System) only accesses data over the network when its required by the VM at a granular (1MB extent) level. So only the “Active Working Set” will be accessed over the 10Gb network, before being copied locally, again in 1MB extents. So if the data is not “Active” having it remotely does not impact performance at all so moving the data would create an overhead on the environment for no benefit.

In the event 90% of a VMs data is on a remote node, but the “Active Working Set” is local, read performance will all be at local speeds, again from Extent Cache (DRAM), Local SSD or Local SATA (for “cold” data).

Now some vendors are working on or have some local caching capabilities which in my experience are not widely deployed and have various caveats such as Operating System version, and in guest drivers, but for the vast majority of environments today, these technologies are not deployed.

The Nutanix DFS has data locality built in, it works with any hypervisor , Guest OS and does not require any configuration.

So now you know why ensuring the Active Working Set (data) is as close to the VM is essential for consistent high performance and low latency.

Related Articles

1. Write I/O Performance & High Availability in a scale-out Distributed File System

Example Architectural Decision – Hyperthreading with Business Critical Applications (Exchange 2013)

Problem Statement

When Virtualizing Exchange 2013 (which is considered by the customer as a Business Critical Application) in a vSphere cluster shared with other production workloads of varying sizes and performance requirements,  should Hyper Threading (HT) be used?

Assumptions

1. vSphere 5.0 or greater
2. Exchange Servers are correctly sized day one or are subsequently “Right Sized”
3. Cluster average CPU overcommitment of 3:1

Motivation

1. Ensure Optimal performance for BCAs (Exchange)
2. Ensure Optimal performance for other Virtual servers in the shared vSphere cluster

Architectural Decision

Enable Hyper Threading (HT)

Alternatives

1. Disable Hyper Threading (HT)
2. Enable Hyper Threading but configure Exchange Virtual machine/s with Advanced CPU, HT Sharing Mode of “None” to ensure Exchange is always scheduled onto physical CPU cores
3. Split off a limited number of ESXi hosts and form a dedicated BCA cluster w/ <= 2:1 overcommitment and disable HT
4. Disable HT on a number of nodes within the cluster but leave HT enabled on other nodes and use DRS rules to pin Exchange VMs to non HT hosts

Justification

1. Enabling Hyper Threading (HT) improves the efficiency of the CPU scheduler, which will minimize the possibility of CPU Ready for the Exchange server/s and other virtual machines on the host where a low level of overcommitment exists (<2:1)
2. For optimal performance, DRS “Should” rules will be used to keep Exchange (BCA) workloads on specific ESXi hosts within the cluster where <=2:1 CPU overcommitment is maintained
3. Configuring Advanced CPU, HT Sharing Mode to “None” (to guarantee only pCore’s are used) may result in increased CPU Ready as the CPU scheduler is forced to find (and wait) for available pCore’s which may result in degraded or inconsistent performance.
4. Sizing for the Exchange solution was completed taking into account only pCore’s (Not HT cores) to simplify sizing
5. As the cluster where Exchange is virtualized is shared with other server workloads with varying levels of importance and performance, HT benefits the vast majority of workloads and results in a higher consolidation ratio and better performance for the vSphere cluster as a whole.
6. In physical servers, enabling Hyper Threading on Exchange servers resulted in wasted or excessive RAM usage for .NET garbage collection due to memory for .NET being allocated based on logical cores. This does not impact “Right Sized” Virtual Machines as only the required number of vCPUs are assigned to the VM, and therefore available to the Guest OS. This avoids the issue of memory being wasted for HT cores.
7. The CPU scheduler in vSphere 5.0 or later is very efficient and can intelligently schedule workload on a hyper-thread or a physical core depending on the VMs CPU demand. While the Exchange server will at some point be scheduled onto a HT thread, this is not likely to be for any extended duration or have any significant impact on performance.
8. Splitting the cluster into BCA’s and server workloads would increase the HA overhead, and effective reduce the usable compute capacity of the infrastructure.
9. Having a cluster with varying configurations (eg: HT enabled on some hosts and not others) is not advisable as it may lead to inconsistent performance and adds unnecessary complexity to the environment

Implications

1. In the event the vCPU to pCore ratio is > 2:1 for any reason (including HA event & Virtual Server Sprawl) the number of users supported and/or the performance of the Exchange environment may be impacted
2. DRS “Should” rules will need to be created to keep Exchange workloads on hosts with <2:1 vCPU to pCore ratio

 

 

 

Storage DRS and Nutanix – To use, or not to use, that is the question?

Storage DRS (SDRS) is an excellent feature which was released with vSphere 5.0 in late 2011. For those of you who are not familiar with SDRS I recommend reading the following article prior to reading the rest of this post as SDRS knowledge is assumed from now on.

Understanding VMware vSphere 5.1 Storage DRS

This post also assumes basic knowledge of the Nutanix platform, for those of you who are not familiar with Nutanix please review the following links prior to reading the remainder of this post.

About Nutanix | How Nutanix Works | 8 Strategies for a Modern Datacenter

Storage DRS & Nutanix – To use, or not to use, that is the question?

With Storage DRS (SDRS), both capacity and performance can be managed, but what should SDRS manage in a Nutanix environment?

Lets start with performance. SDRS can help ensure optimal performance of virtual machines by enabling the I/O metric for SDRS recommendations as shown in the screen shot below.

SDRSsettingsIOmetricCircledSmall

Once this is done, SDRS will evaluate I/O every 8 hours (by default) and where the configured latency threshold is exceeded, perform a cost/benefit analysis before deciding to make a migration recommendation or do nothing.

So the question is, does SDRS add value in a Nutanix environment from a performance perspective?

The Nutanix solution adopts the “Scale-out” methodology by having one (1) Nutanix Controller VM (CVM) per Nutanix Node (ESXi Host) and then presents NFS datastore/s to the vSphere cluster which are serviced by all CVMs. The CVMs use intelligent auto-tiering to ensure optimal performance. The way this works at a high level, is as follows.

Data is written to an SSD tier (either PCIe SSD such as Fusion-io or SATA SSD) before being migrated off to a SATA tier once the blocks are determined to be “Cold” and if/when required, promoted back the an SSD tier when they become “Hot” again for improved read performance.

As with other vendor storage solutions with auto tiering technologies (such as FAST-VP , FlashPools etc) the same recommendation around SDRS and the I/O metric is true for Nutanix, leave it disabled.

So, at this point we have concluded the I/O metric will be “Disabled”, lets move onto Capacity management.

The Nutanix solution presents large NFS datastore/s to the ESXi hosts (Nutanix nodes) which are shared across all ESXi hosts in one or more vSphere clusters.

When using SDRS, it can manage initial placement of a new Virtual machine based on the configured “Utilized Space” metric (shown below) to ensure there is not a capacity imbalance between the datastores in a datastore cluster, as well as move virtual machines around when new machines are provisioned to ensure the balance is maintained.

UtilizedSpaceSDRS

So this is a really good feature which I have and do recommend in several scenarios, however the Nutanix solution presents typical a small number of large NFS datastores to the vSphere cluster (or clusters) which are serviced by all Controller VMs (CVMs) in the Nutanix cluster. Using SDRS for initial placement does not add much (if any) value as the initial placement will almost always be on the same large NFS datastore.

Where actual physical capacity becomes an issue, space saving technologies such as compression can be enabled, or the environment can be granularly scaled by adding just a single additional Nutanix node which linearly scales the solution from both a capacity and performance perspective.

The only real choice is when you choose to present two (or more) datastores where one datastore leverage’s the Nutanix compression technology. This is a very easy scenario for a vSphere admin to choose the placement of a VM and is the same amount of administrative effort as choosing a datastore cluster which would be a collection of datastores either using compression, or not depending on the workloads.

As a result there is no advantage to using SDRS to manage utilized space.

In conclusion, Storage DRS is a great feature when used with storage arrays where performance does not scale linearly or provide intelligent tiering to address I/O bottlenecks and/or where your environment has large numbers of datastores where you need to actively manage capacity.

As performance and capacity management are intelligently managed natively by the Nutanix solution, the requirement (or benefit) provided by SDRS is negated, as a result there is no requirement or benefit for using SDRS with a Nutanix solution.

Related Articles

1. Example Architectural Decision – VMware DRS automation level for a Nutanix environment