Unlimited VMs per datastore? Its not a myth with Nutanix!

For many years, I have been asked on countless occasions questions relating to how many VMs can (or should) be placed in one datastore.

In fact, just this morning I was asked this same question, and I decided to whip up a quick post.

I have previously posted an Example Architectural Decision relating to Datastore sizing for Block based storage. What this example was aimed to show was a how things like RPO/RTO and performance should be taken into consideration when choosing a datastore size.

The above example is not a hard and fast rule, but an example of one deployment which I was involved in.

There is a great article written on this topic by VCDX, Jason Boche (@jasonboche), titled  “VAAI and the Unlimited VMs per Datastore Urban Myth” which covers in great detail this topic as it relates to block based storage, being iSCSI, FC & FCoE.

But what about NFS, and what about with Hyper-converged solutions like Nutanix?

NFS has gained significant popularity in recent years, and in my opinion, people who know what they are talking about, no longer refer to NFS as “Tier 3 Storage” which was once common.

With traditional storage solutions, generally only a smaller number of controllers can actively serve IO to the one NFS mount, so the limiting factor preventing running more virtual machines per NFS mount, in my experience was performance but things like RPO/RTO were and are important considerations.

NFS does not suffer from SCSI reservations which resulted in increased latency ,which is what VAAI, specifically the Atomic Test & Set or ATS primitive helped too all but eliminate for block based datastores.

LUNs are limited by there queue depth, which in most cases is 32 (sometimes 64). This is also a limiting factor, as all the VMs in a datastore (LUN) share the same queue which can lead to contention. SIOC helps manage the contention by ensuring fairness based on share values, but it does not solve the issue.

NFS on the other hand has a much larger queue depth, in fact its basically unlimited as shown below.

NFSqueuedepth

So as NFS does not suffer from SCSI reservations, or queue depth issues, what is limiting us having hundreds or more VMs per datastore?

It comes down to how many active storage controllers are able to service the NFS mount, and the performance of the storage controller/s. In addition to this your business requirements around RPO/RTO. In other words, if a NFS mount is lost, how quickly can you recover.

For most traditional shared storage products,

1. Have only 1 or 2 active controllers – thus potentially limiting performance which would lead to lower VMs per NFS datastore.

2. Do snapshots at the NFS mount layer, so if you need to recover an entire NFS mount, the larger it is, the longer it may take.

For Nutanix, by default, NFS is used to present the Nutanix Distributed File System (NDFS) to vSphere, however the key difference between Nutanix and traditional shared storage is every controller in the Nutanix cluster, can and does Actively serve IO to any datastore in the cluster concurrently.

So the limit from a performance perspective is gone thanks to Nutanix scale out, shared nothing architecture, with one virtual storage controller (CVM) per Nutanix node. The number of nodes that’s can be scaled too, is also unlimited. An example of Nutanix ability to scale can be found here – Scaling to 1 million IOPS and beyond, Linearly!

Next what about the RPO/RTO issue? Well, Nutanix does not rely on LUNs or NFS mounts for our data protection (or snapshots), this is all done at a VM layer so your RPO/RTO is now per VM, which gives you much more flexibility.

With Nutanix, you can literally run hundreds or even thousands of VMs per NFS datastore, without performance or RPO/RTO problems thanks to scale out, shared nothing architecture and the Nutanix Distributed File System.

There are some reasons why you may choose to have multiple NFS datastores even in a Nutanix environment, these include, if you want to enable Compression and/or De-duplication which are enabled/disabled on a per container (or datastore) level. As some workloads don’t compress or dedupe well, these types of workloads should be excluded to reduce the overhead on the cluster.

It is important to note, Nutanix uses a concept called a “Storage Pool” which contains all the storage for the Nutanix cluster. On top of a “Storage Pool” you create “Containers” (or datastores). This means regardless of if you have 1 or 100 datastores, they all still sit on top of the one “Storage Pool” which means you still have access to the same amount of storage capacity, with no silos for maximum capacity utilization (and performance!).

Lastly, Nutanix does not suffer from the same availability concerns as traditional shared storage where a single LUN could potentially be lost. This is due to the distributed architecture of the Nutanix solution. For more information on how Nutanix is more highly available than traditional shared storage, check out “Scale out, Shared Nothing Architecture Resiliency by Nutanix

Check out a screen shot of one cluster with ~800 VMs on a single datastore. Note: The sub millisecond latency and 14K IOPS w/ ~900MBps throughput. Not bad!

800VMsonDatastore

New FREE eLearning: VMware vSphere: What’s New [V5.5]

I have always been impressed with the quality and quantity of free eLearning material that VMware provides, and they have just released eLearning: VMware vSphere: What’s New [V5.5]

The course is short and sweet, Self paced and is estimated to last around 1 hour.

The course can be found here – VMware vSphere: What’s New [V5.5]

Below is the official overview from VMware.

Self-Paced (1 Hour)

Overview:
VMware vSphere® 5.5 introduces many new features and enhancements to further extend the core capabilities of the vSphere platform. This free eLearning will discuss features and capabilities of the vSphere platform,
including vSphere ESXi Hypervisor™, VMware vSphere High Availability (vSphere HA), virtual machines,
VMware vCenter Server™, storage networking and vSphere Big Data Extensions.

The future of NAS Storage (NFS) for Virtual Environments

I read the article (below) by Howard Marks after seeing it come up on Twitter today, and I found it to be very interesting and refreshing to read as it hits the nail on the head.

http://www.networkcomputing.com/storage-networking-management/vmware-has-to-step-up-on-nfs/240163350

For a long time Network Attached Storage (NAS) has been considered by many (including myself in the past) as a second class citizen, or Tier 3 storage and not a serious choice for mission critical virtual environments.

In recent years, I have used more and more NFS in vSphere environments, and as I went through my VCDX journey I formed the strong view that NFS was in fact the best storage protocol for vSphere/vCloud/View environments having gone through a process of trying to learn as much as possible about every storage alternative available to vSphere.

In fact my VCDX design was based on a vCloud solution running on NFS, and this was one area I found quite easy to present and defend during the VCDX defence due to the many advantages of NFS.

In the article, Howard wrote

“It’s time for VMware to upgrade its support for file storage (as opposed to block storage) and embrace the pioneering vendors who are building storage systems specifically for the virtualization environment.”

I totally agree with this statement, and I think it is in the best interest of VMware, its partners and customers for VMware to go down this path. I think most would agree that Netapp have been leading the charge with NFS based storage for a long time, and in my opinion rightly so, with some new storage vendors also choosing to build solutions around NFS.

Another comment Howard made was

“managing vSphere with NFS storage is somewhat simpler than managing an equivalent system on block storage. Even better, a good NFS storage system, because it knows which blocks belong to which virtual machine, can perform storage management tasks such as snapshots, replication and storage quality of service per virtual machine, rather than per volume.”

I totally agree with the above statement and VMware’s development of features such as View Composer for Array Integration (VCAI) which is only supported on NFS, shows the protocol has significant advantages over block based storage especially for deployment speed and reduced workload on the storage compared. (VCAI uses the Fast File Clone VAAI-NAS Primitive to create near instant space efficient Linked Clone desktops)

I wrote an example architectural decision regarding storage protocol choice for Horizon View (VDI) environments which covers in more depth the advantages of NFS for VDI environments. The article can be viewed here : Example Architectural Decision – Storage Protocol Choice for Horizon View

Also NFS does not suffer from the same challenges as block based storage, as much larger numbers of VMs can share an NFS datastore compared to VMFS datastore without being negatively impacted by latency as a result of SCSI reservations (although vastly improved with VAAI) or contention resulting from limited SCSI queue depths which is something VAAI does still not address.

These limitations of block storage leads to the number of VMs per datastore remaining at the old rules of thumb of <25 for non I/O intensive workloads even with VAAI which some felt was the magic solution to the issue which sadly was incorrect. (Note: Number of desktop VMs per VMFS datastore with VAAI the recommended maximum is 140 compared to 64 without VAAI and NFS of >200).

Howard went on to write

“The first step would be for VMware to acknowledge that NFS has advanced in the past decade.”

I think this has been acknowledged by VMware along with many experts in the industry which is a positive step forward and I believe VMware will give more attention to NFS in future versions.

Howard further commented that

“Today vSphere supports version 3.0 of NFS—which is seventeen years old. NFS 4.1 has much more sophisticated security, locking and network improvements than NFS 3.0. The optional pNFS extension can bring the performance and multipathing of SANs with centralized file system management.”

I really think that VMware adding support in the future for NFS 4.1 will really help cement NFS as the protocol of choice for virtual environments and will be complimentary to VMware’s upcoming VSAN offering.

I think with bolstered NFS support and VSAN, VMware will have a solid storage layer to take virtualization into the future, without requiring storage vendors to immediately support vVOLs which in my opinion is being built (at least in part) to solve the challenges of VMFS and block based storage, when NFS (even version 3.) addresses most requirements in virtual environments very well today, and NFS 4.1 support will only improve the situation.

Howard’s comment (below) appears to echo these thoughts.

“Better NFS support will empower storage vendors to innovate and strengthen the vSphere ecosystem and fill the gap until vVols are ready. NFS support will also provide an alternative once vVols hit the market.”

 

To finish I thought Howard’s comment on snapshots (below) and replication being per Virtual Machine rather than volume or LUN, several vendors are doing this today moving towards NFS 4.1 will help these vendors continue to innovate and provide better and more efficient storage solutions for VMware’s customers which I think is what everyone wants.

Even better, a good NFS storage system, because it knows which blocks belong to which virtual machine, can perform storage management tasks such as snapshots, replication and storage quality of service per virtual machine, rather than per volume.