Cloning VMs – Why less (I/O & throughput) is better!

I’ve seen the picture below floating around Twitter and LinkedIn which shows a 32GB VM being cloned in just 7 seconds on an All Flash Array (AFA) and has got a lot of attention.

The AFA peaked at over 7000MB/s during this time showing the AFA is capable of some serious throughput!345363bf-bbb3-4389-aafa-71c81f182de3-large

At this stage some people may be thinking im talking about Nutanix, so I would like to point out the above AFA is not a Nutanix NX-9000 All Flash Node.

So why did I write this post?

I am still surprised that technical people find this sort of test and result impressive, because to me the fact the AFA used 7000MB/s of bandwidth to perform the clone means it has not intelligently performed the clone and the process has used additional capacity while potentially having a high impact on the other workloads using the storage.

At this stage I guess I should explain what I mean by intelligently clone.

An intelligent clone in my mind is where:

a) The clone takes a few seconds to occur
b) The clone is offloaded to the storage layer
c) Uses almost zero I/O & bandwidth to perform the clone
d) Uses almost zero additional space

So in the above example, the solution has cloned the VM in a few seconds, so a) has been satisfied, and since there is no information provided I’m going to give it the benefit of the doubt and say the clone was offloaded to the storage layer, so im assuming (rightly or wrongly) that b) is also satisfied.

But what about c) and d).

If the clone uses 7000MB/s of bandwidth that must have some impact (if not a significant impact) on other workloads running on the storage, even if it is only for 7 seconds.

The clone was also writing data throughout the 7 seconds, so its also duplicating the data.

So the net result is a fast yet high impact (capacity / performance) clone.

Back in 2012, when I worked at IBM, I wrote this post (Netapp Edge VSA – Rapid Cloning Utility) about intelligent cloning, as a customer was suffering terrible VDI recompose times due to using a big dumb storage solution which had no inteligent cloning capabilities. The post shows even on an old IBM x3850 M2 with slow old 4 core processors running a Virtual Storage Appliance running on 3 peices of spinning rust (146GB SAS disks) and it still completes the task in just 4.73 seconds per clone in full compliance with the 4 items I identified as aspects of intelligent cloning (below).

a) The clone takes a few seconds to occur
b) The clone is offloaded to the storage layer
c) Uses almost zero I/O & bandwidth to perform the clone
d) Uses almost zero additional space

The reason intelligent cloning is so much faster is because there is no need to duplicate a VM, the intelligent cloning process simply creates pointers back to the original file (which remains Read Only) and only uses I/O & capacity when new data is created.

The process is actually mostly dependant on vCenter to register the new VM which is why the process takes a couple of seconds as the process takes almost no time at the storage layer. The size of the VM being cloned is irrelevant. (Note: In my post from 2012 it was a 10Gb VM although again the size has no impact on the speed of an intelligent clone)

In the post from 2012, I made the following observation:

Even if you have the worlds fastest array (insert you favorite vendor here), storage connectivity and the biggest and most powerful ESXi hosts the process of cloning a large number of virtual machines will still;

1. Take more time to complete than an intelligent cloning process like RCU

2. Impact the performance of your ESXi hosts and more than likley production VMs

3. Impact the performance of your storage network & array (and anything that uses it , physical or virtual).

So fast forward to 2015, we have lots of really fast All-Flash storage solutions, but for tasks like cloning, even these super fast all-flash solutions can’t outperform a single controller (2vCPU) Virtual Storage appliance running on an old IBM x3850 M2 server running in my test lab using intelligent cloning from back in 2012.

I also wrote this article (Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?) recently explaining the benefits of VAAI-NAS and how VAAI-NAS supports intelligent cloning even with Virtual Storage Appliance solutions.

In Summary:

I find a clone taking a few seconds and using next to no throughput and capacity to be impressive. This is a perfect example of less I/O and throughput (to perform the same task) being better!

Its great if a storage array has the capability to drive many GB/s of throughput, but its totally unnecessary for cloning and is only demonstrating the lack of intelligent cloning capabilities for the storage solution.

In my opinion its much better for a storage solutions to use its high performance capability for driving I/O to virtual machines servicing business applications than for tasks like cloning which can be done intelligently.

To show off more real world performance capabilities of a storage solution (especially an All-Flash array), the example really has to include multiple workloads with different I/O characteristics. This is something the storage industry (all vendors) continues to fail to provide and its something I would like to be a part of changing as things like “Peak” performance are no where near as important as “consistent” performance.

Back on topic though, If cloning is something you or your customers require, for say a VDI, Cloud deployment or just for rapid provisioning of testing & development VMs, consider a storage solution which has intelligent cloning capabilities such as VAAI-NAS which integrates with products like Horizon View (VCAI Clones) and vCloud Director (FAST Provisioning).

Fight the FUD! – Not all VAAI-NAS storage solutions are created equal.

At a meeting recently, a potential customer who is comparing NAS/Hyper-converged solutions for an upcoming project advised me they only wanted to consider platforms with VAAI-NAS support.

As the customer was considering a wide range of workloads, including VDI and server the requirement for VAAI-NAS makes sense.

Then the customer advised us they are comparing 4 different Hyper-Converged platforms and a range of traditional NAS solutions. The customer eliminated two platforms due to no VAAI support at all (!) but then said Nutanix and one other vendor both had VAAI-NAS support so this was not a differentiator.

Having personally completed the VAAI-NAS certification for Nutanix, I was curious what other vendor had full VAAI-NAS support, as it was (and remains) my understanding Nutanix is the only Hyper-converged vendor who has passed the full suite of certification tests.

The customer advised who the other vendor was, so we checked the HCL together and sure enough, that vendor only supported a subset of VAAI-NAS capabilities even though the sales reps and marketing material all claim full VAAI-NAS support.

The customer was more than a little surprised that VAAI-NAS certification does not require all capabilities to be supported.

Any storage vendor wanting its customers to get support for VAAI-NAS with VMware is required to complete a certification process which includes a comprehensive set of tests. There are a total of 66 tests for VAAI-NAS vSphere 5.5 certification which are required to be completed to gain the full VAAI-NAS certification.

However as this customer learned, it is possible and indeed common for storage vendors not to pass all tests and gain certification for only a subset of VAAI-NAS capabilities.

The below shows the Nutanix listing on the VMware HCL for VAAI NAS highlighting the 4 VAAI-NAS features which can be certified and supported being:

1. Extended Stats
2. File Cloning
3. Native SS for LC
4. Space Reserve
NutanixVAAI-NAS

This is an example of a fully certified solution supporting all VAAI-NAS features.

Here is an example of a VAAI-NAS certified solution which has only certified 1 of the 4 capabilities. (This is a Hyper-converged platform although they were not being considered by the customer)

vaai-naslol

Here is another example of a VAAI-NAS certified solution which has only certified 2 of the 4 capabilities. (This is a Hyper-converged platform).

vaainasc

So customers using the above storage solution cannot for example create Thick Provisioned Virtual Disks, therefore preventing the use of Fault Tolerance (FT) or virtualization of business critical applications such as Oracle RAC.

In this next example, the vendor has certified 3 out of 4 capabilities and is not certified for Native SS for LC. (This is a traditional centralized NAS platform).

VNXvaainas

So this solution does not support using storage level snapshots for the creation of Linked Clones, so things like Horizon View (VDI) or vCloud Director FAST Provisioning deployments will not get the cloning performance or optimal capacity saving benefits of fully certified/supported VAAI-NAS storage solutions.

The point of this article is simply to raise awareness that not all solutions advertising VAAI-NAS support are created equal and ALWAYS CHECK THE HCL! Don’t believe the friendly sales rep as they may be misleading you or flat out lying about VAAI-NAS capabilities / support.

When comparing traditional NAS or Hyper-converged solutions, ensure you check the VMware HCL and compare the various VAAI-NAS capabilities supported as some vendors have certified only a subset of the VAAI-NAS capabilities.

To properly compare solutions, use the VMware HCL Storage/SAN section and as per the below image select:

Product Release Version: All
Partner Name: All or the specific vendor you wish to compare
Features Category: VAAI-NAS
Storage Virtual Appliance Only: No for SAN/NAS , Yes for Hyperconverged or VSA solutions

generichcl

Then click on the Model you wish to compare e.g.: NX-3000 Series

hclnutanix1

Then you should see something similar to the below:

ClickViewButtonHCL

Click the “View” link to show the VAAI-NAS capabilities and you will see the below which highlights the VAAI-NAS features supported.

Note: if the “View” link does not appear, the product is NOT supported for VAAI-NAS.

nutanixvaainasresults

If the Features do not list Extended StatsFile CloningNative SS for LCSpace Reserve the solution does not support the full VAAI-NAS capabilities.

Related Articles:

1. My checkbox is bigger than your checkbox@HansDeLeenheer

2. Unchain My VM, And Set Me Free!(Snapshots)

3. VAAI-NAS – Some snapshot chains are deeper than others

How to configure Network I/O Control (NIOC) for Nutanix (or any IP Storage)

This video shows how to configure Network I/O Control (NIOC) as per Nutanix Best Practices, however this configuration is also applicable to any IP based Storage.

For more information see the Nutanix vNetworking Best Practices Guide.

Related Articles:

1. Network I/O Control Shares/Limits for ESXi Host using IP Storage

2. Network I/O Control for ESXi Host using IP Storage (4x10Gb NICs)

3. Example VMware vNetworking Design for IP Storage