Dare2Compare Part 2 : HPE/Simplivity’s claim Nutanix snaps take 10x longer

As discussed in Part 1, HPE have been relentless with their #HPEDare2Compare twitter campaign focused on the market leading Nutanix Enterprise Cloud platform.

In part 2 of this series, I will respond to the claim (below) that Nutanix snaps take 10x longer than HPE/SVT with a YouTube demonstration.

The video duration is 2mins and 20 where I show the Nutanix PRISM GUI, the Virtual machine I will be creating a snapshot of as well as showing the VM does not currently have a snapshot to avoid any claims the demonstration was pre-baked.

In summary the video is 100% unedited walk through in real time. The Snapshot is taken at the 0:49-0:50 mark and is completed in less than 1s. The restore of the 38TB VM VM occurs between 1:12-1:16 for a total duration of 4 seconds and the VM is then powered on and fully booted in to OS by the 1:57 mark, for total duration of 41 seconds.

I then tweeted the following which at this time has not received a reply or retraction of the incorrect statement from HPE.

The YouTube video can be viewed below and very much speaks for itself.

Don’t take my word for it, even our Community edition (CE) can perform 2000 clones in 16 mins. Note: This is without any hardware acceleration and hypervisor agnostic!

 

Return to the Dare2Compare Index:

 

Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 7 – Agility (Time to value)

Deploying other hypervisors and management solutions typically requires considerable design effort and expertise in order to ensure consistent performance and to help minimize risk of downtime while enabling as much agility as possible. Acropolis management requires almost no design at all as the In-built-in management is optimized and highly available out-of-the-box. This enables much faster deployment of AHV than any other hypervisor and associated management components.

Regardless of the starting size on an AHV-based environment, all management, Analysis, Data Protection and BC/DR components are automatically deployed and suitably sized. Regardless of the AHV cluster, no management design effort is required. This results in a very fast (typically <1hr for a single block deployment) time to value.

AHV also provides numerous features which ensure customers can deploy solutions in a timely manner:

  • In-Built Management & Analytics

The fact that all tools required for cluster management are deployed automatically with the cluster means time to value is not dependant on design/deployment/validation of these tools. There isn’t even a need to install a client to manage AHV, it is simply accessed via a Web Browser.

  • Out of the box hardened configuration with In-Built Security/Compliance Auditing

Being hardened by default removes the risk of security flaws being introduced during implementation phase while the automated auditing ensures in the event security settings are modified during business as usual operations that the setting/s are returned to the required security profile.

  • Intelligent cloning

The Distributed Storage Fabric combined with AHV to allow near instant clones of a Virtual Machine. This feature works regardless of the power state of the VM, so it’s not restricted to VMs which are powered off as with other hypervisors.

For a demo of this capability see: Nutanix Acropolis hypervisor acli cloning operations

Note: Cloning can be performed via Prism or acli (Acropolis CLI)

Summary:

  1. Minimal design/implementation effort for AHV management is required
  2. Where Multi-cluster central management is required, only a single VM is required (Prism Central) which is deployed as a virtual appliance
  3. No additional appliances/components to install for Analytics, Data Protection, Replication or Management High Availability
  4. No Subject Matter Experts required for an optimal Acropolis platform deployment

Back to the Index

Cloning VMs – Why less (I/O & throughput) is better!

I’ve seen the picture below floating around Twitter and LinkedIn which shows a 32GB VM being cloned in just 7 seconds on an All Flash Array (AFA) and has got a lot of attention.

The AFA peaked at over 7000MB/s during this time showing the AFA is capable of some serious throughput!345363bf-bbb3-4389-aafa-71c81f182de3-large

At this stage some people may be thinking im talking about Nutanix, so I would like to point out the above AFA is not a Nutanix NX-9000 All Flash Node.

So why did I write this post?

I am still surprised that technical people find this sort of test and result impressive, because to me the fact the AFA used 7000MB/s of bandwidth to perform the clone means it has not intelligently performed the clone and the process has used additional capacity while potentially having a high impact on the other workloads using the storage.

At this stage I guess I should explain what I mean by intelligently clone.

An intelligent clone in my mind is where:

a) The clone takes a few seconds to occur
b) The clone is offloaded to the storage layer
c) Uses almost zero I/O & bandwidth to perform the clone
d) Uses almost zero additional space

So in the above example, the solution has cloned the VM in a few seconds, so a) has been satisfied, and since there is no information provided I’m going to give it the benefit of the doubt and say the clone was offloaded to the storage layer, so im assuming (rightly or wrongly) that b) is also satisfied.

But what about c) and d).

If the clone uses 7000MB/s of bandwidth that must have some impact (if not a significant impact) on other workloads running on the storage, even if it is only for 7 seconds.

The clone was also writing data throughout the 7 seconds, so its also duplicating the data.

So the net result is a fast yet high impact (capacity / performance) clone.

Back in 2012, when I worked at IBM, I wrote this post (Netapp Edge VSA – Rapid Cloning Utility) about intelligent cloning, as a customer was suffering terrible VDI recompose times due to using a big dumb storage solution which had no inteligent cloning capabilities. The post shows even on an old IBM x3850 M2 with slow old 4 core processors running a Virtual Storage Appliance running on 3 peices of spinning rust (146GB SAS disks) and it still completes the task in just 4.73 seconds per clone in full compliance with the 4 items I identified as aspects of intelligent cloning (below).

a) The clone takes a few seconds to occur
b) The clone is offloaded to the storage layer
c) Uses almost zero I/O & bandwidth to perform the clone
d) Uses almost zero additional space

The reason intelligent cloning is so much faster is because there is no need to duplicate a VM, the intelligent cloning process simply creates pointers back to the original file (which remains Read Only) and only uses I/O & capacity when new data is created.

The process is actually mostly dependant on vCenter to register the new VM which is why the process takes a couple of seconds as the process takes almost no time at the storage layer. The size of the VM being cloned is irrelevant. (Note: In my post from 2012 it was a 10Gb VM although again the size has no impact on the speed of an intelligent clone)

In the post from 2012, I made the following observation:

Even if you have the worlds fastest array (insert you favorite vendor here), storage connectivity and the biggest and most powerful ESXi hosts the process of cloning a large number of virtual machines will still;

1. Take more time to complete than an intelligent cloning process like RCU

2. Impact the performance of your ESXi hosts and more than likley production VMs

3. Impact the performance of your storage network & array (and anything that uses it , physical or virtual).

So fast forward to 2015, we have lots of really fast All-Flash storage solutions, but for tasks like cloning, even these super fast all-flash solutions can’t outperform a single controller (2vCPU) Virtual Storage appliance running on an old IBM x3850 M2 server running in my test lab using intelligent cloning from back in 2012.

I also wrote this article (Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?) recently explaining the benefits of VAAI-NAS and how VAAI-NAS supports intelligent cloning even with Virtual Storage Appliance solutions.

In Summary:

I find a clone taking a few seconds and using next to no throughput and capacity to be impressive. This is a perfect example of less I/O and throughput (to perform the same task) being better!

Its great if a storage array has the capability to drive many GB/s of throughput, but its totally unnecessary for cloning and is only demonstrating the lack of intelligent cloning capabilities for the storage solution.

In my opinion its much better for a storage solutions to use its high performance capability for driving I/O to virtual machines servicing business applications than for tasks like cloning which can be done intelligently.

To show off more real world performance capabilities of a storage solution (especially an All-Flash array), the example really has to include multiple workloads with different I/O characteristics. This is something the storage industry (all vendors) continues to fail to provide and its something I would like to be a part of changing as things like “Peak” performance are no where near as important as “consistent” performance.

Back on topic though, If cloning is something you or your customers require, for say a VDI, Cloud deployment or just for rapid provisioning of testing & development VMs, consider a storage solution which has intelligent cloning capabilities such as VAAI-NAS which integrates with products like Horizon View (VCAI Clones) and vCloud Director (FAST Provisioning).