NetApp HCI Versus Nutanix – The Rebuttal

I was made aware of a recent article from Rob Klusman at Netapp titled “Netapp HCI Verses Nutanix” by a Nutanix Technology Champion (NTC) who asked for us to respond to the article “cause there’s some b*llsh*t in it”.

** UPDATE **

Netapp have since removed the post, it can now be viewed via Google Cache here:

http://webcache.googleusercontent.com/search?q=cache:https://blog.netapp.com/netapp-hci-vs-nutanix/

I like it when people call it like it is, so here I am responding to the bullshit (article).

The first point I would like to address is the final statement in the article.

NetApp HCI is the first choice, and Nutanix is the second choice. Leading in an economics battle just doesn’t work if performance is lacking.

Rob rightly points out Nutanix leads the economic battle so kudos for that, but he follows up by implying Nutanix performance is lacking. Wisely Rob does not provide any follow up which can be discredited, so I will just leave you with these three posts discussing how Nutanix scales performance for Single VMs, Monster VMs and Physical servers from my Scalability, Resiliency & Performance blog series.

Part 3 – Storage Performance for a single Virtual Machine
Part 4 – Storage Performance for Monster VMs with AHV!
Part 5 – Scaling Storage Performance for Physical Machines

Rob goes on to make the claim:

Nutanix wants infrastructure “islands” to spread out the workloads

This is just incorrect and not only is it incorrect, Nutanix has been recommending mixed workload deployments for many years. Here is an article I wrote in July 2016 titled “The All-Flash Array (AFA) is Obsolete! where I conclude with the following summary:

MixedWorkloads2016

I specifically state mixed workloads including business critical applications are supported without creating silos. It’s important to note this statement was made in July 2016 before Netapp had even started shipping (Oct 25th 2017) their 3-tier architecture product which they continue to incorrectly refer to as HCI.

Gartner supports my statement that the Netapp product is not HCI and states:

“NetApp HCI competes directly against HCI suppliers, but its solution does not meet Gartner’s functional definition of HCI.”

Mixed workloads is nothing new for Nutanix, and not only is mixing workloads supported, I frequently recommend it as it increases performance and resiliency as described in detail in my blog series Nutanix | Scalability, Resiliency & Performance.

Now let’s address the “Key Differences” Netapp claim:

User interface. Both products have an intuitive graphical interface that is well integrated into the hypervisor of choice. But what’s not obvious is that simplicity goes well beyond where you click. NetApp HCI has the most extensive API in the market, with integration that allows end users to automate even the most minute features in the NetApp HCI stack.

The philosophy of Nutanix intuitive GUI (which Netapp concedes) is all features in the GUI must be made available via an API. In the PRISM GUI Nutanix provides the “REST API Explorer” (shown below) where users can easily understand the available operations to automate anything they choose.

RestAPIexplorer

NutanixRESTAPI

Next up we have:

Versatile scale. How scaling is accomplished is important. NetApp HCI scales in small infrastructure components (compute, memory, storage) that are all interchangeable. Nutanix requires growth in specific block components, limiting the choices you can make.

When vendors attack Nutanix, I am always surprised they try and attack the scalability capabilities as if anything, this is one of the strongest areas for Nutanix.

I’ve already referenced my my Scalability, Resiliency & Performance blog series where I go into a lot of detail on these topics but in short, Nutanix can scale:

  1. Storage Only by adding drives or nodes
  2. Compute Only by adding RAM or nodes
  3. Compute + Storage by adding drives and/or nodes

Back in mid 2013 when I joined Nutanix, the claim by Netapp was true as only one node type (NX-3450) was available, but later that same year the 1000 and 6000 series were released giving more flexibility and things have continued to become more flexible over the years.

Today the flexibility (or versatility) in scale for Nutanix solutions is second to none.

Performance. Today, it’s an absolute requirement for HCI to have an all-flash solution. Spinning disks are slightly less expensive, but you’re sacrificing production workloads. NetApp HCI only offers an all-flash solution.

Congratulations Netapp, you do all flash, just like everyone else (but you came to the party years later). There a many use cases for bulk storage capacity, be it all flash or hybrid, Nutanix provides NVMe+SATA-SSD, All SATA-SSD and SATA-SSD+SAS/SATA HDD options to cover all use cases and requirements.

Not only that but Nutanix allows mixing of All Flash and Hybrid nodes to further avoid the creation of silos.

Enterprise ready. This is an important test. One downfall of Nutanix software running on exactly the same CPU cores as your applications is the effect on enterprise readiness. Many of our customers have shifted away from Nutanix once they’ve seen what happens when a Nutanix component fails. It’s easier to move the VM workload off the current Nutanix system (the one that’s failing) than it is to wait for the fix. Nutanix does not run optimally in hardware-degraded situations. NetApp HCI has no such problem; it can run at full workloads, full bandwidth, and full speed while any given component has failed.

It’s a huge claim by Netapp to dispute Nutanix’ enterprise readiness, considering we have many more years of experience shipping product but hey, Netapp’s article is proving to be without factual basis every step of the way.

The beauty of Nutanix is the ability to self heal after failures (hardware or software) and then tolerate subsequent failures. Nutanix also has the ability to tolerate multiple concurrent failures including up to 8 nodes and 48 physical drives (NVMe/SSD/HDD).

Nutanix can also tolerate one or more failures and FULLY self heal without any hardware being replaced. This is critical as I detailed in my post: Hardware support contracts & why 24×7 4 hour onsite should no longer be required.

For more details on these failure scenarios checkout the Resiliency section of my blog series Nutanix | Scalability, Resiliency & Performance.

Workload performance protection. No one should attempt an advanced HCI deployment without workload performance protection. Only NetApp HCI provides such a guarantee, because this protection is built into the native technology.

 

One critical factor in delivering consistent high performance is data locality. The further data is from the compute layer, the more bottlenecks there are to potentially impact performance.  It’s important to Evaluate Nutanix’ original & unique implementation of Data Locality to understand that features such as QoS for Storage IO are features which are critical with scale up shared storage (a.k.a SAN/NAS) but when using a highly distributed scale out architecture, noisy neighbour problems are all but eliminated by the fact you have more controllers and that the controllers are local to the VMs.

Storage QoS is added complexity, and only required when a product such as a SAN/NAS has no choice but to deal with the IO blender effect where sequential IO is received as random due to competing workloads, this effect is minimised with Nutanix Distributed Storage Fabric.

Shared CPU cores. One key technical difference between the Nutanix product and NetApp HCI is the concept of shared CPU cores. Nutanix has processes running in the same cores as your applications, whereas NetApp HCI does not. There is a cost associated with sharing cores when applications like Oracle and VMware are licensed by core count. You actually pay more for those applications when Nutanix runs their processes on your cores. It’s important to do that math.

I’m very happy Rob raised the point regarding VMware’s licensing (part of what I’d call #vTAX), this is one of the many great reasons to move to Nutanix next generation hypervisor AHV (Acropolis Hypervisor).

In addition, for workloads like Oracle or SQL where licensing is an issue, Nutanix offers two solutions which address these issues:

  1. Compute Only Nodes running AHV
  2. Acropolis Block Services (ABS) to provide the Nutanix Distributed Storage Fabric (ADSF) to physical or virtual servers not running on Nutanix HCI nodes.

But what about the Nutanix Controller VM (CVM) itself? It is assigned vCPUs which share physical CPU cores with other virtual machines.

Sharing Physical cores is a bad idea as virtualisation has taught us over many years. Hold on, wait, no that’s not it (LOL!), Virtualisation has taught us we can share physical CPU cores very successfully even for mission critical applications where it’s done correctly.

Here is a detailed post on the topic titled: Cost vs Reward for the Nutanix Controller VM (CVM)

Asset fluidity. An important part of the NetApp scale functionality is asset fluidity – being able to move subcomponents of HCI around to different applications, nodes, sites, and continents and to use them long beyond the 3-year depreciation cycle.

This is possibly the weakest argument in Netapp’s post, Nutanix nodes can be removed non disruptively from a cluster and added to any other cluster including mixing all flash and hybrid. Brand new nodes can be mixed with any other generation of nodes, I regularly form large clusters using multiple generations of hardware.

Here is a tweet of mine from 2016 showing a 22 node cluster with four different node types across three generations of hardware (G3 being the original NX-8150, G4 and G5).

Data Fabric. The NetApp Data Fabric simplifies and integrates data management across clouds and on the premises to accelerate digital transformation. To plan an enterprise rollout of HCI, a Data Fabric is required – and Nutanix has no such thing. NetApp delivers a Data Fabric that’s built for the data-driven world.

I had to look up what Netapp mean by “Data Fabric” as it sounded to me like a nonsense marketing term, and surprise surprise I was right. Here is how Netapp describe “Data Fabric“.

Data Fabric is an architecture and set of data services that provide consistent capabilities across a choice of endpoints spanning on-premises and multiple cloud environments.

It’s a fluffy marketing phrase but the same could easily be argued about Nutanix Distributed Storage Fabric (ADSF). ADSF is hypervisor agnostic which straight away delivers a multiple platform solution (cloud or on premises) including AWS and Azure (below).

CloudSite

Nutanix can replicate and protect data including virtual machines across different hardware, clusters, hypervisors and clouds.

So the claim “Nutanix does not have a Data Fabric” is pretty laughable based on Netapp’s own description of “Data Fabric”.

Now the final point:

Choosing the Right Infrastructure for Your Enterprise

I’ve written about Things to consider when choosing infrastructure and my conclusion was:

ThingtoconsiderSummary

Nutanix has for many years provided a platform which can be your standard for all workloads and the number of niche workloads that cannot be genuinely supported are now so rare with all the enhancements we’ve made over the years.

The best thing about Nutanix, with our world class enterprise architect enablement and Nutanix Platform Expert (NPX) certification programmes, we ensure our field S.Es , Architects and certified individuals that design and implement solutions for customers every day know exactly when to say “No”.

This culture of customer success first, sales last, comes from our former President Sudheesh Nair who wrote this excellent article during his time at Nutanix

Quite possibly the most powerful 2-letter word in Sales – No

After addressing all the points raised by Netapp, it’s easy to see that Nutanix has a very complete solution thanks to years of development and experience with enterprise customers and their mission critical applications.

Have you read any other “b*llsh*t” you’d like Nutanix to respond to, if so, don’t hesitate to reach out.

Nutanix Scalability – Part 2 – Compute (CPU/RAM)

Following on from Part 1 of the Scalability series where we discussed how Nutanix can scale storage capacity seperate to compute, the next obvious topic is to talk about scaling CPU and Memory resources at both the workload and cluster level.

Let’s first recap the problems with scaling compute with traditional shared storage.

HCInotHCI

Yuk! That looks like old school 3-tier stuff to me!

Non HCI workloads on compute only nodes would therefore:

  • Be running in the same setup as traditional 3-tier infrastructure
  • Have different performance than HCI based workloads
  • Loose the advantage of having compute + storage close together
  • Increase dependency on Network
  • Impact network utilization of HCI node/s
  • Impact benefits of HCI for the native HCI workloads and much more.

The industry has accepted HCI as they way of the future and while adding compute only nodes might sound nice at a high level, its just re-introducing the classic 3-tier complexity and problems of the past when if we review the actual requirements it’s very rare to see a Nutanix node have insufficient resources when sized/configured correctly.

Customers are often surprised when they show me their workloads and I don’t seem surprised by the CPU/RAM or storage IO or capacity requirements. I can’t tell you how many times I’ve made statements like “You’re applications requirements are not that high, I’ve seen much worse!”.

Examples of scaling compute with Nutanix

Example 1: Scaling up a Virtual Machines compute resources:

SQL/Oracle DBA: Our application is growing/running slowly, We need more CPU/RAM!

Nutanix: You have several options:

a) Scale up the virtual machine’s vCPUs and vRAM to match the size of the NUMA node.
b) Scale up the virtual machine’s vCPUs and vRAM to be the same number of pCore’s as the host minus the Nutanix CVM vCPUs and do the same with the RAM.

The first option is the optimal as it will ensure maximum memory performance as the CPU will be assessing memory within the NUMA boundary, however the second option is still viable and for applications such as SQL, the impact of insufficient memory can be higher than the penalty of crossing a NUMA boundary.

BUT MY WORKLOAD IS UNIQUE, IT NEEDS A PHYSICAL SERVER!!

Despite hearing these type of statements by prospective and existing customers, Very few workloads actually need more CPU/RAM that a modern Nutanix (or OEM/Software only) node can provide even if you remove resources for the Controller VM (CVM). I find that it’s usually only a perceived requirement for physical servers and in reality, a reasonably sized VM on a standard node will deliver the desired business outcome/s comfortably.

Currently Nutanix NX nodes support Intel Platinum 8180 processors which have 28 physical cores @ 2.5 GHz per socket for a total of 56 physical cores (112 threads).

If you had say an existing physical server using a fairly modern Intel Broadwell E5-2699 v4 with dual 22 physical core processors, you have a total SPECint_rate of 1760 or 40 per core.

Compare that to the Intel Platinum 8180 processor and you have a total SPECint_rate of 2720 or 48.5 per core.

This is an increase per core of 21.25%.

So if you’re moving from that physical server using Intel Broadwell E5-2699 v4 CPUs (44 cores) and you move that workload to Nutanix with ZERO CPU overcommitment (vCPU:pCore ratio 1:1) using the Intel Platinum 8180 processor, assuming we reserve 8 pCores for the CVM we still have 48 pCores for the SQL VM.

That’s a SpecIntRate of 2328 which is higher than the physical server using all cores.

That’s over 32% more CPU performance for the Virtual Machine compared to the dedicated physical server.

The reality is the Nutanix CVM and Acropolis Distributed Storage Fabric (ADSF) provides high performance, low latency storage which also drives further CPU efficiency by eliminating CPU WAIT (CPU cycles wasted waiting for I/O to complete).

As you can see from this simple example, a Virtual Machine on Nutanix can easily replace even a modern physical server and even provide better performance with only one generation newer CPU. Think about how your 3-5 year old physical servers will feel when they jump multiple generations of CPU and get scale out flash based storage.

Example 2: A VM (genuinely) needs more CPU/RAM than Nutanix nodes have.

SQL/Oracle DBA: Our application/s is needs more CPU/RAM than our biggest node/s can provide.

Nutanix: You have several options:

a) Purchase one or more larger node (e.g.: NX-8035-G6 w/ Intel Platinum or Gold Processors, add them to the existing cluster and live migrate your VM/s to that/those nodes. Use affinity rules to keep critical VMs on the highest performance nodes.

Nutanix supports mixing different hardware types/generations in the same cluster and this can be a preferred option over creating a dedicated cluster for several reasons.

  • Larger clusters provide more targets for replication traffic (i.e.: RF2 or RF3) meaning lower average write latency
  • Larger clusters provide higher resiliency as they can potentially tolerate more failures and rebuild follow a drive/node or nodes failing faster.
  • Larger clusters help ensure the impact of a failure is lower as a lower percentage of cluster resources are lost

b) Purchase one or more larger node (e.g.: NX-8035-G6 w/ Intel Platinum or Gold Processors, and create a new cluster and migrate your VM/s to that cluster.

A dedicated cluster may sound attractive, but in most cases I recommend mix workload clusters as they ultimately provide higher performance, resiliency and flexibility.

c) Scale out your workloads

Applications like MS Exchange, MS SQL and Oracle RAC can (and arguably should) be scaled out rather than scaled up as doing so provides increased performance, resiliency and reduces overall infrastructure costs (e.g.: More cheaper/smaller processors can be used as opposed to premium processors like Intel Platinum series).

One large VM hosting dozens of databases is rarely a good idea, so scale out and run more VMs, distributed across your Nutanix cluster and spread the workload across all the VMs.

For 99% of workloads, I do not see the real world value of compute only nodes. But there are always exceptions to every rule.

Potential Exceptions:

Example 3: Re-using existing hardware

SQL DBA: I love my Nutanix gear (duh!) but I have some physical servers which wont be end of life for 12 months, can I continue using them with Nutanix?

Nutanix: We have several options:

a) If the hardware is on our Software-only hardware compatibility list (HCL), it’s possible you can purchase SW-only licenses and deploy Nutanix on your existing hardware.

b) Use Nutanix Acropolis Block Services (ABS) to provide highly available scale out storage to your physical server via iSCSI.

ABS was released in 2015 and supports SCSI-3 persistent reservations for shared storage-based Windows clusters, which are commonly used with Microsoft SQL Server and clustered file servers.

ABS supports several use cases, including:

  • iSCSI for Microsoft Exchange Server.
  • Shared storage for Linux-based clusters
  • Windows Server Failover Clustering (WSFC).
  • SCSI-3 persistent reservations for shared storage-based Windows clusters
  • Shared storage for Oracle RAC environments.
  • Bare-metal environments.

Therefore ABS allows you to re-use your existing hardware to maximise your return on investment (ROI) while getting the benefits of ADSF. Once the hardware is end of life, the storage already on the Nutanix cluster can be quickly presented to a VM so the workload will benefit from the full Nutanix HCI experience.

Future Capabilities:

In late 2017, Nutanix announced Nutanix Acropolis Compute Cloud (AC2) which will provide the ability to have true compute-only nodes in a Nutanix cluster as shown below.

I reluctantly mention this upcoming capability because I do not want to see customers go back to a 3-teir model or think that HCI isn’t the way forward because it is. That’s not what compute-only is about.

This capability is specifically designed to work around the niche circumstances where a software vendor such as Oracle, are extorting customers from a licensing perspective and it’s desirable to maximise the CPU cores the application can use.

Let me have a quick rant and put an end to the nonsense before it gets out of hand:

IT IS NOT FOR GENERAL VM USE!

NO ITS NOT FOR PERFORMANCE REASONS.

NO NUTANIX IS NOT MOVING BACK TO A 3-TIER COMPUTE+STORAGE MODEL.

HCI WITH NUTANIX IS STILL THE WAY FORWARD

Summary:

Nutanix provides excellent scalability at the CPU/RAM level for both virtual and physical workloads. In rare circumstances where physical servers are a real (or likely just a perceived) requirement, ABS can be used while Nutanix will soon also provide Compute-only for AHV customers to ensure licensing value is maximised for those rare cases.

Back to the Scalability, Resiliency and Performance Index.

Dare2Compare Part 3 : Nutanix can’t support Dedupe without 8vCPUs

As discussed in Part 1, we have proven HPE have made false claims about Nutanix snapshot capabilities as part of the #HPEDare2Compare twitter campaign.

In part 2, I explained how HPE/Simplivity’s 10:1 data reduction HyperGuarantee is nothing more than smoke and mirrors and that most vendors can provide the same if not greater efficiencies, even without hardware acceleration.

Now in part 3, I will respond to yet another false claim (below) that Nutanix cannot support dedupe without 8vCPUs.

This claim is interesting for a number of reasons.

1. There is no minimum or additional vCPU requirement for enabling deduplication.

The only additional CVM (Controller VM) requirement for enabling of deduplication is detailed in the Nutanix Portal (online documentation) which states:

DedupeEnable

There is no additional vCPU requirement for enabling cache or capacity deduplication.

I note that the maximum 32GB RAM requirement is well below the RAM requirements for the HPE SVT product which can exceed 100GB RAM per node.

2. Deduplication is part of our IO engine (stargate) which is limited in AOS to N-2 vCPUs.

In short, this means the maximum number of vCPUs that stargate can use of a 8vCPU CVM is 6. However, this 6 vCPUs is not just for dedupe, its to process all I/O and things like statistics for PRISM (our HTML 5 GUI). Depending on the workload, only a fraction of the maximum 6 vCPUs are used, allowing those cores to be used for other workloads. (Hey, this is virtualization after all)

Deduplication itself uses a small fraction of the N-2 CPU cores and this brings us to my next point which speaks to the efficiency of the Nutanix deduplication compared to other vendors like HPE SVT who brute force dedupe all data regardless of the ratio which is clearly inefficient.

3. Nutanix Controller VM (CVM) CPU usage depends on the workload and feature set being used.

This is a critical point, Nutanix has configurable data reduction at a per vDisk granularity, Meaning for workloads which do not have a dataset which provides significant (or any) savings from deduplication, it can be left disabled (default).

This ensures CVM resources are not wasted performing what I refer to as “brute force” data reduction on all data regardless of the benefits.

4. Nutanix actually has global deduplication which spans across all nodes within a cluster whereas HPE Simplivity deduplication is not truly global. HPE Simplivity does not form a cluster of nodes, the nodes act more like HA pairs for the virtual machines and the deduplication in simple terms in with one or a pair of HPE SVT nodes.

I’ve shown this where 4 copies of the same appliance are deployed across four HPE SVT nodes and the deduplication ratio is only 2.1:1, if the deduplication was global the rate would be closer to, if not 4:1 and this is what we see on Nutanix.

Nutanix can also have defined deduplication boundaries, so customers needing to seperate data for any reason (e.g.: Multi-tenancy / Compliance) can create two containers, both with deduplication enabled and enjoy global deduplication across the entire cluster without having customers refer to the same blocks.

5. Deduplication is vastly less valuable than vendors lead you to believe!

I can’t stress this point enough. Deduplication is a great technology and it works very well on many different platforms depending on the dataset.

But deduplication does not solve 99.9% of the challenges in the datacenter, and is one of the most overrated capabilities in storage.

Even if Nutanix did not support deduplication at all, it would not prevent all our existing and future customers achieving great business outcomes. If a vendor such as HPE SVT want to claim they have the best dedupe in the world, I don’t think anyone really cares, because even if it was true (which in my opinion it is not), then the value of Nutanix is so far beyond the basic storage functionality that we’re still far and away the market leader that deduplication it’s all but a moot point.

For more information about what the vCPUs assigned to the Nutanix CVM provide beyond storage functions, check out the following posts which addresses FUD from VMware about the CVMs overheads and the value the CVM provides much of which is unique to Nutanix.

Nutanix CVM/AHV & vSphere/VSAN overheads

Cost vs Reward for the Nutanix Controller VM (CVM)

 

Return to the Dare2Compare Index: