Dare2Compare Part 3 : Nutanix can’t support Dedupe without 8vCPUs

As discussed in Part 1, we have proven HPE have made false claims about Nutanix snapshot capabilities as part of the #HPEDare2Compare twitter campaign.

In part 2, I explained how HPE/Simplivity’s 10:1 data reduction HyperGuarantee is nothing more than smoke and mirrors and that most vendors can provide the same if not greater efficiencies, even without hardware acceleration.

Now in part 3, I will respond to yet another false claim (below) that Nutanix cannot support dedupe without 8vCPUs.

This claim is interesting for a number of reasons.

1. There is no minimum or additional vCPU requirement for enabling deduplication.

The only additional CVM (Controller VM) requirement for enabling of deduplication is detailed in the Nutanix Portal (online documentation) which states:

DedupeEnable

There is no additional vCPU requirement for enabling cache or capacity deduplication.

I note that the maximum 32GB RAM requirement is well below the RAM requirements for the HPE SVT product which can exceed 100GB RAM per node.

2. Deduplication is part of our IO engine (stargate) which is limited in AOS to N-2 vCPUs.

In short, this means the maximum number of vCPUs that stargate can use of a 8vCPU CVM is 6. However, this 6 vCPUs is not just for dedupe, its to process all I/O and things like statistics for PRISM (our HTML 5 GUI). Depending on the workload, only a fraction of the maximum 6 vCPUs are used, allowing those cores to be used for other workloads. (Hey, this is virtualization after all)

Deduplication itself uses a small fraction of the N-2 CPU cores and this brings us to my next point which speaks to the efficiency of the Nutanix deduplication compared to other vendors like HPE SVT who brute force dedupe all data regardless of the ratio which is clearly inefficient.

3. Nutanix Controller VM (CVM) CPU usage depends on the workload and feature set being used.

This is a critical point, Nutanix has configurable data reduction at a per vDisk granularity, Meaning for workloads which do not have a dataset which provides significant (or any) savings from deduplication, it can be left disabled (default).

This ensures CVM resources are not wasted performing what I refer to as “brute force” data reduction on all data regardless of the benefits.

4. Nutanix actually has global deduplication which spans across all nodes within a cluster whereas HPE Simplivity deduplication is not truly global. HPE Simplivity does not form a cluster of nodes, the nodes act more like HA pairs for the virtual machines and the deduplication in simple terms in with one or a pair of HPE SVT nodes.

I’ve shown this where 4 copies of the same appliance are deployed across four HPE SVT nodes and the deduplication ratio is only 2.1:1, if the deduplication was global the rate would be closer to, if not 4:1 and this is what we see on Nutanix.

Nutanix can also have defined deduplication boundaries, so customers needing to seperate data for any reason (e.g.: Multi-tenancy / Compliance) can create two containers, both with deduplication enabled and enjoy global deduplication across the entire cluster without having customers refer to the same blocks.

5. Deduplication is vastly less valuable than vendors lead you to believe!

I can’t stress this point enough. Deduplication is a great technology and it works very well on many different platforms depending on the dataset.

But deduplication does not solve 99.9% of the challenges in the datacenter, and is one of the most overrated capabilities in storage.

Even if Nutanix did not support deduplication at all, it would not prevent all our existing and future customers achieving great business outcomes. If a vendor such as HPE SVT want to claim they have the best dedupe in the world, I don’t think anyone really cares, because even if it was true (which in my opinion it is not), then the value of Nutanix is so far beyond the basic storage functionality that we’re still far and away the market leader that deduplication it’s all but a moot point.

For more information about what the vCPUs assigned to the Nutanix CVM provide beyond storage functions, check out the following posts which addresses FUD from VMware about the CVMs overheads and the value the CVM provides much of which is unique to Nutanix.

Nutanix CVM/AHV & vSphere/VSAN overheads

Cost vs Reward for the Nutanix Controller VM (CVM)

 

Return to the Dare2Compare Index:

Dare2Compare Part 1 : HPE/Simplivity’s 10:1 data reduction HyperGuarantee Explained

HPE have been relentless with their #HPEDare2Compare twitter campaign focused on the market leading Nutanix Enterprise Cloud platform and I for one have had a good laugh from it. But since existing and prospective customers have been asking for clarification I thought I would do a series of posts addressing each claim.

In part of this series, I will respond to the claim (below) that Nutanix can’t guarantee at least a 10:1 data efficiency ratio.

HPEDare2CompareTweetLol

Firstly let’s think about what is being claimed by HPE/SVT.

If you’re a potential customer, it would be fair for you to assume that if you have 100TB on your current SAN/NAS and you purchase HPE/SVT, you would only need to buy 10TB plus some room growth or to tolerate failures.

But this couldn’t be further from the truth. In fact, if you have 100TB today, you’ll likely need to purchase a similar capacity of HPE/SVT as most platform, even older/legacy ones have some data efficiency already, so what HPE/SVT is offering with deduplication and compression is nothing new or unique.

Let’s go over what the “HyperGuarantee” states and why it’s not worth the paper it’s written on.

HyperEfficientLol

It sounds pretty good, but two things caught my eye. The first is “relative to comparable traditional solutions” which excludes any modern storage which has this functionality (such as Nutanix) and the words “across storage and backup combined”.

Let’s read the fine print about “across storage and backup combined”.

HyperEfficientMoreLol

Hold on, I thought we were talking about a data reduction guarantee but the fine print is talking about a caveat requiring we configure HPE/Simplivity “backups”?

The first issue is if you use an enterprise backup solution such as Commvault, or SMB plays such as Veeam? The guarantee is void and with good reason as you will (HPE)discover shortly. 😉

Let’s do the math on how HPE/SVT can guarantee 10:1 without giving customers ANY real data efficiency compared to even legacy solutions such as Netapp or EMC VNX type platforms.

  1. Let’s use a single 1 TB VM as a simple example.
  2. Take 30 snapshots (1 per day for 30 days) and count each snapshot as if it was a full “backup” to disk.
  3. Data stored now equals 31TB  (1 TB + 30 TB)
  4. Actual Size on Disk is only ~1TB (This is because snapshots don’t create any copies of the data)
  5. Claimed Data Efficiency is 31:1
  6. Effective Capacity Savings = 96.8% (1TB / 31TB = 0.032) which is rigged to be >90% every time

So the guarantee is satisfied by default, for every customer and without actually providing data efficiency for your actual data!

I have worked with numerous platforms over the years, and the same result could be guaranteed by Netapp, Dell/EMC, Nutanix and many more. In my opinion the reason these vendors don’t have a guarantee is because this capability has long been table stakes.

Let’s take a look at a screenshot of the HPE/SVT interface (below).

SimplyshittyScreenshot

Source of image us an official SVT case study which can be found at: https://www.simplivity.com/case-study-coughlan-companies/

It shows an efficiency of 896:1 which again sounds great, but behind the smoke and mirrors it’s about as misleading as you can get.

Firstly the total “VM data” is 9.9TB

The “local backups” which are actually just pointer based copies (not backups at all) reports 3.2PB.

Note: To artificially inflate the report “deduplication” ratio, simply schedule more frequent metadata copies (what HPE/SVT incorrectly refer to as “backups”) and the ratio will increase.

The “remote backups” funnily enough are 0.0Kb which means the solution actually has no backups.

The real data reduction ratio can be easily calculated by taking the VM data of 9.9TB and dividing that by the “Used” capacity of 3.7TB which equates to 2.67:1 which can be broken down to be 2:1 compression as shown in the GUI with a <1.5:1 deduplication ratio.

In short, the 10:1 data efficiency HyperGuarantee is not worth the paper it’s written on, especially if you’re using a 3rd party backup product. If you choose to use the HPE/SVT built in pointer based option with or without replication, you will see the guaranteed efficiency ratio but don’t be fooled into thinking this is something unique to HPE/SVT as most other vendors including Nutanix have the same if not better functionality.

Remember, other vendors including Nutanix do not report metadata copies as “backups” or “data reduction” because its not.

So ask your HPE/SVT rep: “How much deduplication and compression is guaranteed WITHOUT using their pointer based “backups”. The answer is NONE!

For more information read this article which has been endorsed by multiple vendors on what should be included in data reduction ratios.

Return to the Dare2Compare Index:

Expanding Capacity on a Nutanix environment – Design Decisions

I recently saw an article about design decisions around expanding capacity for a HCI platform which went through the various considerations and made some recommendations on how to proceed in different situations.

While reading the article, it really made me think how much simpler this process is with Nutanix and how these types of areas are commonly overlooked when choosing a platform.

Let’s start with a few basics:

The Nutanix Acropolis Distributed Storage Fabric (ADSF) is made up of all the drives (SSD/SAS/SATA etc) in all nodes in the cluster. Data is written locally where the VM performing the write resides and replica’s are distributed based on numerous factors throughout the cluster. i.e.: No Pairing, HA pairs, preferred nodes etc.

In the event of a drive failure, regardless of what drive (SSD,SAS,SATA) fails, only that drive is impacted, not a disk group or RAID pack.

This is key as it limited the impact of the failure.

It is importaint to note, ADSF does not store large objects nor does the file system require tuning to stripe data across multiple drives/nodes. ADSF by default distributes the data (at a 1MB granularity) in the most efficient manner throughout the cluster while maintaining the hottest data locally to ensure the lowest overheads and highest performance read I/O.

Let’s go through a few scenarios, which apply to both All Flash and Hybrid environments.

  1. Expanding capacityWhen adding a node or nodes to an existing cluster, without moving any VMs, changing any configuration or making any design decisions, ADSF will proactively send replicas from write I/O to all nodes within the cluster, therefore improving performance while reactively performing disk balancing where a significant imbalance exists within a cluster.

    This might sound odd but with other HCI products new nodes are not used unless you change the stripe configuration or create new objects e.g.: VMDKs which means you can have lots of spare capacity in your cluster, but still experience an out of space condition.

    This is a great example of why ADSF has a major advantage especially when considering environments with large IO and/or capacity requirements.

    The node addition process only requires the administrator to enter the IP addresses and its basically a one click, capacity is available immediately and there is no mass movement of data. There is also no need to move data off and recreate disk groups or similar as these legacy concepts & complexities do not exist in ADSF.

    Nutanix is also the only platform to allow expanding of capacity via Storage Only nodes and supports VMs which have larger capacity requirements than a single node can provide. Both are supported out of the box with zero configuration required.

    Interestingly, adding storage only nodes also increases performance, resiliency for the entire cluster as well as the management stack including PRISM.

  2. Impact & implications to data reduction of adding new nodesWith ADSF, there are no considerations or implications. Data reduction is truely global throughout the cluster and regardless of hypervisor or if you’re adding Compute+Storage or Storage Only nodes, the benefits particularly of deduplication continue to benefit the environment.

    The net effect of adding more nodes is better performance, higher resiliency, faster rebuilds from drive/node failures and again with global deduplication, a higher chance of duplicate data being found and not stored unnecessarily on physical storage resulting in a better deduplication ratio.

    No matter what size node/s are added & no matter what Hypervisor, the benefits from data reduction features such as deduplication and compression work at a global level.

    What about Erasure Coding? Nutanix EC-X creates the most efficient stripe based on the cluster size, so if you start with a small 4 node cluster your stripe would be 2+1 and if you expand the cluster to 5 nodes, the stripe will automatically become 3+1 and if you expand further to 6 nodes or more, the stripe will become 4+1 which is currently the largest stripe supported.

  3. Drive FailuresIn the event of a drive failure (SSD/SAS or SATA) as mentioned earlier, only that drive is impacted. Therefore to restore resiliency, only the data on that drive needs to be repaired as opposed to something like an entire disk group being marked as offline.

    It’s crazy to think a single commodity drive failure in a HCI product could bring down an entire group of drives, causing a significant impact to the environment.

    With Nutanix, a rebuild is performed in a distributed manner throughout all nodes in the cluster, so the larger the cluster, the lower the per node impact and the faster the configured resiliency factor is restored to a fully resilient state.

At this point you’re probably asking, Are there any decisions to make?

When adding any node, compute+storage or storage only, ensure you consider what the impact of a failure of that node will be.

For example, if you add one 15TB storage only node to a cluster of nodes which are only 2TB usable, then you would need to ensure 15TB of available space to allow the cluster to fully self heal from the loss of the 15TB node. As such, I recommend ensuring your N+1 (or N+2) node/s are equal to the size of the largest node in the cluster from both a capacity, performance and CPU/RAM perspective.

So if your biggest node is an NX-8150 with 44c / 512GB RAM and 20TB usable, you should have an N+1 node of the same size to cover the worst case failure scenario of an NX-8150 failing OR have the equivalent available resources available within the cluster.

By following this one, simple rule, your cluster will always be able to fully self heal in the event of a failure and VMs will failover and be able to perform at comparable levels to before the failure.

Simple as that! No RAID, Disk group, deduplication, compression, failure, or rebuild considerations to worry about.

Summary:

The above are just a few examples of the advantages the Nutanix ADSF provides compared to other HCI products. The operational and architectural complexity of other products can lead to additional risk, inefficient use of infrastructure, misconfiguration and ultimately an environment which does not deliver the business outcome it was originally design to.