Nutanix AHV/AOS Functionality – Removing nodes

A Nutanix ADSF (Acropolis Distributed Storage Fabric) is designed to live forever, meaning as new nodes are added and older nodes removed, the cluster remains online and critically, in a fully resilient state at all times.

While this might not sound that critical, it avoids problems which have plagued legacy (and even many modern) datacenter products where forklift upgrades/replacements are not only complex, high risk and time consuming, they typically also reduce the resiliency of the platform throughout the process.

A common example of reduced resiliency is where one (of two) SAN/NAS controllers is taken offline during a fork lift storage controller upgrade, meaning a single failure can cause the storage to be offline.

Nutanix has now been shipping product for around 5 years so we have had many customers go through hardware refresh cycles, and many more who are about to embark on a HW refresh.

I thought I would quickly demonstrate how easy it is to remove an old node from a cluster and ensure existing and prospective Nutanix customers have the facts about the node removal process.

Firstly lets look at the environment the demonstration is performed on.

We have an AHV environment with 8 nodes with a mix of NX3050 and NX6050 spread over 3 blocks as shown in Nutanix PRISM UI (below).

EnvironmentSummary

To remove a host, all we need to do is go to the hardware tab in PRISM, click the host we want to remove and select Remove Host as shown below.

RemoveHost

No preparation tasks are required at all which also means less planning and change control is required. Once you select Remove Host, the host enters maintenance mode and starts performing the required tasks to remove the node as shown below.

RemoveHost2

As you can see, Acropolis OS (AOS) is removing each individual disk from the cluster before taking the node out of the cluster. This means the configured Resiliency Factor (RF) is always in compliance, ensuring that data is still available even in the event of a drive or node failure. This can be observed on the PRISM Home screen in the data resiliency view shown below.

DataResiliencyStatus

This process is handled by the curator function of AOS and because data is distributed throughout all nodes within the cluster, the process is both lower impact than traditional RAID based solutions or solutions using RAID+Replication, as well as faster because all nodes and therefore CVMs, SSDs and HDDs participate in the process. Nutanix ADSF does not mirror or replicate data from one node to another node, but to and from all nodes. This eliminates the potential bottleneck of a single node.

The following shows the speed at which Nutanix Distributed Storage Fabric (ADSF) performs the data migration even when the majority of data resides on the HDD tier (including in this example).

StoragePoolPerfNodeRemove

For a cluster with 20 x 1TB and 20 x 4TB SATA spindles for a total of 100TB of SATA and just 6.4TB SSD (or approx 6.5%) the node removal rate where it reached >830MBps quite impressive since most of the extents (data) which needed to be replicated throughout the cluster were retrieved from SATA tier.

The rate at which a node can be removed will vary depending on the front end I/O, node types and cluster size with larger cluster sizes able to remove nodes faster due to more available controllers (CMVs) and importantly more choice of source and destination of extents.

The process can be monitored via the Tasks view (shown earlier) or at a very granular level such as per disk (SSD or HDD).

The below shows us the status of the disk is Migrating Data and it also shows the drive had a significant amount of data on it as this was not an empty cluster demonstration. In fact this screen shot was taken about halfway through the node removal process.

DiskStatus

So many of you may be wondering what the CVM CPU utilisation is throughout this process During the process I took the following screenshot showing the eight Controller VMs, there vCPU configuration (8 vCPUs) and the CPU utilisation.

CVMCPUutilRemoveHost

As we can see, the utilisation ranges from just 6% through to 16% with an average of just under 10%. It should be noted these nodes are using Intel Ivy Bridge processors so with latest generation Intel Broadwell chipsets the process would use less percentage of CPU and perform faster (due to higher per core performance) than on this 3 year old equipment.

Note: The CVM is not just doing IO processing. It is providing the full AHV / AOS management stack which makes the fact the CVM is using under 10% CPU even more impressive.

The Remove host task also resets the configuration of the Controller VM (CVM) back to default which ensures the node can be quickly/easily added to a new or existing cluster.

The end result is a fully functional 7 node cluster as shown below.

EndResultNodeRemoval

Summary:

Node removal from a Nutanix cluster (regardless of hypervisor) is a 1-Click, Non disruptive operation which maintains cluster resiliency at all times while being a fast and low impact process.

Related Articles:

1. VMware you’re full of it (FUD) : Nutanix CVM/AHV & vSphere/VSAN overheads

2. Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor

3. Think HCI is not an ideal way to run mission-critical x86 workloads? Think Again!

VMware you’re full of it (FUD) : Nutanix CVM/AHV & vSphere/VSAN overheads

For a long time now, VMware & EMC are leading the charge along with other vendors spreading FUD regarding the Nutanix Controller VM (CVM), making claims it uses a lot of resources to drive storage I/O and it being a Virtual Machine (a.k.a Virtual Storage Appliance / VSA) is inefficient / slower than running In-Kernel.

An recent example of the FUD comes from an article written by the President of VCE himself, Mr Chad Sakac who wrote:

… it (VxRAIL) is the also the ONLY HCIA that has a fully integrated SDS stack that is embedded into the kernel – specifically VSAN because VxRail uses vSphere.  No crazy 8vCPU, 16+ GB of RAM for the storage stack (per “storage controller” or even per node in some cases with other HCIA choices!) needed.

So I thought I would put together a post covering what the Nutanix CVM provides and giving a comparison to what Chad referred to as a fully integrated SDS stack.

Let’s compare what resources are required between the Nutanix suite which is made up of the Acropolis Distributed Storage Fabric (ADSF) & Acropolis Hypervisor (AHV) and VMware’s suite made up of  vCenter , ESXi ,  VSAN and associated components.

This should assist those not familiar with the Nutanix platform understand the capabilities and value the CVM provides and correct the FUD being spread by some competitors.

Before we begin, let’s address the default size for the Nutanix CVM.

As it stands today, The CVM by default is assigned 8 vCPUs and 16GB RAM.

What CPU resources the CVM actually uses obviously depends on the customers use case/s so if the I/O requirements are low, the CVM wont use 8 vCPU, or even 4vCPUs, but it is assigned 8vCPUs.

With the improvement in ESXi CPU scheduling over the years, the impact of having more than the required vCPUs assigned to a limited number of VMs (such as the CVM) in an environment is typically negligible, but the CVM can be right sized which is also common.

The RAM allocation is recommended to be 24Gb when using deduplication, and for workloads which are very read intensive, the RAM can be increased to provide more read cache.

However, increasing the CVM RAM for read cache (Extent Cache) is more of a legacy recommendation as the Acropolis Operating System (AOS) 4.6 release achieves outstanding performance even with the read cache disabled.

In fact, the >150K 4k random read IOPS per node which AOS 4.6 achieves on the NX-9040-G4 nodes was done without the use of in-memory read cache as part of engineering testing to see how hard the SSD drives can be pushed. As a result, even for extreme levels of performance, increasing the CVM RAM for Read Cache is no longer a requirement. As such, 24Gb RAM will be more than sufficient for the vast majority of workloads and reducing RAM levels is also on the cards.

Thought: Even if it was true in-kernel solutions provided faster outright storage performance, (which is not the case as I showed here), this is only one small part of the equation. What about management? VSAN management is done via vSphere Web Client which runs in a VM in user space (i.e.: Not “In-Kernel”) which connects to vCenter which also runs as a VM in user space which commonly leverage an SQL/Oracle database which also runs in user space.

Now think about Replication, VSAN uses vSphere Replication, which, you guessed it, runs in a VM in user space. For Capacity/Performance management, VSAN leverages vRealise Operations Manager (vROM) which also runs in user space. What about backup? The vSphere Data Protection appliance is yet another service which runs in a VM in user space.

All of these products require the data to move from kernel space into user space, So for almost every function apart from basic VM I/O, VSAN is dependant on components which are running in user space (i.e.: Not In-Kernel).

Lets take a look at the requirements for VSAN itself.

According to the VSAN design and sizing guide (Page 56) VSAN uses up to 10% of hosts CPU and requires 32GB RAM for full VSAN functionality. Now the RAM required doesn’t mean VSAN is using all 32GB and the same is true for the Nutanix CVM if it doesn’t need/use all the assigned RAM, it can be downsized although 12GB is the recommended minimum, 16GB is typical and for a node with even 192Gb RAM which is small by todays standards, 16GB is <10% which is minimal overhead for either VSAN or the Nutanix CVM.

In my testing VSAN is not limited to 10% CPU usage and this can be confirmed in VMware’s own official testing of SQL in : VMware Virtual SAN™ Performance with Microsoft SQL Server

In short, the performance testing is conducted with 3 VMs each with 4 vCPUs each on hosts contained a dual-socket Intel Xeon Processor E5-2650 v2 (16 cores, 32 threads, @2.6GHz).

So assuming the VMs were at 100% utilisation, they would only be using 75% of the total cores (12 of 16).  As we can see from the graph below, the hosts were almost 100% utilized, so something other than the VMs is using the CPU. Best case, VSAN is using ~20% CPU, with the hypervisor using 5%, in reality the VMs wont be pegged at 100% so the overhead of VSAN will be higher than 20%.

VSAN10PercentBS

Now I understand I/O requires CPU, and I don’t have a problem with VSAN using 20% or even more CPU, what I have a problem with is VMware lying to customers that it only uses 10% AND spreading FUD about other vendors virtual appliances such as the Nutanix CVM are resource hogs.

Don’t take my word for it, do your own testing and read their documents like the above which simple maths shows the claim of 10% max is a myth.

So that’s roughly 4 vCPUs (on a typical dual socket 8 core system) and up to 32GB RAM required for VSAN, but lets assume just 16GB RAM on average as not all systems are scaled to 5 disk groups.

The above testing was not on the latest VSAN 6.2, so things may have changed. One such change is the introduction of software checksums into VSAN. This actually reduces performance (as you would expect) because it provides a layer of data integrity with every I/O, as such the above performance is still a fair comparison because Nutanix has always had software checksums as this is essential for any production ready storage solution.

Now keep in mind, VSAN is really only providing the storage stack, so its using ~20% CPU under heavy load for just the storage stack, unlike the Nutanix CVM which is also providing a highly available management layer which has comparable (and in many cases better functionality/availability/scalability) to vCenter, VUM, vROM, vSphere Replication, vSphere Data Protection, vSphere Web Client, Platform Services Controller (PSC) and the supporting database platform (e.g.: SQL/Oracle/Postgress).

So I comparing VSAN CPU utilization to a Nutanix CVM is about as far from Apples/Apples as you could get, so let’s look at what all the vSphere Managements components resource requirements are and make a fairer comparison.

vCenter Server

Resource Requirements:

Small | Medium | Large

  • 4vCPUs | 8vCPUs | 16vCPUs
  • 16GB | 24GB | 32GG RAM

Reference:http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.install.doc/GUID-D2121DC5-1FC8-48DC-A4BA-C3FD72D0BE77.html

Platform Services Controller

Resource Requirements:

  • 2vCPUs
  • 2GB RAM

Reference:http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.install.doc/GUID-D2121DC5-1FC8-48DC-A4BA-C3FD72D0BE77.html

vCenter Heartbeat (Deprecated)

If we we’re to compare apples to apples, vCenter would need to be fully distributed and highly available which its not. The now deprecated vCenter Heartbeat used to be able to somewhat provide, so that’s 2x the resources of vCenter, VUM etc, but since its deprecated we’ll give VMware the benefit of the doubt and not count resources to make their management components highly available.

What about vCenter Linked Mode? 

I couldn’t find its resource requirements in the documentation so let’s give VMware the benefit of the doubt and say it doesn’t add any overheads. But regardless of overheads, its another product to install/validate and maintain.

vSphere Web Client

The Web Client is required for full VSAN management/functionality and has its own resource requirements:

  • 4vCPUs
  • 2GB RAM (at least)

Reference:https://pubs.vmware.com/vsphere-50/index.jsp#com.vmware.vsphere.install.doc_50/GUID-67C4D2A0-10F7-4158-A249-D1B7D7B3BC99.html

vSphere Update Manager (VUM)

VUM can be installed on the vCenter server (if you are using the Windows Installation) to save having management VM and OS to manage, if you are using the Virtual Appliance then a seperate windows instance is required.

Resource Requirements:

  • 2vCPUs
  • 2GB

The Nutanix CVM provides the ability to do Major and Minor patch updates for ESXi and of course for AHV.

vRealize Operations Manager (vROM)

Nutanix provides built in Analytics similar to what vROM provides in PRISM Element and centrally managed capacity planning/management and “what if” scenarios for adding nodes to the cluster, as such including vROM in the comparison is essential if we want to get close to apples/apples.

Resource Requirements:

Small | Medium | Large

  • 4vCPUs | 8vCPUs | 16vCPUs
  • 16GB | 32GB | 48GB
  • 14GB Storage

Remote Collectors Standard | Large

  • 2vCPUs | 4vCPUs
  • 4GB | 16GB

Reference:https://pubs.vmware.com/vrealizeoperationsmanager-62/index.jsp#com.vmware.vcom.core.doc/GUID-071E3259-625A-437B-AB34-E6A58B87C65B.html

vSphere Data protection

Nutanix also has built in backup/recovery/snapshotting capabilities which include application consistency via VSS. As with vROM we need to include vSphere Data Protection in any comparison to the Nutanix CVM.

vSphere Data Protection can be deployed in 0.5 to 8TB as shown below:

VDPrequirements

Reference: http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vmware-data-protection-administration-guide-61.pdf

The minimum size is 4vCPUs and 4GB RAM but that only supports 0.5TB, for even an average size node which supports say 4TB, 4 vCPUs and 8GB is required.

So best case scenario we need to deploy one VDP appliance per 8TB, which is smaller than some Nutanix (or VSAN Ready) nodes (e.g.: NX6035 / NX8035 / NX8150) so that would potentially mean one VDP appliance per node when running VSAN since the backup capabilities are not built in like they are with Nutanix.

Now what about if I want to replicate my VMs or use Site Recovery Manager (SRM)?

vSphere Replication

As with vROM and vSphere Data protection, vSphere Replication provides VSAN functionality which Nutanix also has built into the CVM. So we also need to include vSphere Replication resources in any comparison to the CVM.

While vSphere Replication has fairly light on resource requirements, if all my replication needs to go via the appliance, it means one VSAN node will be a hotspot for storage and network traffic, potentially saturating the network/node and being a noisy neighbour to any Virtual machines on the node.

Resource Requirements:

  • 2vCPUs
  • 4GB RAM
  • 14GB Storage

Limitations:

  • 1 vSphere replication appliance per vCenter
  • Limited to 2000 VMs

Reference: http://pubs.vmware.com/vsphere-replication-61/index.jsp?topic=%2Fcom.vmware.vsphere.replication-admin.doc%2FGUID-E114BAB8-F423-45D4-B029-91A5D551AC47.html

So scaling beyond 2000 VMs requires another vCenter, which means another VUM, another Heartbeat VM (if it was still available), potentially more databases on SQL or Oracle.

Nutanix doesn’t have this limitation, but again we’ll give VMware the benefit of the doubt for this comparison.

Supporting Databases

The size of even a small SQL server is typically at least 2vCPUs and 8GB+ RAM and if you want to compare apples/apples with Nutanix AHV/CVM you need to make the supporting database server/s highly available.

So even in a small environment we would be talking 2 VMs @ 2 vCPUs and 8GB+ RAM ea just to support the back end database requirements for vCenter, VUM, SRM etc.

As the environment grows so does the vCPU/vRAM and Storage (Capacity/IOPS) requirements, so keep this in mind.

So what are the approx. VSAN overheads for a small 4 node cluster?

The table below shows the minimum vCPU/vRAM requirements for the various components I have discussed previously to get VSAN comparable (not equivalent) functionality to what the Nutanix CVM provides.

VSANoverheads3

As the above only covers the minimum requirements for a small say 4 node environment, things like vSphere Data Protection will require multiple instances, SQL should be made highly available using an Always on Availability group (AAG) which requires a 2nd SQL server and as the environment grows, so do the vCPU/vRAM requirements for vCenter, vRealize Operations Manager and SQL.

A Nutanix AHV environment on the other hand looks like this:

NutanixOverheads2

So just 32 vCPUs and 64GB RAM for a 4 node cluster which is 8vCPU and 54GB RAM LESS than the comparable vSphere/VSAN 4 node solution.

If we add Nutanix Scale out File Server functionality into the mix (which is optionally enabled) this increases to 48vCPUs and 100GB RAM. Just 8vCPUs more and still 18GB RAM LESS than vSphere/VSAN while Nutanix provides MORE functionality (e.g.: Scale out File Services) and comes out of the box with a fully distributed, highly available, self healing FULLY INTEGRATED management stack.

The Nutanix vCPU count assumes all vCPUs are in use which is VERY rarely the case. So this comparison is well and truely in favour of VSAN while still showing vSphere/VSAN having higher overheads for a typical/comparable solution with Nutanix providing additional built in features such as Scale out File Server (another distributed and highly available solution) for only a small amount more resources than vSphere/VSAN which does not provide comparable native file serving functionality.

What about if you don’t use all those vSphere/VSAN features and therefore don’t deploy all those management VMs. VSAN overheads are lower, right?

It is a fair argument to say not all vSphere/VSAN features need to be deployed, so this will reduce the vSphere/VSAN requirements (or overheads).

The same however is true for the Nutanix Controller VM.

Its not uncommon where customers don’t run all features and/or have lower I/O requirements for the CVM to be downsized to 6vCPUs. I personally did this earlier this week for a customer running SQL/Exchange this week and the CVM is still only running at ~75% or approx ~4 vCPUs and that’s running vBCA with in-line compression.

So the overheads depend on the workloads, and the default sizes can be changed for both vSphere/VSAN components and the Nutanix CVM.

Now back to the whole In-Kernel nonsense.

VMware also like to spread FUD that their own hypervisor has such high overheads, its crazy to run any storage through it. I’ve always found this funny since VMware have been telling the market for years the hypervisor has a low overhead (which it does), but they change their tune like the weather to suit their latest slideware.

One such example of this FUD comes from VMware’s Chief Technologist, Duncan Epping who tweeted:

DuncanInKernelRubbish

The tweet is trying to imply that going through the hypervisor to another Virtual Machine (in this case a Nutanix CVM) is inefficient, which is interesting for a few reasons:

  1. If going from one VM to another via the kernel has such high overheads, why do VMware themselves recommend virtualizing business critical high I/O applications which have applications access data between VMs (and ESXi hosts) all the time? e.g.: When a Web Server VM accesses an Application Server VM which accesses data from a Database. All this is in one VM, through the kernel and into another VM.
  2. Because for VSAN has to do exactly this to leverage many of the features it advertises such as:
  • Replication (via vSphere Replication)
  • vRealize Operations Manager (vROM)
  • vSphere Data Protection (vDP)
  • vCenter and supporting components

Another example of FUD from VMware, in this case Principal Engineer, Jad El-Zein is implying VSAN has low(er) overheads compared to Nutanix (Blocks = Nutanix “Blocks”):

screenshot_2015-03-10-10-05-44-1.png

I guess he forgot about the large number of VMs (and resources) required to provide VSAN functionality and basic vSphere management. Any advantage of being In-Kernel (assuming you still believe it is in fact any advantage) are well and truely eliminated by the constant traffic across the hypervisor to and from the management VMs all of which are not In-Kernel as shown below.

vSphereManagement

I’d say its #AHVisTheOnlyWay and #GoNutanix since the overheads of AHV are lower than vSphere/VSAN!

Summary:

  1. The Nutanix CVM provides a fully integrated, preconfigured and highly available, self healing management stack. vSphere/VSAN requires numerous appliances and/or software to be installed.
  2. The Nutanix AHV Management stack (provided by the CVM) using just 8vCPUs and typically 16GB RAM provides functionality which in many cases exceeds the capabilities of vSphere/VSAN which requires vastly more resources and VMs/Appliances to provide comparable (but in many cases not equivalent) functionality.
  3. The Nutanix CVM provides these capabilities built in (with the exception of PRISM Central which is a seperate Virtual Appliance) rather than being dependant on multiple virtual appliances, VMs and/or 3rd party database products for various functionality.
  4. The Nutanix management stack is also more resilient/highly available that competing products such as all VMware management components and comes this way out of the box. As the cluster scales, the Acropolis management stack continues to automatically scale management capabilities to ensure linear scalability and consistent performance.
  5. Next time VMware/EMC try to spread FUD about the Nutanix Controller VM (CVM) being a resource hog or similar, ask them what resources are required for all functionality they are referring to. They probably haven’t even considered all the points we have discussed in this post so get them to review the above as a learning experience.
  6. Nutanix/AHV management is fully distributed and highly available. Ask VMware how to make all the vSphere/VSAN management components highly available and what the professional services costs will be to design/install/validate/maintain that solution.
  7. The next conversation to have would be “How much does VSAN cost compared to Nutanix”? Now that we understand all the resources overheads and complexity in design/implementation/validation of the VSAN/vSphere environment, not to mention most management components will not be highly available beyond vSphere HA. But cost is a topic for another post as the ELA / Licensing costs are the least of your worries.

To our friends at VMware/EMC, the Nutanix CVM says,

“Go ahead, underestimate me”.

 

Being called out on Exchange performance & scale. Close (well not really), but no cigar.

*UPDATE March 30th 6am EST*

Following an email from Howard Marks, the author of the Atlantis document, I would like to confirm a few points, but overall the feedback and this response from Howard, simply confirmed my original comments found in this post but it also raised new concerns.

  1. It has been confirmed that the 60,000k Exchange user solution could not be supported on the hardware as described in the Atlantis document.
  2. The performance (IOPS) claims are based on a non DAG configuration. This then is clearly not a real world solution as nobody would deploy a 60,000 user Exchange solution without a DAG configuration.
  3. The testing was purely storage based and did not take into account the requirements (such as CPU/RAM/Capacity) for a real world 60,000 user Exchange deployment.
  4. The Nutanix ESRP is not based on hero numbers. Its a solution sized and tested for a real world solution, thus the original claim and the latest claim from Howard’s response being “Hero numbers vs Hero Numbers” and the original claim of “2.5 times the mailboxes and 5 times the IOPS” are both incorrect.
  5. Nutanix ESRPs are 1GB and 1.5GB respectively. The Atlantis product tested “would be stressed to hold the data for 60,000 mailboxes at 800MB each. Perhaps we should have set the mailbox size to 200MB”. This confirms the comparison is not Apples/Apples (or even tangerines and clementines).
  6. The industry standard for Exchange virtualization is to deploy no more than one DAG instance per host (or HCI node). Howard responded by stating “If I were trying to deploy Exchange on a four-node HCI appliance, as opposed to Jetstress, I would use twelve virtual Exchange servers, so each host would run three under normal conditions and four in the event of a node failure.” – This would mean if a DAG configuration was used (which it should be for any real world deployment, especially at this scale) then a single node failure would bring down three Exchange instances, and as a result, the entire DAG. This is Exchange virtualization 101 and I would never recommend multiple Exchange servers (within a DAG) per host/node. I describe this in more detail in How to successfully Virtualize MS Exchange – Part 4 – DRS where I explain”Setup a DRS “Virtual Machines to Hosts” rule with the policy “Should run of hosts in group” on a 1:1 basis with Exchange MSR or MBX VMs & ESXi hosts” and summarized by saying the DRS configuration described “Ensures two or more DAG members will not be impacted in the event of a single ESXi host failure.
  7. Howard mentions “My best understanding of Nutanix’s and Atlantis’ pricing is that the all-flash Atlantis and the hybrid Nutanix will cost a customer about the same amount of cash. The Nutanix system may offer a few more TB of capacity but not a significant amount after Atlantis’ data reduction.” In the original document he states “Atlantis is assuming a very conservative data reduction factor of 1.67:1 when they pitch the CX-12 as having 12TB of effective capacity.”. This is 12TB across 4 nodes, which equates to 3TB per node. The Nutanix NX-8150 can support ~16.50TB usable per node with RF2 without data reduction which is above the CX-12 capacity including data reduction. The Nutanix usable capacity is obviously a lot more with In-Line compression and EC-X enabled which is what we recommend for Exchange. Assuming the same compression ratio, the NX-8150 usable capacity jumps to 27.5TB with further savings available when enabling EC-X. This usable capacity even eclipses the larger Atlantis CX-24 product. As such, the claim (above) by Howard is also incorrect.

My final thoughts on this topic would be that Atlantis should take down the document, as it is misleading to customers and only provides Jetstress performance details without the context of the CPU/RAM requirements.The testing performed was essentially just an IOPS drag race, which as I quoted in “Peak Performance vs Real World Performance“…

“Don’t perform Absurd Testing”.  (Quote Vaughn Steward and Chad Sakac)

As a result the document has nothing which could realistically be used to help design an Exchange environment for a customer on Atlantis,

As such it would make sense to redo the testing to a standard where it could be deployed in the real world, update the document and republish.

Remember: Real world Exchange solutions are typically constrained by RAM, then CPU, then Capacity. As a result it all but negates the need for IOPS (per node) beyond that which was achieved in the Nutanix ESRP report being 671.631. This level of IOPS would support 5535 users per node using Howard’s calculation being “(for 150 messages/day, that’s 0.101) multiply by 1.2 to provide 20% headroom”. However as Exchange is constrained by CPU/RAM first, the solution would have to scale out and use many more servers, meaning the IOPS requirement per node (which is what matters as opposed to “total iops” which is what Howard provides), would be in the mid/high hundreds range (~600), not thousands as the RAM constraint would prevent more users running per server and therefore as I said earlier, negate the need for IOPS at the level Howard is talking.

—— START OF ORIGINAL POST ——-

So Nutanix has been called out by Atlantis on Twitter and via an Atlantis document recently released regarding our MS Exchange ESRP. Atlantis, a little known company who are not listed in the Gartner Magic Quadrant for Integrated Systems, or the IDC marketscape for HCI, attempted to attack Nutanix regarding our Exchange ESRP (or should I say multiple ESRPs) in what I found to be a highly amusing way.

I was reading the document published by Atlantis which is titled “Atlantis HyperScale Supports 60,000 Mailboxes With Microsoft Jetstress” and I honestly thought it was an April Fools joke, but being that its only March 25 (at least here in Australia), I concluded they must be serious.

I’ve got to be honest and say I don’t see Atlantis in the field, but one can only speculate they called Nutanix out to try and get some attention. Well, they have succeeded, I’m just not sure this exposure will be beneficial to them.

Lets look at the summary for their document:

AtlantisBS

  1. Deduping Exchange DAGs
    This is widely not recommended by Microsoft and Exchange MVPs.
  2. (Update) Claiming support for 60k mailboxes at 200 150 message/day/mailbox
    No issue here, large-ish environment and reasonably high messages/per/day.
  3. 2.5x the mailboxes of a leading HCI providers ESRP (i.e.: Nutanix)
    Had a good chuckle here as ESRP is not a competition to see what vendor has the most hardware lying around the office to perform benchmarks. But I appreciate that Atlantis want to be associated with Nutanix so go you ankle biter you!
  4. Five times the IOPS of Nutanix
    This was even more of a laugh as Nutanix has never published “peak” IOPS for Exchange as IOPS is almost never a limiting factor in the real world as I have explained in Peak performance vs Real World – Exchange on Nutanix Acropolis Hypervisor (AHV) but hey, the claim even though its not true makes for a good marketing.

Problem #1 – Atlantis document is not an ESRP, it’s a (paid for?) analysts report.

The article claims:

“2.5 times the mailboxes of a leading HCI provider’s ESRP report”

Off the bat, this is an Apples/Oranges comparison as Atlantis doesn’t have an ESRP validated solution. This can be validated here: Exchange Solution Reviewed Program (ESRP) – Storage

More importantly the number of mailboxes is irrelevant as Nutanix scales linearly, so our ESRP validates 30,000 users, simply double the number of nodes, scale out the DAG and you can support 60,000 users with the same performance. In fact, as the 30,000 user solution already has N+1, you don’t even need to double the number of nodes to get 60,000 users.

Nutanix ESRP is also a configuration which can (and has been) deployed in the real world. As I will explain further in this post, the Atlantis solution as described in the document could not be successfully deployed in the real world.

Problem #2 – Five times more IOPS… lol!

The article claims:

“Five times the IOPS of that HCI provider’s system”

Another apples/oranges comparison as Nutanix uses a hybrid deployment (SSD + SATA) whereas Atlantis used an All-Flash configuration. Atlantis also used deduplication and compression, which I have previously blogged about in Jetstress Testing with Intelligent Tiered Storage Platforms where I conclude by saying:

No matter what any vendor tells you, 8:1 dedupe for Exchange (excluding DAG copies) is not realistic for production data in my experience. As such, it should never be used for performance testing with Jetstress.

Solution: Disable dedupe when using Jetstress (and in my opinion for production DAGs)

So any result (including this one from Atlantis) achieved with Dedupe or Compression enabled is not realistic regardless of if its more/less IOPS than Nutanix.

But regarding Nutanix All-Flash performance, I have previously posted YouTube videos showing Nutanix performance which can be found here: Microsoft Exchange 2013/2016 Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV)

In Part 1 of the series “The Baseline Test” I explained:

Note: This demonstration is not showing the peak performance which can be achieved by Jetstress on Nutanix. In fact it’s running on a ~3 year old NX-3450 with Ivy Bridge processors and Jetstress is tuned (as the video shows) to a low thread count.

So lets look at Nutanix performance on some 3yo old hardware as a comparison.

At the 4:47 mark of the video in Part 1 we can see the Jetstress latency for Read and Write I/O was between 1.3 and 1.4ms for all 4 databases. This is considerably lower and critically much more consistent than the latency Atlantis achieved (below) which varies significantly across databases.

AtlantisLatency

As you can see above, DB read latency varies between 3ms and 16ms, DB writes between 14ms an 105ms and Log Writes between 2ms and 5ms. In 2013 when I joined Nutanix, we had similar (poor) performance for Exchange that the Atlantis Jetstress results show, and this was something I was actively involved with addressing, and by early 2014 we had a fantastic platform for Exchange deployments as we did a lot of work in the back end to deal with very large working sets which exceeded SSD tier.

In contrast, Nutanix NX-8150, which is a hybrid (SSD + SATA) node, achieved very consistent performance as shown below from our official ESRP which has been validated by Microsoft.

NutanixConsistentPerfJetstress

Obviously being Nutanix is hybrid, the average latency for reads which the bulk of are served from SATA are higher than an all flash configuration, but interestingly, Nutanix latencies are much more consistent and the difference between peak and minimal latency is low, compared to Atlantis which varies in some cases by 90ms!

In the real world, I recently visited a large customer who runs almost the exact same configuration as our ESRP and they average <5ms latency across their 3 Nutanix clusters running MS Exchange for 24k users (Multi-site DAG w/ 3 Copies).

At the 4:59 mark of the Baseline Test video we can see the Nutanix test achieved 2342 IOPS using just 8 threads, compare this to Atlantis, where they achieved 2937.191 IOPS but required 3x the threads (24). For those of you not familiar with Jetstress, more threads = more potential IOPS. I have personally tested Nutanix with 64 threads and performance continues to increase.

Using more threads I have previously blogged about how Nutanix outperformed VSAN for Jetstress which showed the following Jetstress summary showing 4381 IOPS using more threads. So actually, Nutanix comfortably outperforms Atlantis even back in June 2015 when I wrote that post. With the release of AOS 4.6 performance has improved significantly.

So Nutanix Exchange performance isn’t limited by IOPS as I explained in Peak performance vs Real World – Exchange on Nutanix Acropolis Hypervisor (AHV).

Exchange scale/performance is always constrained by RAM first (96GB), then CPU (20vCPUs) and typically in HCI environments, Storage Capacity, and a distant fourth, potentially being IOPS.

Problem #3 The solution could not be deployed in the real world

Tony Redmond a well known Microsoft Exchange MVP has previously blogged about unrealistic ESRP submissions such as: Deficient IBM and Hitachi 120,000 mailbox configurations

Tony rightly points out:

“The core problem in IBM’s configuration is that the servers are deficient in memory”

The same is true for the Atlantis document, With just 3 servers, that’s 20,000 users per server. In the configuration they published, that would mean the mailbox servers would vastly exceed the recommended sizing for an Exchange instance of 20vCPUs and 96GB RAM.

The servers Atlantis are using have Intel E5-2680 v3 processors which have a Spec base rate of between ~57 and 61 depending on the exact server vendor, so lets use the higher number to give Atlantis the benefit of the doubt: So Intel E5-2680 v3 processors have 12 cores, in a 2 socket host thats 24 x 61 = A maximum SpecIntRate of 1464.

Again to give Atlantis the benefit of the doubt, lets assume their VSA uses zero resources and see if we have enough compute available for the 20,000 users they tested in the real world.

The first issue is the Exchange 2013 Server Role Requirements Calculator reports the following error:

AtlantisError1

I solved this by adding a fourth server to the sizing since I’m a nice guy.

The second issue is the Exchange 2013 Server Role Requirements Calculator reports the following error (even with the fourth server being added).

AtlantisIssue2

Ok, not the end of the world I guess, Oh wait… then comes the third issue:

AtlantisIssue3

Each of the four mailbox servers vastly exceed the recommended maximums for Exchange 2013 servers (physical or virtual) being 20vCPUs and 96GB RAM.

Reference: Exchange 2013 Sizing and Configuration Recommendations

The below is a screenshot from the above link highlighting the recommendation from MS as at the time/date this article was published. (Friday 25th 2015 AEST)

20vcpus96GBRAM

In fact, they exceed the resources of the tested equipment as well. The testing was performed on Atlantis HyperScale CX-12 systems which have only 256GB RAM each  384GB each.

The below is from page 6 of the Atlantis document:

We performed our testing remotely on a HyperScale CX-12 system, with 256GB of RAM per node, in Atlantis’ quality-control lab.

Update: The solution has 384Gb RAM, which is still insufficient for the number of users and messages per day in the real world.

In this case, Atlantis, even with the benefit of the doubt of a fourth server and maximum SpecIntRate, would be running 332% utilisation on CPU even when using over 3 times the recommended maximum number of pCores and requiring almost 1TB of RAM per Exchange Server. That’s almost 10x the recommended RAM per instance!

So as if the first three weren’t enough, the fourth issue is kind of funny.

The required capacity for the solution (60k users @ 800MB Mailboxes w/ 2 DAG copies) is shown below. If you look at the Database Volume Space Required line item, it shows ~46TB required capacity per server not including restore volume.

AtlantisIssue4

Let’s keep giving Atlantis the benefit of the doubt and accept their claim (below) relating to data reduction.

Atlantis is assuming a very conservative data reduction factor of 1.67:1

So that’s 46TB / 1.67 = ~27.54TB required usable capacity per Exchange server. So that’s ~110TB for the 4 Exchange servers.

The HyperScale CX-24 system is guaranteed to provide up to 24TB according to the document on Page 6, screenshot below.

Atlantis24tB

With a requirement for 27.54TB per node (and that’s with 4 servers, not 3 as per the document), the solution tested has insufficient capacity for the solution even when giving Atlantis the benefit of the doubt regarding its data reduction capabilities.

Prior to publishing this blog, I re-read the document several times and it turns out I made a mistake. On further analysis I discovered that the Hyperscale CX-24 system provides just 24TB across ALL FOUR NODES, not one node as per the document on Page 8.

AtlantisNodeSpecs

So in reality, my above comments were actually in favour of Atlantis, as the actual nodes have (24/4) just 6TB usable each, which is just under 5x LESS storage capacity than is required assuming 100% capacity utilisation.

The fifth issue however takes the cake!

The solution requires 4824 IOPS for the Databases and 1029 IOPS for Logs as shown by the sizing calculator.

AtlantisIssue5

Now the Atlantis document shows they achieved 2937.191 IOPS on Page 10, so they are not achieving the required IOPS for the 60000 users even with their all Flash configuration and 24 threads!

I would have thought a storage company would at least get the storage (capacity and IOPS) sizing correct, but both capacity and IOPS have not been sized for correctly.

Too harsh?

Ok, maybe i’m going a little honeybadger on Atlantis, so lets assume they meant 150 messages/day per mailbox as the document says both 200 messages (Page 4) and then the following on Page 11.

AtlantisIssue6

If I change the messages per day to 150 then the IOPS requirement drops to 3618 for DBs and 773 for Logs as shown below.

AtlantisIssue9

So… they still failed sizing 101, as they only achieved 2937 IOPS as per Page 10 of their document.

What about CPU/RAM? That’s probably fine now right? Wrong. Even with the lower messages per day, each of the 4 Exchange instances are still way over utilized on CPU and 8x oversized on RAM.

AtlantisIssue8

Let’s drop to 50 messages per day per user. Surely that would work right? Nope, we’re still 3x higher on RAM and over the recommended 80% CPU utilisation maximum for Exchange.

Atlantis50MsgsDayCompute

What about IOPS? We’ve dropped the user profile by 4x. Surely Atlantis can support the IOPS now?

Woohoo! By dropping the messages per day by 4x Atlantis can now support the required IOPS. (You’re welcome Atlantis!)

AtlantisIOPS

Too bad I had to include 33% more servers to even get them to this point where the Mailbox servers are still oversized.

Problem #4 – Is it a supported configuration?

No details are provided about the Hypervisor storage protocol used for the tests, but Atlantis is known to use NFS for vSphere, so if VMDK on NFS is used, this configuration is not supported by MS (as much as I think VMDK on NFS should be). Nutanix has two ESRP validated solutions which are fully supported configurations using iSCSI.

UPDATE: I have been informed the testing was done using iSCSI Datastores (not In-Guest iSCSI)

However, this only confirms this is not a supported configuration (at least not by VMware) as Atlantis are not certified for iSCSI according to the VMware HCL as of 5 mins ago when I checked (see below which shows only NFS).

AtlantisiSCSI

Problem #5 – Atlantis claims this is a real world configuration

AtlantisRealWorld

The only positive that comes out of this is that Atlantis follow Nutanix recommendations and have an N+1 node to support failover of the Exchange VM in the event of a node failure.

As I have stated above, unfortunately Atlantis have insufficient CPU,RAM,storage capacity and storage performance for the 60000 user environment described in their document.

As the testing was actually on Hyperscale CX-12 nodes the usable capacity is 12TB for the four node solution. The issue is we need ~110TB in total for the 4 Exchange servers (~27TB per instance), so with only 4 nodes it means Atlantis has insufficient capacity for the solution (actual guaranteed usable is 12TB or almost 10x less than what is required) OR they are are assuming >10x data reduction.

If this is what Atlantis recommends in the real world, then I have serious concerns for any of their customers who try to deploy Exchange as they are in for a horrible experience.

Nutanix ESRP assumes ZERO data reduction, as we want to show customers what they can expect in the worst case scenario AND not cheat ESRP by deduping and compressing data which results in unrealistic data reduction and increased performance.

Nutanix compression and Erasure Coding provide excellent data reduction, in a customer environment I reviewed recently which is 24k users, they had >2:1 data reduction using just In-Line compression. As they are not capacity constrained EC-X is currently not in use but this would also provide further savings and is planned to be enabled as they continue putting more workloads into the Nutanix environment.

However Nutanix size assuming no data reduction and the savings from data reduction are considered a bonus. In cases where customers have limited budget I give customers estimates on data reduction. But typically we just start small, size for smaller mailbox capacities and allow the customer to scale capacity as required with additional storage only nodes OR additional compute+storage nodes where additional messages/day or users are required.

Rule of Thumb: Use data reduction savings as a bonus and not for sizing purposes. (Under promise, over deliver!)

Problem #6 – All Flash (or should I say, All Tier 0/1) for Exchange?

To be honest, I think a hybrid (or tiered) storage solution (HCI or otherwise) is the way to go as its only a small percentage of workloads which require the fastest performance. Lots of applications like Exchange require high capacity and low IOPS, so lower cost, high capacity storage is better suited for this kind of workload. Using a small persistent write/read buffer for hot data gives the Tier 0/1 type performance without the cost, all while having much larger capacity for larger mailboxes/archiving and things like LAGGED copies or snapshot retention on primary storage for fast recovery (but not backup of course since snapshots on primary storage are not backups).

As SSD prices come down and technology evolves, I’m sure we’ll see more all SSD solutions, but with faster technology such as NVMe as the persistent read/write buffers and commodity SSD for capacity tier. But having the bulk of the data from workloads like Exchange on Tier 0/1 doesn’t make sense to me. The argument that data reduction makes the $/GB comparable to lower tier storage will fluctuate over time while the fact remains, storage IOPS are of the least concern when sizing for Exchange.

Problem #7 – $1.50 per mailbox… think again.

The claim of $1.50 per mailbox us just simply wrong. With correct sizing, the Atlantis solution will require significantly more nodes and the price will be way higher. I’d do the math exactly, but its so far fetched its not worth the time.

Summary:

I would be surprised if anyone gets to this summary, as even after Problem #1 or #2 Atlantis have been knocked out faster than Jose Aldo at UFC194 (13 seconds). But none the less, here are a few key points.

  1. Atlantis solution has insufficient CPU for the 60000 users even with 4x less messages per day than the document claims
  2. Atlantis solution has insufficient RAM for the solution, again even with 4x less messages per day than the document claims
  3. Atlantis can only achieve the required IOPS by reducing the messages per day by 4x down to 50 messages per day (funnily enough that’s lower than the Nutanix ESRP)
  4. The nodes tested do not have sufficient storage capacity for the proposed 60,000 users w/ 800MB mailboxes and 2 DAG copies even with the assumption of 1.67:1 data reduction AND a fourth node.
  5. Atlantis does not have a supported configuration on VMware vSphere (Nutanix does using “Volume Groups” over iSCSI)
  6. Atlantis does not have an ESRP validated solution. I believe this would be at least in part due to their only supporting NFS and their configuration not being valid for ESRP submission due to having all DAG copies on the same underlying storage failure domain. Note: Nutanix supports iSCSI which is fully supported and our ESRP uses two failure domains (seperate Nutanix ADSF clusters for each DAG copy).
  7. As for the $/Mailbox claim on twitter of $1.50 per mailbox, think again Atlantis.