MS Exchange on Nutanix now a MS validated ESRP solution

I am pleased to announce that Nutanix has successfully completed the Microsoft Exchange Solution Review Program requirements and are now listed as a validated solution at the following URL:

Exchange Solution Reviewed Program (ESRP) – Storage

The solution shows a dual site 24,000 1GB Mailbox solution running on just 8 NX-8150 nodes. This is a very highly resilient solution with N+1 availability at each site allowing for full self healing and failover in the event of a node failure.

Nutanix is also the FIRST and only hyper-converged platform to be validated under ESRP, further strengthening our leadership in the market.

The performance testing (using Jetstress) was with the nodes at around 90% capacity with 8.5TB per node, proving that Nutanix provided great performance even when running at high utilization and where the working set far exceeds the SSD tier. This is key to a truly enterprise solution for a business critical application such as Exchange.

The solution is running on Hyper-V with SMB 3.0 on the underlying Nutanix Distributed Storage Fabric. The same solution can also be deployed on vSphere or Acropolis Hypervisors using iSCSI in a fully supported configuration.

The above solution was validated without using Compression or Erasure Coding both of which improve performance and give significant capacity savings which allows for larger mailboxes. As a result, the Nutanix platform provides even more value than the ESRP submission shows.

If there was any doubt around if you should virtualize MS Exchange on Nutanix platform, the fact Nutanix is now validated by Microsoft should put your mind at ease.

Now you can move one step closer to a fully webscale datacenter by removing another application specific silo and enjoy improved resiliency/performance while reducing operational cost and complexity.

MS Exchange on Nutanix Acropolis Hypervisor (AHV)

While Virtualization of MS Exchange is now common across multiple hypervisors it continues to be a hotly debated topic. The most common objections being cost (CAPEX), the next being complexity (which translates to CAPEX & OPEX) and the third being that virtualization adds minimal value as MS Exchange provides application level high availability. The other objection I hear is Virtualization isn’t supported, which always makes me laugh.

In my experience, the above objections are typically given in the context of a dedicated MS Exchange environment, which in that specific context some of the points have some truth, but the question becomes, how many customers run only MS Exchange? In my experience, None.

Customers I see typically run tens, hundreds even thousands of workloads in their datacenters so architecting silos for each application is what actually leads to cost & complexity when we think outside the box.

Since most customers have virtualization and want to remove silos in favour of a standarized platform, MS Exchange is just another Business Critical Application which needs to be considered.

Let’s discuss each of the common objections and how I believe Acropolis + Nutanix XCP addresses these challenges:

Microsoft Support for Virtualization

For some reason, there is a huge amount of FUD regarding Microsoft support for Virtualization (other than Hyper-V), but Nutanix + Acropolis is certified under the Microsoft Server Virtualization Validation Program (SVVP) and runs on block storage via iSCSI protocol, so Nutanix + Acropolis is 100% supported for MS Exchange as well as other workloads like Sharepoint & SQL.

Cost (CAPEX)

Unlike other hypervisors and management solutions, Acropolis and Acropolis Hypervisor (AHV) come free with every Nutanix node which eliminates the licensing cost for the virtualization layer.

Acropolis management components also do not require purchase or installation of Tier 1 database platforms, all required management components are built into the distributed platform and scaled automatically as clusters are expanded. As a result, even licenses for Windows operating system are not required.

As a result, Nutanix + Acropolis gives Exchange deployments all the Virtualization features (below) which provide benefits at no cost.

  • High Availability & Live Migration
  • Hardware abstraction
  • Performance monitoring
  • Centralized management

Complexity (CAPEX & OPEX)

Nutanix XCP + Acropolis can be deployed in a fully optimal configuration from out of the box to operational in less than 60 minutes. This includes all required management components which are automatically deployed as part of the Nutanix Controller VM (CVM). For single cluster environments, no design/installation is required for any management components, and for multiple-cluster environments, only a single virtual appliance (PRISM Central) is required for single pane of glass management across all clusters.

Acropolis gives Exchange deployments all the advantages of Virtualization without:

  • Complexity of deploying/maintaining of database server/s to support management components
  • Deployment of dedicated management clusters to house management workloads
  • Having onsite Subject Matter Experts (SMEs) in Virtualization platform/s

Virtualization adds minimal value

While applications such as Exchange have application level high availability, Virtualization can further improve resiliency and flexibility for the application while making better use of infrastructure investments.

The Nutanix XCP including Acropolis + Acropolis Hypervisor (AHV) ensures infrastructure is completely abstracted from the Operating System and Application allowing it to deliver a more highly available and resilient platform.

Microsoft advice is to limit the maximum compute resources per Exchange server to 24 CPU cores and 96GB RAM. However with CPU core counts continuing to increase, this may result in larger numbers of servers being purchased and maintained where an application specific silo is deployed. This would lead to increased datacenter and licensing costs not to mention operational overhead of managing more infrastructure. As a result, being able to run Exchange alongside other workloads in a mixed environment (where contention can easily be avoided) reduces the total cost of infrastructure while providing higher levels of availability to all workloads.

Virtualization allows Exchange servers to be sized for the current workload and resized quickly and easily if/when required which ensures oversizing is avoided.

Some of the benefits include:

  • Minimizing infrastructure in the datacenter
  • Increasing utilization and therefore value for money of infrastructure
  • Removal of application specific silos
  • Ability to upgrade/replace/performance maintenance on hardware with zero impact to application/s
  • Faster deployment of new Exchange servers
  • Increase availability and provide higher fault tolerance
  • Self-healing capabilities at the infrastructure layer to compliment application level high availability
  • Ability to increase Compute/Storage resources beyond that of the current underlying physical server (Nutanix node) e.g.: Add storage capacity/performance

The Nutanix XCP Advantages (for Exchange)

  • More usable capacity

With features such as In-Line compression giving between 1.3:1 and 1.7:1 capacity savings & Erasure Coding providing up to a further 60% usable capacity, Nutanix XCP can provide more usable capacity than RAW while providing protection from SSD/HDD and entire server failures.

In-Line compression also improved performance of the SATA drives, so its a Win/Win. Erasure coding (EC-X) stores data in a more efficient manner which allows more data to be served from the SSD tier, also a Win/Win.

  • More Messages/Day and/or Users per physical CPU core

With all Write I/O serviced by SSD the CPU WAIT time is significantly reduced which frees up the physical CPU to perform other activities rather than waiting for a slow SATA drive to respond. As MS Exchange is CPU intensive (especially from 2013 onwards) this means more Messages per Day and/or Users can be supported per MSR VM compared to physical servers.

  • Better user experience

As Nutanix XCP is a hybrid platform (SSD+SATA), newer/hotter data is serviced by the SSD tier which means faster response times for users AND less CPU WAIT which also helps further increase CPU efficiencies, again leading to more Messages/Day and/or Users per CPU core.

Summary:

With Cost (CAPEX), Complexity (CAPEX & OPEX) and supportability issues well and truly addressed and numerous clear value adds, running a business critical application like MS Exchange on Nutanix + Acropolis Hypervisor (AHV) will make a lot of sense for many customers.

Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

Many customers and partners have expressed interest in Acropolis since it was officially launched at .NEXT in June earlier this year, and since then lots of questions have been asked around resiliency/availability etc.

In this post I will cover how I/O failover occurs and how AHV load balances in the event of I/O failover to ensure optimal performance.

Let’s start with an Acropolis node under normal circumstances. The iSCSI initiator for QEMU connects to the iSCSI redirector which directs all I/O to the local stargate instance which runs within the Nutanix Controller VM (CVM) as shown below.

AHVMPdefault

I/O will always be serviced by the local stargate unless a CVM upgrade, shutdown or failure occurs. In the event one of the above occurs QEMU will loose connection to the local stargate as shown below.AHVMPfailedlocal

When this loss of connectivity to stargare occurs, QEMU reconnects to the iSCSI redirector and establishes a connection to a remote stargate as shown below.AHVMPremote

The process of re-establishing an iSCSI connection is near instant and you will likely not even notice this has occurred.

Once the local stargate is back online (and stable for 300 seconds) I/O will be redirected back locally to ensure optimal performance.

AHVMPfailback

In the unlikely event that the remote stargate goes down before the local stargate is back online then the iSCSI redirector will redirect traffic to another remote stargate.

Next lets talk about Load Balancing.

Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing policy to choose which removes another layer of complexity and potential point of failure.

However in the event of the local CVM being unavailable for any reason we need to service I/O for all the VMs on the node in the most efficient manner. Acropolis does this by redirecting I/O on a per vDisk level to a random remote stargate instance as shown below.

pervmpathfailover

Acropolis can do this because every vdisk is presented via iSCSI and is its own target/LUN which means it has its own TCP connection. What this means is a business critical application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by multiple controllers concurrently.

As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or maintenance scenario.

As i’m sure you can now see, Acropolis provides excellent resiliency and performance even during maintenance or failure scenarios.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM