Deduplication and MS Exchange

Virtualization and Storage always seem to be a hot topics in regards to Exchange deployments and many of you would have seen my post Virtualizing Exchange on vSphere with NFS backed storage a while back.

This post was motivated by a tweet from fellow VCDX which stated:

dedupe not supported for Exchange, no we can’t turn it off.

Later in the twitter conversation he went on to say

To be clear not an MS employee, another integrator MS “master” certified. It’s the whole NFS thing again

I have heard similar over the years and for me the disappointing thing is the support statement is unclear as are the motivations behind support statements for Exchange in general. e.g.: Support for VMDK on NFS

The only support statement I am aware of regarding Exchange and deduplication is in the technet article “Exchange 2013 storage configuration options” under the section “Volume configurations for the Exchange 2013 Mailbox server role” at it states:

storageexchange

In the above statement which specifically refers to “a new technique to optimize storage utilization for Windows Server 2012” is states that for Stand-alone or High availability solutions de-duplication is not supported for Exchange database file unless the DB files are completely offline and used for backup or archives.

So the first question is “Is array level deduplication supported”?

There is nothing that says that it isn’t supported that I am aware of, so if you are aware of such a statement please let me know in the comments and I will update this post.

My interpretation of the support statement is that array level deduplication is supported and MS have simply called out that the deduplication in Windows 2012 is not. Regardless of if you agree or disagree with my interpretation, I think its safe to say the support statement should be clarified with justification.

The next question I would like to discuss is “Should deduplication be used with Exchange”?

Firstly we should discuss the fact Exchange can be deployed with Database Availability Groups (DAGs) which creates multiple copies of Exchange databases across up to 16 Exchange Mailbox (or Multi-Role) servers.

The purpose of a DAG is to provide high availability for the application and data.

So if the application is by design making duplicate copies, should the storage be undoing this work?

Before I give my opinion on deduplicating DAG copies, I want to be clear on two things:

1. Deduplication is a well proven technology which many different vendors implement either in-line or post process or in some cases both.

2. As array level deduplication is abstracted from the Guest OS and Application, there is no risk to the application such as data corruption or anything like that.

So back to deduplicating DAG copies.

I work for Nutanix and I wrote our best practice guide for Exchange which can be found below. In the guide, I recommended Compression but not deduplication. In an upcoming update of the document the recommendation remains to use compression but adds a further recommendation to use Erasure coding (EC-X) for data reduction.

Nutanix Best Practices Guide: Virtualizing Microsoft Exchange on Web-Scale Converged Infrastructure.

The reason for these recommendations is three fold:

1. Compression + EC-X give excellent data reduction savings for Exchange which generally result in usable capacity higher than RAW capacity while still providing data protection at the storage layer.

2. Deduplicating data which is deliberately written multiple times is a huge overhead on any infrastructure as data is still processed multiple times by the Guest OS, Storage Network and storage controller even if deplicate copies are not written to disk. To be clear, the Guest OS (CPU) and Storage network overhead are not eliminated by dedupe.

3. Nutanix recommends the use of hybrid nodes for Exchange with a small percentage of capacity provided by SSD (for all write I/O and hot data) and a large percentage of capacity provided by SATA. As a result the bulk of the data is stored on low cost SATA so the commercial benefit ($ per GB) of deduplication is minimal especially after compression and EC-X.

In my opinion deduplicating everything regardless of its profile is not the answer, so data reduction such as deduplication, compression and Erasure Coding should be able to be turned off for workloads which give minimal benefit.

For Exchange DAGs, deduplication should give excellent data reduction results in line with the number of DAG copies. So if an Exchange DAG has 4 copies, then approx 4:1 data reduction should be achieved right off the bat. Now this sounds great but when running a DAG on highly available shared storage (SAN/NAS/HCI) it is unnessasary to have 4 copies of data.

In reality, I recommend 2 copies when running on Nutanix because the shared storage provided by Nutanix keeps at least 1 additional copy (if using EC-X) or where using RF2 or RF3, 2 or 3 copies of data meaning in the event of a drive or node failure, the data is still available to the application without requiring a DAG failover. Similar is true when running Exchange on SAN/NAS/HCI solutions with some form of RAID or replication for data protection.

So the benefit of deduplication would therefore reduce to from possibly 4:1 down to 2:1 because only 2 DAG copies are really required if the storage is highly available.

Considering the data reduction from compression and storage solutions supporting Erasure Coding, I think deduplication is only commercially viable/required when using expensive all flash storage which lets face it, is not required for Exchange.

If you have chosen an all flash solution and you want to run all workloads on it and eliminate having silos of infrastructure for different workloads, then by all means deduplicate Exchange DAGs otherwise it will be a super expensive solution. But, in my opinion hybrid is still the best solution overall with the only real advantage of all flash being potentially higher and more consistent performance depending on many factors.

Summary:

I hope that Microsoft clarify their position regarding support for array level data reduction technologies including deduplication with detailed justifications.

I would be disappointed to see Microsoft come out and update the support policy stating deduplication (for array’s) is not supported as there is not technical reason it should not be supported (Happy to be corrected if credible evidence can be provided) regardless of if you think its a good idea or not.

Having worked in the storage industry for a long time, I have seen many different deduplication solutions used successfully with MS Exchange and I am yet to see any evidence that it is not a totally viable and enterprise grade option for Exchange databases.

The question which remains is, do you need to deduplicate Exchange databases? – My thinking is only where your using all flash systems and need to lower cost per GB.

My position being the better solution would be choose a hybrid solution when eliminating silos which gives you the best of all worlds and applications requiring all flash can have all flash and other workloads can use flash for hot data and lower cost SATA for cold storage or data which doesn’t require SSD (like Exchange).

Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

Many customers and partners have expressed interest in Acropolis since it was officially launched at .NEXT in June earlier this year, and since then lots of questions have been asked around resiliency/availability etc.

In this post I will cover how I/O failover occurs and how AHV load balances in the event of I/O failover to ensure optimal performance.

Let’s start with an Acropolis node under normal circumstances. The iSCSI initiator for QEMU connects to the iSCSI redirector which directs all I/O to the local stargate instance which runs within the Nutanix Controller VM (CVM) as shown below.

AHVMPdefault

I/O will always be serviced by the local stargate unless a CVM upgrade, shutdown or failure occurs. In the event one of the above occurs QEMU will loose connection to the local stargate as shown below.AHVMPfailedlocal

When this loss of connectivity to stargare occurs, QEMU reconnects to the iSCSI redirector and establishes a connection to a remote stargate as shown below.AHVMPremote

The process of re-establishing an iSCSI connection is near instant and you will likely not even notice this has occurred.

Once the local stargate is back online (and stable for 300 seconds) I/O will be redirected back locally to ensure optimal performance.

AHVMPfailback

In the unlikely event that the remote stargate goes down before the local stargate is back online then the iSCSI redirector will redirect traffic to another remote stargate.

Next lets talk about Load Balancing.

Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing policy to choose which removes another layer of complexity and potential point of failure.

However in the event of the local CVM being unavailable for any reason we need to service I/O for all the VMs on the node in the most efficient manner. Acropolis does this by redirecting I/O on a per vDisk level to a random remote stargate instance as shown below.

pervmpathfailover

Acropolis can do this because every vdisk is presented via iSCSI and is its own target/LUN which means it has its own TCP connection. What this means is a business critical application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by multiple controllers concurrently.

As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or maintenance scenario.

As i’m sure you can now see, Acropolis provides excellent resiliency and performance even during maintenance or failure scenarios.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM

Exchange 2013 & VMware – The latest round.

Recently there has been some back and forth on social media (Twitter & Blogs) around the following article published on the Exchange Team Blog:

Troubleshooting High CPU utilization issues in Exchange 2013
The article discusses several topics including Common Configuration Issues, Over-sizing and several performance metrics which can help Exchange administrators identify and hopefully resolve performance problems.
As a result of some of the recent updated recommendations, the Exchange 2013 Server Role Requirements calculator has been updated to reflect the newer recommendations.

The calculator can be found here: Exchange 2013 Server Role Requirements Calculator.

On key recommendation from Microsoft is the maximum Exchange Mailbox (or Multi-Role) Server sizing now being:

Recommended Maximum CPU Core Count

24

Recommended Maximum Memory

96 GB

In reply to the above, VMware have released the following article.

A Stronger Case For Virtualizing Exchange Server 2013 – Think “Performance”

Personally, I don’t think the article will (or has) had the effect VMware and Virtualization architects/admins would have liked due to its negativity. As such I am going to highlight the important points in this post.

In response to VMware’s article, a well known member of the Exchange community and MVP, Tony Redmond wrote the below article.

VMware tells Microsoft that they don’t know anything about Exchange 2013 performance

In this case, even as a strong Virtualization (and VMware) evangelist, I think Tony has some valid points.

Tony mentions the following regarding the sizing tool:

Most experienced people take the output from any general-purpose sizing tool and cast a cold eye over its recommendations to put them into context with the operational and business requirements for a deployment. In other words, the recommendations are adjusted. And yes, sometimes those recommendations are adjusted to make sure that Exchange 2013 works well when deployed on virtualized servers, either Hyper-V or VMware.

I totally agree! No matter if the sizing tool was recommending more CPUs or not, experienced architects should be making decisions based on many factors, the most important being their real world experience and using the calculator as a secondary (or tertiary) input and adjusting for the underlying infrastructure, regardless of it being physical or virtualized on one of many viable hypervisors.

I also agree with the following point:

Third, wouldn’t it be better use of VMware’s time to publish well-argued and pertinent observations on how you can take the output of Microsoft sizing tools and adjust them for their platform

VMware can be considered the experts in their own technology and they should as Tony suggested publish and continue to update documentation on how best to successfully deploy Exchange on vSphere. There is no advantage in an “I told you so” style blog posts when Microsoft have actually published recommendations which to the one point of VMware’s blog that I agree with, Strengthens the case for Virtualization of Exchange.

For companies like Nutanix who have numerous Virtualization experts across multiple hypervisors including vSphere, Hyper-V and Acropolis (which is fully supported for Windows 2012 and Exchange) we should also post reference architectures and best practice guides on how to deploy Exchange successfully.

Tony also commented in reply to VMware’s post which was not favourable towards Multi-role deployments:

Combined role means a multi-role server, which I think that every reasonable expert working with Exchange has concluded is the only way to go because it increases server utilization and improves the overall resilience of any deployment. But it’s a bad thing in VMware’s world, which is a pity for them because Exchange 2016 only supports multi-role servers, so I guess they will just have to get used to that fact.

VMware’s point here is scaling out (rather than up) means in general better overall consolidation and performance. Now to be fair this is true but one cannot apply a blanket rule for all applications.

Deploying Exchange 2013 in a Multi-Role configuration is a good way to simplify as well as ensure consistent performance and resiliency in the event of a server failure as all servers can service all roles.

As Exchange is generally considered a Business Critical Application, Virtualization architects like myself should consider the recommendations for the application, the critically of the solution to the customer and make an informed recommendation for its deployment. For an app like Exchange, I think its fair to say the MS Exchange team have solid justification to recommend MSR deployments.

The impact of less scale out (i.e.: MBX + CAS) is simply a design consideration, and not an uncommon one so this in my opinion is not an issue for virtual deployments.

Key Take Aways:

1. Ensure you size within Microsoft’s revised vCPU / vRAM recommended configuration maximums (24 vCPUs / 96GB RAM)

2. When sizing Exchange on any Hypervisor, start small and scale up vCPU/vRAM requirements as required to avoid oversizing. (This is a major advantage of virtualization, so don’t be afraid to use it!)

3. Multi-Role Deployments are perfectly fine for Virtual environments. Exchange is a Business Critical Application, so treat it as such in your design phase.

Final thoughts:

With the ever increasing number of cores per socket, the case to virtualize Exchange is strengthened when considering 12-18c CPUs are not uncommon these days. As such an 18vCPU / 96GB RAM Exchange 2013 MSR VM could be virtualized with zero CPU/RAM overcommitment (as is generally recommended) while running other VMs on the same host.

This helps to remove silos within the datacenter as well as driving up utilization (without creating resource contention) which equates to lower cost/power/cooling/maintenance, the list goes on.

While Virtualization does add some complexity & cost, I would argue with newer technologies such as Nutanix which replace complex and costly SAN/NAS storage with simple to deploy and manage scale out storage these (storage) challenges are soon going to be things of the past.

Nutanix also allows customers to choose a premium feature rich hypervisor such as ESXi, or lower cost/feature solutions such as Hyper-V or Acropolis which allows customers to work with what they are comfortable with.

Acropolis for example is fully supported by Microsoft (see Microsoft SVVP) and can be deployed in <30mins with just a few clicks and its free with Nutanix, so the cost and complexity arguments against virtualization of Exchange just went out the window and Exchange would have all the benefits of things like Virtual Machine High Availability, Migration (vMotion),

Please to hear everyone’s thoughts.