Deduplication and MS Exchange

Virtualization and Storage always seem to be a hot topics in regards to Exchange deployments and many of you would have seen my post Virtualizing Exchange on vSphere with NFS backed storage a while back.

This post was motivated by a tweet from fellow VCDX which stated:

dedupe not supported for Exchange, no we can’t turn it off.

Later in the twitter conversation he went on to say

To be clear not an MS employee, another integrator MS “master” certified. It’s the whole NFS thing again

I have heard similar over the years and for me the disappointing thing is the support statement is unclear as are the motivations behind support statements for Exchange in general. e.g.: Support for VMDK on NFS

The only support statement I am aware of regarding Exchange and deduplication is in the technet article “Exchange 2013 storage configuration options” under the section “Volume configurations for the Exchange 2013 Mailbox server role” at it states:

storageexchange

In the above statement which specifically refers to “a new technique to optimize storage utilization for Windows Server 2012” is states that for Stand-alone or High availability solutions de-duplication is not supported for Exchange database file unless the DB files are completely offline and used for backup or archives.

So the first question is “Is array level deduplication supported”?

There is nothing that says that it isn’t supported that I am aware of, so if you are aware of such a statement please let me know in the comments and I will update this post.

My interpretation of the support statement is that array level deduplication is supported and MS have simply called out that the deduplication in Windows 2012 is not. Regardless of if you agree or disagree with my interpretation, I think its safe to say the support statement should be clarified with justification.

The next question I would like to discuss is “Should deduplication be used with Exchange”?

Firstly we should discuss the fact Exchange can be deployed with Database Availability Groups (DAGs) which creates multiple copies of Exchange databases across up to 16 Exchange Mailbox (or Multi-Role) servers.

The purpose of a DAG is to provide high availability for the application and data.

So if the application is by design making duplicate copies, should the storage be undoing this work?

Before I give my opinion on deduplicating DAG copies, I want to be clear on two things:

1. Deduplication is a well proven technology which many different vendors implement either in-line or post process or in some cases both.

2. As array level deduplication is abstracted from the Guest OS and Application, there is no risk to the application such as data corruption or anything like that.

So back to deduplicating DAG copies.

I work for Nutanix and I wrote our best practice guide for Exchange which can be found below. In the guide, I recommended Compression but not deduplication. In an upcoming update of the document the recommendation remains to use compression but adds a further recommendation to use Erasure coding (EC-X) for data reduction.

Nutanix Best Practices Guide: Virtualizing Microsoft Exchange on Web-Scale Converged Infrastructure.

The reason for these recommendations is three fold:

1. Compression + EC-X give excellent data reduction savings for Exchange which generally result in usable capacity higher than RAW capacity while still providing data protection at the storage layer.

2. Deduplicating data which is deliberately written multiple times is a huge overhead on any infrastructure as data is still processed multiple times by the Guest OS, Storage Network and storage controller even if deplicate copies are not written to disk. To be clear, the Guest OS (CPU) and Storage network overhead are not eliminated by dedupe.

3. Nutanix recommends the use of hybrid nodes for Exchange with a small percentage of capacity provided by SSD (for all write I/O and hot data) and a large percentage of capacity provided by SATA. As a result the bulk of the data is stored on low cost SATA so the commercial benefit ($ per GB) of deduplication is minimal especially after compression and EC-X.

In my opinion deduplicating everything regardless of its profile is not the answer, so data reduction such as deduplication, compression and Erasure Coding should be able to be turned off for workloads which give minimal benefit.

For Exchange DAGs, deduplication should give excellent data reduction results in line with the number of DAG copies. So if an Exchange DAG has 4 copies, then approx 4:1 data reduction should be achieved right off the bat. Now this sounds great but when running a DAG on highly available shared storage (SAN/NAS/HCI) it is unnessasary to have 4 copies of data.

In reality, I recommend 2 copies when running on Nutanix because the shared storage provided by Nutanix keeps at least 1 additional copy (if using EC-X) or where using RF2 or RF3, 2 or 3 copies of data meaning in the event of a drive or node failure, the data is still available to the application without requiring a DAG failover. Similar is true when running Exchange on SAN/NAS/HCI solutions with some form of RAID or replication for data protection.

So the benefit of deduplication would therefore reduce to from possibly 4:1 down to 2:1 because only 2 DAG copies are really required if the storage is highly available.

Considering the data reduction from compression and storage solutions supporting Erasure Coding, I think deduplication is only commercially viable/required when using expensive all flash storage which lets face it, is not required for Exchange.

If you have chosen an all flash solution and you want to run all workloads on it and eliminate having silos of infrastructure for different workloads, then by all means deduplicate Exchange DAGs otherwise it will be a super expensive solution. But, in my opinion hybrid is still the best solution overall with the only real advantage of all flash being potentially higher and more consistent performance depending on many factors.

Summary:

I hope that Microsoft clarify their position regarding support for array level data reduction technologies including deduplication with detailed justifications.

I would be disappointed to see Microsoft come out and update the support policy stating deduplication (for array’s) is not supported as there is not technical reason it should not be supported (Happy to be corrected if credible evidence can be provided) regardless of if you think its a good idea or not.

Having worked in the storage industry for a long time, I have seen many different deduplication solutions used successfully with MS Exchange and I am yet to see any evidence that it is not a totally viable and enterprise grade option for Exchange databases.

The question which remains is, do you need to deduplicate Exchange databases? – My thinking is only where your using all flash systems and need to lower cost per GB.

My position being the better solution would be choose a hybrid solution when eliminating silos which gives you the best of all worlds and applications requiring all flash can have all flash and other workloads can use flash for hot data and lower cost SATA for cold storage or data which doesn’t require SSD (like Exchange).

Scaling Hyper-converged solutions – Compute only.

A quick bit of history on Nutanix, back in mid 2013 when I joined, in almost every meeting I went to, and presentation I gave, there was a common theme. People wanted to scale compute and storage at different rates.

Now this makes perfect sense, and this issue has long been addressed by a large range of node types which can be mixed in the same Nutanix cluster.

For example: NX3060 nodes with Dual Intel Haswell CPUs and ~2TB usable storage can be mixed with NX6060 nodes also running dual Intel Haswell CPUs but with ~8TB usable each.

Nutanix also has configure to order (CTO) nodes where size of SSDs and HDDs can be modified to suit customer requirements. So at this point I never have a challenge sizing for a customer workload as I have plenty of great options to choose from.

Another common question has been “How do I scale storage only?”. Nutanix has also addressed this in an intelligent way and as a result adding “Storage Only” nodes makes sense as I described in Scale Storage separately to Compute on Nutanix!

In recent months a new question has emerged and a small percentage of partners/customers have been asking about adding Compute only nodes (e.g.: Traditional ESXi hosts) to a Nutanix (or HCI) cluster.

My first question to these customers/partners is: Why?

The typical reply is something like “Because we need to add more VMs which have low storage requirements” or “Because we don’t need storage”.

Let’s look at these answers:

Firstly, my favourite one, “Because we don’t need storage”.

Is this really true, or do you mean the new VMs have low storage requirements. In almost all cases the truth is the new VMs have a small requirement for storage capacity and performance.

So next let’s look at the other common (and more realistic) situation:

“Because we need to add more VMs which have low storage requirements”

So this is very possible and something a HCI solution should cater for and for Nutanix we do. For example one of our most popular nodes is the NX-3050 or NX-3060 which are a compute heavy node with 2 sockets each with up to 24 physical CPU cores (Haswell) and 512GB RAM.

This node also comes with 2 x SSDs and 4 x SATA HDDs with a minimum usable capacity of approx 2TB (of which 20% is SSD).

So while the solution adds some capacity, its giving the added advantage of ensuring all the advantages of HCI while eliminating the complexity of a 3-tier architecture, which is why customers are flocking to HCI in the 1st place.

Even if the capacity is not required and the SSDs simply service the reads locally where required and increase the shared SSD tier of the cluster which means more write performance for workloads throughout the cluster. Sounds pretty good to me!

Does having an additional 4 x SATA drives really matter? Well from a cost perspective, its minimal cost and thanks to Disk Balancing, the SATA drives will hold some data (such as replicas) which lowers the overheads on other nodes, therefore improving resiliency and performance.

So there is lots of advantages to adding even a small amount of storage even if the new workloads don’t require most of it.

But for those of you who aren’t already convinced that adding some storage is advantageous, how about adding dual Intel Haswell CPUs and up to 512GB RAM just 1 x SSD to accelerate write I/O and serve what little storage locally that the VMs need and just 2 x SATA HDDs.

Nutanix has such a node, which is another option to scale high compute and very low storage.

Another question I get is: “Is the fact Nutanix can’t do this why you don’t recommend it?”

The answer is, Nutanix can add compute only, and we can actually do it very well and get very good performance, but its not HCI and it adds complexity which is not necessary which is why we don’t recommend (or Productise) this option.

Now let’s look at what adding compute only to HCI looks like?

warning-contents-may-offend_design-200x200 (1)

*Scroll down when ready!

V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V

HCInotHCI

 

Yuk! That looks like old school 3-tier stuff to me!

As the above shows, adding Compute Only to HCI basically means you have a non HCI solution for part of your workloads.

Non HCI workloads on compute only nodes would therefore:

  • Be running in the same setup as traditional 3-tier infrastructure
  • Have different performance than HCI based workloads
  • Loose the advantage of having compute + storage close together
  • Increase dependency on Network
  • Impact network utilization of HCI node
  • Impact benefits of HCI for the native HCI workloads and much more.

The industry has accepted HCI as they way of the future and while adding compute only nodes might sound nice at a high level, its just re-introducing the class 3-tier complexity and problems of the past.

Summary:

If you have already invested in HCI, you clearly understand the advantages and value of the solution. Adding compute only is not a true “value” its just a “perceived value”.

Adding “Compute only” is just adding complexity and moving away from the value HCI brings, so my advice, don’t make the mistake, but if you have, you now know the solution.

Invest in a compute+storage node (albeit at a higher CAPEX) and enjoy the continued value of HCI and improve performance and resiliency to your entire cluster! Now that’s real value (at a reasonable cost).

And just remember….

cheaper

Related Posts:

1. Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM

Nutanix – Erasure Coding (EC-X) Deep Dive

I published a post earlier this month during the .NEXT conference titled “What’s .NEXT? – Erasure Coding!” which covered the basics of Nutanix EC-X implementation.

This post is a deep drive follow on to answer numerous questions I have received about EC-X such as:

1. Does it work with Compression and De-duplication?
2. Can I use EC-X to reduce the overhead of RF3?
3. Does it work on Hot or Cold data?
4. Does it work only on the SATA tier?
5. What is the performance impact?
6. When should I use/not use EC-X?
7. What’s different about Nutanix (Patent pending) EC-X compared to other EC algorithms?
8. How does EC-X impact Data Locality?
9. What Hypervisors is EC-X supported with?

So let’s start with What’s different about Nutanix (Patent pending) EC-X compared to other EC algorithms?

* Nutanix EC-X is optimized for a distributed platform, where data is spread across nodes, not individual disks to ensure optimal performance. This also ensures rebuild times are faster and lower impact as the rebuild is performed across all the nodes/drives.

* Nutanix EC-X is also performed as a background task and only on Write Cold data meaning the configured RF is completed as normal and then as a post process EC-X is performed to ensure the write process is not potentially slowed by requiring numerous nodes within the cluster to participate in the initial write I/O.

How does EC-X affect existing Nutanix Data Reduction technologies.

* Short answer, EC-X is complimentary to both compression and deduplication so you will get even more data reduction. Here is a sample screen shot from the Home screen in PRISM which shows a breakdown of Dedup, Compression and Erasure Coding savings.

CapacityOptimization

In the Storage Tab within PRISM, we can get further details on the capacity savings. Here we see an example Container with Compression and EC-X enabled:

CompplusECXhighlighted

Does it work only on the SATA tier?

No, EC-X works on all tiers, being SSD and SATA today, but in the future when newer technology or more than two tiers are used, EC-X works across all tiers.

Does EC-X work on Hot or Cold data?

EC-X waits until data written (via RF2 or RF3) is “Write Cold”, meaning the data is not being overwritten. The data might be white hot from a read I/O perspective, but as long as its not being overwritten the extent group (4MB) will be a candidate for EC-X.

This means for data which is Write Cold, the effective capacity of the SSD tier will be increased due to requiring less space thanks to EC-X.

What is the performance impact?

As EC-X is a post process task and EC-X waits until data is “Write Cold” before performing EC-X on the data, in general it will not impact the Write performance.

The exception to this is in the event data is Write Cold for a period of time, then the data is overwritten, this “overwrite” will incur a higher penalty than a typical RF2/RF3 write. As such some workloads may not be suitable for EC-X which I will discuss later.

Overall, if the workload is suitable, EC-X will keep the data in the SSD tier and the parity on the SATA tier which effectively extends the usable capacity of the SSD tier therefore helping to increase performance (as with compression and dedup).

What Hypervisors is EC-X supported with?

Everything in the Nutanix Distributed Storage Fabric (part of the Nutanix Xtreme Computing Platform or XCP) is designed to be hypervisor agnostic. So whatever Hypervisor/s you choose, you can benefit from EC-X!

How does EC-X impact Data Locality?

As the initial Write path is not impacted by enabling EC-X, Data Locality is still maintained and ensures one copy of data is written to the local node where the VM is running while replicating a further one or two copies (dependent on RF configuration) throughout the cluster.

This means that for newly written data as well as data being overwritten at frequencies of <60mins will always maintain data locality.

For data which meets the criteria for EC-X to be performed, such as Read Hot or Write Cold data, Data Locality can only be partially maintained as the data is by design striped across nodes. The result of this means that it is probable Read I/O will be performed over the network.

Importantly though Read Hot data will be maintained in the SSD tier and be distributed throughout the cluster. This means a single VMs read I/O can be served by multiple nodes concurrently which can lead to increased performance.

As EC-X also provides capacity savings, this allows for more data to be serviced by the SSD tier which enabled a larger active working set to perform at SSD speeds.

In summary, while Data Locality is not always maintained when using EC-X, the advantages of EC-X far outweigh the partial loss in Data Locality.

And finally, When should I use/not use EC-X?

As discussed earlier, EC-X is applied to Write Cold data and if/when that data is overwritten, the write penalty is higher than a typical RF2 write I/O. So if your dataset has a high percentage of overwrites, it is recommended not to use EC-X. The good news is storage can be assigned on a per VMDK level (or vDisk at the NDFS layer) so you can have one VM using EC-X for some data and RF2/3 for other data, again giving customers the best of both worlds.

The best workloads for EC-X are:

1. File Servers
2. Backup
3. Archive
4. Email
5. Logging

Summary:

Nutanix EC-X gives customers more choice without compromising functionality and performance while dramatically reduces the cost/GB of storage.

Related Articles:

  1. Large scale clusters and increased resiliency with RF3 + EC-X
  2. What I/O will Nutanix Erasure coding (EC-X) take effect on?

  3. Sizing assumptions for solutions with Erasure Coding (EC-X)