Exchange 2013 & VMware – The latest round.

Recently there has been some back and forth on social media (Twitter & Blogs) around the following article published on the Exchange Team Blog:

Troubleshooting High CPU utilization issues in Exchange 2013
The article discusses several topics including Common Configuration Issues, Over-sizing and several performance metrics which can help Exchange administrators identify and hopefully resolve performance problems.
As a result of some of the recent updated recommendations, the Exchange 2013 Server Role Requirements calculator has been updated to reflect the newer recommendations.

The calculator can be found here: Exchange 2013 Server Role Requirements Calculator.

On key recommendation from Microsoft is the maximum Exchange Mailbox (or Multi-Role) Server sizing now being:

Recommended Maximum CPU Core Count

24

Recommended Maximum Memory

96 GB

In reply to the above, VMware have released the following article.

A Stronger Case For Virtualizing Exchange Server 2013 – Think “Performance”

Personally, I don’t think the article will (or has) had the effect VMware and Virtualization architects/admins would have liked due to its negativity. As such I am going to highlight the important points in this post.

In response to VMware’s article, a well known member of the Exchange community and MVP, Tony Redmond wrote the below article.

VMware tells Microsoft that they don’t know anything about Exchange 2013 performance

In this case, even as a strong Virtualization (and VMware) evangelist, I think Tony has some valid points.

Tony mentions the following regarding the sizing tool:

Most experienced people take the output from any general-purpose sizing tool and cast a cold eye over its recommendations to put them into context with the operational and business requirements for a deployment. In other words, the recommendations are adjusted. And yes, sometimes those recommendations are adjusted to make sure that Exchange 2013 works well when deployed on virtualized servers, either Hyper-V or VMware.

I totally agree! No matter if the sizing tool was recommending more CPUs or not, experienced architects should be making decisions based on many factors, the most important being their real world experience and using the calculator as a secondary (or tertiary) input and adjusting for the underlying infrastructure, regardless of it being physical or virtualized on one of many viable hypervisors.

I also agree with the following point:

Third, wouldn’t it be better use of VMware’s time to publish well-argued and pertinent observations on how you can take the output of Microsoft sizing tools and adjust them for their platform

VMware can be considered the experts in their own technology and they should as Tony suggested publish and continue to update documentation on how best to successfully deploy Exchange on vSphere. There is no advantage in an “I told you so” style blog posts when Microsoft have actually published recommendations which to the one point of VMware’s blog that I agree with, Strengthens the case for Virtualization of Exchange.

For companies like Nutanix who have numerous Virtualization experts across multiple hypervisors including vSphere, Hyper-V and Acropolis (which is fully supported for Windows 2012 and Exchange) we should also post reference architectures and best practice guides on how to deploy Exchange successfully.

Tony also commented in reply to VMware’s post which was not favourable towards Multi-role deployments:

Combined role means a multi-role server, which I think that every reasonable expert working with Exchange has concluded is the only way to go because it increases server utilization and improves the overall resilience of any deployment. But it’s a bad thing in VMware’s world, which is a pity for them because Exchange 2016 only supports multi-role servers, so I guess they will just have to get used to that fact.

VMware’s point here is scaling out (rather than up) means in general better overall consolidation and performance. Now to be fair this is true but one cannot apply a blanket rule for all applications.

Deploying Exchange 2013 in a Multi-Role configuration is a good way to simplify as well as ensure consistent performance and resiliency in the event of a server failure as all servers can service all roles.

As Exchange is generally considered a Business Critical Application, Virtualization architects like myself should consider the recommendations for the application, the critically of the solution to the customer and make an informed recommendation for its deployment. For an app like Exchange, I think its fair to say the MS Exchange team have solid justification to recommend MSR deployments.

The impact of less scale out (i.e.: MBX + CAS) is simply a design consideration, and not an uncommon one so this in my opinion is not an issue for virtual deployments.

Key Take Aways:

1. Ensure you size within Microsoft’s revised vCPU / vRAM recommended configuration maximums (24 vCPUs / 96GB RAM)

2. When sizing Exchange on any Hypervisor, start small and scale up vCPU/vRAM requirements as required to avoid oversizing. (This is a major advantage of virtualization, so don’t be afraid to use it!)

3. Multi-Role Deployments are perfectly fine for Virtual environments. Exchange is a Business Critical Application, so treat it as such in your design phase.

Final thoughts:

With the ever increasing number of cores per socket, the case to virtualize Exchange is strengthened when considering 12-18c CPUs are not uncommon these days. As such an 18vCPU / 96GB RAM Exchange 2013 MSR VM could be virtualized with zero CPU/RAM overcommitment (as is generally recommended) while running other VMs on the same host.

This helps to remove silos within the datacenter as well as driving up utilization (without creating resource contention) which equates to lower cost/power/cooling/maintenance, the list goes on.

While Virtualization does add some complexity & cost, I would argue with newer technologies such as Nutanix which replace complex and costly SAN/NAS storage with simple to deploy and manage scale out storage these (storage) challenges are soon going to be things of the past.

Nutanix also allows customers to choose a premium feature rich hypervisor such as ESXi, or lower cost/feature solutions such as Hyper-V or Acropolis which allows customers to work with what they are comfortable with.

Acropolis for example is fully supported by Microsoft (see Microsoft SVVP) and can be deployed in <30mins with just a few clicks and its free with Nutanix, so the cost and complexity arguments against virtualization of Exchange just went out the window and Exchange would have all the benefits of things like Virtual Machine High Availability, Migration (vMotion),

Please to hear everyone’s thoughts.

Advanced Storage Performance Monitoring with Nutanix

Nutanix provides excellent performance monitoring and analytic capabilities through our HTML 5 based PRISM UI, but what if you want to delve deeper into the performance of a specific business critical application?

Nutanix also provides advanced storage performance monitoring and workload profiling through port 2009 on any CVM which shows very granular details for Virtual disks.

By default, Nutanix secures our CVM and the http://CVM_IP:2009 page is not accessible, but for advanced troubleshooting this can be enabled by using the following command.

sudo iptables -t filter -A WORLDLIST -p tcp -m tcp –dport 2009 -j ACCEPT

 

When accessing the 2009 page (which is part of the Nutanix process called “Stargate”) you will see things like Extent (In Memory Read) cache usages and hits as well as much more.

On the main 2009 page you will see a section called “Hosted VDisks” (shown below) which shows all the current VDisks (equivalent of a VMDK in ESXi) which are currently running on that node.

HostedvDisks

 

The Hosted VDisks shows high level details about the VDisk such as Outstanding Operations, capacity usage, Read/Write breakdown and how much data is in the OpLog (Persistent Write Cache).

If you need more information, you can click on the “VDisk Id” and you will get to a page titled “VDisk XXXXX Stats” where the XXXXX is the VDisk ID.

The below is some of the information which can be discovered in the VDisk Stats Page.

VDisk Working Set Size (WWS)

The working set size can be thought of as the data which you would ideally want to fit within the SSD tier of a Nutanix node, which would result in all-flash type performance.

In the below example, in the last 2mins, the VDisk had a combined (or Union) working set of 6.208GB and over the last 1hr over 111GB.

WSSExchange

 

 

VDisk Read Source

The Read Source is simply what tier of storage is servicing the VDisks IO requests. In the below example, 41% was from Extent Cache (In Memory), 7% was from the SSD Extent Store and 52% was from the SATA Extent Store.
ReadSource

 

In the above example, this was an Exchange 2013 workload where the total dataset was approx 5x the size of the SSD tier. The important point here is its not always possible to have all data in the SSD tier, but its critical to ensure consistent performance. If 90% was being served from SATA and performance was not acceptable, you could use this information to select a better node to migrate (vMotion) the VM too, or help choose to purchase a new node.

VDisk Write Destination

The Write Destination is fairly self explanatory, if its Oplog it means its Random IO and its being written to SSD, if its straight to the extent store (SSD) it means the IO is either sequential, OR in rare cases the OpLog is being bypassed if the SSD tier reached 95% full (which is generally prevented by Nutanix ILM tiering process).

WriteDestination

VDisk Write Size Distribution

The Write Size Distribution is key to determining things like the Windows Allocation Size when formatting drives as well as understanding the workload.

WriteSizeOverall

VDisk Read Size Distribution

The Read Size Distribution is similar to Write Size in that its key to determining things like the Windows Allocation Size when formatting drives as well as understanding the workload. In this case, a 64k allocation size would be ideal as both the Write (shown above) and the Read (below) are >32K and <64K 86% of the time. (Which is expected as this was an Exchange 2013 workload).

ReadSizeExchange

VDisk Write Latency

The Write Latency shows the percentage of Write I/O which are serviced within the latency ranges shown. In this case, 52% of writes are sub-millisecond. It also shows for this vDisk 1% of IO being outliers being served between 5-10ms. This is something that outside of a lab, if the outliers were a significant percentage that could be investigated to ensure the VM disk configuration (e.g.: PVSCSI and number of VMDKs) is optimal.

WriteLatency

VDisk Ops and Randomness

Here we see the number of IOPS, the Read/Write split, MB/s and the split between Random and Sequential.

vDisksOps

Summary

For any enterprise grade storage solution, it is important that performance monitoring be easy as it is with Nutanix via PRISM UI, but also to be able to quickly and easily dive deep into very granular details about a specific VM or VDisk. The above shows just a glimpse of the information which is tracked by default for all VDisks allowing customers , partners and Nutanix support to quickly and easily monitor & profile workloads.

Importantly these capabilities are hypervisor agnostic giving customers the same capabilities no matter what choice/s they make.

 

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

3. Advanced Storage Performance Monitoring with Nutanix

4. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

5. Nutanix – Erasure Coding (EC-X) Deep Dive

6. Acropolis: VM High Availability (HA)

7. Acropolis: Scalability

8. NOS & Hypervisor Upgrade Resiliency in PRISM

Nutanix – Erasure Coding (EC-X) Deep Dive

I published a post earlier this month during the .NEXT conference titled “What’s .NEXT? – Erasure Coding!” which covered the basics of Nutanix EC-X implementation.

This post is a deep drive follow on to answer numerous questions I have received about EC-X such as:

1. Does it work with Compression and De-duplication?
2. Can I use EC-X to reduce the overhead of RF3?
3. Does it work on Hot or Cold data?
4. Does it work only on the SATA tier?
5. What is the performance impact?
6. When should I use/not use EC-X?
7. What’s different about Nutanix (Patent pending) EC-X compared to other EC algorithms?
8. How does EC-X impact Data Locality?
9. What Hypervisors is EC-X supported with?

So let’s start with What’s different about Nutanix (Patent pending) EC-X compared to other EC algorithms?

* Nutanix EC-X is optimized for a distributed platform, where data is spread across nodes, not individual disks to ensure optimal performance. This also ensures rebuild times are faster and lower impact as the rebuild is performed across all the nodes/drives.

* Nutanix EC-X is also performed as a background task and only on Write Cold data meaning the configured RF is completed as normal and then as a post process EC-X is performed to ensure the write process is not potentially slowed by requiring numerous nodes within the cluster to participate in the initial write I/O.

How does EC-X affect existing Nutanix Data Reduction technologies.

* Short answer, EC-X is complimentary to both compression and deduplication so you will get even more data reduction. Here is a sample screen shot from the Home screen in PRISM which shows a breakdown of Dedup, Compression and Erasure Coding savings.

CapacityOptimization

In the Storage Tab within PRISM, we can get further details on the capacity savings. Here we see an example Container with Compression and EC-X enabled:

CompplusECXhighlighted

Does it work only on the SATA tier?

No, EC-X works on all tiers, being SSD and SATA today, but in the future when newer technology or more than two tiers are used, EC-X works across all tiers.

Does EC-X work on Hot or Cold data?

EC-X waits until data written (via RF2 or RF3) is “Write Cold”, meaning the data is not being overwritten. The data might be white hot from a read I/O perspective, but as long as its not being overwritten the extent group (4MB) will be a candidate for EC-X.

This means for data which is Write Cold, the effective capacity of the SSD tier will be increased due to requiring less space thanks to EC-X.

What is the performance impact?

As EC-X is a post process task and EC-X waits until data is “Write Cold” before performing EC-X on the data, in general it will not impact the Write performance.

The exception to this is in the event data is Write Cold for a period of time, then the data is overwritten, this “overwrite” will incur a higher penalty than a typical RF2/RF3 write. As such some workloads may not be suitable for EC-X which I will discuss later.

Overall, if the workload is suitable, EC-X will keep the data in the SSD tier and the parity on the SATA tier which effectively extends the usable capacity of the SSD tier therefore helping to increase performance (as with compression and dedup).

What Hypervisors is EC-X supported with?

Everything in the Nutanix Distributed Storage Fabric (part of the Nutanix Xtreme Computing Platform or XCP) is designed to be hypervisor agnostic. So whatever Hypervisor/s you choose, you can benefit from EC-X!

How does EC-X impact Data Locality?

As the initial Write path is not impacted by enabling EC-X, Data Locality is still maintained and ensures one copy of data is written to the local node where the VM is running while replicating a further one or two copies (dependent on RF configuration) throughout the cluster.

This means that for newly written data as well as data being overwritten at frequencies of <60mins will always maintain data locality.

For data which meets the criteria for EC-X to be performed, such as Read Hot or Write Cold data, Data Locality can only be partially maintained as the data is by design striped across nodes. The result of this means that it is probable Read I/O will be performed over the network.

Importantly though Read Hot data will be maintained in the SSD tier and be distributed throughout the cluster. This means a single VMs read I/O can be served by multiple nodes concurrently which can lead to increased performance.

As EC-X also provides capacity savings, this allows for more data to be serviced by the SSD tier which enabled a larger active working set to perform at SSD speeds.

In summary, while Data Locality is not always maintained when using EC-X, the advantages of EC-X far outweigh the partial loss in Data Locality.

And finally, When should I use/not use EC-X?

As discussed earlier, EC-X is applied to Write Cold data and if/when that data is overwritten, the write penalty is higher than a typical RF2 write I/O. So if your dataset has a high percentage of overwrites, it is recommended not to use EC-X. The good news is storage can be assigned on a per VMDK level (or vDisk at the NDFS layer) so you can have one VM using EC-X for some data and RF2/3 for other data, again giving customers the best of both worlds.

The best workloads for EC-X are:

1. File Servers
2. Backup
3. Archive
4. Email
5. Logging

Summary:

Nutanix EC-X gives customers more choice without compromising functionality and performance while dramatically reduces the cost/GB of storage.

Related Articles:

  1. Large scale clusters and increased resiliency with RF3 + EC-X
  2. What I/O will Nutanix Erasure coding (EC-X) take effect on?

  3. Sizing assumptions for solutions with Erasure Coding (EC-X)