How to successfully Virtualize MS Exchange – Part 11 – Types of Datastores

Datastores are a logical construct which allows DAS,SAN or NAS storage is presented to ESXi. In the case of SAN and NAS storage it is generally “shared storage” which enabled virtualization features such as HA, DRS and vMotion.

When storage is presented to ESXi from DAS or SAN (block based) storage it is formatted with VMFS (Virtual Machine File System) and when storage is presented via file storage (NFS), it is presented to ESXi as an NFS mount.

Regardless of datastores being presented via block (iSCSI,FC,FCoE) or file based (NFS) protocols, they both host VMDKs (Virtual Machine Disks) which are block based storage. In the case of NFS, the SCSI commands are emulated by the hypervisor. This process is explained in Emulation of the SCSI Protocol and can be compared to Hyper-V SMB 3.0 (File) storage with VHDX which also emulates SCSI commands over File (SMB 3.0) storage.

The following diagram is courtesy of http://pubs.vmware.com and shows “host1” and “host2” runnings VMs across VMFS (block) and NFS (file) datastores. Note the VMs residing on datastore1 and datastore2 all have .vmx and .vmdk files and operate in the exact same way from the perspective of the VM, Guest OS and applications.

GUID-AD71704F-67E4-4AC2-9C22-10B531755566-high

The next paragraph is controversial and may be hotly debated, but to the best of my knowledge and the countless industry experts (from several different vendors) I have investigated this with over the last year including VMware’s formal position, it is completely true and I welcome any credible and detailed evidence to the contrary! (I even asked this question of Microsoft here).

Using either VMFS or NFS datastores meets the technical requirements for Exchange, being Write Ordering, Forced Unit Access (FUA) and SCSI abort/reset commands and because drives within Windows are formatted with NTFS which is a journalling file system as such the requirement to protect against Torn I/O is also achieved.

With that being said, Microsoft currently do not support Exchange running in VMDKs on NFS datastores.

The below is a quote from Exchange 2013 storage configuration options outlining the storage support statement for MS Exchange with the underlined section applying to NFS datastores.

All storage used by Exchange for storage of Exchange data must be block-level storage because Exchange 2013 doesn’t support the use of NAS volumes, other than in the SMB 3.0 scenario outlined in the topic Exchange 2013 virtualization. Also, in a virtualized environment, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.

If your interested in finding out more information about MS Exchange running in VMDKs on NFS datastores, see the links at the end of this post.

Now let’s discuss the limitations of Datastore’s and what impact they have on vSphere environments with MS Exchange deployments and why.

Number of LUNs / NFS Mounts : 256

This can be a significant constraint when using one or multiple datastore/s per Exchange MBX/MSR VM however in my opinion this should not be necessary nor is it recommended.

Generally Exchange VMDKs can be mixed with other VMs in the same datastore providing there is not a performance constraint. As such keep high I/O VMs (including other Exchange VMs) in different datastores.

As discussed in Part 10, if legacy per LUN snapshot based backup solutions are being used then in-Guest iSCSI may have to be used but for new deployments especially where new storage will be purchased, per LUN solutions should not be considered!

Number of Paths per ESXi host : 1024

If using VMFS datastores, the simple fact is 4 paths per LUN is the maximum you can use if you plan to reach or near the limit of 256 datastores. This is not a performance limiting factor with any enterprise grade storage solution.

Number of Paths per LUN : 32

If you configure 32 paths per LUN, you straight away restrict yourself to 32 LUNs per ESXi host (and vSphere cluster) so don’t do it! As mentioned earlier 4 paths per LUN is the maximum if you plan to reach 256 datastores. Do the math and this limit is not a problem.

Number of Paths per NFS mount : N/A

NFS mounts connect over IP to the Storage Controllers via vNetworking so there is no maximum as such although with NFS v3 only one physical NIC will be used at a time per IP subnet. This topic will be covered in a future post on vNetworking for MS Exchange.

VMFS Datastore Maximum Size : 64TB

256 LUNs x 64TB = 16,384TB per vSphere HA cluster. This is not a problem.

NFS Datastore Maximum Size : Varies depending on vendor

The limit depends on vendor but its typically higher than the VMFS limit of 64TB per mount, with some vendors not having a limit.

So you’re safe to assume >=16.384TB but always check with your current or potential storage vendor.

ESXi hosts per Volume (Datastore) : 64 (Note: HA cluster limit of 32)

As a vSphere cluster is limited to 32 currently, this limitation isn’t really an issue. With vSphere 6.0 it is expected cluster size will increase to 64 but ESXi hosts per Volume is not a maximum I have ever heard of being reached.

Recommendations:

1. If you require a fully supported configuration use VMFS datastores.
2. Maximum 4 paths per LUN to ensure maximum scalability (if required).
3. Consider the underlying storage configuration and type of a datastore before deploying MS Exchange.
4. Do not deployment MS Exchange VMDKs onto datastores with other high I/O workloads
5. When mixing workloads on a datastore, enable SIOC to ensure fairness between workloads in the event of storage contention.
6. Spread Exchange VMDKs across multiple datastores for maximum performance and resiliency. e.g.: 12 VMDKs per Exchange MBX/MSR VM across 4 mixed workload datastores.
7. Do not use dedicated datastore/s per MS Exchange database or VM. (This is unnecessary from a performance perspective)
8. If choosing to use NFS Datastores, purchase Premier Support from Microsoft and negotiate support for NFS. Microsoft do provide support for many customers with Premier Support with Exchange running on NFS datastores although it is not their preference.

On a final note, in future posts I will be discussing in detail the underlying storage from a performance and availability perspective with Database availability Groups in mind.

Thank you to @mattliebowitz for reviewing this post. I highly recommend his book Virtualizing Business Critical Applications by VMware Press. I purchase and reviewed this book mid 2014, well worth a read!

Back to the Index of How to successfully Virtualize MS Exchange.

Articles on MS Exchange running in VMDK on NFS datastores

1. “Support for Exchange Databases running within VMDKs on NFS datastores

2. Microsoft Exchange Improvements Suggestions Forum – Exchange on NFS/SMB

3. What does Exchange running in a VMDK on NFS datastore look like to the Guest OS?

4. Integrity of I/O for VMs on NFS Datastores Series

Part 1 – Emulation of the SCSI Protocol
Part 2 – Forced Unit Access (FUA) & Write Through
Part 3 – Write Ordering
Part 4 – Torn Writes
Part 5 – Data Corruption

How to successfully Virtualize MS Exchange – Part 5 – High Availability (HA)

HA has two main configuration options which can significantly impact the availability and consolidation of any vSphere environment but can have an even higher impact when talking about Business Critical Applications such as MS Exchange.

When considering MS Exchange MBX or MSR VMs can be very large in terms of vCPU and vRAM, understanding and choosing an appropriate setting is critical for the success of not only the MS Exchange deployment but any other VMs which are sharing the same HA cluster.

Let’s start with the “Admission Control Setting“.

Admission control can be configured in either “Enabled” or “Disabled” mode. “Enabled” means that if the Admission Control Policy (discussed later in this post) is going to be breached by powering on one or more VMs, the VM will not be permitted to power on, which guarantees a minimum level of performance for the running VMs.

If the setting is “Disabled” it means no matter what, VMs will be powered on. In this situation, it leads to the possibility of significant contention for compute resources which for MS Exchange MBX or MSR VMs would not be ideal.

As a result, it is my strong recommendation that the “Admission Control Setting” be set to “Enabled”.

Next lets discuss the “Admission Control Policy“.

There are three policies to choose from (shown below) each with their pros and cons.

AdmissionControlPolicies

1. Host failures the cluster tolerates

This option is the default and most conservative option. However it calculates the utilization of the cluster using what many describe as a very inefficient algorithm using what is called “slot sizes”.

A slot size is calculated as the largest VM from a vCPU perspective, AND the largest VM from a vRAM perspective and combines the two. Then the cluster will calculate how many “slots” the cluster can support.

The issue with this is for environments with a range of VM sizes, a small VM of 1vCPU and 1Gb RAM uses 1 slot, as would a VM with 8vCPU & 64Gb VM. This leads to the cluster having very low consolidation ratio and leads to unnecessary high numbers of ESXi hosts and underutilization.

As such, this is not recommended for environments with mixed VM sizes, such as MS Exchange MBX or MSR combined with VMs such as Domain Controllers.

2. Specify failover hosts

Specify failover hosts is a very easy setting to understand. You specify a failover host, and it does exactly that, acts as a failover host so if one host fails, all the VMs fail over onto the failover host.

Great, but the ESXi host then remains powered on doing nothing until such time there is a failure. So the fail-over HW provides no value during normal operations.

As such, this setting is not recommended.

3. Percentage of cluster resources reserved as failover spare capacity

This setting also is fairly easy to understand at a high level, although under the covers is more complicated and does not work how many people believe it does.

With that being said, it is a very efficient policy for environments with large VMs like Exchange MBX or MSR.

It avoids the inefficient “slot size” calculation, and works on virtual machines reservations to calculate cluster capacity.

For VMs with no reservation, 32Mhz and 0Mb ram (plus memory overhead) is used from vSphere 5.0 onwards. However for Exchange MBX/MSR VMs which as discussed in Part 3 should have Memory Reservations, HA will then use the full reserved memory to ensure sufficient cluster capacity for the Exchange VM to fail over without impacting memory performance. Now this is great news as we don’t want to overcommit memory for Exchange even in a failure scenario.

From a CPU perspective, 32Mhz will be the default reserved for any Exchange MBX or MSR VM which does not have a CPU reservation, so it makes sense from a HA perspective to use CPU reservations for Exchange VMs to ensure sufficient capacity exists within the cluster to tolerate an ESXi host failure.

CPU reservations will be discussed in more detail in a future post in this series.

As a result, I recommend using “Percentage of cluster resources reserved as failover spare capacity” for the admission control policy for Exchange environments.

Next we need to discuss what is the most suitable percentage to set for CPU and RAM.

The below table shows the required percentage for N+1 (Green) and N+2 (Blue) deployments based on the number of nodes in a vSphere HA cluster.

Table 1:

AdmissionControlPercetage

The above is generally what I recommend as N+2 provides excellent availability, including being able to tolerate a failure during maintenance or multiple host failures concurrently with little or no impact to performance after VMs restart.

So for clusters of sub 16 ESXi hosts, N+1 can be considered, but I recommended N+2 for greater than 16 ESXi host clusters.

The next table shows the required percentage for a cluster scaling from N+1 availability for up to 8 hosts, N+2 for up to 16 host, N+3 for up to 24 hosts and N+4 for the current maximum vSphere cluster of 32.

Table 2:

originalhapercetages

Its safe to say the above table is quite a conservative option (going up to N+4), however depending on business requirements these HA reservation values may be perfectly suited and are worth considering.

For more information see:
1. Example Architectural Decision – Admission Control Setting and Policy
2. Example Architectural Decision – VMware HA – Percentage of Cluster Resources Reserved for HA

Next lets discuss the “HA Virtual Machine Options“.

The below shows the “Cluster default settings” along with the “Virtual Machine settings” which allow you to override the cluster settings.

HAVMoptions

For the “VM restart priority”, I recommend leaving the “Cluster default setting” as “Medium” (Default).

For “Host Isolation Response” this heavily depends on your underlying storage and availability requirements, as such, I will address this setting in detail later in this series.

For the “VM Restart Priority” under “Virtual Machine Settings“, we have a number of options. If a DAG is being used, one option would be to Disable VM Restart and depend solely on the DAG for availability.

This has the advantage of reducing the compute requirements for the cluster to satisfy HA and gives the same level of availability as a DAG, which in many cases will meet the customers requirements.

Alternatively, the Exchange MBX or MSR VMs could be configured to “High” to ensure they are started asap following a failure, above less critical VMs such as Testing/Development.

Regarding “Datastore Heartbeating” and “VM Monitoring“, these will be discussed in future posts.

Recommendations for HA:

1. The “Admission Control Setting” be set to “Enabled”.
2. The “Admission control policy” to set to “Percentage of cluster resources reserved as failover spare capacity”
3. The “percentage of cluster resources reserved as failover spare capacity” be configured as per Table 1 (at a minimum).

Recommendations for HA Virtual Machine Options:

1. Do not disable HA restart for Exchange MBX or MSR VMs
2. Leave “HA restart priority” for Exchange MBX or MSR VMs to “Normal” for DAG deployments
3. Set “HA restart priority” for Exchange MBX or MSR VMs to “High” for non DAG deployments

Back to the Index of How to successfully Virtualize MS Exchange.

Virtual machines lose network connectivity after HA fail over. (ESXi 5.5)

If you’re running ESXi 5.5 pre Release 2068190 (Update 2) and have not upgraded from GA release or are avoided upgrading due to the NFS issue in Update 1 and are running on a vSphere Distributed Switch (VDS) then read on:

When configured with Static Port Binding, which is the default and recommended port binding setting (See VMware KB: 1022312) after a HA event, a problem has been discovered which prevents VMs from connecting to the network and the VM starts up with its vmNIC in a “Disconnected” state.

You may receive the following error when trying to reconnect the vmNIC.

DVSfailure2

If you are having this problem you can connect the VMs to a dvPortGroup which uses Ephemeral Binding to get the environment online. This could be used for VMs such as Domain Controllers and vCenter (and its dependencies) then once these VMs are online, this will allow all other VMs to connect to the network normally.

Once everything is back online, I recommend you connect all VMs back to their original dvPortGroup/s with Static Binding.

If you don’t have a dvPortGroup with Ephemeral binding, create a Standard vSwitch and connect a single NIC to it, follow the same process, then migrate the VMs back to the dvPortGroup once they are online and return the pNIC to the dvSwitch.

To future proof the environment, you may choose to create a dvPortGroup for the Infrastructure VMs and use Ephemeral Binding, or just have a dvPortGroup with Ephemeral binding ready to use just in case.

The good news is VMware have already resolved this issue in ESXi Update 2 (Release: 2068190) so I recommend bypassing Update 1 (due to the NFS bug) and going straight to Update 2 which means you will avoid the issue altogether.