How to successfully Virtualize MS Exchange – Part 16 – Virtual Disk Provisioning Types

Posted on January 6, 2015 by Josh Odgers

Once you have made the decision on storage platform, and assuming you have chosen to use VMFS or NFS datastores, the next decision is how should my VMDKs be provisioned?

The VMware Exchange 2013 Best Practice Guide does not make mention of disk provisioning options nor does it make any recommendations, however you’re in luck as we will cover all the options along with pros and cons here.

For Exchange 2010, Microsoft state in Understanding Exchange 2010 Virtualization:

Virtual disks that dynamically expand aren’t supported by Exchange.

Virtual disks that use differencing or delta mechanisms (such as Hyper-V’s differencing VHDs or snapshots) aren’t supported.

However I have been unable to find confirmation if this has changed or not for Exchange 2013 in the Exchange 2013 storage configuration options document which does state Thin provisioning for Storage spaces is supported but it does not state that any other form of thin provisioning is or is not supported.

While technically not supported in 2010, there is plenty of experts who understand and recommend thin provisioning including MCM and MVP for Exchange Dustin Smith who in this video talks about some of the considerations and benefits of thin Provisioning for Exchange 2010.

Now on to the topic at hand:

When creating a Virtual Machine, VMDK/s can be provisioned in one of three ways, these are:

1. Thick Provisioned Lazy Zeroed
2. Thick Provisioned Eager Zeroed
3. Thin Provisioned

Starting with Thick Provisioned Lazy Zeroed this means that the VMDK is thick provisioned but only zeroed in a just in time fashion.

The advantages of Thick Provisioned Lazy Zeroed VMDKs include:

1. Faster VM creation time than Eager Zeroed Thick (Minimal if the storage supports VAAI Write Same primitive)
2. The entire VMDKs capacity is reserved making capacity planning easier than Thin Provisioning

The disadvantages of Thick Provisioned Lazy Zeroed VMDKs include:

1. Slower provisioning that Thin Provisioning (although the different is generally minimal)
2. The entire VMDKs capacity is reserved and unavailable for use by other virtual machines.

With Thick Provisioned Eager Zeroed (EZT) the VMDK is thick provisioned and all blocked zeroed at the time of creation. Eager Zeroed Thick VMDKs are supported on all VMFS datastores and on NFS datastores which support the VAAI-NAS Reserve Space primitive.

The advantages of EZT VMDKs these days are really minimal but include:

1. Supporting Oracle RAC and VMware Fault Tolerance (neither being applicable to Exchange)
2. Increased performance verses Lazy and Thin Provisioned VMDKs (but more on this topic later).

However there are a number of downsides to this method which include:

1. Slower VM creation times. The time depends on the size of the VMDK/s being created and the speed of your storage as every Gb needs to be zeroed, just like performing a Full (not quick) format on your physical server.

Note: Storage array’s who support VAAI with the “Write Same” primitive can offload the zeroing to the storage array to reduce the load on the ESXi host and speed up provisioning time dramatically.

2. Increased potential for wasted capacity on a datastore.

3. Free space within VMDKs cannot be shared with other VMs which requires every VMDK have some (generally >10% is recommended) free space per VMDK to ensure the VM does not run out of space.

Lastly there is Thin Provision which means the VMDK only takes up the amount of space that data is written too and before each write the block must be zeroed.

The advantages of Thin Provisioning VMDKs include:

1. You can create larger VMDKs with no space utilization penalty making capacity planning and growth easier.
2. Reduce wasted or unused space on the storage
3. Allows for disk space to be overcommitted ensuring maximum utilization and flexibility.
4. Free space in VMDKs is not wasted on the datastore reducing capacity requirements compared to Eager and Lazy Zeroed VMDKs.
5. The impact of SCSI reservations (VMFS datastores ONLY) causing performance issues (increased latency) when thin provisioned virtual machines (VMDKs) grow is no longer an issue as the VAAI Atomic Test & Set (ATS) primitive alleviates the issue of SCSI reservations.
6. Thin provisioned VMs reduce the overhead for Storage vMotion , Cloning and Snapshot activities. Eg: For Storage vMotion it eliminates the requirement for Storage vMotion (or the array when offloaded by VAAI XCOPY Primitive) to relocate “White space”. Note: Storage vMotion should rarely if ever be required for Exchange VMs.
7. Thin provisioning leaves maximum available free space on the physical spindles which should improve performance of the storage subsystem as a whole.

The disadvantages of thin provisioning include:

1. Increased risk of running out of space on a datastore or underlying storage array.
2. Additional write penalty of zeroing a block before writing to it. (again more on performance later in this post).
3. Increased importance of monitoring storage capacity utilization.
4. Not supported for Exchange 2010. Note: However there is no technical inhibitor for using Thin Provisioning but supported options are obviously preferable.

All in all, @FrankDenneman (VCDX #29) sums it up perfectly with his article Thin or thick disks? – it’s about management not performance. I would also suggest considering all other workloads in the environment, not just Exchange when making decisions about Thin Provisioning as it can be very beneficial and a huge cost saving (especially CAPEX) when purchasing new equipment.

Which brings us to our next topic, Thin Vs Thick Provisioning Performance!

There have been many recommendations not to use Thin Provisioning due to the performance impact of Zeroing a block before writing to it. This recommendation has been around for a long time, and like the VMDK on NFS debate appears to have strong options on both sides.

Now for the facts!

From a performance perspective most people are surprised to learn there is no significant performance advantage to using Thick Provisioned (Eager or Lazy Zeroed) VMDKs compared to Thin Provisioned disks.

In addition to that, with the reduction of I/O from Exchange 2007 to 2010 being around 50%, and from 2010 to 2013 another 50% reduction in I/O, Exchange is no longer the huge storage I/O heavy monster it once was.

VMware conducted a Performance Study of VMware vStorage Thin Provisioning back in the ESXi 4.0 days (~2009) which I will briefly summarize.

On page 6 of the performance study the following graph shows the different in performance between Thin and Thick VMDKs during zeroing and post-zeroing.

As you can see the performance is almost identical.

The next chart shows also from Page 6 is a comparison of throughput between thin and thick VMDKs. Again we see the difference is insignificant.

As a result of there being no significant performance impact of using Thin Provisioning, Performance should no longer be considered an objection to using Thin Provisioning!

I recommend taking advantage of the flexibility of using Thin Provisioning and creating larger Thin Provisioned VMDKs which can help simplify capacity management from a VM/OS and application perspective as well as making growth easier for Exchange as mailbox sizes increase over time.

When using thin provisioning always ensure you have your alerting properly set-up with early warning on your vSphere environment AND underlying storage to advise when storage capacity of a datastore or underlying LUN/NFS mount or storage is running low so this can be remediated.

In an upcoming post I will discuss the underlying storage, including provisioning type for LUNs and NFS mounts (i.e.: Thin on Thick / Thin on Thin / Thick on Thick and Thick on Thin).

Recommendations for VMDK provisioning:

1. Check with your storage vendor and unless they have solid justification for not using Thin Provisioning OR you have an operational constraint preventing it, use Thin Provisioned VMDKs. (The pros outweigh the cons in my opinion)
2. When using Thin Provisioning create larger VMDKs to simplify capacity management at the VM and OS/Application layer.
3. When using Thick or Thin provisioning, ensure you test performance using Jetstress and LoadGen with the same provisioning type.
4. Ensure alerting is configured and working to monitor capacity utilization especially when using thin provisioned VMDKs.

Back to the Index of How to successfully Virtualize MS Exchange.

More Information on VMDK and Datastore provisioning options:

1. Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

2. Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning (Thin on Thick)

Back to the Index of How to successfully Virtualize MS Exchange.

Ensuring Data Integrity with Nutanix – Part 2 – Forced Unit Access (FUA) & Write Through

Posted on January 4, 2015 by Josh Odgers

In the Integrity of Write I/O for VMs on NFS Datastores Series, I discussed Forced Unit Access (FUA) and Write Through in part 2 which covered a vendor agnostic view of FUA and Write through.

In this series, The goal is too explain how Nutanix can guarantee data integrity and how this is Nutanix Number #1 priority. In addition to this goal, I want to show how Nutanix supports Business Critical Applications such as MS SQL and MS Exchange which have strict storage requirements such as Write Ordering, Forced Unit Access (FUA) , SCSI abort/reset commands and to protect against Torn I/O.

Note: With Windows 2012 onwards FUA is no longer used in favour of issuing a “Flush” of the drives write cache. However this change makes no difference to Nutanix environments because regardless of FUA or a Flush being used, write I/O is not acknowledged until written to persistent media on 2 or more nodes which will be explained further later in this post.

Currently MS Exchange is not supported to run in a VMDK on an NFS datastore/s although interestingly Active Directory and MS SQL servers which have the exact same storage requirements (discussed earlier) are supported. This post will show why Microsoft should allow storage vendors to certify Exchange in a VMDK on NFS datastore deployments to prove compliance with the storage requirements stated earlier.

Note: Nutanix provides support for Exchange 2010/2013 deployments in VMDKs on NFS datastores. Customers can find this support statement on http://portal.nutanix.com/ under article number 000001303.

Firstly I would like to start by stating that FUA is fully supported by VMware ESXi.

In the Microsoft article, Deploying Transactional NTFS, it states:

“The caching control mechanism used by TxF is a flag known as the Force Unit Access (FUA) function. This flag specifies that the drive should write the data to stable media storage before signaling complete.”

Nutanix meets this requirement as all writes are written to persistent media (SSD) on at least two independent nodes and no write caching is performed at any layer including the Nutanix Controller CM (CVM), Physical Storage Controller card or the physical drives themselves.

For more information on how Nutanix is compliant with this requirement click here.

The article also states:

“Some Host Bus Adapters (HBAs) and storage controllers (for example, RAID systems) have built-in battery-backed caches. Because these devices preserve cached data if a power fault occurs, any disks connected to them are not required to honor the FUA flag. Further, a disk whose power supply is protected by an uninterruptable power supply (UPS) does not need to honor the FUA flag. This is because the UPS will maintain power long enough for the disk to flush its cache to the media.”

As discussed with the previous requirement, Nutanix meets this requirement as the write acknowledgement is not given until writes are successfully commited to persistent storage on at least two nodes. As a result, even without a UPS, data integrity can be guaranteed in a Nutanix environment.

For more information on how Nutanix is compliant with this requirement click here.

Another key point in the article is:

“Disabling a drive’s write cache eliminates the requirement for the drive to honor the FUA flag.”

All physical drives (SSD and SATA) in a Nutanix nodes have their write cache disabled, therefore removing the requirement of FUA.

The article concludes with the following:

“Note For TxF to be capable of consistently protecting your data’s integrity through power faults, the system must satisfy at least one of the following criteria:

1. Use server-class disks (SCSI, Fiber Channel)

2. Make sure the disks are connected to a battery-backed caching HBA.

3. Use a storage controller (for example, RAID system) as the storage device.

4. Ensure power to the disk is protected by a UPS.

5. Ensure that the disk’s write caching feature is disabled.”

We have already discussed because Nutanix does not use a non persistent write cache there is no requirement for the OS to issue the FUA flag or the Flush command in Windows 2012 to ensure data is written to persistent media. But for fun lets see how many of the above Nutanix is compliant with.

1. YES – Nutanix uses enterprise grade Intel S3700 SSDs for all write I/O
2. N/A – There is no need for battery backed caching HBAs due to Nutanix write acknowledgement not being given until written to persistent media on two or more nodes
3. YES – Nutanix Distributed File System (NDFS) with Resiliency Factor (RF) 2 or 3
4. Recommended to ensure system uptime but not required to ensure data integrity as writes are not acknowledged until written to persistent media on two or more nodes
5. YES – All write caching features are disabled on all SSDs/HDDs

So to meet Microsoft’s FUA requirements, only one of the above is required. Nutanix meets 3 out of 5 outright, with a 4th being Recommended (but not required) and the final requirement not being applicable.

Write Cache and Write Acknowledgements.

Nutanix does not use a non persistent write cache, period.

When a I/O is issued in a Nutanix environment, if it is Random, it will be sent to the “OpLog” which is a persistent write buffer stored on SSD.

If the I/O is sequential, it is sent straight to the Extent Store which is persistent data storage, also located on SSD.

Both Random and Sequential I/O flows are shown in the below diagram from The Nutanix Bible by @StevenPoitras.

All Writes are also protected by Resiliency Factor (RF) of 2 or 3, meaning 2 or 3 copies of the data are synchronously replicated to other Nutanix nodes within the cluster prior to the write being acknowledged.

To be clear, Write acknowledgements are NOT sent until the data is written to 2 or 3 nodes OpLog or Extent Store (depending on the configured RF). What this means is the requirement for Forced Unit Access (FUA) is achieved as every write is written to persistent media before write acknowledgements are sent regardless of FUA (or Flush) being issued by the OS.

Importantly, this write acknowledgement process is the same regardless of the storage protocol (iSCSI , NFS , SMB 3.0) used to present storage to the hypervisor (ESXi , Hyper-V or KVM).

Physical Drive Configuration

As Nutanix does not use a non persistent write cache, and does not acknowledge writes until written to persistent media on 2 or 3 nodes, that’s the end of the problem right?

Not really, as physical drives also have write caches and in the event of a power failure, its possible (albeit unlikely) data in the cache may not be written to disk even after a write acknowledgement is written.

This is why all physical SSD / SATA drives in a Nutanix environment have the disks write caching feature disabled.

This ensures there is no dependency on Uninterruptable power supplies (UPS) to ensure data is successfully written to the disk in the event of a power failure.

This means Nutanix is compliant with the “Ensure that the disk’s write caching feature is disabled” requirement specified by Microsoft.

Uninterruptible Power Supplies (UPS)

As non persistent write caching is not used either at the Nutanix Controller VM (CVM), Physical Storage Controller OR the physical SSDs/HDDs, the use of a UPS is not a requirement for a Nutanix environment to ensure data integrity, however it is still recommended to use a suitable UPS to ensure uptime of the environment. Assuming a power outage is not catastrophic (e.g.: For a single node) and the cluster is still online, write acknowledgements are still not given until data is written to the configured RF policy as Nutanix nodes are effectively stateless.

The Microsoft article quoted earlier states:

“Further, a disk whose power supply is protected by an uninterruptable power supply (UPS) does not need to honor the FUA flag. This is because the UPS will maintain power long enough for the disk to flush its cache to the media.”

Even in the case a storage solution or disk is protected by a UPS, it requires sufficient time to allow all data in the cache to be written to persistent media. This is a potential risk to data consistency as a UPS is just another link in the chain which can go wrong. This is why Nutanix does not depend on UPS for data integrity.

Another Microsoft Article, Key factors to consider when evaluating third-party file cache systems with SQL Server gives two examples of how data corruption can occur:

“Example 1: Data loss and physical or logical corruption”

“Example 2: Suspect database”

So how does Nutanix protect against these issues?

The article states

“How to configure a product providing file cache from something like non-battery backed cache is specific to the vendor implementation. A few rules, however, can be applied:

1. All writes must be completed in or on stable media before the cache indicates to the operating system that the I/O is finished.

2. Data can be cached as long as a read request serviced from the cache returns the same image as located in or on stable media.”

Regarding the first point: All write I/O is written to persistent media (as is the intention of FUA) as described earlier in this article.

For the second point, the Nutanix Distributed File System (NDFS) read I/O can happen from one of the following places:

“Extent Cache”, located in RAM.
“Content Cache”, located on SSD as per earlier diagram.
“Oplog”, the persistent write cache located on SSD as per earlier diagram.
“Extent store” located on either SSD or SATA depending on if the data is “Hot” or ‘Cold”.
A remote nodes Extent Cache, Content Cache, OpLog or Extent Store

To ensure the Extent Cache (in RAM) is consistent with the Content Cache or Extent Store located on the persistent media, when a write I/O occurs which modifies data which has been cached in the Extent Cache (in RAM), the corresponding data is discarded from the Extent Cache and only promoted back to the Extent Cache if the data profile remains Hot (i.e.: Frequently accessed).

In Summary:

The Nutanix write path guarantees (even without the use of a UPS) writes are written to persistent media and have at least 1 redundant copy on another node in the cluster for resiliency before acknowledging the I/O back to the hypervisor and onto the guest. This is critical to ensuring data consistency/resiliency.

This is in full compliance with the storage requirements of applications such as SQL, Exchange and Active Directory.

——————————————————–

Integrity of Write I/O for VMs on NFS Datastores Series

Part 1 – Emulation of the SCSI Protocol
Part 2 – Forced Unit Access (FUA) & Write Through
Part 3 – Write Ordering
Part 4 – Torn Writes
Part 5 – Data Corruption

Nutanix Specific Articles

Part 6 – Emulation of the SCSI Protocol (Coming soon)
Part 7 – Forced Unit Access (FUA) & Write Through
Part 8 – Write Ordering (Coming soon)
Part 9 – Torn I/O Protection (Coming soon)
Part 10 – Data Corruption (Coming soon)

Related Articles

1. What does Exchange running in a VMDK on NFS datastore look like to the Guest OS?
2. Support for Exchange Databases running within VMDKs on NFS datastores (TechNet)
3. Microsoft Exchange Improvements Suggestions Forum – Exchange on NFS/SMB
4. Virtualizing Exchange on vSphere with NFS backed storage

Fight the FUD! – Not all VAAI-NAS storage solutions are created equal.

Posted on November 28, 2014 by Josh Odgers

At a meeting recently, a potential customer who is comparing NAS/Hyper-converged solutions for an upcoming project advised me they only wanted to consider platforms with VAAI-NAS support.

As the customer was considering a wide range of workloads, including VDI and server the requirement for VAAI-NAS makes sense.

Then the customer advised us they are comparing 4 different Hyper-Converged platforms and a range of traditional NAS solutions. The customer eliminated two platforms due to no VAAI support at all (!) but then said Nutanix and one other vendor both had VAAI-NAS support so this was not a differentiator.

Having personally completed the VAAI-NAS certification for Nutanix, I was curious what other vendor had full VAAI-NAS support, as it was (and remains) my understanding Nutanix is the only Hyper-converged vendor who has passed the full suite of certification tests.

The customer advised who the other vendor was, so we checked the HCL together and sure enough, that vendor only supported a subset of VAAI-NAS capabilities even though the sales reps and marketing material all claim full VAAI-NAS support.

The customer was more than a little surprised that VAAI-NAS certification does not require all capabilities to be supported.

Any storage vendor wanting its customers to get support for VAAI-NAS with VMware is required to complete a certification process which includes a comprehensive set of tests. There are a total of 66 tests for VAAI-NAS vSphere 5.5 certification which are required to be completed to gain the full VAAI-NAS certification.

However as this customer learned, it is possible and indeed common for storage vendors not to pass all tests and gain certification for only a subset of VAAI-NAS capabilities.

The below shows the Nutanix listing on the VMware HCL for VAAI NAS highlighting the 4 VAAI-NAS features which can be certified and supported being:

1. Extended Stats
2. File Cloning
3. Native SS for LC
4. Space Reserve

This is an example of a fully certified solution supporting all VAAI-NAS features.

Here is an example of a VAAI-NAS certified solution which has only certified 1 of the 4 capabilities. (This is a Hyper-converged platform although they were not being considered by the customer)

Here is another example of a VAAI-NAS certified solution which has only certified 2 of the 4 capabilities. (This is a Hyper-converged platform).

So customers using the above storage solution cannot for example create Thick Provisioned Virtual Disks, therefore preventing the use of Fault Tolerance (FT) or virtualization of business critical applications such as Oracle RAC.

In this next example, the vendor has certified 3 out of 4 capabilities and is not certified for Native SS for LC. (This is a traditional centralized NAS platform).

So this solution does not support using storage level snapshots for the creation of Linked Clones, so things like Horizon View (VDI) or vCloud Director FAST Provisioning deployments will not get the cloning performance or optimal capacity saving benefits of fully certified/supported VAAI-NAS storage solutions.

The point of this article is simply to raise awareness that not all solutions advertising VAAI-NAS support are created equal and ALWAYS CHECK THE HCL! Don’t believe the friendly sales rep as they may be misleading you or flat out lying about VAAI-NAS capabilities / support.

When comparing traditional NAS or Hyper-converged solutions, ensure you check the VMware HCL and compare the various VAAI-NAS capabilities supported as some vendors have certified only a subset of the VAAI-NAS capabilities.

To properly compare solutions, use the VMware HCL Storage/SAN section and as per the below image select:

Product Release Version: All
Partner Name: All or the specific vendor you wish to compare
Features Category: VAAI-NAS
Storage Virtual Appliance Only: No for SAN/NAS , Yes for Hyperconverged or VSA solutions

Then click on the Model you wish to compare e.g.: NX-3000 Series

Then you should see something similar to the below:

Click the “View” link to show the VAAI-NAS capabilities and you will see the below which highlights the VAAI-NAS features supported.

Note: if the “View” link does not appear, the product is NOT supported for VAAI-NAS.

If the Features do not list Extended Stats, File Cloning, Native SS for LC & Space Reserve the solution does not support the full VAAI-NAS capabilities.

1. My checkbox is bigger than your checkbox – @HansDeLeenheer

2. Unchain My VM, And Set Me Free!(Snapshots)

3. VAAI-NAS – Some snapshot chains are deeper than others

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: support

How to successfully Virtualize MS Exchange – Part 16 – Virtual Disk Provisioning Types

Ensuring Data Integrity with Nutanix – Part 2 – Forced Unit Access (FUA) & Write Through

Fight the FUD! – Not all VAAI-NAS storage solutions are created equal.

Share this:

Share this:

Share this: