Microsoft Exchange 2013/2016 Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV)

Virtualization of business critical application has been common place for a number of years, however it is less well known that these business critical applications are also regularly deployed on Nutanix Hyper-converged Infrastructure (HCI) as I discuss in the following post:

Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again!

I am regularly involved in discussions with customers about how well MS Exchange and other business critical applications perform on Nutanix especially during:

  • Storage software upgrades (Acropolis Base Software)
  • Hypervisor upgrade
  • VMs Migrations (e.g.: vMotion)
  • Failure scenarios.

Customers also ask how Data Locality works with workloads like Exchange which have large amounts of data, what overheads are there if any, how much data is served local vs remote and so on.

As a result, I have created the following series of Videos demonstrating the following:

  • Setting a baseline for Jetstress performance on Node 1
  • Migrating VM to a 2nd node and repeating the Jetstress performance test
  • Migrating VM to a 3rd node and repeating the Jetstress performance test
  • Migrating VM to a 4th node and repeating the Jetstress performance test
  • Migrating the VM back to the 1st node and repeating the Jetstress performance test
  • Repeating the test on the 2nd, 3rd and 4th nodes (second Jetstress run for comparison)
  • Performing a Jetstress performance test on a VM with the local Nutanix Controller VM (CVM) offline (to simulate a CVM failure, Storage Maintenance or Upgrade scenarios)

During the above videos I will show advanced Nutanix Distributed Storage Fabric (NDSF) performance statistics such as how Write I/O is being served and What percentage of data is being served locally verses remotely.

Enjoy the videos:

Part 1 – Setting a baseline for Jetstress performance on Nutanix AHV

Part 2 – Migrating Jetstress to 2nd node and repeating Jetstress test

Part 3 – Migrating Jetstress to 3rd node and repeating Jetstress test

Part 4 – Migrating Jetstress to 4th node and repeating Jetstress test

Part 5 through 8 – Repeat Jetstress Tests on all four nodes. (Coming soon)

Part 9 – Take the local Nutanix Controller VM (CVM) offline and repeat test (Coming soon)

Part 10 – Scale out Performance Validation (Coming soon)

Related Articles:

Jetstress Testing with Intelligent Tiered Storage Platforms

As virtualization of mission-critical applications is now common place, customers are increasingly looking to run mixed/multiple workloads on their chosen infrastructure. Its now common that shared storage be it SAN or Hyperconverged (HCI) is used and these days most products have some form of storage tiering and/or read/write buffers.

It is also common for storage to have one or more data reduction technologies such as Deduplication, Compression & Erasure Coding.

A quick note on Exchange support requirements: You must have storage which enforces Forced Unit Access (FUA) / Write Through (when requested by the Guest OS) which means data must be written to persistent media (not a write cache) before being acknowledged to the guest OS/application.

For more information on how Nutanix is complaint (regardless of Hypervisor) see the following post:  Ensuring Data Integrity with Nutanix – Part 2 – Forced Unit Access (FUA) & Write Through

Now back to Jetstress performance testing. When considering a storage platform, or migrating an existing workload onto your shared storage its a no brainer to run the tried and tested MS Exchange Jetstress tool to validate storage performance, right?

Well, not necessarily, and here’s why.

When using Jetstress, you typically create multiple databases, e.g.: 8 and spread them across multiple virtual disks. Jetstress then creates the 1st database and proceeds to duplicate it “X” number of times, in this example, an additional 7 times.

Here is a screenshot showing Jetstress creating a 159.9GB database and then duplicating it 3 times.

AAAJetstressDuplicate

Problem 1: Jetstress duplicates databases multiple times, leading to unrealistic deduplication ratios.

Arguably deduplication should not be used for DAG deployments, which I have discussed previously, but putting that issue to one side, what about for performance testing with Jetstress? Well think about it, if we have 8 databases, 7 of which are exact copies of the 1st, then of course we will see great deduplication ratios.

As a result of say an 8:1 deduplication ratio, it means 8x more data will be served out of the cache/SSD tier/s leading to unrealistically high performance and low latency.

No matter what any vendor tells you, 8:1 dedupe for Exchange (excluding DAG copies) is not realistic for production data in my experience. As such, it should never be used for performance testing with Jetstress.

Solution: Disable dedupe when using Jetstress (and in my opinion for production DAGs)

Problem 2: Jetstress databases contain lots of zeros which can be easily compressed.

In the real world, I personally recommend compression for Exchange databases (not logs) with or without DAG deployments, as compressing data can achieve excellent data reduction while not removing copies of data deliberately created by the DAG. It lowers the cost/GB and even increases performance in some storage systems especially when writing to or accessing the data on the slower cold tier. (In fact it can lead to more usable capacity than RAW, but caution your milage may vary.)

However, databases created through Jetstress are packed with a ton of zeros, which means compression ratios are also much higher than real world. I’ve seen >7:1 compression ratios for Jetstress databases, which as with dedupe, means more data will be served out of the cache/SSD tier/s leading again to unrealistically high performance and low latency.

Solution: Disable compression when using Jetstress

Problem 3: Jetstress performs random read/write I/O across the entire data set

This is a valid test for deployments using physical servers & JBOD as the databases are spread across multiple drives (usually SATA) and there is no tiering between drives. As such, testing I/O across the entire data set concurrently is importaint.

It is also a reasonable test for shared storage if no tiering is being used, as with many legacy storage solutions.

However, when you have intelligent storage with tiering, such as Nutanix Distributed Storage Fabric (NDSF), Write I/O is always served by the SSD tier and the coldest data is tiered off to SATA. Then only if required, cold data is served by the SATA tier.

As such, the larger the Exchange mailbox size, typically the higher the percentage of data will be cold which means and increasingly smaller percentage of total capacity needs to be SSD to give all flash type performance the vast majority of the time. This also allows customers to maintain large mailboxes cost effectively and with consistent performance on SATA. As such I believe hybrid storage (Small SSD tier w/ large low cost capacity tier)  is advantageous to Exchange but that’s another topic.

Because Jetstress actively performs I/O as if all data is hot, it effectively negates the benefits of tiering which is not demonstrating the real world performance of a tiered storage platform such as Nutanix. For Nutanix solutions the application will have similar to all flash performance even with TBs of mailbox databases sitting on SATA since active I/O is predominantly serviced by SSD. The small percentage of I/O serviced by the SATA tier performs much better than JBOD since the I/O to those drives is limited thanks to all new/active data being served by the SSD tier.

As such, to get an idea of real world performance, Jetstress tests need to be performed on a carefully sized databases which fit within the (persistent) performance tier (i.e.: not RAM style cache, which Nutanix calls the Extent Cache which is typically a few GB per node). This test should easily produce a passing result for Jetstress.

This style test will show you close to what real world performance looks like although I also recommend what I call a Worst case scenario test which I cover later in this post.

This 2nd Jetstress test is the one you want to make sure is under 20ms Database Read Latency and 10ms Log Write latency which are the Microsoft accepted thresholds for performance for Exchange.

Problem 4: Jetstress performs lots of overwrites

As Jetstress runs, it performs frequent random overwrites within the databases, which in my experience does not represent real world behaviour. So a Jetstress Pass result is really a strong indication the solution will perform well if the Achieved IOPS are >= the MS Exchange Server Role Requirements Calculator estimates (which is a good thing!)

But, Nutanix uses a technology called Erasure Coding (EC-X) for data reduction, which is designed specifically for use with cold data. That is data such as older email in a large mailbox. EC-X is recommended for production Exchange environments as it provides more usable capacity and is complementary to Compression.

But when overwrites occur, NDSF re-stripes the data which has a small write penalty, which in the real world is insignificant as it happen infrequently. But with Jetstress performing constant overwrites, EC-X provides limited/no data reduction and decreases performance.

As such, this is another case where benchmarks do not properly represent real world performance, so when using Jetstress, ensure EC-X is not enabled.

For non Nutanix storage platforms, large numbers of overwrites will typically also reduce Jetstress performance compared to real world where the percentage of overwrites will be much lower.

How to Test on Tiered Storage solutions.

If the vendor does not have a Microsoft ESRP certification (Nutanix ESRP can be found here), then you should validate the infrastructure is capable of supporting your requirements.

If the vendor does have ESRP then should still use Jetstress as an Operational Verification tool following initial implementation and prior to going into production.

In this example I will specifically cover Nutanix Distributed Storage Fabric (NDSF), while the below may be applicable to other vendor products, please refer to each vendors recommendations although all data reduction recommendations should be consistent across vendors in my opinion.

Solution: Perform two stages of Jetstress testing.

Stage 1: All flash performance test

If the SSD tier has 1TB usable, make the Jetstress databases total 75% of the usable capacity (in the case of Nutanix, 75% of the per node SSD usable capacity per Jetstress instance).

Run a short 15 min test and fine tune the threads starting from 32 and reduce until you achieve <4x the required IO levels according to the MS Exchange Server Role Requirements Calculator (4x should be easy to achieve for All flash testing at low latency), then run a 24 hour Stress test with all Jetstress instances concurrently (Multi-Host Test).

This result should be indicative (although not exactly) of the performance you should see under normal circumstances.

Stage 2: Worst case scenario test (90% capacity)

If the usable capacity is 1TB (per node), then make the Jetstress databases total >90% of the usable capacity (in the case of Nutanix, per node usable capacity per Jetstress instance). Nutanix recommends N+1 for any mission-critical application, so the actual cluster utilisation for a 4 node cluster would be ~67.5% utilised (100% – 25% for N+1 creating DBs to use 90% of the other nodes capacity).

Note: Larger clusters equate to higher usable percentage of capacity, e.g.: An 8 node cluster would be 100% – 12.5% for N+1 – 10% = ~78% capacity for the cluster).

Run a short 15 min test and fine tune the threads starting from 12 and reduce until you achieve >= the required IO levels according to the MS Exchange Server Role Requirements Calculator (which has 20% buffer built in), then run a 24 hour Stress test with all Jetstress instances concurrently (Multi-Host Test).

The worst case scenario test shows how the system will perform if the tiering/cache layers are totally saturated, hence the name worst case scenario. This is how Nutanix runs testing for Microsoft ESRP certification to ensure every Nutanix deployment for Exchange performs flawlessly in production.

Real World vs Jetstress

I will be publishing a case study on this topic in the future, but to give you a teaser, a 30k Seat Exchange deployment I designed and validated had roughly 700 IOPS @ 5-15ms Read/Write latency on the Jetstress worst case scenario test and the SSD only Jetstress report was ~4000 IOPS @ 1-2ms for Read and Write I/O. In production the average latency is 3-4ms and the number of messages per day is within +2% of the estimates in the MS Exchange Server Role Requirements Calculator.

The cluster average latency includes read and write I/O as well as other workloads sharing the cluster.

As you can see, a Jetstress result showing 15ms doesn’t sound very impressive, yet the SSD test is super impressive considering the thread count could have been increased to provided higher IOPS, but since the requirement was <500 IOPS, the 4000 IOPS achieved was well in excess of what was required so no further testing was performed.

But now that you understand why Jetstress is not designed for modern tiered shared storage, you can use the above mentioned tests to ensure you get results which are indicative to real world performance and not be fooled by data reduction (Dedupe/Compression) giving you unrealistic high performance.

Summary:

When using tiered storage with MS Exchange Jetstress, ensure:

  • Deduplication is disabled (as it should be for production DAGs)
  • Compression is disabled
  • Erasure Coding (EC-X) is disabled (Nutanix specific)

Once the above is complete, run the following Jetstress tests:

  1. All performance tier test to see best case scenario performance (Indicative of real world performance)
  2. 90% capacity performance test to show worst case scenario performance (which should rarely if ever be experienced)

Related Articles:

 

Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again! – Part 1

I recently wrote a post called Fight the FUD: Nutanix scale limitations which corrected some mis-information VCE COO Todd Pavone has stated in this article COO: VCE converged infrastructure not affected by Dell-EMC about Nutanix scalability.

In the same interview, Todd makes several comments ( see quote below) which I can only trust to be accurate for VSPEX Blue but as he refers more generally about Hyper-converged systems, I have to disagree with many of the comments from a Nutanix perspective, and thought it would be good to discuss where I see Nutanix.

Where does VSPEX Blue fit into the portfolio?

Hyper-converged by definition is where you use software to find technology to manage what people like to call a commoditized infrastructure, where there is no external storage. So, the intelligence is in the software, and you don’t require the intelligence in the infrastructure. In the market, everyone has had an appliance, which is just a server with embedded storage or some marketed software, and ideal for edge locations or for single use cases. But you’re not going to put SAP and run your mission-critical business on an appliance. They have scaling challenges, right? You get to a certain number of nodes, and then the performance degrades; you have to then create another cluster, another cluster. It’s just not an ideal way to go run your mission-critical x86 workloads. [It’s] good for an edge, good for a simple form factors, good for single use cases or what I’ll call more simplified workloads.

In this post I will be specifically discussing Nutanix HCI solution, and while I have experience with and opinions about other products in the market, I will let other vendors speak for themselves.

The following quotes are not in the order Todd mentioned them in the above interview, they have been grouped together/ordered to avoid overlap/repeating comments and to make this blog flow better (hopefully). As such, if any comments appear to be taken out of context, it is not my intention.

So let’s break down what Todd has said:

  • Todd: In the market, everyone has had an appliance, which is just a server with embedded storage or some marketed software, and ideal for edge locations or for single use cases.

I agree that Hyper-converged systems such as Nutanix run on commodity servers with embedded storage. I also agree Nutanix is ideal for edge locations and can be successfully used for single use cases, but as my next response will show, I strongly disagree with any implication that Nutanix (as the markets most innovative leader in HCI, source: Gartner with 52% market share according to IDC) is limited to edge or single use cases.

  • Todd: “It’s just not an ideal way to go run your mission-critical x86 workloads” & “But you’re not going to put SAP and run your mission-critical business on an appliance.”

Interestingly, Nutanix is the only certified HCI platform for SAP.

As an architect, when designing for mission critical workloads, I want a platform which can/is:

a) Start small and scale as required (for example as vBCA’s demands increase)
b) Highly resilient & have automated self healing
c) Fully automated non-disruptive (and low impact) maintenance
d) Easy to manage / scale
e) Deliver the required levels of performance

In addition to the above, the fewer dependancies the better, as there is less to go wrong, troubleshoot, create bottlenecks and so on.

Nutanix HCI delivers all of the above, so why wouldn’t you run vBCA on Nutanix? In fact, the question I would ask is, “Why would you run vBCA on legacy 3 tier platforms”!

With legacy 3 tier in my experience it’s more difficult to start small and scale, typically 3-tier solutions have only two controllers which cannot self heal in the event of a failure, have complex and time consuming patching/upgrading procedures, typically have multiple points of Management (not single pane of glass like Nutanix w/ Acropolis Hypervisor), are typically much more difficult to scale (and require rip/replace).

The only thing most monolithic 3-tier products provide (if architected correctly) is reasonable performance.

Here is a typical example of a Nutanix customer upgrade experience compared to a legacy 3-tier product.

HdexTweetUpgrades

Think the above isn’t a fair comparison? I agree! Nutanix vs Legacy is no contest.

When I joined Nutanix in 2013, I was immediately involved with testing of mission critical workloads & I have no problems saying performance was not good enough for some workloads. Since then Nutanix has focused on building out a large team (3 of which are VCDX with years of vBCA experience) focusing on business critical applications, now applications like SQL, Oracle (including RAC deployments), MS Exchange and SAP are becoming common workloads for our customers who originally started with Test/Dev or VDI.

Think of Nutanix like VMware in 2005, everyone was concerned about performance, resiliency and didn’t run business critical applications on VI3 (later renamed vSphere), but over time everyone (including myself) learned virtualization was infact not only suitable for vBCA it’s an ideal platform. I’m here to tell everyone, don’t make the same mistake (we all did with virtualization) and assume Nutanix isn’t suitable for vBCA and wait 5 years to realise the value. Nutanix is more than ready (and has been for a while) for Mission critical applications.

Regarding Todd’s second statement “But you’re not going to put SAP and run your mission-critical business on an appliance.”

If not on an appliance, then what are we supposed to put mission-critical application on? Regardless of what you think of traditional Converged products, the fact is they are actually just a single SKU for multiple different pre-existing products (generally from multiple different vendors) which have been pre-architected and configured. They are not radically different and nor do they eliminate ongoing operational complexity which is a strength of HCI solutions such as Nutanix.

If anything putting mission critical applications on a simple and highly performant/scalable HCI appliance based solution (especially Nutanix) makes more sense than Converged / 3 Tier products. Nutanix is no longer the new kid on the block, Nutanix is well proven across all industries and on different workloads, including mission critical. Hell, most US Federal agencies including the Pentagon uses Nutanix, how much more critical do you want?  (Also anyone saying VDI isn’t mission critical has rock’s in their head! Think if all your users are offline, how productive is your company and how much use are all your servers?)

Imagine if the sizing of a traditional converged solution is wrong, or a mission critical application outgrows it before its scheduled end of life. Well with Nutanix, add one or more nodes (no rip and replace) and vMotion the workload/s, and you’ve scaled completely non disruptively. In fact, with Nutanix you should intentionally start small and scale as close to a just in time fashion as possible so your mission-critical application can take advantage of newer HW over the 3-5 years! Lower CAPEX and better long term performance, sounds like a WIN/WIN to me!

Even if it were true that Converged (or any other product) had higher peak performance (which in the real world has minimal value) than a Nutanix HCI solution, so what? Do you really want to have point solutions (a.k.a Silos) for every different workload? No. I wrote the following post which covers things to consider when choosing infrastructure which covers why you want to avoid silos which I encourage you to read when considering any new infrastructure.

  • Todd: They have scaling challenges, right? You get to a certain number of nodes, and then the performance degrades; you have to then create another cluster, another cluster.”

My previous post Fight the FUD: Nutanix scale limitations covers this FUD off in detail. In short, Nutanix has proven numerous times we can scale linearly, see Scaling to 1 Million IOPS and beyond linearly! for an example (And this video is from October 2013). Note: Ignore the actual IO number, the importaint factor is the linear scalability, not the peak benchmark number which have little value in the real world as I discuss here: “Peak Performance vs Real World Performance”.

  • Todd:  [It’s] good for an edge, good for a simple form factors, good for single use cases or what I’ll call more simplified workloads.

To be honest i’m not sure what he means by “good for a simple form factors”, but I can only assume he is talking about how HCI solutions like Nutanix has compact 4 node per 2RU form factors and use less rack space, power, cooling etc?

As for single use cases, I recommend customers run mixed workloads for several reasons. Firstly, Nutanix is a truly distributed solution which means the more nodes in a cluster, the more performant & resilient the cluster becomes. Scaling out a cluster also helps eliminate silos which reduces waste.

I recently wrote this post: Heterogeneous Nutanix Clusters Advantages & Considerations which covers how mixing node types works in a Nutanix environment. The Nutanix Distributed Storage fabric has lots of back end optimisations (ran by curator) which have been developed over the years to ensure heterogeneous clusters perform well. This is an example of technology which marketing slides can’t represent the value of, but the real world value is huge.

I have been involved with numerous mission critical application deployments, and there are heaps of case studies available on the Nutanix website for these deployments available at http://www.nutanix.com/resources/case-studies/.

A final thought for Part 1, with Nutanix, you can build what you need today and have mission critical workloads benefit from latest generation HW on a frequent basis (e.g.: Annually) by adding new nodes over time and simply vMotioning mission critical VMs to the newer nodes. So over say a 5 year life span of infrastructure, your mission critical applications could benefit from the performance improvements of 5 generations of intel chipsets not to mention the ever increasing efficiency of the Nutanix Acropolis base software (formally known as NOS).

Try getting that level of flexibility/performance improvements with legacy 3 tier!

Next up, Part 2