Nutanix AOS 5.5 delivers 1M read IOPS from a single VM, but what about 70/30 read/write?

I recently wrote Nutanix AOS 5.5 delivers 1M IOPS from a single VM, but what happens when you vMotion which showed the impact of a vMotion was around -10% for a period of approx. 3 seconds before read performance resumed back to pre-migration levels.

In this post I will be addressing the question about performance for a single VM with a more realistic 70% read, 30% write IO profile which was performed using an 8k IO size and what the impact is during and after a live migration.

While not surprising to Nutanix customers, this result shows a maximum starting baseline of 436K random read and 187k random write IOPS and immediately following the migration performance reduced to 359k read and 164k write IOPS before achieving greater performance than the original baseline @ 446k read and 192k IOPS within a few seconds.

So in comparison to 100% random read which achieved just over 1 million 8k IOPS, the 70/30 mix achieves in the ballpark of 600k IOPS which is very respectable. Not bad for a platform which Nutanix competitors continue to describe as only being good for VDI. Considering even the largest array from a leading all flash SAN vendor is only advertising performance in the hundreds of thousand random read range, it shows Nutanix unique hyper-converged architecture can achieve higher performance than a monolithic all flash array from a single VM.

This shows that with the unique Nutanix Acropolis Distributed Storage Fabric, very high performance at low latency can be achieved with real world IO patterns even during and after live migrating the virtual machine across a distributed platform.

This result is further evidence of the efficiency of Nutanix Acropolis Hypervisor, AHV (which is included at no additional charge with AOS) as well as the IO path running in user space (not the much hyped in-kernel). This is in part thanks to AHV Turbo Mode which optimised the IO path which was announced at .NEXT 2017 in Washington. In addition to these excellent levels of performance, they can be sustained even when using data protection features such as snapshots as shown in recent post I wrote about Nutanix X-ray tool where I used the Snapshot impact scenario to compare Nutanix AHV and a leading hypervisor and SDS product. If you don’t have time to read the post, in short, the Nutanix competitors performance degraded as snapshots were taken while Nutanix AHV’s performance remained consistent which is essential for real world scenarios, especially with business critical applications.

With Nutanix unique ability to scale out performance using storage only nodes, even higher performance can be achieved without modification to the virtual machine to applications which gives Nutanix further advantage over the competition.

Nutanix data locality ensures optimal performance by ensuring new data is always local to the VM and cold data can remain remote indefinitely while only hot data will be migrated locally if/when required at a 1MB granularity. This translates to intelligent data locality and not brute force locality as it is frequently mistaken to be.

Back to Part 1

Nutanix X-Ray Benchmarking tool – Extended Node Failure Scenario

In the first part of this series, I introduced Nutanix X-Ray benchmarking tool which has been designed very differently to traditional benchmarking tools as the performance of the app is the control and the variable is the platform,not the other way around.

In the second part, I showed how Nutanix AHV & AOS could maintain the performance while utilising snapshots to achieve the type of recovery point objective (RPO) that is expected in production environments, especially with business critical workloads whereas a leading hypervisor and SDS platform could not.

In this part, I will cover the Extended Node Failure Scenario in X-Ray and again compare Nutanix AOS/AHV and a leading hypervisor and SDS platform in another real world scenario.

Let’s start by reviewing what the description of the X-ray Extended node failure scenario.

XrayExtendedNodeFailureScenario

I really like that X-ray has a scenario which shows a simulated node failure as this is bound to happen regardless of the platform you choose, and with hyperconverged platforms the impact of a node failure is arguably higher than traditional 3-tier as the nodes contain some data which needs to be recovered.

As such, it is critical before choosing a HCI platform to understand how it behaves in a failure scenario which is exactly what this scenario demonstrates.

XrayNodeFailureComparison

Here we can see the impact on the performance of the surviving VMs following the power being disconnected via the out of band management interface.

The Nutanix AOS/AHV platform continues to run at a very steady rate, virtually without impact to the VMs. On the other hand we see that after 1 hour the other platform has a high impact with significant degradation.

This clearly shows the Acropolis Distributed Storage Fabric (ADSF) to be a superior platform from a resiliency perspective, which should be a primary consideration when choosing a platform for any production environment.

Back in 2014, I highlighted the Problems with RAID and Object Based Storage for data protection and in a follow up post I discussed how Nutanix Acropolis Distributed Storage Fabric (ADSF) compares with traditional SAN/NAS RAID and hyper-converged solutions using Object storage for data protection.

The above results clearly demonstrate the problems I discussed back in 2014 are still applicable to even the most recent versions of a leading hypervisor and SDS platform. This is because the problem is the underlying architecture and bolting on new features is at best masking the constraints of the original architectural decision which has proven to be significantly flawed.

This scenario clearly demonstrates the criticality of looking beyond peak performance numbers and conducting a thorough evaluation of a platform prior to purchase as well as comprehensive operational verification prior to moving any platform into production.

Related Articles:

Nutanix X-Ray Benchmarking tool Part 1 – Introduction

Nutanix X-Ray Benchmarking tool Part 2 -Snapshot Impact Scenario

Nutanix X-Ray Benchmarking tool – Snapshot Impact Scenario

In the first part of this series, I introduced Nutanix X-Ray benchmarking tool which has been designed very differently to traditional benchmarking tools as the performance of the app is the control and the variable is the platform,not the other way around.

This is done by generating realistic IO patterns (e.g.: Not 100% 4k read) and then performing functions against the platform to see how the control (the VM application performance) is impacted by the underlying platforms functionality.

A great example of this is performing snapshots as the first step in a space efficient backup solution.

X-Ray has a built in test which generates an OLTP workload which is ran for 8 hours which for an all flash platform generates 6000 IOPS across the database and 400 IOPS for the logs. The scenario is detailed in the X-Ray report shown below.

XraySnapshotImpactDescription

The Snapshot impact scenario is then ran against multiple platforms and using the Analysis functionality within X-ray. we can generate a report which overlays the results from multiple platforms.

The below example is GA Acropolis Hypervisor (AHV) on AOS 5.1.1 verses a leading hypervisor and SDS platform showing the snapshot impact scenario.

XraySnapshotImpact

Each of the red lines indicate a snapshot and what we observe is the performance of both platforms remains consistent until the 10th snapshot (shown below) where the Nutanix platform continues without impact and the leading hypervisor and SDS platform starts degrading significantly.

XraySnapshotImpactSnap10

In the real world, customers use the intelligent features of storage, SDS or hyper-converged platforms but rarely test how this functionality works prior to purchasing. This is because it’s difficult and time consuming to do so.

Nutanix X-Ray tool makes the process of validating a platforms performance under real world scenarios a quick and easy process and provides automatically generated reports where accurate comparisons can be made.

What this example shows is that while both platforms could achieve the required performance without snapshots, only Nutanix AHV & AOS could maintain the performance while utilising snapshots to achieve the type of recovery point objective (RPO) that is expected in production environments, especially with business critical workloads.

As part of the Nutanix Solutions and Performance engineering organisation, I can tell you that the focus for Nutanix is real world performance, using data reduction, leveraging snapshots, mixing workloads and testing a large scale.

In upcoming posts I will show more examples of X-Ray test scenarios as well as comparisons between GA Acropolis Hypervisor (AHV) & AOS 5.1.1 verses a leading hypervisor and SDS platform.

Related Articles:

Nutanix X-Ray Benchmarking tool Part 1 – Introduction

Nutanix X-Ray Benchmarking tool Part 3 – Extended Node Failure Scenario