Heterogeneous Nutanix Clusters Advantages & Considerations

Lets start with a simple example, the below shows a 4 node cluster mixing 2 x NX-3060 nodes with 2 x NX-8035 nodes. Both node types share the same Haswell CPU types but the NX-3060 has ~2TB usable and the NX-8035 has ~8TB usable.

3060and8035Mixed

Assuming the cluster capacity was 50% utilized the NDSF layer would look similar to this:

3060and8035cluster50percentused

The above shows the NDSF having a total Storage Pool capacity of 20TB with 50% used (10TB). As we have a heterogenous cluster, we have 2 different node types with vastly different usable capacity.

Nutanix Disk Balancing automatically balances the storage to ensure the utilization percentage of all SSDs/HDDs within the cluster are within +-15%. This means administrators do not have to worry about capacity management on a per node basis, capacity management only needs to be performed at the storage pool (cluster) layer.

Advantage 1: No silos of storage capacity is heterogeneous environments

Advantage 2: NDSF disk balancing ensures the data is evenly distributed throughout the cluster

Advantage 3: There is no requirement for hypervisor level storage capacity management such as Storage DRS (SDRS).

For more information on why Storage DRS is not required see: Storage DRS and Nutanix – To use, or not to use, that is the question?

In a heterogeneous environment, it is likely you will have multiple workloads with different capacity and performance requirements. The below diagram shows the same 4 node cluster, with a single storage pool and 4 containers with different data protection and reduction settings to suit a wide range of application requirements.

Note: The RF3 container shown below would only be possible in clusters of 5 nodes or more, but is shown to illustrate the flexibility/capabilities of NDSF.

HetroClusterCapacity

The storage pool itself has up to 20TB usable (assuming RF2 and excluding data reduction savings). In the Pool we can see four Containers which can be thought of as policies which can be applied to Virtual Machines or Virtual Disks.

Container01 is configured with RF2 and In-Line compression and reports 10TB free space as the underlying storage pool (where capacity is managed) is 50% utilised. Therefore the Container reports free space as all the available capacity within the Storage Pool based on its configured RF.

Container02 has RF2, In-Line compression and EC-X enabled but you will note it also reports 10Tb free space, as capacity is not assigned to a container, its shared between all containers within a Storage Pool.

Container03 is configured with a RF3 which is different to Containers 01 and 02, as such the container reports free space based on its configured RF of 3, so it shows 13.3TB usable and 6.66Tb free space as that is the maximum data that can be supported in that container based on its storage policies.

Container04 reports the same free space as Container 01 and 02, as its configured with the same RF. While Container04 has all data reduction technologies enabled, the Container reports actual free space, as data reduction takes effect the usable capacity will change.

It is possible to set capacity reservations on Containers where an application or tenant requires a guarantee as to the usable capacity available, it is also possible to set limits on containers to prevent workloads using more than a specified amount of capacity. However, for most use cases, I recommend not using Reservations or Limits and simply manage capacity at the Storage Pool layer.

Nutanix also supports VMs with more assigned/used capacity than the node they are running on, for more information see: What if my VMs storage exceeds the capacity of a Nutanix node?

Regardless of what node type/s reside within a Nutanix cluster, there is no advanced settings required to be configured such as Queue Depths, VAAI and multi-pathing, which can be required when mixing legacy storage platforms in the same cluster. There is also no requirement for Storage DRS to manage either performance or capacity as discussed earlier.

Advantage 3: No silos of storage capacity, all capacity is shared in the storage pool

Advantage 4: Storage policies such as RF and Data Reduction can be changed on the fly as required and multiple policies are supported within the same cluster.

For more information about Nutanix data reduction technologies, see: Nutanix Implementation of Data Avoidance & Reduction Technologies

Regardless of the mixture of node types and their respective capacity/performance characteristics, there is no advanced configuration required to achieve optimal performance.

Nutanix automatically manages I/O pathing and as data locality ensures most data is read locally and writes are always written local to the VM and then replicas distributed throughout the cluster, it minimizes the chances of hot spots by default.

In the unlikely event one nodes local SSD tier becomes saturated, NDSF will automatically write data across the shared SSD tier until the local nodes SSD tier has sufficient capacity to resume local writes. This avoids the requirement for a storage admin to take any corrective actions.

Advantage 5: In the unlikely event of saturation of a nodes SSD tier, NDSF automatically redirects new I/O until ILM (tiering) can free up capacity within the local tier.

NDSF natively distributes writes throughout all nodes within the cluster. This means all nodes within heterogeneous clusters increase the capacity, performance and resiliency of the entire cluster.

To increase the performance of a single VM, you have numerous options. All you need to do is migrate (vMotion for ESXi, Live Migration for Hyper-V or Migrate for AHV) to a node with higher spec physical processors, more SSD drives and/or more SATA spindles.

There is no requirement to Storage vMotion, or relocate the VM to a new Datastore/Container. NDSF manages the storage layer automatically and will localize hot data if/when required.

Advantage 6: No silos of storage capacity, all capacity is shared in the storage pool

Advantage 7: All nodes contribute to the capacity, performance and resiliency of the cluster

Heterogeneous clusters are managed by a single HTML 5 GUI called PRISM. There is no need to access multiple management interfaces for different storage types.

Advantage 8: Heterogeneous clusters are managed via a single HTML 5 GUI.

Nutanix also supports Pin to SSD which allows workloads requiring all flash to reside within a hybrid (SSD+SATA) cluster and be guaranteed all flash performance.

VMs or Virtual Disks can also be marked to be stored solely in Flash on the fly if/when required and vice versa.

Advantage 9: No silos required for workloads requiring All Flash performance

Nutanix eliminates the complexity around managing performance at a datastore layer. Nutanix supports up to the chosen hypervisors limits, e.g.: vSphere HA limit is 2048 VMs per datastore. As all controllers within a cluster actively service all datastores (Containers), performance isn’t constrained at a datastore layer like with traditional storage products.

For more information see: Unlimited VMs per datastore? Its not a myth with Nutanix!

Advantage 10: No performance concerns/constraints at the datastore level

What about Considerations for Heterogeneous Clusters?

From a performance perspective, always ensure you size to have your N+x (e.g.: N+1 , N+2 etc) node/s sized >= the largest node in the cluster to ensure in the event of a node failure, workloads benefiting from higher performance nodes can failover to equivalent nodes.

From a capacity perspective, for NDSF to be able to restore the configured RF (RF2 or RF3) in the event of a node failure, sufficient capacity must exist within the storage pool. As such, when using high capacity nodes such as NX-8035s , NX-8150s or NX-6035C storage only nodes, ensure you have >= capacity of the largest node free within the storage pool.

Advantage 11: Performance and availability sizing for heterogeneous clusters is simple.

Another consideration is for mission-critical or high I/O applications, spread these evenly across the nodes and ideally ensure the active working set fits within the local SSD tier. Doing so will maximise performance, but in the event a very large workload cannot fit with the local SSD, its data will resided within the shared SSD tier and be actively serviced by multiple Controller VMs.

For more information about sizing see:  Rule of Thumb: Sizing for Storage Performance in the new world.

Advantage 12: The NDSF shared SSD tier ensures in the event a workload exceeds the local SSD capacity that the application still enjoys all flash performance by distributing data intelligently across the cluster.

Over time, when adding new nodes, VMs can be quickly/easily migrated to newer, higher performance/capacity nodes without any preparation. The VMs will immediately benefit from the newer nodes CPU,RAM and storage performance even if most of its data is still stored on older node types.

Older nodes can be non disruptively removed once they are end of life, again without any preparation or administrator intevenston.

Advantage 13: Workloads on NDSF benefit from newer generation nodes immediately without complex design/migration activities.

Summary:

  • Nutanix supports and recommends heterogeneous clusters
  • No complexity with multi-pathing, it’s optimal out of the box
  • No custom per datastore configuration
  • VAAI just works, no advanced configuration required due to mixed node types
  • No compromise required to mix node types
  • No silos of storage capacity, all capacity is shared in the storage pool
  • All nodes contribute to performance of the cluster
  • No balancing VMs across datastores/storage devices is required to improve performance/resiliency
  • NDSF disk balancing ensures the data is evenly distributed throughout the cluster helping avoid hotspots
  • The distribution of RF traffic throughout the cluster also helps avoid hotspots
  • No silos required for workloads requiring all flash performance
  • NDSF ensures VMs can immediately benefit from the addition of newer generation node types
  • Nodes can be added/removed without system administrator performing data migrations

Nutanix Implementation of Data Avoidance & Reduction Technologies

While its not news that Nutanix Distributed Storage Fabric (NDSF) supports numerous data avoidance & reduction technologies, what is less well known is how these technologies can be enabled/disabled and used.

Before we begin, let me cover off what technologies NDSF offers:

Data Avoidance:

  • VAAI-NAS Fast File Clone (for ESXi)
  • View Composer for Array Integration (VCAI) for Horizon View
  • Native NDSF Clones (ESXi, Hyper-V and AHV)
  • ODX Copy Offload (Hyper-V)
  • Crash and Application Consistent snapshots (ESXi, Hyper-V and AHV)

Data Reduction:

  • Compression (In-Line and Post-Process)
  • Deduplication (Fingerprint on Write/In-Line for Performance Tier and/or Capacity Tier)
  • Erasure Coding (EC-X)

Data avoidance is designed to prevent the creation of unnecessary data which removes the requirement to leverage data reduction technologies. This means less work for the storage layer which results in more available front end IO to service the virtual machines.

An example of data avoidance is using VCAI with Horizon View to create Linked Clones near instantly which not only reduces space but ensures faster deployment and recompose activities with greatly reduced impact to the environment.

Data avoidance is greatly underrated in my opinion, as it results in lower compression/deduplication ratios, because there is no additional data to dedupe or compress. If Nutanix turned off these data avoidance technologies, it would result in HIGHER compression and dedupe ratios, which sounds great on a marketing slide or in a tweet, but in reality, avoiding work for the storage is a much better way to do things.

Some vendors report data avoidance such as snapshots in deduplication ratios, and this in my opinion is very misleading and designed to artifically inflate dedupe ratios for competitive purposes. For more information see: Deduplication ratios – What should be included in the reported ratio?

Data Reduction is still a valuable option to have but in my opinion its overrated. The reason I think its overrated is data reduction does not always work well. It greatly depends on your data type if you will see a good data reduction ratio or not, AND if the overheads (of which there is always an overhead) are worth it.

Let’s now focus on the NDSF implementation of Data Reduction technologies.

Compression:

Compression can be configured on new or existing containers and be set to In-Line or Post-Process. For post process, enter a “Delay” value e.g.: 60 to delay compression for 1 Hour, or 3600 for 1 day.

Compression

Compression can be reconfigured at any time, without the requirement to relocate VMs or reformat the storage. For data which is already compressed it will be uncompressed as part of a low priority background task (known as Curator). This ensures there is low/no impact of changing Compression settings, ensuring maximum flexibility for customers.

Because compression is configured per container, you can have VMs or even Virtual Disks running compression alongside VMs or Virtual Disks not running compression within the same NDSF cluster. This helps eliminate silos and ensures mixed workloads with different data types/profiles can co-exist efficiently.

Deduplication:

As with Compression, Deduplication can be configured on new or existing containers and be set to dedupe for the performance tier (SSD) and optionally for the Capacity (HDD) Tier. This means data reduction can be maximised for either or both tiers depending on customer requirements.

dedupeconfig

Again the same as Compression, Dedupe can be reconfigured at any time, without the requirement to relocate VMs or reformat the storage. For data which is already deduped the same low priority background task (Curator) rehydrates the data again ensuring there is low/no impact of changing dedupe settings and ensuring maximum flexibility for customers.

Because dedupe is configured per container, you can have VMs or even Virtual Disks running dedupe alongside VMs or Virtual Disks not running dedupe within the same NDSF cluster. Deduplication is also complimentary to Compression, meaning both can be ran at the same time to maximise data reduction and further eliminate silos ensuring mixed workloads can co-exist efficiently.

Erasure Coding (EC-X):

As with Compression & Dedupe, EC-X is enabled on a per container basis and is complimentary to both Compression and Dedupe. EC-X is a post-process only form of data reduction designed to work on Write cold data (meaning data which is not changing).

EC-X applies to data across the Performance Tier (SSD) and the Capacity Tier (SATA) which means the effective SSD capacity is increased, which means more data can be serviced by SSD, thus increasing performance.

ecxonoff

As previously discussed, NDSF supports different containers using different combinations of data reduction all within the same NDSF cluster to maximise efficiencies and eliminate unnecessary silos.

Summary:

Nutanix provides multiple technologies to minimise the data being stored on the distributed storage fabric while giving customers the flexibility to enable/disable and tune data reduction settings to suit different data profiles all within the same NDSF cluster.

Remember, “one size does not fit all” so it is importaint for the storage layer to be able treat your workloads differently based on their individual requirements.

Related Articles:

Deduplication and MS Exchange

Virtualization and Storage always seem to be a hot topics in regards to Exchange deployments and many of you would have seen my post Virtualizing Exchange on vSphere with NFS backed storage a while back.

This post was motivated by a tweet from fellow VCDX which stated:

dedupe not supported for Exchange, no we can’t turn it off.

Later in the twitter conversation he went on to say

To be clear not an MS employee, another integrator MS “master” certified. It’s the whole NFS thing again

I have heard similar over the years and for me the disappointing thing is the support statement is unclear as are the motivations behind support statements for Exchange in general. e.g.: Support for VMDK on NFS

The only support statement I am aware of regarding Exchange and deduplication is in the technet article “Exchange 2013 storage configuration options” under the section “Volume configurations for the Exchange 2013 Mailbox server role” at it states:

storageexchange

In the above statement which specifically refers to “a new technique to optimize storage utilization for Windows Server 2012” is states that for Stand-alone or High availability solutions de-duplication is not supported for Exchange database file unless the DB files are completely offline and used for backup or archives.

So the first question is “Is array level deduplication supported”?

There is nothing that says that it isn’t supported that I am aware of, so if you are aware of such a statement please let me know in the comments and I will update this post.

My interpretation of the support statement is that array level deduplication is supported and MS have simply called out that the deduplication in Windows 2012 is not. Regardless of if you agree or disagree with my interpretation, I think its safe to say the support statement should be clarified with justification.

The next question I would like to discuss is “Should deduplication be used with Exchange”?

Firstly we should discuss the fact Exchange can be deployed with Database Availability Groups (DAGs) which creates multiple copies of Exchange databases across up to 16 Exchange Mailbox (or Multi-Role) servers.

The purpose of a DAG is to provide high availability for the application and data.

So if the application is by design making duplicate copies, should the storage be undoing this work?

Before I give my opinion on deduplicating DAG copies, I want to be clear on two things:

1. Deduplication is a well proven technology which many different vendors implement either in-line or post process or in some cases both.

2. As array level deduplication is abstracted from the Guest OS and Application, there is no risk to the application such as data corruption or anything like that.

So back to deduplicating DAG copies.

I work for Nutanix and I wrote our best practice guide for Exchange which can be found below. In the guide, I recommended Compression but not deduplication. In an upcoming update of the document the recommendation remains to use compression but adds a further recommendation to use Erasure coding (EC-X) for data reduction.

Nutanix Best Practices Guide: Virtualizing Microsoft Exchange on Web-Scale Converged Infrastructure.

The reason for these recommendations is three fold:

1. Compression + EC-X give excellent data reduction savings for Exchange which generally result in usable capacity higher than RAW capacity while still providing data protection at the storage layer.

2. Deduplicating data which is deliberately written multiple times is a huge overhead on any infrastructure as data is still processed multiple times by the Guest OS, Storage Network and storage controller even if deplicate copies are not written to disk. To be clear, the Guest OS (CPU) and Storage network overhead are not eliminated by dedupe.

3. Nutanix recommends the use of hybrid nodes for Exchange with a small percentage of capacity provided by SSD (for all write I/O and hot data) and a large percentage of capacity provided by SATA. As a result the bulk of the data is stored on low cost SATA so the commercial benefit ($ per GB) of deduplication is minimal especially after compression and EC-X.

In my opinion deduplicating everything regardless of its profile is not the answer, so data reduction such as deduplication, compression and Erasure Coding should be able to be turned off for workloads which give minimal benefit.

For Exchange DAGs, deduplication should give excellent data reduction results in line with the number of DAG copies. So if an Exchange DAG has 4 copies, then approx 4:1 data reduction should be achieved right off the bat. Now this sounds great but when running a DAG on highly available shared storage (SAN/NAS/HCI) it is unnessasary to have 4 copies of data.

In reality, I recommend 2 copies when running on Nutanix because the shared storage provided by Nutanix keeps at least 1 additional copy (if using EC-X) or where using RF2 or RF3, 2 or 3 copies of data meaning in the event of a drive or node failure, the data is still available to the application without requiring a DAG failover. Similar is true when running Exchange on SAN/NAS/HCI solutions with some form of RAID or replication for data protection.

So the benefit of deduplication would therefore reduce to from possibly 4:1 down to 2:1 because only 2 DAG copies are really required if the storage is highly available.

Considering the data reduction from compression and storage solutions supporting Erasure Coding, I think deduplication is only commercially viable/required when using expensive all flash storage which lets face it, is not required for Exchange.

If you have chosen an all flash solution and you want to run all workloads on it and eliminate having silos of infrastructure for different workloads, then by all means deduplicate Exchange DAGs otherwise it will be a super expensive solution. But, in my opinion hybrid is still the best solution overall with the only real advantage of all flash being potentially higher and more consistent performance depending on many factors.

Summary:

I hope that Microsoft clarify their position regarding support for array level data reduction technologies including deduplication with detailed justifications.

I would be disappointed to see Microsoft come out and update the support policy stating deduplication (for array’s) is not supported as there is not technical reason it should not be supported (Happy to be corrected if credible evidence can be provided) regardless of if you think its a good idea or not.

Having worked in the storage industry for a long time, I have seen many different deduplication solutions used successfully with MS Exchange and I am yet to see any evidence that it is not a totally viable and enterprise grade option for Exchange databases.

The question which remains is, do you need to deduplicate Exchange databases? – My thinking is only where your using all flash systems and need to lower cost per GB.

My position being the better solution would be choose a hybrid solution when eliminating silos which gives you the best of all worlds and applications requiring all flash can have all flash and other workloads can use flash for hot data and lower cost SATA for cold storage or data which doesn’t require SSD (like Exchange).