What’s .NEXT 2016 – Acropolis X-Fit

Posted on June 15, 2016 by Josh Odgers

Now that Acropolis Hypervisor (AHV) has been GA for approx 18 months (with many customers using it in production well before official GA), Nutanix has had a lot of positive feedback about its ease of deployment, management, scaling and performance. However there has been a common theme that customers have wanted the ability to create rules to seperate VMs and to keep VMs together much like vSphere’s DRS functionality.

Since the GA of AHV, it has supported some basic DRS functionality including initial placement of VMs and the ability to restore a VMs data locality by migrating the VM to the node containing the most data locally.

These features work very well, so the affinity and anti-affinity rules were the main pain point. While AHV is not designed to or aimed to be feature parity with ESXi or Hyper-V, DRS style rules is one area where similar functionality makes sense whereas in many other areas, AHV is and will remain very different to legacy hypervisors.

No surprise the AHV scheduler now provides VM/host affinity and anti-affinity rule capabilities which (similar to vSphere DRS) allows for “Should” and “Must” rules to control how the cluster enforces the rules.

Rule types which can be created:

VM-VM affinity: Keep VMs on a same host.
VM-VM anti-affinity: Keep VMs on separate hosts.
VM-Host affinity: Keep a given VM on a group of hosts.
VM-Host anti-affinity: Keep a given VM out of a group of hosts.
Affinity and Anti-affinity rules are cross-cluster policies.
Users can specify MUST as well as SHOULD enforcement of DRS rules

In addition to matching the capabilities of vSphere DRS, the Acropolis X-Fit functionality is also tightly integrated with both the compute and storage layers and works to proactively identify and resolve storage and compute contention by automatically moving virtual machines while ensuring data locality is optimised.

There are many other exciting load balancing capabilities to come so stay tuned as the AHV scheduler has plenty more tricks up its sleeve.

Related .NEXT 2016 Posts

What’s .NEXT 2016 – Metro Availability Witness

Posted on June 15, 2016 by Josh Odgers

In 2014, Nutanix introduced Metro Availability which allows Virtual Machines to have mobility between sites as well as to provide failover in the event of a site failure.

The goal of the Metro Availability (MA) Witness is to automate failovers in case of inter-site network failures or site failures. By the virtue of running the Witness in a different location than the two Metro Sites, it provides the ‘outside’ view that can determine whether a site is actually down or whether the network connection between the two sites is down, avoiding a split-brain scenario that can occur without an external Witness.

The main functions of a Witness include:

Making failover decision in the event of site or inter-site network failure
Avoiding split brain where the same container is active on both sites
Design to handle situations where a single storage or network domain fails

For example, in the case of a Metro Availability (MA) relationship between clusters, a Witness residing in a separate failure domain (e.g.: 3^rd site) decides which site should be activated when a split brain occurs due to a WAN failure, or in the situation where a site goes down. For that reason, it is a requirement that there are independent network connections for inter-site connectivity *and* for connections to the witness.

How Metro works without a Witness:

In the event of the primary site failure (the site where the Metro container is currently active) or the links between the sites going offline, the Nutanix administrator is required to manually Disable Metro Availability and Promote the Container to Active on the site where VMs are desired to be ran. This is a quick and simple process, but it is not automated which may impact the Recovery Time Objective (RTO).

In case of a communication failure with the secondary site (either due to the site going down or the network link between the sites going down), the Nutanix administrator can configure the system in two ways:

Automatic: the system will automatically disable Metro Availability on the container on the primary site after a short pause if the secondary site connection doesn’t recover within that time
Manual: wait for the administrator to manually take action

How Metro Availability (MA) works with the witness:

With the new Witness capability, the process of disabling Metro Availability and Promoting the Container in case of a site outage or a network failure is fully automated which ensures the fastest possible RTO. The Witness functionality is only used in case of a failure, meaning a Witness failure itself will not affect VMs running on either site.

Failure Scenarios Addressed by MA Witness.

There are a number of scenarios which can occur and Metro Availability responds differently depending on if MA is configured in “Witness mode”, “Automatic Resume mode” or in “Manual mode”.

The following table details the scenarios and the behaviour based on the configuration of Metro Availability.

In all cases except a failure at both Site 1 and Site 2, the MA Witness automatically handles the situation and ensures the fastest possible RTO.

The following videos show how each of the above scenarios function.

Deployment of the Metro Availability (MA) Witness:

The Witness capability is deployed as a stand-alone Virtual Machine, that can be imported on any hypervisor in a separate failure domain, typically a 3^rd site. This VM can run on non-Nutanix hardware. This site is expected to have dedicated network connections to Site 1 and Site 2 to avoid a single point of failure.

As a result, MA Witness is quick and easy to deploy, resulting in lower complexity and risk compared to other solutions on the market.

Summary:

The Nutanix Metro Witness completes the Nutanix Metro Availability functionality by providing completely automatic failover in case of site or networking failures.

Related .NEXT 2016 Posts

Nutanix AHV/AOS Functionality – Removing nodes

Posted on May 30, 2016 by Josh Odgers

A Nutanix ADSF (Acropolis Distributed Storage Fabric) is designed to live forever, meaning as new nodes are added and older nodes removed, the cluster remains online and critically, in a fully resilient state at all times.

While this might not sound that critical, it avoids problems which have plagued legacy (and even many modern) datacenter products where forklift upgrades/replacements are not only complex, high risk and time consuming, they typically also reduce the resiliency of the platform throughout the process.

A common example of reduced resiliency is where one (of two) SAN/NAS controllers is taken offline during a fork lift storage controller upgrade, meaning a single failure can cause the storage to be offline.

Nutanix has now been shipping product for around 5 years so we have had many customers go through hardware refresh cycles, and many more who are about to embark on a HW refresh.

I thought I would quickly demonstrate how easy it is to remove an old node from a cluster and ensure existing and prospective Nutanix customers have the facts about the node removal process.

Firstly lets look at the environment the demonstration is performed on.

We have an AHV environment with 8 nodes with a mix of NX3050 and NX6050 spread over 3 blocks as shown in Nutanix PRISM UI (below).

To remove a host, all we need to do is go to the hardware tab in PRISM, click the host we want to remove and select Remove Host as shown below.

No preparation tasks are required at all which also means less planning and change control is required. Once you select Remove Host, the host enters maintenance mode and starts performing the required tasks to remove the node as shown below.

As you can see, Acropolis OS (AOS) is removing each individual disk from the cluster before taking the node out of the cluster. This means the configured Resiliency Factor (RF) is always in compliance, ensuring that data is still available even in the event of a drive or node failure. This can be observed on the PRISM Home screen in the data resiliency view shown below.

This process is handled by the curator function of AOS and because data is distributed throughout all nodes within the cluster, the process is both lower impact than traditional RAID based solutions or solutions using RAID+Replication, as well as faster because all nodes and therefore CVMs, SSDs and HDDs participate in the process. Nutanix ADSF does not mirror or replicate data from one node to another node, but to and from all nodes. This eliminates the potential bottleneck of a single node.

The following shows the speed at which Nutanix Distributed Storage Fabric (ADSF) performs the data migration even when the majority of data resides on the HDD tier (including in this example).

For a cluster with 20 x 1TB and 20 x 4TB SATA spindles for a total of 100TB of SATA and just 6.4TB SSD (or approx 6.5%) the node removal rate where it reached >830MBps quite impressive since most of the extents (data) which needed to be replicated throughout the cluster were retrieved from SATA tier.

The rate at which a node can be removed will vary depending on the front end I/O, node types and cluster size with larger cluster sizes able to remove nodes faster due to more available controllers (CMVs) and importantly more choice of source and destination of extents.

The process can be monitored via the Tasks view (shown earlier) or at a very granular level such as per disk (SSD or HDD).

The below shows us the status of the disk is Migrating Data and it also shows the drive had a significant amount of data on it as this was not an empty cluster demonstration. In fact this screen shot was taken about halfway through the node removal process.

So many of you may be wondering what the CVM CPU utilisation is throughout this process During the process I took the following screenshot showing the eight Controller VMs, there vCPU configuration (8 vCPUs) and the CPU utilisation.

As we can see, the utilisation ranges from just 6% through to 16% with an average of just under 10%. It should be noted these nodes are using Intel Ivy Bridge processors so with latest generation Intel Broadwell chipsets the process would use less percentage of CPU and perform faster (due to higher per core performance) than on this 3 year old equipment.

Note: The CVM is not just doing IO processing. It is providing the full AHV / AOS management stack which makes the fact the CVM is using under 10% CPU even more impressive.

The Remove host task also resets the configuration of the Controller VM (CVM) back to default which ensures the node can be quickly/easily added to a new or existing cluster.

The end result is a fully functional 7 node cluster as shown below.

Summary:

Node removal from a Nutanix cluster (regardless of hypervisor) is a 1-Click, Non disruptive operation which maintains cluster resiliency at all times while being a fast and low impact process.

Related Articles:

1. VMware you’re full of it (FUD) : Nutanix CVM/AHV & vSphere/VSAN overheads

2. Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor

3. Think HCI is not an ideal way to run mission-critical x86 workloads? Think Again!

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: AHV

What’s .NEXT 2016 – Acropolis X-Fit

What’s .NEXT 2016 – Metro Availability Witness

Nutanix AHV/AOS Functionality – Removing nodes

Share this:

Share this:

Share this: