Example Architectural Decision – Host Isolation Response for IP Storage

Problem Statement

What are the most suitable HA / host isolation response when using IP based storage (In this case, Netapp HA Pair in 7-mode) when the IP storage runs over physically separate network cards and switches to ESXi management?

Assumptions

1. vSphere 5.0 or greater (To enable use of Datastore Heartbearting)
2. vFiler1 & vFiler2 reside on different physical Netapp Controllers (within the same HA Pair in 7-mode)
3. Virtual Machine guest operating systems with an I/O timeout of 190 seconds to allow for a Controller fail-over (Maximum 180 seconds)

Motivation

1. Minimize the chance of a false positive isolation response
2.Ensure in the event the storage is unavailable that virtual machines are promptly shutdown to minimize impact on the applications/data.

Architectural Decision

Turn off the default isolation address and configure the below specified isolation addresses, which check connectivity to multiple Netapp vFilers (IP storage) on the vFiler management VLAN and the IP storage interface.

Utilize Datastore heartbeating, checking multiple datastores hosted across both Netapp controllers (in HA Pair) to confirm the datastores themselves are accessible.

Services VLANs
das.isolationaddress1 : vFiler1 Mgmt Interface 192.168.1.10
das.isolationaddress2 : vFiler2 Mgmt Interface 192.168.2.10

IP Storage VLANs
das.isolationaddress3 : vFiler1 vIF 192.168.10.10
das.isolationaddress4 : vFiler2 vIF 192.168.20.10

Configure Datastore Heartbeating with “Select any of the clusters datastores taking into account my preference” and select the following datastores

  • One datastore from vFiler1 (Preference)
  • One datastore from vFiler2 (Preference)
  • A second datastore from vFiler1
  • A second datastore from vFiler2

Configure Host Isolation Response to: Power off.

Justification

1. The ESXi Management traffic is running on a standard vSwitch with 2 x 1GB connections which connect to different physical switches to the IP storage (and Data) traffic (which runs over 10GB connections). Using the ESXi management gateway (default isolation address) to deter main isolation is not suitable as the management network can be offline without impacting the IP storage or data networks. This situation could lead to false positives isolation responses.
2. The isolation addresses chosen test both data and IP storage connectivity over the converged 10Gb network
3. In the event the four isolation addresses (Netapp vFilers on the Services and IP storage interfaces) cannot be reached by ICMP, Datastore heartbeating will be used to confirm if the specified datastores (hosted on separate physical Netapp controllers) are accessible or not before any isolation action will be taken.
4. In the event the two storage controllers do not respond to ICMP on either the Services or IP storage interfaces, and both the specified datastores are inaccessible, it is likely there has been a catastrophic failure in the environment, either to the network, or the storage controllers themselves, in which case the safest option is to shutdown the VMs.
5. In the event the isolation response is triggered and the isolation does not impact all hosts within the cluster, the VM will be restarted by HA onto a surviving host.

Implications

1. In the event the host cannot reach any of the isolation addresses, and datastore heartbeating cannot access the specified datastores, virtual machines will be powered off.

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address

For more details, refer to my post “VMware HA and IP Storage

Native NFS Snapshots (VAAI) w/ VMware View Composer (View 5.1)

Following my post on Netapp Edge VSA and the Rapid Clone utility, I thought it was obvious to write a piece on the new VAAI functionality in VMware View 5.1 which allows the use of Netapp Native NFS snapshots (VAAI) for VMware View Composer linked clone deployments.

This feature is really the missing piece of the puzzle as the Rapid Clone Utility (RCU) could deploy large numbers of desktops very quickly, however it could only do manual pools which may have been a pain point for some customers.

So lets jump right in.

To take advantage of native NFS snapshot functionality within VAAI you need to install the NFS VAAI Plugin.

The official documentation on the plugin can be found at here.

The easiest way however, is too download the offline bundle from now.netapp.com and use the VSC plugin to complete the installation, see below for instructions.

The below screenshots are designed to be visual aids to support the above written instructions.

The below is the VSC plugin main screen.

Click the “Tools” option on the left hand side

Click the “Install on host” button, then select the hosts you want to install the plugin on and press “Install”

Select “Yes” to confirm the installation

The installation will begin as shown below. The installation was not super fast for me, so be patient.

After around 3 mins (in my lab anyway), it should complete, following which, reboot your host/s.

The easiest way to confirm if the installation was successful is to check the “Hardware Accelerated” column (on the far right) for your datastores. Ensure it is now showing “Supported” as per the below example.

If for some reason it still shows “Not Supported”, reboot your host, and if that doesn’t work, reinstall the plugin.

Now that we have the plugin installed, its time to get into VMware View Administrator.

Launch the web interface to your connection broker, and login.

You should see similar to the below after logging in.

Note: My system health shows some errors due to not have signed certificates, this will not impact the functionality.

Now, this article assumes your environment is already configured with a vCenter and View Composer server, like the below. If you do not have vCenter and View Composer configured, this article does not cover these steps.

The below shows the VMware View administrator console, To create a Pool (with or without Native NFS Snapshots) , we use the “Add” button shown below.

In the Pool definitions section, we start at the “Type” menu.

For this example, to use View Composer, we select the “Automated Pool” option and press “Next”.

The “User Assignment” screen gives us two (2) options, both options can leverage the Native NFS snapshots, but in this case, I have selected “Floating”.

In the “vCenter server” menu, Select “View Composer linked clones” and press “Next”.

We are now in the “Settings” section of the Add Pool wizard, Here we set the ID and Display Name, for this example, the ID and display name are both set to “W7TestPool”. After you set your ID and Display name, press “Next”.

In Pool settings,  I have chosen to leave everything default for this example. In the real world, each of these settings should be carefully considered.

In the “Provisioning settings” menu the two (2) main things to do is to set the naming pattern, which should be a logical name for your environment, followed by {n:fixed=3}, this basically results in three (3) digits after your chosen name so you can support VMs 001 through 999.

Then we select the maximum number of desktops and the number of spare desktops.

In this example I want to provision all desktops up-front to demonstrate the speed of deployment.

In a production environment this would not generally be the most efficient setting.

The “View Composer” disks menu allows us to configure “disposable disks”, for this example these are not required as no users will be using the desktops I am deploying as this is a test lab. However, in a production environment this is an option you need to carefully consider.

The “Storage Optimization” menu allows both persistent and replica disks to be separated from OS disks. Again, this is something to carefully consider in your production environments, but is not relevant to this example. As such neither option it used.

Now we select the Parent VM & Snapshot, In this case, I am using a Windows 7 VM which I have prepared. There is nothing special about this image, it is just a bare Windows 7 installation patched using Windows update, nothing more.

For this example, I am using my “MgmtCluster”, which is  just a cluster with my physical ESXi 5.0 host.

The datastores option is important, to make full use of the Native NFS Snapshots, the Parent VM should be in the same NFS datastore.

I have selected “NetappEdge_Vol1” as this is where my Parent VM resides, You have the option to set the “Storage Overcommitment” as shown below, however this is not relevant as we’re using the Native NFS Snapshots option later in the wizard.

The below shows all options are completed, now we hit “Next”.

Here we see we have the option to “Use native NFS snapshots (VAAI), if this is greyed out, you may have an issue with the plugin install OR the datastore you have selected is not on your Netapp Edge/FAS or IBM N-Series controller.

We can also use Host caching (CBRC) which will generally provide good performance, as such I have left it enabled.

In the guest customization section, we can set an AD container where we want the linked clones to reside, in a production environment you should use this feature, but for this demonstration its not relevant.

You can also use QuickPrep or Sysprep – Each has Pros & Cons, but both work with the Native NFS snapshots.

Now we’re done, so all we need to do is hit “Finish”.

Now, I have included the below screen shot of the datastores prior to the Linked clones being deployed as a baseline to show there is 31.50GB Free on “NetappEdge_Vol1” which will be used for this demonstration.

Having completed the “Add Pool” wizard, after a short delay, the initial clone of the master VM will start, you will see a task similar to the below appear.

We can also see from the above the first two clones took just 20 seconds.

Now see the next screen shot, where the tenth VM is powering on, thus confirming the storage (or cloning) part of the process is complete. Note the completed time of 20:53:39, which is 20:50:12 , this means all 10 VMs we’re cloned, and registered to vCenter is just 3mins 27seconds (or 20.7 seconds per 10Gb VM).

At this stage the VMs are all booting up, and customizing etc before registering with the connection broker.

This step in the process is largely dependent on the amount of compute in your cluster and the storage performance (mostly from a read) perspective. As I have only a single host with my servers / storage and desktops running on the same host, the time it takes to complete this step will be longer than a production environment.

In conclusion the new functionality with Native NFS Snapshots (VAAI) clearly demonstrates a significant step forward to improving desktop provisioning times w/ View Composer. It also basically removes the compute and I/O impact on your vSphere cluster and storage array.

The performance appears to be similar to the performance of the Rapid Clone Utility (RCU) without the restriction of having to use “Manual Pools”.

As such I would encourage anyone looking at VDI solutions to consider this technology, as it has a number of important benefits over traditional “dumb disk”.

VMware Clusters – Scale up or out?

I get asked this question all the time, is it better to Scale up or out?

The answer is of course, it depends. 🙂

First lets define the two terms. Put simply,

Scale Up is having larger hosts, and less of them.

Scale Out is having more smaller hosts.

What are the Pro’s and Con’s of each?

Scale Up 

* PRO – More RAM per host will likely achieve higher transparent memory sharing (higher consolidation ratio!)

* PRO – Greater CPU scheduling flexibility as more physical cores are available (less chance for CPU contention!)

* PRO – Ability to support larger VMs (ie: The 32vCPU monster VM w/ 1TB RAM)

* PRO – Larger NUMA node sizes for better memory performance. Note: For those of you not familiar with NUMA, i recommend you check out Sizing VMs and NUMA nodes | frankdenneman.nl

* PRO – Use less ports in the Data and Storage networks

* PRO – Less complex DRS simulations to take place (every 5 mins)

* CON – Potential for Network or I/O bottlenecks due to larger number of VMs per host

* CON – When a host fails, a larger number of VMs are impacted and have to be restarted on the surviving hosts

* CON – Less hosts per cluster leads to a higher HA overhead or “waste”

* CON – Less hosts for DRS to effectively load balance VMs across

Scale Out

* CON – Less RAM per host will likely achieve lower transparent memory sharing (thus reducing overcommitment)

* CON – Less physical cores may impact CPU scheduling (which may lead to contention – CPU ready)

* CON – Unable to support larger VMs (ie: 8vCPU VMs or the 32vCPU monster VM w/ 1TB RAM)

* CON – Use more ports in the Data and Storage networks – ie: Cost!

* PRO – Less likely for Data or I/O bottlenecks due to smaller number of VMs per host

* PRO – When a host fails, a smaller number of VMs are impacted and have to be restarted on the surviving hosts

* PRO – More hosts per cluster may lead to a lower HA overhead or “waste”

* PRO – Greater flexibility for DRS to load balance VMs

Overall, both Scale out and up have their advantages so how do you choose?

When doing your initial capacity planning exercise, determine how many VMs you will have day 1 (and their vCPU/RAM/Disk/Network/IOPS) and try and start with a cluster size which gives you the minimum HA overhead.

Example: If you have 2 large hosts with heaps of CPU / RAM your HA overhead is 50%, if you have 8 smaller hosts your overhead is 12.5% (both with N+1).

As a general rule, I believe the ideal cluster would be large 4 way hosts with a bucket load of ram and around 16-24 hosts. This would be in my opinion the best of both worlds. Sadly, few environments meet the requirements (or have the budget) for this type of cluster.

I believe a cluster should ideally start with enough hosts to ensure a sufficient number of hosts to minimize the initial HA overhead (say <25%) and ensure DRS can load balance effectively, then scale up (eg: RAM) to cater for additional VMs. If more compute power is required in future, scaling out and then scaling up (add RAM) further. I would generally suggest not to design to the maximum, so up to 24 node clusters.

From a HA perspective, I feel in a 32 node cluster, 4 hosts worth of compute should be reserved for HA, or 1 in 8 (12.5% HA Reservation). Similar to the RAID-DP concept from Netapp, of 14+2 disks in a RAID pack.

Tip: Choose Hardware which can be upgraded (Scaled up) . Avoid designing a cluster with hosts hardware specs maxed out day 1.

There are exceptions to this, such as Management clusters, which may only have (and need) 2 or 3 hosts over their life span, (eg: For environments where vCloud Director is used), or environments with static or predictable workloads.

To achieve the above, the chosen hardware needs to be upgradable, ie: If a Servers maximum RAM is 1TB, you may consider only half populating it (being careful to choose DIMMs that allow you to expand) to enable you to scale up as the environments compute requires grow.

Tip: Know your workloads! So use tools like Capacity Planner so you understand what your designing for.

It is very important to consider the larger VMs, and ensure the hardware you select has suitable number of physical cores.

Example: Don’t expect 2 x 8vCPU VMs (highly utilized) to run well together on a 2 way 4 core host.

When designing a new cluster or scaling an existing one, be sure to consider the CPU to RAM ratio, so that you don’t end up with a cluster with heaps of available CPU and maxed out memory or vice versa. This is a common mistake i see.

Note: Typically in environments I have seen over many years, Memory is almost always the bottleneck.

The Following is an example where a Scale Out and Up approach end up with very similar compute power in their respective clusters, but would likely have very different performance characteristics and consolidation ratios.

Example Scenario: A customer with 200 VMs day one , and lets say the average VM size is 1vCPU / 4GB RAM but they have 4 highly utilized 8vCPU / 64GB Ram VMs running database workloads.

The expected consolidation ratio is 2.5:1 vCPUs to physical cores and 1.5:1 vRAM to physical Ram.

The customer expects to increase the number of VMs by 25% per year, for the next 3 years.

So our total compute required is

Day one : 92.8 CPU cores and 704GB Ram.

End of Year 1 : 116 CPU cores and 880GB Ram.

End of Year 2 : 145 CPU cores and 1100GB Ram.

End of Year 3 : 181 CPU cores and 1375GB Ram.

The day 1 requirements could be achieved in a number of ways, see two examples below.

Option 1 (Scale Out) – Use 9 hosts with 2 Way / 6 core / 96GB Ram w/ HA reservation of 12% (~N+1)

Total Cluster Resources = 108 Cores & 864GB RAM

Usable assuming N+1 HA = 96 cores & 768GB RAM

Option 2 (Scale Up) – Use 4 hosts with 4 Way / 8 core / 256GB Ram w/ HA reservation of 25% (~N+1)

Total Cluster Resources = 128 Cores & 1024GB RAM

Usable assuming N+1 HA = 96 cores & 768GB RAM

Both Option 1 and Option 2 appear to meet the Day 1 compute requirements of the customer, right?

Well, yes, at the high level, both scale out and up appear to provide the required compute resources.

Now lets review how the clusters will scale to meet the End of Year 3 requirements, after all, we don’t design just for day 1 do we. 🙂

End of Year 3 Requirements : 181 CPU cores and 1375GB Ram.

Option 1 (Scale Out) would require ~15 hosts (2RU per host) based on CPU & ~15 hosts based on RAM plus HA capacity of ~12% (N+2 as the cluster is >8 hosts.) taking the total required hosts to 18 hosts.

Total Cluster Resources = 216 Cores & 1728GB RAM

Usable assuming N+2 HA = 190 cores & 1520GB RAM

Note: At between 16 and 24 hosts N+3 should be considered. (Equates to 1 spare host of compute per 8 hosts)

Option 2 (Scale Up) – would require Use 6 hosts (4RU per host) based on CPU &  5 hosts based on RAM plus HA capacity of ~15% (N+1 as the cluster is <8 hosts.) taking the total required hosts to 7 hosts.

Total Cluster Resources = 224 Cores & 1792GB RAM

Usable assuming N+1 HA = 190 cores & 1523GB RAM

So on the raw compute numbers, we have two viable options which scale from Day to end of Year 3 and meet the customers compute requirement.

Which option would I choose I hear you asking, good question.

I think I could easily defend either Option, but I believe Option 2 would be be more economically viable and result in better performance. The below are a few reasons for my conclusion.

* Option 2 Would give significant transparent page sharing, compared to Option 1 therefore getting a higher consolidation ratio.

* Option 2 would likely be much cheaper from a Network / Storage connectivity point of view (less connections)

* Option 2 is more suited to host the 4 x 8vCPU highly utilized VMs (as they fit within a NUMA node and will only use 1/4 of the hosts CPU as opposed to 3/4’s of the 2 Way host)

* The 4 way (32 core) host would provide better CPU scheduling due to the large number of cores

* From a data center perspective, Option 2 would only use 28RU compared to 36RU

Note: A cluster of 7 hosts is not really ideal, but in my opinion is large enough to get both HA and DRS efficiencies. The 18 node cluster (option 1) is really in the sweet spot for cluster sizing, but the CPUs did not suit the 8 vCPU workloads. Had Option 1 used 8 core processors that would have made Option 1 more attractive.

Happy to hear everyone’s thoughts on the topic.