With IP storage (particularly NFS in my experience) becoming more popular over recent years, I have been designing more and more VMware solutions with IP Storage, both iSCSI and NFS.
The purpose of this post is not to debate the pros and cons of IP storage, or Block vs File, or even vendor vs vendor but to explore how to ensure a VMware environments (vSphere 4 and 5) using IP storage can be made as resilient as possible purely from a VMware HA perspective. (I will be writing another post on highly available vNetworking for IP Storage)
So what are some considerations when using IP storage and VMware HA?
In many solutions I’ve seen (and designed), the ESXi Management VMKernel is on “vSwitch0” and uses two (2) x 1GB NICs while the IP storage (and Data network) is on a dvSwitch/es and uses two or more 10Gb NICs which connect to different physical switches than the ESXi Management 1GB NICs.
So does this matter? Well, while it is a good idea, there are some things we need to consider.
What happens if the 1GB network is offline for whatever reason, but the 10GB network is still operational?
Do we want this event to trigger a HA isolation event? In my opinion, not always.
So lets investigate further.
1. Host Isolation Response.
Host Isolation response is important to any cluster, but for IP storage it is especially critical.
How does Host Isolation Response work? Well, in vSphere 5, it requires 3 conditions to be met
1. The host fails to receive heartbeats from the HA master
2. The host does not receive any HA election traffic
3. Failing conditions 1&2 , the host attempts to ping the “isolation address/es” and is unsuccessful.
4. The isolation response is triggered
So in the scenario I have provided, the goal is to ensure that if a host becomes isolated from the HA Primary nodes (or HA Master in vSphere 5) via the 1GB Network that the host does not unnecessarily trigger the “host isolation response”.
Now why would you want to stop HA restarting the VM on another host? Don’t we want the VMs to be restarted in the event of a failure?
Yes & No. In this scenario its possible the ESXi host still has access to the IP Storage network, and the VM the data network/s via the 10Gb Network. The 1Gb Network may have suffered a failure, which may effect management, but it may be desirable to leave the VMs running to avoid outages.
If both the 1GB and 10GB networks go down to the host, this would result in the host being isolated from the HA Primary nodes (or HA Master in vSphere 5), the host would not receive HA election traffic and the host would suffer an “APD” (All Paths Down) condition. HA isolation response will then rightly be triggered and VMs will be “Powered Off”. This is desirable as the VMs could then be restarted on the surviving hosts assuming the failure is not network wide.
Here is a screen grab (vSphere 5) of the “Host Isolation response” setting, which is located when you right click your cluster “Edit Settings”, “vSphere HA” and “Virtual Machine Options”.
The host isolation response setting for environments with IP Storage should always be configured to “Power Off” (and not Shutdown). Duncan Epping explained this well in his blog, so no need to cover this off again.
But wait, there’s more! 😉
How do I avoid false positives which may cause outages for my VMs?
If using vSphere 5, we can use Datastore Heartbeating (which I will discuss later), but in vSphere 4 some more thought needs to go into the design.
So lets recap step three in the isolation detection process we discussed earlier
“3. Failing conditions 1&2 , the host attempts to ping the “isolation address/es”
What is the “isolation address”? By default, its the ESXi Management VMKernel default gateway.
Is this the best address to check for isolation? In a environment without IP storage, normally in my experience it is suitable, although it is best to discuss this with your Network architect as the device you ping needs to be highly available. Note: It also needs to respond to ICMP!
When using IP storage, I recommend overriding the default by configuring the advanced setting “das.usedefaultisolationaddress” value to “false”. Then configure the “das.isolationaddress1” through “das.isolationaddress9” with the IP address/es of your IP storage (in this example, Netapp vFilers), the ESXi host will now ping your IP storage assuming the HA Primaries (or “Master” in vSphere 5) is unavailable and no election traffic is being received) to check if it is isolated or not.
If the host/s complete the isolation detection process and are unable to ping any of the isolation addresses (IP Storage), (and therefore the ESXi host will not be able to access the storage) it will declare itself isolated and trigger a HA isolation response. (Which should always be “Power Off” as we discussed earlier)
The below screen shot shows the Advanced options and the settings chosen.
In this case, the IP Storage (Netapp vFilers) are connected to the same physical 10Gb Switches and the ESXi hosts (one “hop”) so they are a perfect way to test network connectivity of the network and access to the storage.
In the event the IP Storage (Netapp vFilers) are inaccessible, this alone would not trigger HA isolation response as the connectivity to the HA Primary nodes (or HA Master in vSphere 5) may still be functional. If the Storage is in fact inaccessible for >125secs (if using default settings – NFS “HeartbeatFrequency” of 12 seconds & “HeartbeatMaxFailures” of 10) the datastore/s will be marked as Unavailable and a “APD” event may occur. See VMware KB 2004684 for details on APD events.
Below is a screen grab of a vSphere 5 host showing the advanced NFS settings discussed above.
Note: With Netapp Storage it is recommended to configure the VMs with a disk timeout of 190 seconds, to allow for intermittent network issues and/or total controller loss (which takes place in <180 seconds, usually much less), and therefore the VMs can continue running and no outage is caused.
My advice would be modifying the “das.usedefaultisolationaddress” and “das.isolationadressX” is an excellent way in vSphere 4 (and 5) of ensuring your host is isolated or not by checking the IP storage is available, after all, the storage is critical to the ESXi hosts functionality! 😀
If for any reason the IP Storage is not responding, assuming the HA isolation detection process step 1 & 2 have completed, an isolation event is triggered and HA will take swift action (Powering Off the VM) to ensure the VM can be restarted on another host (assuming the issue is not network wide).
Note: Powering Off the VM in the event of Isolation helps prevent a split brain scenario where the VM is live on two hosts at the same time.
While datastore heart-beating is an excellent feature, it is only used by the HA Master to verify if a host is “isolated” or “failed”, the “das.isolationaddressX” setting is a very good way to ensure your ESXi host can check if the IP storage is accessible or not, and in my experience (and testing) works well.
Now, this brings me onto the new feature in vSphere 5…..
2. Datastore Heart beating.
It provides that extra layer of protection from HA isolation “false positives”, but adds little value for IP Storage unless the Management and IP Storage run over different physical NICs (in the scenario we are discussing they do).
Note: If the “Network Heartbeat” is not received, and the “Datastore Heartbeat” is not received by the HA Master, the host is considered “Failed” and the VMs will be restarted. But, If the “Network Heartbeat” is not received & “Datastore Heartbeat” is received by the HA Master, The host is “Isolated” and HA will trigger the “Host isolation response”.
The benefit here, in the scenario I have described, the “das.usedefaultisolationaddress” setting is “false” preventing HA trying to ping the VMK default gateway & “das.isolationaddress1” & “das.isolationaddress2” have been configured so HA will ping the IP Storage (vFilers) to check for isolation.
Datastore heartbeats, was configured to “Select any of the cluster datastores taking into account my preferences”. This allows a VMware administrator to specify a number of datastores , and these should be datastore critical to the operation of the cluster (Yes, I know, almost every data store will be important).
In this case, being a Netapp environment, the best practice is to separate OS / Page-file / Data / vSwap etc.
Therefore I decided to select the Windows OS & the Swap File datastores, as without these, all the VMs would not function, so they are the logical choice.
The below screen grab shows where Datastore heart-beating is configured, under the Cluster settings.
So what has this achieved?
We have the ESXi host pinging the isolation addresses (Netapp Filers), and we have the HA Master checking Datastore Heartbeating to accurately identify if the host is failed , isolated or partitioned. In the event HA Master does not receive Network heartbeats or Datastore heartbeats, then it is extremely likely there has been a total failure of the network (at least for this host) and the storage is no longer accessible, which obviously means the VMs cannot run, and therefore the host will be considered “Failed” by the master. The host will then trigger the configured “host isolation response” which for IP storage is “Power off”.
QUOTE: Duncan Epping – Datastore Heartbeating “To summarize, the datastore heartbeat mechanism has been introduced to allow the master to identify the state of hosts and is not use by the “isolated host” to prevent isolation.”
I couldn’t have said it better myself.
If the failure is not effecting the entire cluster, then the VM will power off and be recovered by VMware HA shortly there after. If the network failure effects all hosts in the cluster, then the VM will not be restarted until the network problem is resolved.
Another quality post by the BigO
Coming from a Hyper-v environment, features like this are why vSphere runs rings around it (Anyone who knows hyper-v will understand) 🙁
Keep them coming, good reading material #90 🙂
Hi Josh. This was indeed a quality article. Coincidentally, it has reinforced some of the findings from vmware HA “crash tests” I’ve been running only in the last week or 2.
My HA crash tests were performed on a vmware 4.1 platform, using das-usedefaultisolationaddress = false, and a series of das-isolationaddressXes which include the default (mgmt) network gateway, AND our 2 netapp filer addresses, on a different network of course, exactly as in your described scenario.
Long story short: my overall conclusion, from this article, and indeed from my tests, is that the mgmt network shouldn’t even come into the picture vis-a-vis HA advanced options! I am thinking it’s entirely sufficient and indeed no less and no more perfectly correct, even for vmware v4.1, to have only TWO isolation addresses – those of your 2 netapp filers, and nothing more.
The reason for this (from the HA perspective) is that the non-pingability of those vfiler addresses is the most effective “simulation” and/or “guarantor” of an ACTUAL host failure.
If I’m not interested in any “VM specific” monitoring from the HA perspective (such as das.uptime or whatever, due to having other, non-vmware based monitoring already in place for the moment), is this not a perfectly good HA advanced option scenario, even for v4.1?
Long story short: I can’t speculate on any scenario in which an ACTUAL host failure would mean anything else OTHER than the loss of the ESXI storage network, hence there’s no point in even considering the mgmt network (or isolation addresses contained within it) in HA advanced options.
Strangely enough, some months ago we had a gigantic “false positive” where the mgmt network really went down in a big way, the vmware HA settings were all wrong, and this resulted in a large number of production machines being rebooted for no reason. This adds yet more weight to the argument that one should ignore the mgmt network when configuring vmware HA with IP storage.
Thanks for your comments,
Andrei