Example Architectural Decision – Host Isolation Response for FC Based storage

Problem Statement

What are the most suitable HA / host isolation settings where the environment uses Storage (IBM SVC) with FC connectivity via a dedicated highly available Storage Area Network (SAN) fabric where ESXi Management and Virtual Machine traffic run over a highly available data network?

Requirements

1. Ensure in the event of one or more hosts becoming isolated, the environment responds in an automated manner to recover VMs where possible

Assumptions

1.The Network is highly available (>99.999% availability)
2. The Storage is highly available (>99.999% availability)
3. vSphere 5.0 or later
4. ESXi hosts are connected to the network via two physical separate switches via two physical NICs

Constraints

1. FC (Block) based storage

Motivation

1. Meet/Exceed availability requirements
2. Minimize the chance of a false positive isolation event

Architectural Decision

Turn off the default isolation address by setting the below advanced setting

“das.usedefaultisolationaddress” = False

Configure three (3) isolation addresses by setting the below advanced settings

“das.isolationaddress1″ = 192.168.1.1 (Core Router)

“das.isolationaddress2″ = 192.168.1.2 (Core Switch 1 )

“das.isolationaddress3″ = 192.168.1.3 (Core Switch 2 )

Configure Datastore Heartbeating with “Select any of the clusters datastores”

Configure Host Isolation Response to: “Shutdown”

Justification

1. When using FC storage, it is possible for the Management and Virtual Machine Networks to be unavailable, while the Storage network is working perfectly. In this case Virtual machines may not be able to communicate to other servers, but can continuing reading/writing from disk. In this case, they will likely not be servicing customer workloads, as such, Shutting the VM down gracefully allows HA to restart the VM/s on host/s which are not isolated gives the VM a greater chance of being able to resume servicing workloads than remaining on an isolated host.
2. Datastore heartbeating will allow HA to confirm if the host is “isolated” or “failed”. In either case, Shutting down the VM will allow HA to recover the VM on a surviving host.
3. As all storage is presented via Active/Active IBM SVC controllers, there is no benefit is specifying specific datastores to be used for heartbeating
4. The selected isolation addresses were chosen as they are both highly available devices in the network which are essential for network communication and cover the core routing and switching components in the network.
5. In an environment where the Network is highly available an isolation event is extremely unlikely  as such, where the three (3) isolation addresses cannot be contacted, it is unlikely the network can be restored in a timely manner OR the host has suffered multiple concurrent failures (eg: Multiple Network Cards) and performing a controlled shutdown helps ensure when the network is recovered, the VMs are brought back up in a consistent state, OR in the event the isolation impacts only a subset of ESXi hosts in the cluster, the VM/s can be recovered by HA and resume normal operations.

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address

Implications

1. In the event the host cannot reach any of the isolation addresses, virtual machines will be Shutdown
2.  Using “Shutdown” as opposed to “Power off” ensures a graceful shutdown of the guest operating system, however this will delay the HA restart of the VM for up to 5 mins (300 seconds) if VMware Tools is unable to do a controlled shutdown, in which case after 300 seconds a “Power Off” will be executed.
3. In the unlikely event of network instability, VMs may be Shutdown prematurely.

CloudXClogo

 

 

Example Architectural Decision – vSphere configuration for handling APD/PDL scenarios

Problem Statement

What is the best way to configure the vSphere environment to handle All Paths Down (APD) and Permanent Device Loss (PDL) situations where the environment uses Active/Active (IBM SVC) storage with FC connectivity via a dedicated highly available Storage Area Network (SAN) fabric?

Requirements

1. Ensure in the event of storage issues the impact to the vSphere environment is minimized.
2. Where possible have the environment automatically respond in the event of storage problems

Assumptions

1. vSphere 5.1 or later
2. The Storage Area Network (SAN)  fabric is highly available (>99.999% availability)
3. All storage is FC (block) based via an Active/Active Disk array (IBM SVC disk system)
4. All ESXi hosts have storage connectivity via multiple HBAs
5. All ESXi hosts are connected to two (2) physically separate FC switches
6. The Path Selection Plugin (PSP) being used is “VMW_PSP_RR” (Round Robin)

Constraints

1. None

Motivation

1. Minimize impact of APD and PDL situations

Architectural Decision

Configure the following advanced settings

Set “Misc.APDHandlingEnable” to 1 (0 is default which is Disabled)
Set “Misc.APDTimeout” to 20 (140 seconds is default)

Set “disk.terminateVMOnPDLDefault” to 1 (Enabled)
Set “das.maskCleanShutdownEnabled” to 1 (Enabled)

Justification

1. The storage array (IBM SVC) operates in an Active/Active manor where the Path Selection Plugin (PSP) is either “VMW_PSP_RR” (Round Robin), “VMW_PSP_MRU” (Most Recently Used) OR “VMW_PSP_FIXED_AP” (Note: Now included in VMW_PSP_FIXED in vSphere 5.1), in the event of one or more path failures, the PSP will handle this event and use a working path. Where an APD situation occurs in a highly available SAN fabric it is likely the issue is a catastrophic failure and it is ideal to terminate I/O as soon as possible. As such lowering the “Misc.APDTimeout” to 20 (minimum) allows for a short outage but does not allow the VM to continue attempting I/O where it cannot be committed to disk.

2. After 20 seconds, any I/O from the VMs will be “fast-failed” with a status of “No_Connect” to prevent “hostd” worker threads being exhausted and causing the “hostd” service to become hung thus increasing resiliency at the ESXi layer.

3. In the event not all hosts in the cluster are impacted by the PDL, HA can detect the PDL on one (or more) hosts and restart the virtual machines on one of the hosts in the cluster which do not have the PDL state on the datastore/s

  • 4. Having “disk.terminateVMOnPDLDefault” enabled , ensures VMs are shutdown in a PDL event
  • 5.

  • The “das.maskCleanShutdownEnabled” setting allows VMs shutdown as a result of a PDL to be automatically restarted by HA

5. Setting the Misc.APDTimeout to “20” does not impact the storage connectivity even in the event of a single SVC cluster node failing as all Storage is Active on all SVC cluster nodes. Note: Half the paths would be lost in the event of a failed SVC cluster node but this does not constitute an APD situation.

Alternatives

1. Leave “Misc.APDHandlingEnable” at 0 (default)
2. Leave “Misc.APDTimeout” at 140 (default) OR set a higher or lower value (20 Min / 99999 Max)
3. Set “das.maskCleanShutdownEnabled” to Disabled
4. Set “disk.terminateVMOnPDLDefault” to 0 (Disabled)
5. Various combinations of the above

Implications

1. After 20 seconds, any I/O from the VMs will be “fast-failed” with a status of “No_Connect”., in the unlikely event of an outage lasting >20 seconds manual intervention will be required.
2. In the event of APD situation, Virtual machines will not be restarted by HA even where other ESXi hosts are not impacted by the APD situation
3. Due to the nature of an APD situation, there is no clean way to recover. Once the issue is resolved at the SAN fabric or disk system layer, ESXi hosts may need to be rebooted.

Related Articles

1. Advanced Configuration options for VMware High Availability in vSphere 5.0 and 5.1 (2033250)

CloudXClogo

 

 

Example Architectural Decision – VMware HA – Percentage of Cluster resources reserved for HA

Problem Statement

The decision has been made to use “Percentage of cluster resources reserved for HA” admission control setting, and use Strict admission control to ensure the N+1 minimum redundancy level is maintained. However, as most virtual machines do not use  “Reservations” for CPU and/or Memory, the default reservation is only 32Mhz and 0MB+overhead for a virtual machine. In the event of a failure, this level of resources is unlikely to provide sufficient compute to operate production workloads. How can the environment be configured to ensure a minimum level of performance is guaranteed in the event of one or more host failures?

Requirements

1. All Clusters have a minimum requirement of N+1 redundancy
2. In the event of a host failure, a minimum level of performance must be guaranteed

Assumptions

1. vSphere 5.0 or later (Note: This is Significant as default reservation dropped from 256Mhz to 32Mhz, RAM remained at 0MB + overhead)

2. Percentage of Cluster resources reserved for HA is used and set to a value as per Example Architectural Decision – High Availability Admission Control

3. Strict admission control is enabled

4. Target over commitment Ratios are <=4:1 vCPU / Physical Cores | <=1.5 : 1 vRAM / Physical RAM

5. Physical CPU Core speed is >=2.0Ghz

6. Virtual machines sizes in the cluster will vary

7. A limited number of mission critical virtual machines may be set with reservations

8. Average VM size uses >2GB RAM

9. Clusters compute resources will be utilized at >=50%

Constraints

1. Ensuring all compute requirements are provided to Virtual machines during BAU

Motivation

1. Meet/Exceed availability requirements
2. Minimize complexity
3. Ensure the target availability and performance is maintained without significantly compromising  over commitment ratios

Architectural Decision

Ensure all clusters remain configured with the HA admission control setting use
“Enable – Do not power on virtual machines that violate availability constraints”

and

Use “Percentage of Cluster resources reserved for HA” for the admission control policy with the percentage value based on the following Architectural Decision – High Availability Admission Control

Configure the following HA Advanced Settings

1. “das.vmMemoryMinMB” with a value of “1024″
2. “das.vmCpuMinMHz” with a value of “512”

Justification

1. Enabling admission control is critical to ensure the required level of availability.
2. The “Percentage of cluster resources reserved for HA” setting allows a suitable percentage value of cluster resources to reserved depending on the size of each cluster to maintain N+1
3.The potentially inefficient slot size calculation used with “Host Failures cluster tolerates” does not suit clusters where virtual machines sizes vary and/or where some mission Critical VMs require reservations

  • 4.
  • Using advanced settings “das.vmCpuMinMHz” & “das.vmMemoryMinMB” allows a minimum level of performance (per VM) to be guaranteed in the event of one or more host failures
  • 5.
  • Advanced settings have been configured to ensure the target over commit ratios are still achieved while ensuring a minimum level of resources in a the event of a host failure
  • 6.
  • Maintains an acceptable minimum level of performance in the event of a host failure without requiring the administrative overhead of setting and maintaining “reservations” at the Virtual machine level
  • 7.
  • Where no reservations are used, and advanced settings not configured, the default reservation would be 32Mhz and 0MB+ memory overhead is used. This would likely result in degraded performance in the event a host failure occurs.

Alternatives

1. Use “Specify a fail over host” and have one or more hosts specified
2. “Host Failures cluster tolerates” and set it to appropriate value depending on hosts per cluster without using advanced settings
3.Use higher Percentage values
4. Use Higher / Lower values for “das.vmMemoryMinMB” and “das.vmCpuMinMHz”
5. Set Virtual machine level reservations on all VMs

Implications

1. The “das.vmCpuMinMHz” advanced setting applies on a per VM basis, not a per vCPU basis, so VMs with multiple vCPUs will still only be guarenteed 512Mhz in a HA event

2. This will reduce the number of virtual machines that can be powered on within the cluster (in order to enforce the HA requirements)

CloudXClogo