vForum 2012 Sydney – TechTalk – VMware View 5.1 Desktop Deployment Solutions

At this years vForum in Sydney, I presented a TechTalk on VMware View 5.1 Desktop Deployment Solutions.

Below is the link to the recording, appologies the audio is not great as the TechTalk was done in the vForum lounge.

Josh Odgers – vForum 2012 TechTalk – VMware View 5.1 Desktop Deployment Solutions

As the video is not very clear I have provided a copy of the presentation below.

vForum VMware View Provisioning Options V0.

Example Architectural Decision – Host Isolation Response for IP Storage

Problem Statement

What are the most suitable HA / host isolation response when using IP based storage (In this case, Netapp HA Pair in 7-mode) when the IP storage runs over physically separate network cards and switches to ESXi management?

Assumptions

1. vSphere 5.0 or greater (To enable use of Datastore Heartbearting)
2. vFiler1 & vFiler2 reside on different physical Netapp Controllers (within the same HA Pair in 7-mode)
3. Virtual Machine guest operating systems with an I/O timeout of 190 seconds to allow for a Controller fail-over (Maximum 180 seconds)

Motivation

1. Minimize the chance of a false positive isolation response
2.Ensure in the event the storage is unavailable that virtual machines are promptly shutdown to minimize impact on the applications/data.

Architectural Decision

Turn off the default isolation address and configure the below specified isolation addresses, which check connectivity to multiple Netapp vFilers (IP storage) on the vFiler management VLAN and the IP storage interface.

Utilize Datastore heartbeating, checking multiple datastores hosted across both Netapp controllers (in HA Pair) to confirm the datastores themselves are accessible.

Services VLANs
das.isolationaddress1 : vFiler1 Mgmt Interface 192.168.1.10
das.isolationaddress2 : vFiler2 Mgmt Interface 192.168.2.10

IP Storage VLANs
das.isolationaddress3 : vFiler1 vIF 192.168.10.10
das.isolationaddress4 : vFiler2 vIF 192.168.20.10

Configure Datastore Heartbeating with “Select any of the clusters datastores taking into account my preference” and select the following datastores

  • One datastore from vFiler1 (Preference)
  • One datastore from vFiler2 (Preference)
  • A second datastore from vFiler1
  • A second datastore from vFiler2

Configure Host Isolation Response to: Power off.

Justification

1. The ESXi Management traffic is running on a standard vSwitch with 2 x 1GB connections which connect to different physical switches to the IP storage (and Data) traffic (which runs over 10GB connections). Using the ESXi management gateway (default isolation address) to deter main isolation is not suitable as the management network can be offline without impacting the IP storage or data networks. This situation could lead to false positives isolation responses.
2. The isolation addresses chosen test both data and IP storage connectivity over the converged 10Gb network
3. In the event the four isolation addresses (Netapp vFilers on the Services and IP storage interfaces) cannot be reached by ICMP, Datastore heartbeating will be used to confirm if the specified datastores (hosted on separate physical Netapp controllers) are accessible or not before any isolation action will be taken.
4. In the event the two storage controllers do not respond to ICMP on either the Services or IP storage interfaces, and both the specified datastores are inaccessible, it is likely there has been a catastrophic failure in the environment, either to the network, or the storage controllers themselves, in which case the safest option is to shutdown the VMs.
5. In the event the isolation response is triggered and the isolation does not impact all hosts within the cluster, the VM will be restarted by HA onto a surviving host.

Implications

1. In the event the host cannot reach any of the isolation addresses, and datastore heartbeating cannot access the specified datastores, virtual machines will be powered off.

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address

For more details, refer to my post “VMware HA and IP Storage

Example Architectural Decision – DRS Automation Level

Problem Statement

What is the most suitable DRS automation level and migration threshold for a vSphere cluster running an IaaS offering with a self service portal w/ unpredictable workloads?

Assumptions

1. Workload types and size are unpredictable in a IaaS environment, workloads may vary greatly and without notice
2. The solution needs to be as automated as possible without introducing significant risk

Motivation

1. Prevent unnecessary vMotion migrations which will impact host & cluster performance
2.Ensure the cluster standard deviation is minimal
3. Reduce administrative overhead of reviewing and approving DRS recommendations

Alternatives

1.Use Fully automated and Migration threshold 1 – Apply priority 1 recommendations
2.Use Fully automated and Migration threshold 2- Apply priority 1 & 2 recommendations
3. Use Fully automated and Migration threshold 4- Apply priority 1,2,3 and 4 recommendations
4.Use Fully automated and Migration threshold 5- Apply priority 1,2,3,4 & 5 recommendations
5. Set DRS to manual and have a VMware administrator assess and apply recommendations

Justification

1. Prevent excessive vMotion migrations that do not provide significant benefit to cluster balance as the vMotion itself will use cluster and network resources
2. Ensure cluster remains in a reasonably load balanced state without resource being wasted on load balancing for minimal improvement
3. DRS is a low risk, proven technology which has been used in large production environments for many years
4. Setting DRS to manual would be a significant administrative overhead and introduce additional risk from human error
5. Setting a more aggressive DRS migration threshold would put an additional load on the cluster which will likely not result in significantly better balance

Architectural Decision

Use DRS in Fully Automated mode with setting “3” – Apply priority 1,2 and 3 recommendations

Implications

1. DRS will not move workloads via vMotion where only a moderate improvement to the cluster will be achieved
2. At times, including after performing updates (via VUM) of ESXi hosts the cluster may appear to be unevenly balanced as DRS may calculate minimal benefit from migrations. Setting DRS to “Use Fully automated and Migration threshold 5” for a short period of time following maintenance should result in a more evenly balanced cluster.