Example Architectural Decision – Advanced Power Management for vSphere Clusters with Business Critical Applications

Problem Statement

In a vSphere environment where Business Critical Applications have been successfully virtualized, should Advanced Power Management be used to help reduce data center costs?

Requirements

1. Fully Supported solution

2. Reduce data center costs where possible

3. Business Critical Application performance must not be significantly degraded

Assumptions

1. Supported Hardware

2. vSphere 5.0 or later

3. Admission Control is enabled with >= N+1 redundancy

Constraints

1. None

Motivation

1. Reduce Datacenter costs where possible with minimal/no impact to performance

Architectural Decision

Configure the BIOS to “OS Controlled”

Set ESXi Advanced Power Management to “Balanced”

Justification

1. Power savings can be realized with almost no impact to performance

2. The performance difference between “High performance” & “Balanced” options is insignificant however Power savings can be achieved reducing cost and environmental impacts

3. In the unlikely event of performance issues as a result of using the “Balanced” option, the BIOS is set to OS Controlled so ESXi can be updated without downtime during troubleshooting

4. Advanced Power Management Options (other than “High Performance” & “Balanced”) have proven to have excellent power savings but at a high cost to performance which is not suitable for Business Critical Applications

5. As HA Admission Control is used to provide >=N+1 redundancy, the ESXi hosts will generally not be fully utilized which will give Advanced Power Management opportunities to conserve power

6. The workloads in the cluster/s run 24/7 however demand is generally higher during business hours and some low demand or idle time exists

7. Even where only a small power saving is realized, if performance is not significantly impacted then a faster ROI can be achieved due to cost savings

Implications

1. Where performance issues exist using “Balanced” a vSphere administrator may need to change Advanced Power Management to “High Performance”

Alternatives

1. Use “High Performance”

2. Use “BIOS Controlled”

3. Do not use Advanced Power Management

4. Use Advanced Power Management in conjunction with DPM

Relates Articles

1. Power Management and Performance in ESXi 5.1 – By Rebecca Grider (@RebeccaGrider)

 AdvancedPowerManagement

 

Example Architectural Decision – Jumbo Frames for IP Storage (Do not use Jumbo Frames)

Problem Statement

When using IP based storage over a converged 10GB network, should Jumbo Frames be used?

Requirements

1. Fully Supported storage

2. Maximum vSphere environment availability

3. Maximize performance where possible

Assumptions

1. Converged 10GB Network which is highly available

2. Two (or more) 10GB connections per ESXi host

Constraints

1. No dedicated network for IP storage traffic

Motivation

1. Simplify the environment

Architectural Decision

Do not use Jumbo Frames

Justification

1. Reduce the complexity in the environment for initial implementation

2. Simplify ongoing support / troubleshooting

3. For a Jumbo Frame to be transmitted without fragmentation, All devices end to end must support and be configured for Jumbo Frames

4. While there can be performance benefits of Jumbo Frames for IP Storage this is not generally seen across the board and depends on I/O types

5. Ensure IP storage packets are not fragmented or dropped by mis-configured devices or devices that do not support Jumbo Frames

6. Storage performance for the virtual environment will generally be constrained by the storage array controllers not the storage area network

7. Ensure packet fragmentation does not occur as all devices support a default MTU of 1500

8. Increasing the MTU will decrease the number of packets required for the same bandwidth but where the bottleneck is throughput (bytes) there will be minimal/no benefit

9. Jumbo Frames will only assist where the network is constrained at an interrupt level

Implications

1. IP Storage may have reduced performance in some circumstances compared to what Jumbo Frames may offer

Alternatives

1. Use Jumbo Frames

Relates Articles

1. Example Architectural Decision – Jumbo Frames for IP Storage (Use Jumbo Frames)

 Contributors

Thanks to Rob McNab (IBM) and Peter McCrystal (IBM) for their input into this example architectural decision.

 

 

Example Architectural Decision – VMware HA – Percentage of Cluster resources reserved for HA

Problem Statement

The decision has been made to use “Percentage of cluster resources reserved for HA” admission control setting, and use Strict admission control to ensure the N+1 minimum redundancy level is maintained. However, as most virtual machines do not use  “Reservations” for CPU and/or Memory, the default reservation is only 32Mhz and 0MB+overhead for a virtual machine. In the event of a failure, this level of resources is unlikely to provide sufficient compute to operate production workloads. How can the environment be configured to ensure a minimum level of performance is guaranteed in the event of one or more host failures?

Requirements

1. All Clusters have a minimum requirement of N+1 redundancy
2. In the event of a host failure, a minimum level of performance must be guaranteed

Assumptions

1. vSphere 5.0 or later (Note: This is Significant as default reservation dropped from 256Mhz to 32Mhz, RAM remained at 0MB + overhead)

2. Percentage of Cluster resources reserved for HA is used and set to a value as per Example Architectural Decision – High Availability Admission Control

3. Strict admission control is enabled

4. Target over commitment Ratios are <=4:1 vCPU / Physical Cores | <=1.5 : 1 vRAM / Physical RAM

5. Physical CPU Core speed is >=2.0Ghz

6. Virtual machines sizes in the cluster will vary

7. A limited number of mission critical virtual machines may be set with reservations

8. Average VM size uses >2GB RAM

9. Clusters compute resources will be utilized at >=50%

Constraints

1. Ensuring all compute requirements are provided to Virtual machines during BAU

Motivation

1. Meet/Exceed availability requirements
2. Minimize complexity
3. Ensure the target availability and performance is maintained without significantly compromising  over commitment ratios

Architectural Decision

Ensure all clusters remain configured with the HA admission control setting use
“Enable – Do not power on virtual machines that violate availability constraints”

and

Use “Percentage of Cluster resources reserved for HA” for the admission control policy with the percentage value based on the following Architectural Decision – High Availability Admission Control

Configure the following HA Advanced Settings

1. “das.vmMemoryMinMB” with a value of “1024″
2. “das.vmCpuMinMHz” with a value of “512”

Justification

1. Enabling admission control is critical to ensure the required level of availability.
2. The “Percentage of cluster resources reserved for HA” setting allows a suitable percentage value of cluster resources to reserved depending on the size of each cluster to maintain N+1
3.The potentially inefficient slot size calculation used with “Host Failures cluster tolerates” does not suit clusters where virtual machines sizes vary and/or where some mission Critical VMs require reservations

  • 4.
  • Using advanced settings “das.vmCpuMinMHz” & “das.vmMemoryMinMB” allows a minimum level of performance (per VM) to be guaranteed in the event of one or more host failures
  • 5.
  • Advanced settings have been configured to ensure the target over commit ratios are still achieved while ensuring a minimum level of resources in a the event of a host failure
  • 6.
  • Maintains an acceptable minimum level of performance in the event of a host failure without requiring the administrative overhead of setting and maintaining “reservations” at the Virtual machine level
  • 7.
  • Where no reservations are used, and advanced settings not configured, the default reservation would be 32Mhz and 0MB+ memory overhead is used. This would likely result in degraded performance in the event a host failure occurs.

Alternatives

1. Use “Specify a fail over host” and have one or more hosts specified
2. “Host Failures cluster tolerates” and set it to appropriate value depending on hosts per cluster without using advanced settings
3.Use higher Percentage values
4. Use Higher / Lower values for “das.vmMemoryMinMB” and “das.vmCpuMinMHz”
5. Set Virtual machine level reservations on all VMs

Implications

1. The “das.vmCpuMinMHz” advanced setting applies on a per VM basis, not a per vCPU basis, so VMs with multiple vCPUs will still only be guarenteed 512Mhz in a HA event

2. This will reduce the number of virtual machines that can be powered on within the cluster (in order to enforce the HA requirements)

CloudXClogo