Example Architectural Decision – ESXi Host Hardware Sizing (Example 1)

Posted on June 29, 2013 by Josh Odgers

Problem Statement

What is the most suitable hardware specifications for this environments ESXi hosts?

Requirements

1. Support Virtual Machines of up to 16 vCPUs and 256GB RAM
2. Achieve up to 400% CPU overcommitment
3. Achieve up to 150% RAM overcommitment
4. Ensure cluster performance is both consistent & maximized
5. Support IP based storage (NFS & iSCSI)
6. The average VM size is 1vCPU / 4GB RAM
7. Cluster must support approx 1000 average size Virtual machines day 1
8. The solution should be scalable beyond 1000 VMs (Future-Proofing)
9. N+2 redundancy

Assumptions

1. vSphere 5.0 or later
2. vSphere Enterprise Plus licensing (to support Network I/O Control)
3. VMs range from Business Critical Application (BCAs) to non critical servers
4. Software licensing for applications being hosted in the environment are based on per vCPU OR per host where DRS “Must” rules can be used to isolate VMs to licensed ESXi hosts

Constraints

1. None

Motivation

1. Create a Scalable solution
2. Ensure high performance
3. Minimize HA overhead
4. Maximize flexibility

Architectural Decision

Use Two Socket Servers w/ >= 8 cores per socket with HT support (16 physical cores / 32 logical cores) , 256GB Ram , 2 x 10GB NICs

Justification

1. Two socket 8 core (or greater) CPUs with Hyper threading will provide flexibility for CPU scheduling of large numbers of diverse (vCPU sized) VMs to minimize CPU Ready (contention)

2. Using Two Socket servers of the proposed specification will support the required 1000 average sized VMs with 18 hosts with 11% reserved for HA to meet the required N+2 redundancy.

3. A cluster size of 18 hosts will deliver excellent cluster (DRS) efficiency / flexibility with minimal overhead for HA (Only 11%) thus ensuring cluster performance is both consistent & maximized.

4. The cluster can be expanded with up to 14 more hosts (to the 32 host cluster limit) in the event the average VM size is greater than anticipated or the customer experiences growth

5. Having 2 x 10GB connections should comfortably support the IP Storage / vMotion / FT and network data with minimal possibility of contention. In the event of contention Network I/O Control will be configured to minimize any impact (see Example VMware vNetworking Design w/ 2 x 10GB NICs)

6. RAM is one of the most common bottlenecks in a virtual environment, with 16 physical cores and 256GB RAM this equates to 16GB of RAM per physical core. For the average sized VM (1vCPU / 4GB RAM) this meets the CPU overcommitment target (up to 400%) with no RAM overcommitment to minimize the chance of RAM becoming the bottleneck

7. In the event of a host failure, the number of Virtual machines impacted will be up to 64 (based on the assumed average size VM) which is minimal when compared to a Four Socket ESXi host which would see 128 VMs impacted by a single host outage

8. If using Four socket ESXi hosts the cluster size would be approx 10 hosts and would require 20% of cluster resources would have to be reserved for HA to meet the N+2 redundancy requirement. This cluster size is less efficient from a DRS perspective and the HA overhead would equate to higher CapEx and as a result lower the ROI

9. The solution supports Virtual machines of up to 16 vCPUs and 256GB RAM although this size VM would be discouraged in favour of a scale out approach (where possible)

10. The cluster aligns with a virtualization friendly “Scale out” methodology

11. Using smaller hosts (either single socket, or less cores per socket) would not meet the requirement to support supports Virtual machines of up to 16 vCPUs and 256GB RAM , would likely require multiple clusters and require additional 10GB and 1GB cabling as compared to the Two Socket configuration

12. The two socket configuration allows the cluster to be scaled (expanded) at a very granular level (if required) to reduce CapEx expenditure and minimize waste/unused cluster capacity by adding larger hosts

13. Enabling features such as Distributed Power Management (DPM) are more attractive and lower risk for larger clusters and may result in lower environmental costs (ie: Power / Cooling)

Alternatives

1. Use Four Socket Servers w/ >= 8 cores per socket , 512GB Ram , 4 x 10GB NICs
2. Use Single Socket Servers w/ >= 8 cores , 128GB Ram , 2 x 10GB NICs
3. Use Two Socket Servers w/ >= 8 cores , 512GB Ram , 2 x 10GB NICs
4. Use Two Socket Servers w/ >= 8 cores , 384GB Ram , 2 x 10GB NICs
5. Have two clusters of 9 hosts with the recommended hardware specifications

Implications

1. Additional IP addresses for ESXi Management, vMotion, FT & Out of band management will be required as compared to a solution using larger hosts

2. Additional out of band management cabling will be required as compared to a solution using larger hosts

Related Articles

1. Example Architectural Decision – Network I/O Control for ESXi Host using IP Storage (4 x 10 GB NICs)

2. Example VMware vNetworking Design w/ 2 x 10GB NICs

3. Network I/O Control Shares/Limits for ESXi Host using IP Storage

4. VMware Clusters – Scale up for Scale out?

5. Jumbo Frames for IP Storage (Do not use Jumbo Frames)

6. Jumbo Frames for IP Storage (Use Jumbo Frames)

Example Architectural Decision – Single Sign On Configuration for Single Site w/ Multiple vCenter Servers

Posted on June 24, 2013 by Josh Odgers

Problem Statement

What is the most suitable deployment mode for vCenter Single-Sign On (SSO) in an environment where there is a single physical datacenter with multiple vCenter servers?

Requirements

1. The solution must be a fully supported configuration
2. Meet/Exceed RTO of 4 hours
3. Support Single Pane of glass management
4. Ability to scale for future vCenters and/or datacenters

Assumptions

1. All vCenter instances can access the same Authentication source (Active Directory or OpenLDAP)

2. The average number of authentications per second for each SSO instance is <30 (Configuration Maximum)

Constraints

1. vCenter servers reside in different network security zones within the datacenter

Motivation

1. Future proof the environment

Architectural Decision

1. Use “Multi-site” SSO deployment mode

2. Use one SSO instance per vCenter

3. Each SSO instance will reside with the vCenter on a Windows 2008 x64 R2 virtual machine in a vSphere cluster with HA enabled

4. Each SSO instance will use the bundled SQL database

5. (Optional) For greater availability, vCenter Heartbeat can be used to protect each SSO instance along with vCenter and the bundled SSO database

6. The Virtual Machine hosting vCenter/SSO will be 2vCPU and 10GB RAM to support vCenter/SSO/Inventory Service and an additional 2GB RAM to support the bundled SSO Database

7. Using the bundled SSO database ensures only a single vCenter Heartbeat deployment is required to protect each vCenter/SSO instance and reduce Windows licensing

Justification

1. To simplify the maintenance/upgrade process for vCenter/SSO as different versions of vCenter cannot co-exist with the same SSO instance

2. If “High Availability” mode is used it would prevent single pane of glass management

3. “High Availability” mode currently requires an SSL load balancer to be configured as well as manual intervention which can be complicated and problematic to implement and support

4. “Basic” mode prevents the use of Linked Mode which will prevent the management of the environment being single pane of glass

5. Where vCenter servers reside in different network security zones, Using Multi-site mode allows each SSO instance to use authentication sources that are as logically close as possible while supporting single pane of glass management. This should provide faster access to authentication services as each SSO instance is configured with Active Directory servers located in the same or logically closest network security zone/s.

6. If one instance SSO goes offline for any reason, it will only impact a single vCenter server. It will not prevent authentication to the other vCenter servers.

7. Reduce the licensing costs for Microsoft Windows 2008 by combining SSO and vCenter roles onto a single OS

Alternatives

1. Use “Basic” Mode, resulting in a standalone version of SSO for each vCenter server with no single pane of glass management

2. Use “High Availability” mode per vCenter

3. Use a shared “High Availability” mode for all vCenters in the datacenter

4. In any SSO configuration, Host the SSO database (per vCenter) on a Oracle OR SQL Server

5. Run SSO on a dedicated Windows 2008 instance with or without the SSO database locally

6. Run a single SSO instance in “Multi-Site” mode , use vCenter Heartbeat to protect SSO (including the database) and share the SSO instance with all vCenters

Implications

1. Where SSO is not protected by vCenter Heartbeat (optional), SSO for each vCenter is a Single point of failure where authentication to the affected vCenter will fail

2. “Multi-Site” mode requires the install-able version of SSO, which is Windows Only which prevents the use of the vCenter Server Appliance (VCSA) as it only supports basic mode.

Related Articles

1. vSphere 5.1 Single Sign On (SSO) deployment mode across Active/Active Datacenters

2. vSphere 5.1 Single Sign On (SSO) Architectural Decision Flowchart

3. Disabling Single Sign On – Dont Do It! – Michael Webster (VCDX#66) @vcdxnz001

Example Architectural Decision – Datastore Heartbeats for Clusters protected by SRM

Posted on April 4, 2013 by Josh Odgers

Problem Statement

To enhance the isolation detection abilities of vSphere to minimize the chance of false positive isolation responses Datastore Heartbeats will be used. What is the most suitable configuration of Datastore Heartbeats for an environment using SRM?

Requirements

1. SRM solution must not be impacted

2. Maximum vSphere environment availability

Assumptions

1. Site Recovery Manager 5.1 protects virtual machines in the cluster/s

2. Appropriate isolation address/es have been configured OR the default isolation address is suitable

3. As all storage is presented via Active/Active storage controllers

4. There are some datastore which are not replicated

5. Isolation response is set to “Shutdown”

Constraints

1. None

Motivation

1. Minimize the chance of a false positive isolation event

2. In the event of isolation, automate the recovery of VMs

Architectural Decision

Use Datastore Heartbeats to enhance the isolation detection capabilities of vSphere.

For each cluster where SRM is used, Configure Datastore Heartbeating to Manually select two non replicated datastores per cluster as the heartbeat datastores

Justification

1. Datastore heartbeating frequently writes to the datastore selected for heartbeating so in the event the network is down, isolation, partition or failure can be properly determained. As a result, during a SRM recovery, datastores need to be un-mounted from the failed site and the Datastore heartbeating may cause one or more datastores to fail to unmount due to I/O on the datastore

2. Datastores failing to un-mount will cause one or more of the SRM recovery steps to report as failed, selecting non replicated datastores prevents this impacting SRM

3. The environment benefits from increases resiliency as a result of datastore heartbeats being used

4. There is no negative impact to the SRM solution

Implications

1. Each cluster will need to have one or more non replicated datastores if Datastore Heartbeating is to be used

2. Additional configuration required to manually select non replicated datastores for heartbeating

Alternatives

1. Do not use Datastore heartbeating

2. Use Datastore Heartbeats and have datastores automatically selected

Relates Articles

1. Example Architectural Decision – Host Isolation Response for FC Based storage

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: ha

Example Architectural Decision – ESXi Host Hardware Sizing (Example 1)

Example Architectural Decision – Single Sign On Configuration for Single Site w/ Multiple vCenter Servers

Example Architectural Decision – Datastore Heartbeats for Clusters protected by SRM

Share this:

Share this:

Share this: