Example Architectural Decision – Network I/O Control Shares/Limits for ESXi Host using IP Storage

Problem Statement

With 10GB connections becoming the norm, ESXi hosts will generally have less physical connections than in the past where 1Gb was generally used, but more bandwidth per connection (and in total) than a host with 1GB NICs.

In this case, the hosts have only to 2 x 10GB NICs and the design needs to cater for all traffic (including IP storage) for the ESXi hosts.

The design needs to ensure all types of traffic have sufficient burst and sustained bandwidth for all traffic types without significantly negatively impacting other types of traffic.

How can this be achieved?

Assumptions

1. No additional Network cards (1gb or 10gb) can be supported
2. vSphere 5.1
3. Multi-NIC vMotion is desired

Constraints

1. Two (2) x 10GB NICs

Motivation

1. Ensure IP Storage (NFS) performance is optimal
2.Ensure vMotion activities (including a host entering maintenance mode) can be performed in a timely manner without impact to IP Storage or Fault Tolerance
3. Fault tolerance is a latency-sensitive traffic flow, so it is recommended to always set the corresponding resource-pool shares to a reasonably high relative value in the case of custom shares.
4. Proactively address potential contention due to limited physical network interfaces

Architectural Decision

Use one dvSwitch to support all VMKernel and virtual machine network traffic.

Enable Network I/O control, and configure NFS and/or iSCSI traffic with a share value of 100 and ESXi Management , vMotion & FT which will have share value of 25. Virtual Machine traffic will have a share value of 50.

Configure the two (2) VMKernel’s for IP Storage on dvSwitch and set to be Active on one 10GB interface and Standby on the second.

Configure two VMKernel interfaces for vMotion on the dvSwitch and set the first as Active on one interface and standby on the second.

A single VMKernel will be configured for Fault tolerance and will be configured as Active on one interface and standby on the second.

For ESXi Management, the VMKernel will be configured as Active on the interface where FT is standby and standby on the second interface.

All dvPortGroups for Virtual machine traffic will be active on both interfaces.

Justification

1. The share values were chosen to ensure IP storage traffic is not impacted as this can cause flow on effects for the environments performance. vMotion & FT are considered important, but during periods of contention, should not monopolize or impact IP storage traffic.
2. IP Storage is more critical to ongoing cluster and VM performance than ESXi Management, vMotion or FT
3. IP storage requires higher priority than vMotion which is more of a burst activity and is not as critical to VM performance
4. With a share value of 25,  Fault Tolerance still has ample bandwidth to support the maximum supported FT machines per host of 4 even during periods of contention
5. With a share value of 25, vMotion still has ample bandwidth to support multiple concurrent vMotion’s during contention however performance should not be impacted on a day to day basis. With up to 8 vMotion’s supported as it is configured on a 10GB interface. (Limit of 4 on a 1GB interface) Where no contention exists, vMotion traffic can burst and use a large percentage of both 10GB interfaces to complete vMotion activity as fast as possible
6. With a share value of 25,  ESXi Management still has ample bandwidth to continue normal operations even during periods of contention
7. When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution.
8. With a share value of 50,  Virtual machine traffic still has ample bandwidth and should result in minimal or no impact to VM performance across 10Gb NICs
9. Setting Limits may prevent operations from completing in a timely manner where there is no contention

Implications

1. In the unlikely event of significant and ongoing contention, performance for vMotion may affect the ability to perform the evacuation of a host in a timely manner. This may extend scheduled maintenance windows.
2. VMs protected by FT may be impacted

Alternatives

1. Use a share value  of 50 for IP storage traffic to more evenly share bandwidth during periods of contention. However this may impact VM performance eg: Increased CPU WAIT if the IP storage is not keeping up with the storage demand

Related Posts
1. Example VMware vNetworking Design for IP Storage (4 x 10GB NICs)
2. Example VMware vNetworking Design for IP Storage (2 x 100GB NICs)
3. Frank Denneman (VCDX) – Designing your vMotion Network – Multi-NIC vMotion & NIOC

vCenter Operations for View – Scalable Architecture for a 10,000 user Pod

Recently I was putting together a design for a vCenter Operations for View 1.0.x solution for a customer who has approx 6000 virtual desktops and it got me thinking, what would best way to implement vCenter Operations for View for a 10,000 user “Pod” for a View 5.0 or 5.1 environment?

Before we begin, I would like to clarify this solution is designed to work with a standard View “Pod” design. If your environment does not follow the VMware View Reference Architecture then I do not recommend this architecture. eg: Managing Server and Desktop workloads via the same vCentrer may cause issues for this solution.

Example: It is assumed if a customer is deploying a VMware View solution with greater than 2000 users, that a management cluster will be used, or that the View Management VMs (View Connection Server / View Security Server / View Composer etc) are hosted somewhere other than the View Blocks themselves.

For more details on why using a Management cluster for View Management VMs is preferred , see my post “Example Architectural Decision – Supporting VMware View Infrastructure Servers

The below is the basic concept of the View “Pod” which is made up of five (5) “Blocks”. Each block is a vSphere cluster which supports up to 2000 view users.

view_pod

This above graphic is courtesy of John Dodge in his blog post Demystifying VMware View Large Scale Designs

In summary a 10,000 user View “Pod” is made up of

* Five (5) vCenter Servers
* Five (5) View Composer Servers (Note: Can be installed on the vCenter server)
* Seven (7) View Connection Servers (Brokers)
* Five (5) View “Blocks”

Now we need to confirm what is required to implement vCenter Operations for View.

Lets look at the system requirements for the vC Ops View Adapter server.

vCOPS_ViewAdapter_Sizing

Reference: Pg 11 of vCenter Operations for View Integration Guide

So based on the above, to support a 10,000 user pod we would require a “Monster” VM with 20 vCPUs and 40gb Ram!

MelvinMonsterVM

This would require an ESXi host with at least two physical CPUs w/ 10 cores each and the “Monster” VM would basically monopolize the host, so this doesn’t seem like a viable solution for the vast majority of customers.

Alternatively, if we take a scale out approach then we can use five (5) VMs with 4 cores and 8Gb ram. This sounds perfectly reasonable, and would fit within the majority of vSphere clusters currently deployed.

Next lets look at the requirements for the vCenter Operation Manager vApp.

vCopsMgrSizingvApp

Reference: Pg 13 of the vCenter Operations Manager 5.0 Installation Guide

Here we can see three examples, that support up to 1500, 3000 and 6000 virtual machines.

From these numbers, if we used a single vC Ops manager vApp it would require greater than 8vCPUs each for the UI and Analytics VMs, which could monopolize smaller management ESXi hosts and/or cause CPU scheduling difficulties or reduced consolidation ratios for the management cluster.

So similarly to the View Adapter server, I am proposing a “scale out” approach for the vC OPS Manager vApp.

In this case, I want to comfortably support 5 “Blocks” of 2000 virtual machines, therefore allowing for some head room, the “up to 3,000 virtual machine” solution appears to be the best option.

Therefore we will require five  vCenter Operations Manager vApp deployments for this solution.

It is also important to consider the storage capacity and performance requirements, which are shown below.

vCOPS_sizing

Reference: Pg 13 of the vCenter Operations Manager 5.0 Installation Guide

From a capacity/performance perspective, the solution for 10,000 users needs too be sized to support between

* 15,000 & 30,000 IOPS
* ~6TB & ~12TB

At this stage we have determined we require the following

5 x vCenter Operations for View Adapter Servers

5 x vCenter Operations Manager vApp’s

Next we need to work out how best to configure each View Adapter server.

If you review the vCenter Operations for View Integration Guide on page 19 you will see the below graphic which states “Enter the name of a View connection server in your VDI environment….”

So the question is, which View Connection Server (Broker) should we use?

vCopsViewAdapterSettings

Here I have come up with two Solutions which are overall very similar but have a couple of differences, which are

1. How many connection brokers are used to service user connections
2. What connection broker/s the View Adapter servers connect too.

Lets go over both solutions as well as a potential option 3 which needs further investigation. (I will follow up with another article on Option 3)

Solution 1 : Dedicated Connection Brokers for vCenter Operations for View Adapter Servers

The concept here is there is a total of 7 connection brokers servicing the 10,000 user “Pod”. We remove two (2) of the brokers (in this example, Number 6 and 7) from the round robin on the load balancer and configure the View Adapter servers to use either Connection Broker 6 or 7.

Here is a diagram showing the solution

DoHA vC OPS RA Dedicated Brokers for vCops View V0.1

Note: Even though Connection Brokers 6 & 7 are not included in the round robin to service user connections, they continue to replicate between all other brokers.

Here are the Pros and Cons for Solution 1

PROS

* Traffic from vC Ops for View does not impact the performance of the connection brokers which users connect too

CONS

* Only five (5) connection brokers are available to service user connections (Note: 5 should be sufficient as each broker can support 2000 connections)
* In the event one (or more) of the five connection brokers has an issue user connection times may be impacted

Solution 2 : One to One mappings between View Adapter Servers and Connection Brokers

The concept here is there is a total of 7 connection brokers and 5 View Adapter servers. Each View Adapter server is configured to connect on a one to one basis to a specific connection broker eg: View Adapter Server 1 connects to Connection Broker 1, and so on.

Here is a diagram showing the solution

DoHA vC OPS RA View Adapter Server 1to1 Mappings V0.1

Here are the Pros and Cons for Solution 2

PROS

* All seven (7) connection brokers are available to the load balancer to service user connections
* The performance of all the connection servers (Brokers) should be consistent as they have a fairly equal workload (except for 6 & 7 which dont service View Adapter traffic)

CONS

* Traffic from vC Ops for View may impact the performance of the connection brokers which users are using to connect

Solution 3 – Configuring the View Adapter servers to use the load balancer address

What about configuring the View Adapter servers to use the load balancer address, rather than connecting to a specific connection broker? I am currently investigating this option, and from some discussions with a number of the VMware EUC team, there may be reasons this wont work. I will post an update once I have further investigated and tested this option.

Moving on, Once you decide which of the above two Solutions suit your environment best, the following applies to all three of the above solutions.

For each “Block” (ran by a dedicated vCenter server), one (1) vC OPS for View Adapter server and one (1) vCenter Operations Manager vApp are deployed into the Management cluster.

For a complete 10,000 user “Pod”, this will mean a total of five (5) vC OPS for View Adapter server and five (5) vCenter Operations Manager vApps.

Each vCenter Operations Manager server will be configured on a one to one basis with one vCenter eg: Block 1 vC Ops vApp connects to Block 1 vCenter.

Next we need to ensure each vC Ops for View Adapter server isn’t monitoring all desktop pools in the pod (which it will by default). This is very important otherwise each View Adapter server will be saturated with all pools, and therefore managing  all desktops in the “Pod” (up to 10,000 desktops!). This would cause major performance issues and result in the vC Ops for View environment hitting some hard limits.

To avoid this issue we select the tick box “Specify desktop Pools” (shown below) and enter all pool names (separated by a “,”)  for the “block” that is being monitored by this View Adapter server, similar to the below.

vcops_View_view_settings

Now each of the View Adapter servers configured to only monitor one block, so a maximum of 2000 users.

However, there is a catch, as the pool filter is configured on the View adapter server/s, there is still an ongoing overhead (both compute & network) on both the connection broker/s and the view adapter server/s as the View pods topology is still proceeded by the adapter server before being filtered out (by specifying the desktop pools as in the above screen capture).  It is important to understand this overhead does not impact the vC Ops Manager vApp in any way as it is associated with a vCenter which will only manage up to 2,000 users.

So in addition to the standard components of the 10,000 user View “Pod” discussed earlier, we now have

* Five (5) vC Ops for View Adapter Servers
* Five (5) vCenter Operations Manager vApps

In summary, here are the Pros and Cons of the overall concept.

PROS

1. Avoid’s the requirement for very large (>8vCPU) vC OPS UI / Analysis & View Adapter VMs to support the solution
2. Prevents large VMs from potentially monopolizing management cluster ESXi hosts
3. Prevents an increased CPU scheduling overhead in scheduling large vCPU VMs for the Management ESXi hosts
4. In the event of a View Connection Server (Broker) , View Adapter Server and/or vC OPS Manager vApp failure, only part of the monitoring solution is impacted.
5. Prevents an increased CPU scheduling overhead in scheduling large vCPU VMs for the Management ESXi hosts
6. The solution is scalable and can start from a smaller deployment of <2000 users (with one vC Ops vApp and one View Adapter Server) and easily scale with linear performance even beyond a 10,000 user pod as its based on a repeatable model supporting 2000 users. In short, this solution can scale basically without limit.
7. No single View Connection Server (Broker) is managing all vC OPS for View traffic
8. Increased DRS flexibility/efficiency for the management cluster as smaller VMs are easier for DRS to load balance
9. Flexibility in ESXi host hardware, ie: The ability to have smaller & potentially cheaper (2 socket / 8 core) servers for management

CONS

1. Additional installation / configuration time as five (5) View Adapter servers & five (5) vCenter Operations Manager vApps need to be deployed
2. Increased Microsoft Windows 2008 licensing – although if you can justify Windows 2008 Datacenter edition licenses on the management cluster/s allow unlimited windows VMs as such avoiding this issue.
3. Additional maintenance effort in patching/upgrading five (5) View Adapter servers & five (5) vCenter Operations Manager vApps instead of one.
4. You do not get a single pane of glass for monitoring the entire 10,000 user Pod, you will have five (5) separate vC OPS web interfaces, one per block.

In Conclusion, the above architecture is scalable and allows the deployment of vCenter Operations for View version 1.0.x which avoids a number of potential “gotchas”, some of which may degrade the performance of your View environment. Each View Adapter Server and vC Ops Manager vApp will only service one (1) block of 2000 users. When adding additional Blocks or Pods (new Pods are required for >10000 users) the solution will scale and support 2000 users at a time.

My advise would be to use Solution 1, as the main advantage is as the Adapter server still has to process all topology data before filtering it out, this solution ensures there is no impact on the connection brokers servicing user connections. This means the CPU/Network overhead (discussed earlier) only impacts the connection brokers not servicing users.

Looking forward to the upcoming version of vCops for View (1.5), it will bring increased scalability and is penciled in for late Q1 2013, so the above architecture is really an interim solution until the new product is released.

Note: In VMware View 5.2 (which will also be released first half of 2013) there are some major improvements in scalability which may change the “Block” and “Pod” architecture discussed in this post as well as some improvements to the View agent, again these changes will likely change the vC Ops for View architecture.

I will be following up this article, with a similar post on vC Ops for View 1.5 architecture closer to the release date.

I would like to Thank John Dodge , David Wooten , David Homoki & Tim Whiffen from VMware EUC team, as well as Michael Webster (@vcdxnz001) & Andre Leibovici (@andreleibovici) for there input into this article.

I hope this article has been helpful and I welcome any constructive feedback / comments etc.

Example Architectural Decision – Supporting VMware View Infrastructure Servers

Problem Statement

When designing a VMware View environment, there are numerous management virtual machines which are required to run the environment, including but not limited to Domain Controllers, vCenter , VUM , View Connection Brokers , View Security Servers, View Transfer servers , View Composer. These servers are typically heavily utilized in larger View deployments and in the event of compute or storage contention, would likely impact the performance of the Virtual Desktop Infrastructure, especially where View Composer or virtual desktop power or provisioning operations are frequent.

How can the VDI environment be designed so management servers have a consistent high level of performance and ensure that high consolidation ratios can be achieved for desktops whilst maintaining a consistent end user experience?

Assumptions

1.  One or more VMware View “Blocks”
2. ~2000 Users per Block
3. Using VMware View Linked Clones
4. Target overcommitment for Virtual desktops vCPU is >=6:1 – This is a conservative overcommitment ratio, >10:1 can be achieved
5. Target overcommitment for Virtual desktops vRAM is >=1.5:1 – This is a reasonable overcommitment ratio,  although higher can be achieved
6. vSphere 4.1 or later
7. VMware View 4.5 or later
8. ESXi Hosts are large enough to support >200 users each (eg: At least 2 way / 256GB assuming 1vCPU/1GB RAM VDI VMs)
9. An existing vSphere cluster supporting server workloads is not available or is at or near capacity
10. Antivirus has been optimized for Virtual desktop environments, such as vShield Endpoint to offload AV scanning to the hypervisor

Motivation

1.  Ensure consistent & optimal performance for Virtual desktops and VMware View Infrastructure VMs
2. Achieve the best ROI for the solution

Architectural Decision

Create a three (3) node “Management Cluster” with a scale out approach using 2 Way servers (as opposed to Four way servers like the VMware View Blocks) to ensure lower HA overhead (33% for N+1) and higher DRS efficiency than a two (2) node cluster. Have management virtual machines use different underlying storage, being either dedicated RAID packs or aggregates or for a large environments, storage controllers. Have a vCenter dedicated to running the Management infrastructure.

Justification

1.  The CPU overcommitment ratio for Virtual desktops is generally much higher than for server workloads
2. Server workloads are less tolerant to high CPU overcommitment ratios than virtual desktops
3. CPU contention (a.k.a CPU Ready) will likely have significant impact on infrastructure VMs
4. If Management VMs we’re hosted within the VMware View Blocks, the overcommitment would have to be lower to enable adequate performance, thus reducing the ROI for the solution
5. Server and desktop workloads have very different compute and storage profiles and generally are not good candidates to share the same ESXi host or cluster
6. During VMware View Linked Clone deployments, or maintenance activities such as a “recompose”of one or more Pools, Management VMs such as vCenter and View Composer should have minimal or no compute contention to ensure timely completion of maintenance. This does not fit well in a cluster with >6:1 CPU overcommitment.
7. Having a management cluster minimizes or removes the requirement for complexity/overheads of setting CPU or Memory reservations in an attempt to ensure performance for management VMs competing for compute resources with virtual desktops. (See “Common Mistake – Using CPU reservations to solve CPU ready” for more information)
8. Maximize the efficiency of the CPU scheduler, as the majority of Virtual Desktops should be 1vCPU as compared to management VMs such as vCenter / SQL / Connection brokers which will likely be 2 and 4 vCPU. Scheduling VMs with higher vCPU numbers on an environment with >6:1 vCPU overcommitment is unlikely to result in acceptable performance for the management virtual machines.
9. Having a cluster/s dedicated to desktops will give more flexibility to use features such as Distributed Power Management (DPM) for VMware View Blocks which will help achieve a faster ROI
10. vCenter’s workload with virtual desktops is generally higher (compared to vCenter servers managing server workloads) due to increased frequency of things like power operations and provisioning operations from View Composer. One (1) vCenter should be used per Block, or up to 2000 users.
11. In the event of performance/stability issues in the View Block/s, if the management servers shared the cluster, the ability for vSphere/View administrators to access management servers will likely be impacted, which may delay the troubleshooting process and eventual resolution of the issue/s
12. Having a separate management cluster with dedicated storage (RAID packs/aggregates and/or storage controllers) prevents the IO load of the View Desktops impacting the ability to manage the environment, especially during recompose and provisioning operations.

Implications

1.  Hardware will be required for the Management cluster – Although as the ESXi hosts in View Blocks (as they wont be hosting management workloads) should as a result achieve higher consolidation ratios which should close to if not entirely neutralize the cost of the Management Host Hardware
2. The storage solution will need to provide storage for Management virtual machines which is separate to Virtual desktops
3. The scale out approach for the management cluster may not achieve as higher memory savings form transparent page sharing due to having less virtual machines per host
4. Having an additional cluster is an additional administrative overhead, albeit minimal however this should reduce the risk in the environment leading to lower BAU effort/costs.

Alternatives

1. Run Management VMs in VMware View Blocks (with desktop workloads). – Not recommended
2. Run management VMs in an existing vSphere cluster running server workloads (if available)

A special Thanks to Michael Webster (VCDX#66) for his contribution to this example Architectural decision.