Transparent Page Sharing (TPS) Example Architectural Decisions Register

The following is a register of all Example Architectural Decisions related to Transparent Page Sharing on VMware ESXi following the announcement from VMware that TPS will be disabled by default in future patches and versions.

See The Impact of Transparent Page Sharing (TPS) being disabled by default for more information.

The goal of this series is to give the pros and cons for multiple options for the configuration of TPS for a wide range of virtual workloads from VDI, to Server, Business Critical Apps , Test/Dev and QA/Pre-Production.

Business Critical Applications (vBCA) :

1. Transparent Page Sharing (TPS) Configuration for Virtualized Business Critical Applications (vBCA)

Mixed Server Workloads:

1. Transparent Page Sharing (TPS) Configuration for Production Servers (1 of 2)

2. Transparent Page Sharing (TPS) Configuration for Production Servers (2 of 2) – Coming Soon!

Virtual Desktop (VDI) Environments:

1. Transparent Page Sharing (TPS) Configuration for VDI (1 of 2)

2. Transparent Page Sharing (TPS) Configuration for VDI (2 of 2)

Testing & Development:

1. Transparent Page Sharing (TPS) Configuration for Test/Dev Servers (1 of 2) – Coming Soon!

2. Transparent Page Sharing (TPS) Configuration for Test/Dev Servers (2 of 2) – Coming Soon!

QA / Pre-Production:

1. Transparent Page Sharing (TPS) Configuration for QA / Pre-Production Servers

Related Articles:

1. Example Architectural Decision Register

2. The Impact of Transparent Page Sharing (TPS) being disabled by default – @josh_odgers (VCDX#90)

3. Future direction of disabling TPS by default and its impact on capacity planning – @FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for Virtualized Business Critical Applications (vBCA)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for an environment running Virtualized Business Critical Applications?

Assumptions

1. TPS is disabled by default.
2. Storage is expensive.
3. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
4. vSphere 5.5 or earlier

Requirements

1. Applications must meet strict Service Level Agreements (SLAs)
2. The environment must deliver high consistent performance
3. Minimize the cost of shared storage

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure
3. Meet/Exceed SLAs

Architectural Decision

Leave TPS disabled (default) and leave Large Memory pages enabled (default).

Justification

1. Setting 100% memory reservations ensures consistent performance by eliminating the possibility of swapping.
2. The 100% memory reservation also eliminates the capacity usage by the vswap file which saves space on the shared storage as well as reducing the impact on the storage in the event of swapping.
3. RAM is cheaper than Tier 1 storage (which is recommended for vSwap storage to ensure minimal performance impact during swapping) so the increased cost of memory in the hosts is easily offset by the saving in Tier 1 shared storage.
4. Simplicity. Leaving default settings is advantageous from both an architectural and operational perspective.  Example: ESXi Patching can cause settings to revert to default which could negate TPS savings and put a sudden high demand on storage where TPS savings may be expected.
5. TPS savings for vBCA workloads is typically much less than with mixed server or desktop workloads due to the memory requirements for vBCAs being typically much higher.
6. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM leading to more consistent performance under a wider range of circumstances.
7. Remove the real or perceived security risk of sensitive information being gathered from other VMs using TPS as described in VMware KB 2080735
8. Many business critical applications such as SAP and MS SQL use in Memory caching, overcommitment of memory could lead to serious degradation in performance.
9. Many business critical applications such as MS SQL claim by default all available RAM in the Virtual Machine, as such, TPS savings would be minimal to none for SQL servers.

Implications

1. Using 100% memory reservations requires ESXi hosts and the cluster be sized at a 1:1 ratio of vRAM to pRAM (Physical RAM) and should include N+1 so a host failure can be tolerated.
2. Potential Increased RAM costs
3. No memory overcommitment can be achieved
4. Potential for lower CPU utilization / overcommitment as RAM may become the first constraint.

Alternatives

1. Use 50% reservation and enable TPS
2. Use no reservation, Enable TPS and disable large pages

Summary

I highly recommend sizing vBCA’s with no memory overcommitment in mind and setting 100% memory reservations for these VMs. By definition, these are “Business Critical” and the requirement of consistent high performance far outweigh potential memory savings!

In the event vBCAs are running in a cluster with no vBCAs such as Server or even Test/Dev where TPS may be enabled or desired, TPS can be enabled along with disabling large pages but vBCAs should always have 100% memory reservation.

Related Articles:

1. Cloud XC Transparent Page Sharing Example Architectural Decisions

2. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (2 of 2)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for a Virtual Desktop environment?

Assumptions

1. TPS is disabled by default
2. Storage is expensive
3. Two Socket ESXi Hosts have been chosen to align with a scale out methodology.
4. Average VDI user is Task Worker with 1vCPU and 2GB Ram.
5. Memory is the first compute level constraint.
6. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
7. vSphere 5.5 or earlier

Requirements

1. VDI environment costs must be minimized

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure

Architectural Decision

Enable TPS and disable Large Memory pages

Justification

1. Disabling Large pages is essential to maximizing the benefits of TPS
2. Not disabling large pages would likely result in minimal TPS savings
3. With Kiosk and Task worker VDI profiles, the percentage of memory which is likely to be shared is higher than for Power users.
4. Existing shared storage has plenty of spare Tier 1 capacity to vSwap files

Implications

1. Sufficient capacity for VM swap files must be catered for.
2. VDI & Storage performance may be impacted significantly in the event of memory contention.
3. Decreased memory costs may result in increased storage costs.
4. During patching, and operational verification that non default settings have not been reverted by the patching of ESXi.
5. Additional CPU overhead on ESXi from enabling TPS.
6. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM,
6. HA admission control (when configured to Percentage of Cluster resources reserved for HA) will only calculate fail-over capacity based on 0MB + VM overhead for each VM which can lead to significantly degraded performance in a HA event.
7. Higher core count (and higher cost) CPUs may be desired to drive overcommitment ratios as RAM will be less likely to be a point of contention.

Alternatives

1. Use 100% memory reservation and leave TPS disabled (default)
2. Use 50% memory reservation and Enable TPS and disable large pages

Related Articles:

1. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

2. Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (1 of 2)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl(VCDX#104)