Problem Statement
In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for a Virtual Desktop environment?
Assumptions
1. TPS is disabled by default
2. Storage is expensive
3. Two Socket ESXi Hosts have been chosen to align with a scale out methodology.
4. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
5. vSphere 5.5 or earlier
Requirements
1. VDI environment must deliver consistent performance
2. VDI environment supports a high percentage of Power Users
Motivation
1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure
Architectural Decision
Leave TPS disabled (default) and apply 100% Memory Reservations to VDI VMs and/or Golden Master Image.
Justification
1. Setting 100% memory reservations ensures consistent performance by eliminating the possibility of swapping.
2. The 100% memory reservation also eliminates the capacity usage by the vswap file which saves space on the shared storage as well as reducing the impact on the storage in the event of swapping.
3. RAM is cheaper than Tier 1 storage (which is recommended for vSwap storage to ensure minimal performance impact during swapping) so the increased cost of memory in the hosts is easily offset by the saving in shared storage.
4. Simplicity. Leaving default settings is advantageous from both an architectural and operational perspective. Example: ESXi Patching can cause settings to revert to default which could negate TPS savings and put a sudden high demand on storage where TPS savings are expected.
5. TPS savings for desktops can be significant, however with a high percentage of Power Users with >=4GB desktops and 2vCPUs, the TPS savings are lower compared to Kiosk or Task users typically with 1-2GB per desktop.
6. The decision has been made to use 2 socket ESXi hosts and scale out so the TPS savings per host compared to a 4 socket server with double the RAM will be lower.
7. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM leading to more consistent performance under a wider range of circumstances.
8. Lower core count (and lower cost) CPUs will likely be viable as RAM will likely be the first constraint for further consolidation.
Implications
1. Using 100% memory reservations requires ESXi hosts and the cluster be sized at a 1:1 ratio of vRAM to pRAM (Physical RAM) and should include N+1 so a host failure can be tolerated.
2. Increased RAM costs
3. No memory overcommitment can be achieved
4. Potential for lower CPU utilization / overcommitment as RAM may become the first constraint.
Alternatives
1. Use 50% reservation and enable TPS
2. Use no reservation, Enable TPS and disable large pages
Related Articles:
1. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)
2. Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (2 of 2)
3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)
4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl(VCDX#104)