Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (2 of 2)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for a Virtual Desktop environment?

Assumptions

1. TPS is disabled by default
2. Storage is expensive
3. Two Socket ESXi Hosts have been chosen to align with a scale out methodology.
4. Average VDI user is Task Worker with 1vCPU and 2GB Ram.
5. Memory is the first compute level constraint.
6. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
7. vSphere 5.5 or earlier

Requirements

1. VDI environment costs must be minimized

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure

Architectural Decision

Enable TPS and disable Large Memory pages

Justification

1. Disabling Large pages is essential to maximizing the benefits of TPS
2. Not disabling large pages would likely result in minimal TPS savings
3. With Kiosk and Task worker VDI profiles, the percentage of memory which is likely to be shared is higher than for Power users.
4. Existing shared storage has plenty of spare Tier 1 capacity to vSwap files

Implications

1. Sufficient capacity for VM swap files must be catered for.
2. VDI & Storage performance may be impacted significantly in the event of memory contention.
3. Decreased memory costs may result in increased storage costs.
4. During patching, and operational verification that non default settings have not been reverted by the patching of ESXi.
5. Additional CPU overhead on ESXi from enabling TPS.
6. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM,
6. HA admission control (when configured to Percentage of Cluster resources reserved for HA) will only calculate fail-over capacity based on 0MB + VM overhead for each VM which can lead to significantly degraded performance in a HA event.
7. Higher core count (and higher cost) CPUs may be desired to drive overcommitment ratios as RAM will be less likely to be a point of contention.

Alternatives

1. Use 100% memory reservation and leave TPS disabled (default)
2. Use 50% memory reservation and Enable TPS and disable large pages

Related Articles:

1. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

2. Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (1 of 2)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl(VCDX#104)

Virtual Machine Swap File Location & Capacity Usage on Nutanix

The Location of the Virtual Machine swap file can be critical when deploying vSphere with traditional centralized storage solutions, or legacy solutions which acknowledge “zeros” or “White-space” as the Virtual Machine swap file can be as large as the VMs configured vRAM where Memory Reservations are not used.

The below shows the default configuration.
VMswapFileLocation

If a VM resides on Tier 1 storage for example, and the VM does not have a memory reservation set (or a reservation of less than 100%), the Swap-file will take up valuable Tier 1 storage capacity.

This can be avoided by specifying a Swap-file datastore however this introduces complexity and in the event the Swap-file datastore is on a low tier of storage, performance in the event of swapping will degrade significantly.

Some platforms recommend having different datastores for VM swap files to minimize the overheads on de duplication or replication for environments using SRM as discussed in Example Architectural Decision – Virtual Machine Swap-file location for SRM Protected VMs.

The Nutanix Distributed File System does not write “White space” to disk, as a result the impact of Virtual Machine swap files is negligible which makes the issue of swap file placement much less of an issue.

The only time when Virtual machine swap files will use storage capacity in the Nutanix Distributed File System is when host memory utilization is >100% and swapping needs to occur.

As such, the default vSphere configuration of “Virtual Machine Directory” is ideal for Nutanix environments and valuable storage capacity is not unnecessarily wasted resulting in increased usable space, reduced complexity by removing the requirement for dedicated swap-file datastores without compromising the benefits of de-duplication and compression.

Competition Example Architectural Decision Entry 4 – vCloud Allocation Pool Usable Memory

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

When using an Allocation Pool with 100% memory reservation, due to the VM memory overhead, the usable memory is less than what is expected by the users. What is the best way to ensure users can use the entire memory assigned to the Allocation pool.

Assumptions

1. vCD 5.1.2 is in use

2. vSphere 5.1 or later is in use

3. Org VDC created with Allocation Pool

Constraints

1. vCD 5.1.2 has to be used

2. Allocation Model only VDCs are affected

Motivation

1. Need to use 100% memory allocated to the VDC with Allocation Pool model

2. Optimal use of Memory assigned to Org VDC and then to the VM

Architectural Decision

Due to the “by design” fact of VM memory overhead, we cannot use the entire allocated memory and this will be solved by enabling Elastic Allocation Pool in the vCloud System level and then set a lower vCPU Speed value (260 MHz). This will allow VMs to use the entire allocated memory (100% guarantees) in the Org VDC.

Alternatives

1. Over allocate resources to the customer but only reserve the amount they purchased.

Historically VM overhead ranges in between <=5% to 20%. Most configurations have an overhead of less than 5%, if you assume such you could over allocate resources by 5% but only reserve ~95%. The effect would be that the customer could consume up to the amount of vRAM they purchased and if they created VMs with low overhead (high vRAM allocations, low vCPU) they could possibly actually consume more than they “purchased”. In the case of a 20GHz/20GB purchase we would have to set the Allocation to 21GHz but set the reservation to 95%.

Justification

VM memory overhead is calculated with so many moving targets like the model of the CPU in the ESXi host the VM will be running on, whether 3D is enabled for MKS, etc. So you cannot use the entire allocated memory at any point in time.

By selecting the Elastic VDC, we are overwriting this behavior and still not allowing more VMs to power on from what they have entitled to. Also Elastic VDC gives us an opportunity to set a custom vCPU speed and lowering the vCPU speed will allow you to deploy more vCPUs without being penalized. Without setting this flag, you cannot overcommit the vCPU, which is really bad.

260MHz is the least vCPU speed we can set and thus this has been taken to allow system administrators to overcommit the vCPUs in a VDC with Allocation Pool.

 

Implications

1. One of the caveat is not having any memory reservation for any VMs. Due to the nature of OrgVDCs, it does not allow an Org Admin to set the resource reservation for the VMs (unlike Reservation Pool) and thus any VMs with Elasticity on will not have any reservation which will be marked as overkill for the customer’s high I/O VMs (like DB or Mail Server).

You can easily overwrite the resource reservation using the vSphere but that is not the intent. Hence, we flag it as RISK as it will hamper customer’s VM performance for sure.

If we say we are reserving 100% memory and thus spawning the VMs will get equal memory and can’t oversubscribe the memory as the limit is still what the customer has bought, then also if there is a contention of memory within those VMs, I don’t have an option to prefer those VMs which are resource hungry. In a nutshell all of the VMs will get equal share.

Equal shares will distribute the resource in a RP equally and thus there will not be any guarantee that a hungry VM can get more resource on demand.

Back to Competition Main Page or Competition Submissions