The Impact of Transparent Page Sharing (TPS) being disabled by default

Recently VMware announced via the VMware Security Blog, that Transparent Page Sharing (TPS) will be disabled by default in an upcoming update of ESXi.

Since this announcement I have been asked how will this impact sizing vSphere solutions and as a result I’ve been involved in discussions about the impact of this on Business Critical Application, Server and VDI solutions.

Firstly what benefits does TPS provide? In my experience, in recent times with large memory pages essentially not being compatible with TPS, even for VDI environments where all VMs are running the same OS, the benefits have been minimal, in general <20% if that.

Memory overcommitment in general is not something that can achieve significant savings from because memory is much harder to overcommit than CPU. Overcommitment can be achieved but only where memory is not all being used by the VM/OS & Applications, in which case, simply right sizing VMs will give similar memory saving and likely result in better overall VM and cluster performance.

So to begin, in my opinion TPS is in most cases overrated.

Next Business Critical Applications (vBCA):

In my experience, Business Critical Applications such as MS Exchange, MS SQL , Oracle would generally have memory reservations, and in most cases the memory reservation would be 100% (All Memory Locked).

As a result, in most environments running vBCA’s, TPS has no benefits already, so TPS being disabled has no significant impact for these workloads.

Next End User Computing (EUC) Solutions:

There are a number of EUC solutions, such as Horizon View , Citrix XenDesktop and Citrix PVS which all run very well on vSphere.

One common issue with EUC solutions is architects fail to consider the vSwap storage requirements for Virtual Servers (for Citrix PVS) or VDI such as Horizon View.

As a result, a huge amount of Tier 1 storage can be wasted with vswap file storage. This can be up to the amount vRAM allocated to VMs less memory reservations!

The last part is a bit of a hint, how can we reduce or eliminate the need for Tier 1 storage of vSwap? By using Memory Reservations!

While TPS can provide some memory savings, I would invite you to consider the cost saving of eliminating the need for vSwap storage space on your storage solution, and the guarantee of consistent performance (at least from a memory perspective) outweigh the benefits of TPS.

Next Virtual Server Solutions:

Lets say we’re talking about general production servers excluding vBCAs (discussed earlier). These servers are providing applications and functions to your end users so consistent performance is something the business is likely to demand.

When sizing your cluster/s, architects should size for at least N+1 redundancy and to have memory utilization around the 1:1 mark in a host failure scenario. (i.e.: Size your cluster assuming a host failure or maintenance of one host is being performed).

As a result, any reasonable architectural assumption around TPS savings would be minimal.

As with EUC solutions, I would again invite you to consider the cost saving of eliminating the vSwap storage requirement and the guarantee of consistent performance outweigh the benefits of TPS.

Next Test/Dev Environments:

This is probably the area where TPS will provide the most benefit, where memory overcommitment ratios can be much higher as the impact to the applications(VMs) of memory saving techniques such as swapping/ballooning should not have as high an impact on the business as with vBCA, EUC or Server workloads.

However, what is Test/Dev for? In my opinion, Test/Dev should where possible simulate production conditions so the operational verification of an application can be accurately conducted before putting the workloads into production. As such, the Test/Dev VMs should be configured the same way as they are intended to be put into production, including Memory Reservations and CPU overcommitment.

So, can more compute overcommitment be achieved in Test/Dev, sure, but again is the impact of vSwap space, potentially inconsistent performance and the increased risk of operational verification not being performed to properly simulate product worth the minimal benefits of TPS?

Summary

If VMware believe TPS is a significant enough security issue to make it disabled by default, this is something architects should consider, however I would argue there are many other areas where security is a much larger issue, but that’s a different topic.

TPS being disabled by default is likely to only impact a small percentage of virtual workloads and with RAM being one of the most inexpensive components in the datacenter, ensuring consistent performance by using Memory Reservations and eliminating the architectural considerations and potentially high storage costs for VMs vSwap make leaving TPS disabled an attractive option regardless of if its truly a security advantage or not.

Related Articles:

1. Future direction of disabling TPS by default and its impact on capacity planning – @FrankDenneman (VCDX #29)

2. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

9 thoughts on “The Impact of Transparent Page Sharing (TPS) being disabled by default

  1. >In my experience, in recent times with large memory pages essentially not being compatible with TPS, even for VDI environments

    What is performance impact for VDI environments with large pages disabled?
    As for my experience in any environments except heavy loaded disabling large pages gives a significant memory economy.
    I’ve heard a lot of times that memory is cheap now. Really? These guys have never seen price for 64GB modules. I have maxed out memory slots with 16GB modules and CPU load is just 20%.

    • I agree the highest capacity RAM DIMMs are not cheap, but as with any tech, the latest and greatest is always more expensive. I always try to find hosts (or Nutanix nodes) with the RAM the customer needs with the most cost effective DIMM’S possible.

      In saying that, I don’t disagree that disabling large memory pages will result in higher TPS savings, my point in this post is for everyone to consider the benefits of memory reservations from a vswap storage savings/ performance consistency and architectural simplicity perspective. Its still an architectural decision that needs to be made, and there isn’t a right or wrong answer IMO.

      Regarding CPU load, I’ve seen many EUC environments where CPU utilization is very low as a result of CPU scheduling contention which makes the issue appear to be RAM or Storage when it may not be. In fact I wrote a post on this a while back : http://www.joshodgers.com/2013/01/12/high-cpu-ready-with-low-cpu-utilization/

      Thanks for your comment.

  2. Pingback: Transparent Page Sharing Vulnerable, Yet Largely Irrelevant - Wahl Network

  3. I agree with most of this Josh except where you say that “In my opinion, Test/Dev should where possible simulate production conditions so the operational verification of an application can be accurately conducted before putting the workloads into production”

    Test/Dev is a place for developers to cut code and QA to test it functionally. While it’s beneficial to make this environment simulate production conditions as much as possible, there is almost always a large trade-off for lowered cost and performance in these environments and that reasonably translates into larger overcommitment ratios and the potential for leveraging TPS. This is especially true in Test/Dev environments which have lots and lots of copies of exactly the same VM – identical operating system, applications and data.

    A case could even be made for lowered security expectations — no external exposure and data that has been sanitised/masked — that could tip the trade-off in favour of TPS even considering this new risk.

    A pre-production environment is completely different, of course, where performance and security considerations are different and you’ve accepted the cost of lowered overcommitment ratios.

    So while I agree with you for the most part, I think you’re underplaying the value of TPS to Test/Dev environments.

    • Hey Luke, well pointed about re: The differences between Test/Dev and QA, and I agree. I will update the article to address both QA and Test/Dev environments separately.

      Regarding Pre-Production, I would put this in the same category of Production.

  4. In my experience the % savings with TPS ‘can be’ quite extensive (often more than 20%) which at scale translates into a significant amount of money (a lot of the savings being zero pages). Without large pages disabled (as you pointed out) this savings is impacted significantly. In fact you would never see it until your utilization alarms had long been triggered since it is now only activated as a memory savings technique (like ballooning, compression, etc. Another major issue is you also are not able to calculate capacity effectively by looking at consumed like the olden days before large pages and the introduction of Intel EPT and AMD RVI.

    I agree that there are benefits in considering just going the route of 100% reservations. With large pages enabled by default, and now TPS disabled due to security concerns, it unfortunately starts to make less and less sense to worry about memory over-commit except in specific circumstances (like the larger scale functional test/dev/demo use cases pointed out).

    Another significant side benefit of enabling 100% reservations is that % reserved HA admission control policy will actually work as people might expect. It is a common misconception that % reserved admission control policy will function without reservations. The defaults values it uses for MHz and MB are far too low for HA in virtually any environment to actually take action based on them.

    • Great point re: HA admission control, percentage based is not well understood and reservations do help admission control act in the way most people (incorrectly) understand it works.

      Disabling large pages and using TPS is def an option, and a good topic for an upcoming example architectural decision. 🙂

  5. Pingback: Transparent Page Sharing – Disabled by Default | Another vSphere Blog

  6. Pingback: The anatomy of a virtual desktop (Back to Basics) • My Virtual Vision