The Impact of Transparent Page Sharing (TPS) being disabled by default

Recently VMware announced via the VMware Security Blog, that Transparent Page Sharing (TPS) will be disabled by default in an upcoming update of ESXi.

Since this announcement I have been asked how will this impact sizing vSphere solutions and as a result I’ve been involved in discussions about the impact of this on Business Critical Application, Server and VDI solutions.

Firstly what benefits does TPS provide? In my experience, in recent times with large memory pages essentially not being compatible with TPS, even for VDI environments where all VMs are running the same OS, the benefits have been minimal, in general <20% if that.

Memory overcommitment in general is not something that can achieve significant savings from because memory is much harder to overcommit than CPU. Overcommitment can be achieved but only where memory is not all being used by the VM/OS & Applications, in which case, simply right sizing VMs will give similar memory saving and likely result in better overall VM and cluster performance.

So to begin, in my opinion TPS is in most cases overrated.

Next Business Critical Applications (vBCA):

In my experience, Business Critical Applications such as MS Exchange, MS SQL , Oracle would generally have memory reservations, and in most cases the memory reservation would be 100% (All Memory Locked).

As a result, in most environments running vBCA’s, TPS has no benefits already, so TPS being disabled has no significant impact for these workloads.

Next End User Computing (EUC) Solutions:

There are a number of EUC solutions, such as Horizon View , Citrix XenDesktop and Citrix PVS which all run very well on vSphere.

One common issue with EUC solutions is architects fail to consider the vSwap storage requirements for Virtual Servers (for Citrix PVS) or VDI such as Horizon View.

As a result, a huge amount of Tier 1 storage can be wasted with vswap file storage. This can be up to the amount vRAM allocated to VMs less memory reservations!

The last part is a bit of a hint, how can we reduce or eliminate the need for Tier 1 storage of vSwap? By using Memory Reservations!

While TPS can provide some memory savings, I would invite you to consider the cost saving of eliminating the need for vSwap storage space on your storage solution, and the guarantee of consistent performance (at least from a memory perspective) outweigh the benefits of TPS.

Next Virtual Server Solutions:

Lets say we’re talking about general production servers excluding vBCAs (discussed earlier). These servers are providing applications and functions to your end users so consistent performance is something the business is likely to demand.

When sizing your cluster/s, architects should size for at least N+1 redundancy and to have memory utilization around the 1:1 mark in a host failure scenario. (i.e.: Size your cluster assuming a host failure or maintenance of one host is being performed).

As a result, any reasonable architectural assumption around TPS savings would be minimal.

As with EUC solutions, I would again invite you to consider the cost saving of eliminating the vSwap storage requirement and the guarantee of consistent performance outweigh the benefits of TPS.

Next Test/Dev Environments:

This is probably the area where TPS will provide the most benefit, where memory overcommitment ratios can be much higher as the impact to the applications(VMs) of memory saving techniques such as swapping/ballooning should not have as high an impact on the business as with vBCA, EUC or Server workloads.

However, what is Test/Dev for? In my opinion, Test/Dev should where possible simulate production conditions so the operational verification of an application can be accurately conducted before putting the workloads into production. As such, the Test/Dev VMs should be configured the same way as they are intended to be put into production, including Memory Reservations and CPU overcommitment.

So, can more compute overcommitment be achieved in Test/Dev, sure, but again is the impact of vSwap space, potentially inconsistent performance and the increased risk of operational verification not being performed to properly simulate product worth the minimal benefits of TPS?

Summary

If VMware believe TPS is a significant enough security issue to make it disabled by default, this is something architects should consider, however I would argue there are many other areas where security is a much larger issue, but that’s a different topic.

TPS being disabled by default is likely to only impact a small percentage of virtual workloads and with RAM being one of the most inexpensive components in the datacenter, ensuring consistent performance by using Memory Reservations and eliminating the architectural considerations and potentially high storage costs for VMs vSwap make leaving TPS disabled an attractive option regardless of if its truly a security advantage or not.

Related Articles:

1. Future direction of disabling TPS by default and its impact on capacity planning – @FrankDenneman (VCDX #29)

2. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

The VCDX candidates advantage over the panellists.

As the candidate, you submit a VCDX application based on a project you have worked on from start to finish and in most case, lead from a technical/architectural perspective.

You therefore should have:

  • Had Initial discussions with the customer about requirements.
  • Lead or been involved in Design workshops
  • Considered design decisions
  • Documented the detailed design along with implementation & test plans etc
  • Either overseen or been actively involved with implementation

As a result of the above, you have spent many hours, potentially hundreds or thousands of hours depending on the size of the project and you will have intimate knowledge of the design & solution.

On the other hand, VCDX panellists, while they are experts in the field, have been given your application, including the design and a very limited amount of time to review and prepare for the defence.

As a result, the VCDX candidate has a HUGE advantage over the panellists!

So in a defence, who is the expert in the room on the design? The Candidate!

As a result, the candidate should be an expert in the design being presented and answering questions from the panel about the design should not be intimidating.

To all VCDX candidates, understand that you have a major advantage and go defend your design with confidence!

Microsoft Exchange on Nutanix Best Practice Guide

I am pleased to announce that the Best Practice guide for Microsoft Exchange on Nutanix is released and can be found here.

For me deploying MS Exchange on Nutanix with vSphere combines best of breed application level resiliency (in the form of Exchange Database Availability Groups), infrastructure and hypervisor technologies to provide an infrastructure with not only high performance, but with industry leading scalability, no silos and very high efficiency & resiliency.

All of this leads to overall lower CAPEX/OPEX for customers.

In summary by Virtualizing MS Exchange on Nutanix, customers realize several key benefits including:

  • Ability to use a standard platform for all workloads in the datacenter, thus allowing the removal of legacy silos resulting in lower overall cost, and increased operational efficiencies.
    • An example of this is no disruption to MS Exchange users when performing Nutanix / Hypervisor or HW maintenance
  • A highly resilient , scalable and flexible MS Exchange deployment.
  • Reducing the number of Exchange Mailbox servers required to maintain 4 copies of Exchange data thanks to the combination of NDFS + DAG. (2 copies at NDFS layer / 2 copies at DAG layer)
  • Eliminate the need for large / costly refresh cycles of HW as individual nodes can be added and removed non disruptively.
  • Simplified architecture, no need for complex sizing architecture or risk of over sizing day 1, start small and scale VMs, Compute or storage if/when required.
  • No dependency of specific HW, Exchange VMs can be migrated to/from any Nutanix node and even to non Nutanix nodes.
  • Full support from Nutanix including at the Exchange, Hypervisor and Storage layers with support from Microsoft via Premier Support contracts or via TSANet.
  • Lower CAPEX/OPEX as Exchange can be combined with new or existing Nutanix/Virtualization deployment.
  • Reduced datacenter costs including Power, Cooling , Space (RU)

I hope you enjoy the Best Practice guide and look forward to hearing about your MS Exchange on Nutanix questions & experiences.