Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for Virtualized Business Critical Applications (vBCA)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for an environment running Virtualized Business Critical Applications?

Assumptions

1. TPS is disabled by default.
2. Storage is expensive.
3. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
4. vSphere 5.5 or earlier

Requirements

1. Applications must meet strict Service Level Agreements (SLAs)
2. The environment must deliver high consistent performance
3. Minimize the cost of shared storage

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure
3. Meet/Exceed SLAs

Architectural Decision

Leave TPS disabled (default) and leave Large Memory pages enabled (default).

Justification

1. Setting 100% memory reservations ensures consistent performance by eliminating the possibility of swapping.
2. The 100% memory reservation also eliminates the capacity usage by the vswap file which saves space on the shared storage as well as reducing the impact on the storage in the event of swapping.
3. RAM is cheaper than Tier 1 storage (which is recommended for vSwap storage to ensure minimal performance impact during swapping) so the increased cost of memory in the hosts is easily offset by the saving in Tier 1 shared storage.
4. Simplicity. Leaving default settings is advantageous from both an architectural and operational perspective.  Example: ESXi Patching can cause settings to revert to default which could negate TPS savings and put a sudden high demand on storage where TPS savings may be expected.
5. TPS savings for vBCA workloads is typically much less than with mixed server or desktop workloads due to the memory requirements for vBCAs being typically much higher.
6. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM leading to more consistent performance under a wider range of circumstances.
7. Remove the real or perceived security risk of sensitive information being gathered from other VMs using TPS as described in VMware KB 2080735
8. Many business critical applications such as SAP and MS SQL use in Memory caching, overcommitment of memory could lead to serious degradation in performance.
9. Many business critical applications such as MS SQL claim by default all available RAM in the Virtual Machine, as such, TPS savings would be minimal to none for SQL servers.

Implications

1. Using 100% memory reservations requires ESXi hosts and the cluster be sized at a 1:1 ratio of vRAM to pRAM (Physical RAM) and should include N+1 so a host failure can be tolerated.
2. Potential Increased RAM costs
3. No memory overcommitment can be achieved
4. Potential for lower CPU utilization / overcommitment as RAM may become the first constraint.

Alternatives

1. Use 50% reservation and enable TPS
2. Use no reservation, Enable TPS and disable large pages

Summary

I highly recommend sizing vBCA’s with no memory overcommitment in mind and setting 100% memory reservations for these VMs. By definition, these are “Business Critical” and the requirement of consistent high performance far outweigh potential memory savings!

In the event vBCAs are running in a cluster with no vBCAs such as Server or even Test/Dev where TPS may be enabled or desired, TPS can be enabled along with disabling large pages but vBCAs should always have 100% memory reservation.

Related Articles:

1. Cloud XC Transparent Page Sharing Example Architectural Decisions

2. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

ESXi Host Isolation Response and custom isolation address configuration.

I was reviewing a vSphere design recently and I came across an interesting design choice which I thought I would share.

The architect selected the isolation response of “Leave Powered On” and disabled  “das.usedefaultisolationaddress”  (which is by default enabled) and configured multiple custom isolation addresses using the “das.isolationadressX” advanced setting.

The architect explained that this was done to minimize the chance of a false positive isolation event. In many environments such as ones using IP storage or where the ESXi Management VMKernel default gateway is not highly available, this can be a very good idea.

In this environment, the storage was provided via FC and the default gateway was highly available.

So was there a benefit in changing the default setting of “das.usedefaultisolationaddress” and configuring custom isolation addresses?

The short answer is No.

This is because the isolation response is configured with “Leave Powered On” so regardless of the host being isolated or not, the Virtual Machines will remain powered on.

So keep it simple, if your isolation response is “Leave Powered On” there is no need to change either of these advanced settings.

The below articles show examples of isolation response and custom isolation addresses configurations for IP Storage, FC storage and Hyper-converged environments.

Related Articles

1. Host Isolation Response for IP Storage
2. Host isolation response for FC based Storage
3. Host Isolation Response for a Nutanix Environment

Giving Back to the VMware community

After achieving my VCDX in Toronto mid last year (2012), one of my goals was to start blogging and giving back to the VMware/Virtualisation community which I wrote in my About Me post back in April last year.

Over the last year I presented at multiple VMUG events , contributed to community podcasts and kicked off this blog (CloudXC). I also got more involved with Twitter and VMware Communities forum.

I have also helped a number of VCDX candidates with mock panels and submission reviews and I am very pleased a number of those candidates have been successful.

I am pleased to say I have thoroughly enjoyed getting more involved with the community, and have met a lot of great people and learnt lots of new things along the way.

Today I received notification that along with 574 others, I was awarded the title of vExpert for 2013. (vExpert Awardees announced here)

Just like achieving VCDX, earning the vExpert title for me is just motivation to continually keep improving my skills and adding value to the community.

Thanks to everyone involved with the vExpert Program and I look forward to continuing to contribute too this great community and hopefully get another vExpert gong next year.

Congrats to everyone else who was awarded vExpert for 2013!

vExpert_pre2013_new