Example Architectural Decision – Port Binding Setting for a dvPortGroup

Problem Statement

In a VMware vSphere environment using Virtual Distributed Switches (VDS) where all VMs including vCenter is hosted on the VDS, What is the most suitable Port Binding setting for dvPortgroups to ensure maximum performance and availability?

Assumptions

1. Enterprise Plus Licensing
2. vCenter is hosted on the VDS

Requirements

1. The environment must have central management of vNetworking
2. All VMs must be able to be powered on in the event of a vCenter outage
3. Network connectivity must not be impacted if vCenter is down.

Motivation

1. Reduce complexity where possible.
2. Maximize the availability of the infrastructure

Architectural Decision

Use the default dvPortGroup Port Binding setting of “Static Binding”

Justification

1. A dvPortGroup port is assigned to a VM and reserved when a VM is connected to the dvPortGroup. This ensures connectivity at all times including when vCenter is down.
3. Using “Static Binding” ensures the vCenter VM can be powered on and connected to the dvPortGroup even after a failure/outage.
4. “Static Binding” is the default setting and there is no reason to modify this setting.

Implications

1. The number of VMs supported on the dvPortGroup / VDS is limited to the number ports on the VDS (not overcommitment of ports is possible).
2. Number of ports configured on a dvPortGroup should be greater than the maximum number of VMs required to be supported.
3. Port Allocation should be left at the default of “Elastic” to ensure the number of ports is automatically expanded if/when required.
4. New Virtual machines cannot be powered on and connected to a dvPortGroup (VDS) when vCenter is down.

Alternatives

1. Set dvPortGroup Port Binding to “Dynamic binding”
2. Set dvPortGroup Port Binding to “Ephemeral binding”

Related Articles

1. Distributed vSwitches and vCenter outage, what’s the deal? – @duncanyb (VCDX #007)

2. Choosing a port binding type in ESX/ESXi (1022312)

Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (1 of 2)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for a Virtual Desktop environment?

Assumptions

1. TPS is disabled by default
2. Storage is expensive
3. Two Socket ESXi Hosts have been chosen to align with a scale out methodology.
4. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
5. vSphere 5.5 or earlier

Requirements

1. VDI environment must deliver consistent performance
2. VDI environment supports a high percentage of Power Users

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure

Architectural Decision

Leave TPS disabled (default) and apply 100% Memory Reservations to VDI VMs and/or Golden Master Image.

Justification

1. Setting 100% memory reservations ensures consistent performance by eliminating the possibility of swapping.
2. The 100% memory reservation also eliminates the capacity usage by the vswap file which saves space on the shared storage as well as reducing the impact on the storage in the event of swapping.
3. RAM is cheaper than Tier 1 storage (which is recommended for vSwap storage to ensure minimal performance impact during swapping) so the increased cost of memory in the hosts is easily offset by the saving in shared storage.
4. Simplicity. Leaving default settings is advantageous from both an architectural and operational perspective.  Example: ESXi Patching can cause settings to revert to default which could negate TPS savings and put a sudden high demand on storage where TPS savings are expected.
5. TPS savings for desktops can be significant, however with a high percentage of Power Users with >=4GB desktops and 2vCPUs, the TPS savings are lower compared to Kiosk or Task users typically with 1-2GB per desktop.
6. The decision has been made to use 2 socket ESXi hosts and scale out so the TPS savings per host compared to a 4 socket server with double the RAM will be lower.
7. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM leading to more consistent performance under a wider range of circumstances.
8. Lower core count (and lower cost) CPUs will likely be viable as RAM will likely be the first constraint for further consolidation.

Implications

1. Using 100% memory reservations requires ESXi hosts and the cluster be sized at a 1:1 ratio of vRAM to pRAM (Physical RAM) and should include N+1 so a host failure can be tolerated.
2. Increased RAM costs
3. No memory overcommitment can be achieved
4. Potential for lower CPU utilization / overcommitment as RAM may become the first constraint.

Alternatives

1. Use 50% reservation and enable TPS
2. Use no reservation, Enable TPS and disable large pages

Related Articles:

1. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

2. Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for VDI (2 of 2)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl(VCDX#104)

The Impact of Transparent Page Sharing (TPS) being disabled by default

Recently VMware announced via the VMware Security Blog, that Transparent Page Sharing (TPS) will be disabled by default in an upcoming update of ESXi.

Since this announcement I have been asked how will this impact sizing vSphere solutions and as a result I’ve been involved in discussions about the impact of this on Business Critical Application, Server and VDI solutions.

Firstly what benefits does TPS provide? In my experience, in recent times with large memory pages essentially not being compatible with TPS, even for VDI environments where all VMs are running the same OS, the benefits have been minimal, in general <20% if that.

Memory overcommitment in general is not something that can achieve significant savings from because memory is much harder to overcommit than CPU. Overcommitment can be achieved but only where memory is not all being used by the VM/OS & Applications, in which case, simply right sizing VMs will give similar memory saving and likely result in better overall VM and cluster performance.

So to begin, in my opinion TPS is in most cases overrated.

Next Business Critical Applications (vBCA):

In my experience, Business Critical Applications such as MS Exchange, MS SQL , Oracle would generally have memory reservations, and in most cases the memory reservation would be 100% (All Memory Locked).

As a result, in most environments running vBCA’s, TPS has no benefits already, so TPS being disabled has no significant impact for these workloads.

Next End User Computing (EUC) Solutions:

There are a number of EUC solutions, such as Horizon View , Citrix XenDesktop and Citrix PVS which all run very well on vSphere.

One common issue with EUC solutions is architects fail to consider the vSwap storage requirements for Virtual Servers (for Citrix PVS) or VDI such as Horizon View.

As a result, a huge amount of Tier 1 storage can be wasted with vswap file storage. This can be up to the amount vRAM allocated to VMs less memory reservations!

The last part is a bit of a hint, how can we reduce or eliminate the need for Tier 1 storage of vSwap? By using Memory Reservations!

While TPS can provide some memory savings, I would invite you to consider the cost saving of eliminating the need for vSwap storage space on your storage solution, and the guarantee of consistent performance (at least from a memory perspective) outweigh the benefits of TPS.

Next Virtual Server Solutions:

Lets say we’re talking about general production servers excluding vBCAs (discussed earlier). These servers are providing applications and functions to your end users so consistent performance is something the business is likely to demand.

When sizing your cluster/s, architects should size for at least N+1 redundancy and to have memory utilization around the 1:1 mark in a host failure scenario. (i.e.: Size your cluster assuming a host failure or maintenance of one host is being performed).

As a result, any reasonable architectural assumption around TPS savings would be minimal.

As with EUC solutions, I would again invite you to consider the cost saving of eliminating the vSwap storage requirement and the guarantee of consistent performance outweigh the benefits of TPS.

Next Test/Dev Environments:

This is probably the area where TPS will provide the most benefit, where memory overcommitment ratios can be much higher as the impact to the applications(VMs) of memory saving techniques such as swapping/ballooning should not have as high an impact on the business as with vBCA, EUC or Server workloads.

However, what is Test/Dev for? In my opinion, Test/Dev should where possible simulate production conditions so the operational verification of an application can be accurately conducted before putting the workloads into production. As such, the Test/Dev VMs should be configured the same way as they are intended to be put into production, including Memory Reservations and CPU overcommitment.

So, can more compute overcommitment be achieved in Test/Dev, sure, but again is the impact of vSwap space, potentially inconsistent performance and the increased risk of operational verification not being performed to properly simulate product worth the minimal benefits of TPS?

Summary

If VMware believe TPS is a significant enough security issue to make it disabled by default, this is something architects should consider, however I would argue there are many other areas where security is a much larger issue, but that’s a different topic.

TPS being disabled by default is likely to only impact a small percentage of virtual workloads and with RAM being one of the most inexpensive components in the datacenter, ensuring consistent performance by using Memory Reservations and eliminating the architectural considerations and potentially high storage costs for VMs vSwap make leaving TPS disabled an attractive option regardless of if its truly a security advantage or not.

Related Articles:

1. Future direction of disabling TPS by default and its impact on capacity planning – @FrankDenneman (VCDX #29)

2. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)