How to successfully Virtualize MS Exchange – Part 3 – Memory

In Part 1 and Part 2, we discussed how to size and configure Exchange VMs to meet CPU requirements. In Part 3 we will focus on virtual memory (vRAM).

As Exchange 2013 is quite RAM intensive and is not unusual to have memory requirements of >128GB RAM in larger deployments. As such, one of the first things we should consider is Virtual Machine maximums.

Luckily in recent years the maximum VM size in vSphere has increased and is no longer a constraint for virtualizing even the largest of Exchange environments.

The current maximum vRAM configuration per VM is shown below:

vSphere Virtual Machine RAM Maximums

Maximum vRAM per VM1TB (vSphere 5.0 or later)
Maximum vRAM per VM: 255GB (vSphere 4.1)

The key point here is memory is in no way a constraining factor when virtualizing Exchange even in older vSphere 4.1 deployments.

Memory Sizing

For maximum Memory performance, sizing the Exchange VM within a NUMA node gives the maximum benefit from NUMA locality, meaning the latency between the CPU and RAM is minimized.

In the event the memory requirements exceed the NUMA node, consider scaling out until you have at least 4 Exchange VMs (across 4 ESXi hosts) before scaling Exchange VMs up. This ensures higher resiliency and aligns with a Virtualization friendly scale out approach. Once the environment has 4 or more Exchange VMs, scaling up beyond the size of a NUMA node can be a good option to reduce the number of Exchange instances to manage and license without significantly impacting resiliency.

Memory Overcommitment

ESXi has excellent memory overcommitment capabilities which can work very well depending on the Operating system and application running within the guest. However Exchange is generally considered a Business Critical Application and as such, overcommitting memory for Exchange is generally not a good idea and should be avoided where possible.

Memory Reservations

For Exchange VMs, I recommended configuring the VM with “All Memory Locked” or in other words, a 100% memory reservation.

This has two advantages, the first being consistent memory performance for MS Exchange which is critical to ensure a great end user experience.

The second is the potentially large storage saving as the vSwap file is eliminated. For example, if an Exchange VM has 128Gb RAM and no memory reservation, a 128Gb vSwap file will be created by default in the same Datastore as the VMs .vmx file which could impact storage sizing and performance.

ESXi Host / Cluster Sizing Considerations

Exchange VMs are typically larger than the average VM, as a result they can consume a significant percentage of an ESXi hosts memory resources. As a result it is important to size your ESXi hosts to have sufficient RAM for the Exchange VMs.

As such in cases where the Exchange VM is sized to exceed the NUMA node, I recommend sizing ESXi hosts to have at least 25% more physical RAM than the vRAM assigned to your Exchange VMs.

Example: If your Exchange VM is assigned 96Gb, the ESXi hosts in the cluster should have at least 128Gb. This ensures memory for the hypervisor and other smaller VMs such as Domain Controllers to service things like the global catalog requirements for Exchange without contention.

Recommendations:

1. Set “All Memory Locked” (100% Memory Reservation) for Exchange VMs.
2. Where possible, size the Exchange VMs RAM within a NUMA node.
3. Where Exchange RAM requirements exceed that of the NUMA node ensure the size ESXi hosts to have at least 25% more RAM than the Exchange VM (or the largest vRAM VM in the cluster)
4. Ensure VMs vRAM is right sized after deployment to minimize waste (especially considering the recommendation to use memory reservations)

Back to the Index of How to successfully Virtualize MS Exchange.

Transparent Page Sharing (TPS) Example Architectural Decisions Register

The following is a register of all Example Architectural Decisions related to Transparent Page Sharing on VMware ESXi following the announcement from VMware that TPS will be disabled by default in future patches and versions.

See The Impact of Transparent Page Sharing (TPS) being disabled by default for more information.

The goal of this series is to give the pros and cons for multiple options for the configuration of TPS for a wide range of virtual workloads from VDI, to Server, Business Critical Apps , Test/Dev and QA/Pre-Production.

Business Critical Applications (vBCA) :

1. Transparent Page Sharing (TPS) Configuration for Virtualized Business Critical Applications (vBCA)

Mixed Server Workloads:

1. Transparent Page Sharing (TPS) Configuration for Production Servers (1 of 2)

2. Transparent Page Sharing (TPS) Configuration for Production Servers (2 of 2) – Coming Soon!

Virtual Desktop (VDI) Environments:

1. Transparent Page Sharing (TPS) Configuration for VDI (1 of 2)

2. Transparent Page Sharing (TPS) Configuration for VDI (2 of 2)

Testing & Development:

1. Transparent Page Sharing (TPS) Configuration for Test/Dev Servers (1 of 2) – Coming Soon!

2. Transparent Page Sharing (TPS) Configuration for Test/Dev Servers (2 of 2) – Coming Soon!

QA / Pre-Production:

1. Transparent Page Sharing (TPS) Configuration for QA / Pre-Production Servers

Related Articles:

1. Example Architectural Decision Register

2. The Impact of Transparent Page Sharing (TPS) being disabled by default – @josh_odgers (VCDX#90)

3. Future direction of disabling TPS by default and its impact on capacity planning – @FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for Virtualized Business Critical Applications (vBCA)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for an environment running Virtualized Business Critical Applications?

Assumptions

1. TPS is disabled by default.
2. Storage is expensive.
3. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
4. vSphere 5.5 or earlier

Requirements

1. Applications must meet strict Service Level Agreements (SLAs)
2. The environment must deliver high consistent performance
3. Minimize the cost of shared storage

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure
3. Meet/Exceed SLAs

Architectural Decision

Leave TPS disabled (default) and leave Large Memory pages enabled (default).

Justification

1. Setting 100% memory reservations ensures consistent performance by eliminating the possibility of swapping.
2. The 100% memory reservation also eliminates the capacity usage by the vswap file which saves space on the shared storage as well as reducing the impact on the storage in the event of swapping.
3. RAM is cheaper than Tier 1 storage (which is recommended for vSwap storage to ensure minimal performance impact during swapping) so the increased cost of memory in the hosts is easily offset by the saving in Tier 1 shared storage.
4. Simplicity. Leaving default settings is advantageous from both an architectural and operational perspective.  Example: ESXi Patching can cause settings to revert to default which could negate TPS savings and put a sudden high demand on storage where TPS savings may be expected.
5. TPS savings for vBCA workloads is typically much less than with mixed server or desktop workloads due to the memory requirements for vBCAs being typically much higher.
6. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM leading to more consistent performance under a wider range of circumstances.
7. Remove the real or perceived security risk of sensitive information being gathered from other VMs using TPS as described in VMware KB 2080735
8. Many business critical applications such as SAP and MS SQL use in Memory caching, overcommitment of memory could lead to serious degradation in performance.
9. Many business critical applications such as MS SQL claim by default all available RAM in the Virtual Machine, as such, TPS savings would be minimal to none for SQL servers.

Implications

1. Using 100% memory reservations requires ESXi hosts and the cluster be sized at a 1:1 ratio of vRAM to pRAM (Physical RAM) and should include N+1 so a host failure can be tolerated.
2. Potential Increased RAM costs
3. No memory overcommitment can be achieved
4. Potential for lower CPU utilization / overcommitment as RAM may become the first constraint.

Alternatives

1. Use 50% reservation and enable TPS
2. Use no reservation, Enable TPS and disable large pages

Summary

I highly recommend sizing vBCA’s with no memory overcommitment in mind and setting 100% memory reservations for these VMs. By definition, these are “Business Critical” and the requirement of consistent high performance far outweigh potential memory savings!

In the event vBCAs are running in a cluster with no vBCAs such as Server or even Test/Dev where TPS may be enabled or desired, TPS can be enabled along with disabling large pages but vBCAs should always have 100% memory reservation.

Related Articles:

1. Cloud XC Transparent Page Sharing Example Architectural Decisions

2. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)