How to successfully Virtualize MS Exchange – Part 6 – vMotion

Having a virtualized Exchange server opens up the ability to perform vMotion and migrate the VM between ESXi hosts without downtime. This is a handy feature to enable hardware maintenance , upgrades or replacement with no downtime and importantly no loss of resiliency to the application.

In this article, I am talking only about vMotion, not Storage vMotion.

Lets first discuss vMotions requirements and configuration maximums.

vMotion requirements:

1. A VMKernel enabled for vMotion
2. A minimum of 1 x 1Gb NIC
3. Shared storage between source and destination ESXi hosts (recommended).

vMotion Configuration Maximums:

Concurrent vMotion operations per host (1Gb/s network):  4
Concurrent vMotion operations per host (10Gb/s network):  8
Concurrent vMotion operations per datastore: 128

As discussed in Part 4, I recommend using DRS “VM to Host” should rules to ensure DRS does not vMotion Exchange VMs unnecessary while keeping the cluster load balanced.

However, it is still important to design your environment to ensure Exchange VMs can vMotion as fast as possible and with the lowest impact during the syncing of the memory and during the final cutover.

So that brings us to our first main topic, Multi-NIC vMotion.

Multi-NIC vMotion:

Multi-NIC vMotion is a feature introduced in vSphere 5.0 which allows vMotion traffic to be sent concurrently down multiple physical NICs to increase available bandwidth and speed up vMotion activity. This effectively lowers the impact of vMotion and enables larger VMs with very high memory change rates do be vMotioned.

For those who are not familiar with the feature, it is described in depth in VMware KB : Multiple-NIC vMotion in vSphere 5 (2007467) as is the process to set it up on Virtual Standard Switches (VSS) and Virtual Distributed Switches (VDS).

From an Exchange perspective, the larger the MBX/MSR VM’s vRAM, and more importantly the more “active” the memory, the longer the vMotion can take. If vMotion detects the memory change rate is higher than the available bandwidth, the hypervisor will insert micro “stuns” to the VM’s CPU over time until the change rate is low enough to vMotion. This is generally has minimal impact to VMs, including Exchange, but if it can be avoided the better.

So using Multi-NIC vMotion helps as more bandwidth can be utilized which means vMotion activity is either faster, or can support more active memory with a low impact.

vMotion “Slot size”:

A vMotion slot size can be thought of as the compute and ram capacity required to perform a vMotion of a VM between two hosts. So for a VM with 96Gb of vRAM and the same memory reservation, the destination host requires 96Gb of physical RAM to be available to even qualify to begin a vMotion.

The larger the VM, the more of a factor this can become in the design of a vSphere cluster.

For example, The diagram below shows a four ESXi host HA cluster with several large VMs including several which are assigned 96Gb of vRAM as is common with Exchange MBX / MSR VMs.

In this scenario the Exchange VMs are represented by VM #13,15 and 16 and have 96Gb RAM ea.

ClustervMotionSlotSizeBad

The issue here is there is insufficient memory on any host to accommodate a vMotion of any of the Exchange VMs. This leads to complexity during maintenance periods as well as a HA event.

In fact in the above example, if an ESXi host crashed, HA would not be able to restart any of the Exchange VMs.

This goes back to the point I made in Part 5 about always ensuring an N+1 (minimum) configuration for the cluster, as this should in most cases avoid this issue.

In addition to the recommendation in Part 4 about using VM to Host DRS “should” rules to ensure only one Exchange VM runs per host.

Enhanced vMotion Compatibility:

Enhanced vMotion Compatibility or EVC, is used to ensure vMotion compatibility for all the hosts within a cluster. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. The end result is configuring EVC prevents vMotion from failing because of incompatible CPUs.

The knowledge based article Enhanced vMotion Compatibility (EVC) processor support (1003212) from VMware explains the EVC modes and compatible CPU models. Note: EVC does not support Intel to AMD or vice versa.

Contrary to popular belief, EVC does not “slow down” the CPU, it only masks processor features that affect vMotion compatibility. The full speed of the processor is still utilized, the only potential performance degradation is where an application is specifically written to take advantage of masked CPU features, in which case that workload may have some performance loss. However this is not the case with MS Exchange and as a result, I recommend EVC always be enabled to ensure the cluster is future proofed and Exchange VMs can be migrated to newer HW seamlessly via vMotion.

For more details on why you should enable EVC, review the Example Architectural Decision – Enhanced vMotion Compatibility.

Jumbo Frames:

Using Jumbo frames helps improve vMotion throughput by reducing the number of packets and therefore interrupts required to migrate the same Exchange VM between two hosts.

Michael Webster @vcdxnz001 (VCDX #66) wrote the following great article showing the benefits of Jumbo Frames for vMotion is up to 19% in Multi-NIC vMotion environments: Jumbo Frames on vSphere 5

So we know there is a significant performance benefit, but what about the downsides of Jumbo Frames?

The following two Example architectural decisions covers the pros and cons of Jumbo Frames, along with justification for using and not using Jumbo for IP Storage. The same concepts are true for vMotion so I recommend you review both decisions and choose which one best suits your requirements/constraints.

Note: Neither decision is “right” or “wrong” but if your environment is configured correctly for Jumbo Frames, you will get better vMotion performance with Jumbo Frames.

  1. Jumbo Frames for IP Storage (Do not use Jumbo Frames)
  2. Jumbo Frames for IP Storage (Use Jumbo Frames)

vMotion Security:

vMotion traffic is unencrypted, as a result anyone with access to the network can sniff the traffic. To avoid this, vMotion traffic should be placed on a dedicated non route-able VLAN.

For more information see: Example Architectural Decision : Securing vMotion & Fault Tolerant Traffic in IaaS/Cloud Environments.

Note: This post is relevant to all environments, not just IaaS/Cloud/Multi-tenant.

Performing a vMotion or entering Maintenance Mode:

As per Part 4, I recommended using VM to Host DRS “should” rules to ensure only one Exchange VM runs per host. This also ensures only one Exchange VM is potentially impacted by vMotion when a host enters maintenance mode.

However, simply entering maintenance mode can kick off up to 8 concurrent vMotion activities when using 10Gb networking for vMotion. In this situation, the length of the vMotion for the Exchange VM will increase and potentially impact performance for a longer period.

As such, I recommend to manually vMotion the Exchange VM onto another host not running any other Exchange VMs (and ideally no other large vCPU/vRAM VMs) and waiting for this to complete before entering the host into maintenance mode.

The benefit of this will depend on the size of your Exchange VMs and the performance of your environment but this is an easy way to minimize the chance of performance issues.

DAG Failovers during vMotion?

This can occur as even a momentary drop of the network during vMotion, or the quiesce of the VM during the final stage of the vMotion exceeds the default Windows cluster heartbeat thresholds.

With vMotion setup correctly and ideally if using Multi-NIC vMotion, this should not occur, however there are ways to mitigate against this issue by increasing the cluster heartbeat time-out help prevent unnecessary DAG failovers.

To increase the cluster heartbeat timeout see: Tuning Failover Cluster Network Thresholds

Recommendations for vMotion:

1. Ensure vMotion is Active on 10Gb (or higher) adapters
2. Enable Multi-NIC vMotion across 2 x 10Gb adapters in environments with Exchange VMs larger than 64Gb RAM
3. Enable Enhanced vMotion Compatibility (EVC) to the highest supported level in your cluster
4. Use Jumbo Frames for vMotion Traffic
5. Ensure sufficient cluster capacity to migrate Exchange VMs
6. Use DRS rules to separate Exchange VMs to ensure vMotion is not prevented (as per Part 4)
7. When evacuating ESXi hosts running Exchange VMs, vMotion the Exchange VM first, and once it has succeeded, put the hosts into maintenance mode.
8. Use Network I/O Control (NIOC) to ensure a minimum level of bandwidth to vMotion (Further details in an upcoming post)
9. Do not Route vMotion Traffic
10. Put vMotion traffic on a dedicated non route-able VLAN (ie: No gateway)
11. Increase cluster heartbeat time-outs for Windows failover clustering with the maximums outlined in Tuning Failover Cluster Network Thresholds.

Back to the Index of How to successfully Virtualize MS Exchange.

Example Architectural Decision – Jumbo Frames for IP Storage (Do not use Jumbo Frames)

Problem Statement

When using IP based storage over a converged 10GB network, should Jumbo Frames be used?

Requirements

1. Fully Supported storage

2. Maximum vSphere environment availability

3. Maximize performance where possible

Assumptions

1. Converged 10GB Network which is highly available

2. Two (or more) 10GB connections per ESXi host

Constraints

1. No dedicated network for IP storage traffic

Motivation

1. Simplify the environment

Architectural Decision

Do not use Jumbo Frames

Justification

1. Reduce the complexity in the environment for initial implementation

2. Simplify ongoing support / troubleshooting

3. For a Jumbo Frame to be transmitted without fragmentation, All devices end to end must support and be configured for Jumbo Frames

4. While there can be performance benefits of Jumbo Frames for IP Storage this is not generally seen across the board and depends on I/O types

5. Ensure IP storage packets are not fragmented or dropped by mis-configured devices or devices that do not support Jumbo Frames

6. Storage performance for the virtual environment will generally be constrained by the storage array controllers not the storage area network

7. Ensure packet fragmentation does not occur as all devices support a default MTU of 1500

8. Increasing the MTU will decrease the number of packets required for the same bandwidth but where the bottleneck is throughput (bytes) there will be minimal/no benefit

9. Jumbo Frames will only assist where the network is constrained at an interrupt level

Implications

1. IP Storage may have reduced performance in some circumstances compared to what Jumbo Frames may offer

Alternatives

1. Use Jumbo Frames

Relates Articles

1. Example Architectural Decision – Jumbo Frames for IP Storage (Use Jumbo Frames)

 Contributors

Thanks to Rob McNab (IBM) and Peter McCrystal (IBM) for their input into this example architectural decision.

 

 

Example Architectural Decision – Jumbo Frames with IP Storage (Use Jumbo Frames)

Problem Statement

When using IP based storage over a converged 10GB network, should Jumbo Frames be used?

Requirements

1. Fully Supported storage

2. Maximum vSphere environment availability

3. Maximize performance where possible

Assumptions

1. Dedicated 10GB Storage Network which is highly available

2. Two 10GB connections per ESXi host dedicated to IP Storage

3. Storage array supports Jumbo Frames

4. Benefit of Jumbo Frames outweighs the complexity to implement/maintain/support

5. Network performance is constrained at an interrupt level

Constraints

1. Maximum of two connections per ESXi host for IP Storage

Motivation

1. Maximum performance and security

Architectural Decision

Use Jumbo Frames

Justification

1. There is a dedicated physical network for IP storage

2. All devices end to end support Jumbo Frames and this is enabled on all switches globally

3. As only IP storage traffic traverses the dedicated network, a larger MTU will not have any adverse effects on data network traffic.

4.  IP storage packets will not be fragmented or dropped as the storage network has been verified and configured to support Jumbo Frames. Thus avoiding costly re-transmits

5. No routing exists (or is required) for the IP storage network, as such the environment is flat and simple to support

6. IP Storage performance will not be constrained by MTU

7. A standard MTU of 1500 can optionally be configured at the VMKernel layer if performance is negatively impacted by Jumbo Frames without the need to modify the switch configuration which will support up to 9216 MTU

8. Increasing the MTU will decrease the number of packets required for the same bandwidth to help prevent IP storage network being constrained at an interrupt level

Implications

1. A dedicated network needs to be maintained for IP storage which reduces consolidation

2. Storage network needs to be configured for Jumbo Frames

3. The Storage controller needs to be configured for Jumbo Frames

4. The VMKernel/s need to be configured for Jumbo Frames

5. Where the networks becomes constrained at either an interrupt or throughput level, any benefit of Jumbo Frames may be reduced or lost and IP storage performance may degrade

Alternatives

1. Do not use Jumbo Frames

2. Use Jumbo Frames in a converged network (ie: No dedicated IP Storage switches)

Relates Articles

1. Example Architectural Decision – Jumbo Frames for IP Storage (Use Jumbo Frames)

 Contributors

Thanks to Rob McNab (IBM) and Peter McCrystal (IBM) for their input into this example architectural decision.