How to successfully Virtualize MS Exchange – Part 6 – vMotion

Having a virtualized Exchange server opens up the ability to perform vMotion and migrate the VM between ESXi hosts without downtime. This is a handy feature to enable hardware maintenance , upgrades or replacement with no downtime and importantly no loss of resiliency to the application.

In this article, I am talking only about vMotion, not Storage vMotion.

Lets first discuss vMotions requirements and configuration maximums.

vMotion requirements:

1. A VMKernel enabled for vMotion
2. A minimum of 1 x 1Gb NIC
3. Shared storage between source and destination ESXi hosts (recommended).

vMotion Configuration Maximums:

Concurrent vMotion operations per host (1Gb/s network):  4
Concurrent vMotion operations per host (10Gb/s network):  8
Concurrent vMotion operations per datastore: 128

As discussed in Part 4, I recommend using DRS “VM to Host” should rules to ensure DRS does not vMotion Exchange VMs unnecessary while keeping the cluster load balanced.

However, it is still important to design your environment to ensure Exchange VMs can vMotion as fast as possible and with the lowest impact during the syncing of the memory and during the final cutover.

So that brings us to our first main topic, Multi-NIC vMotion.

Multi-NIC vMotion:

Multi-NIC vMotion is a feature introduced in vSphere 5.0 which allows vMotion traffic to be sent concurrently down multiple physical NICs to increase available bandwidth and speed up vMotion activity. This effectively lowers the impact of vMotion and enables larger VMs with very high memory change rates do be vMotioned.

For those who are not familiar with the feature, it is described in depth in VMware KB : Multiple-NIC vMotion in vSphere 5 (2007467) as is the process to set it up on Virtual Standard Switches (VSS) and Virtual Distributed Switches (VDS).

From an Exchange perspective, the larger the MBX/MSR VM’s vRAM, and more importantly the more “active” the memory, the longer the vMotion can take. If vMotion detects the memory change rate is higher than the available bandwidth, the hypervisor will insert micro “stuns” to the VM’s CPU over time until the change rate is low enough to vMotion. This is generally has minimal impact to VMs, including Exchange, but if it can be avoided the better.

So using Multi-NIC vMotion helps as more bandwidth can be utilized which means vMotion activity is either faster, or can support more active memory with a low impact.

vMotion “Slot size”:

A vMotion slot size can be thought of as the compute and ram capacity required to perform a vMotion of a VM between two hosts. So for a VM with 96Gb of vRAM and the same memory reservation, the destination host requires 96Gb of physical RAM to be available to even qualify to begin a vMotion.

The larger the VM, the more of a factor this can become in the design of a vSphere cluster.

For example, The diagram below shows a four ESXi host HA cluster with several large VMs including several which are assigned 96Gb of vRAM as is common with Exchange MBX / MSR VMs.

In this scenario the Exchange VMs are represented by VM #13,15 and 16 and have 96Gb RAM ea.

ClustervMotionSlotSizeBad

The issue here is there is insufficient memory on any host to accommodate a vMotion of any of the Exchange VMs. This leads to complexity during maintenance periods as well as a HA event.

In fact in the above example, if an ESXi host crashed, HA would not be able to restart any of the Exchange VMs.

This goes back to the point I made in Part 5 about always ensuring an N+1 (minimum) configuration for the cluster, as this should in most cases avoid this issue.

In addition to the recommendation in Part 4 about using VM to Host DRS “should” rules to ensure only one Exchange VM runs per host.

Enhanced vMotion Compatibility:

Enhanced vMotion Compatibility or EVC, is used to ensure vMotion compatibility for all the hosts within a cluster. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. The end result is configuring EVC prevents vMotion from failing because of incompatible CPUs.

The knowledge based article Enhanced vMotion Compatibility (EVC) processor support (1003212) from VMware explains the EVC modes and compatible CPU models. Note: EVC does not support Intel to AMD or vice versa.

Contrary to popular belief, EVC does not “slow down” the CPU, it only masks processor features that affect vMotion compatibility. The full speed of the processor is still utilized, the only potential performance degradation is where an application is specifically written to take advantage of masked CPU features, in which case that workload may have some performance loss. However this is not the case with MS Exchange and as a result, I recommend EVC always be enabled to ensure the cluster is future proofed and Exchange VMs can be migrated to newer HW seamlessly via vMotion.

For more details on why you should enable EVC, review the Example Architectural Decision – Enhanced vMotion Compatibility.

Jumbo Frames:

Using Jumbo frames helps improve vMotion throughput by reducing the number of packets and therefore interrupts required to migrate the same Exchange VM between two hosts.

Michael Webster @vcdxnz001 (VCDX #66) wrote the following great article showing the benefits of Jumbo Frames for vMotion is up to 19% in Multi-NIC vMotion environments: Jumbo Frames on vSphere 5

So we know there is a significant performance benefit, but what about the downsides of Jumbo Frames?

The following two Example architectural decisions covers the pros and cons of Jumbo Frames, along with justification for using and not using Jumbo for IP Storage. The same concepts are true for vMotion so I recommend you review both decisions and choose which one best suits your requirements/constraints.

Note: Neither decision is “right” or “wrong” but if your environment is configured correctly for Jumbo Frames, you will get better vMotion performance with Jumbo Frames.

  1. Jumbo Frames for IP Storage (Do not use Jumbo Frames)
  2. Jumbo Frames for IP Storage (Use Jumbo Frames)

vMotion Security:

vMotion traffic is unencrypted, as a result anyone with access to the network can sniff the traffic. To avoid this, vMotion traffic should be placed on a dedicated non route-able VLAN.

For more information see: Example Architectural Decision : Securing vMotion & Fault Tolerant Traffic in IaaS/Cloud Environments.

Note: This post is relevant to all environments, not just IaaS/Cloud/Multi-tenant.

Performing a vMotion or entering Maintenance Mode:

As per Part 4, I recommended using VM to Host DRS “should” rules to ensure only one Exchange VM runs per host. This also ensures only one Exchange VM is potentially impacted by vMotion when a host enters maintenance mode.

However, simply entering maintenance mode can kick off up to 8 concurrent vMotion activities when using 10Gb networking for vMotion. In this situation, the length of the vMotion for the Exchange VM will increase and potentially impact performance for a longer period.

As such, I recommend to manually vMotion the Exchange VM onto another host not running any other Exchange VMs (and ideally no other large vCPU/vRAM VMs) and waiting for this to complete before entering the host into maintenance mode.

The benefit of this will depend on the size of your Exchange VMs and the performance of your environment but this is an easy way to minimize the chance of performance issues.

DAG Failovers during vMotion?

This can occur as even a momentary drop of the network during vMotion, or the quiesce of the VM during the final stage of the vMotion exceeds the default Windows cluster heartbeat thresholds.

With vMotion setup correctly and ideally if using Multi-NIC vMotion, this should not occur, however there are ways to mitigate against this issue by increasing the cluster heartbeat time-out help prevent unnecessary DAG failovers.

To increase the cluster heartbeat timeout see: Tuning Failover Cluster Network Thresholds

Recommendations for vMotion:

1. Ensure vMotion is Active on 10Gb (or higher) adapters
2. Enable Multi-NIC vMotion across 2 x 10Gb adapters in environments with Exchange VMs larger than 64Gb RAM
3. Enable Enhanced vMotion Compatibility (EVC) to the highest supported level in your cluster
4. Use Jumbo Frames for vMotion Traffic
5. Ensure sufficient cluster capacity to migrate Exchange VMs
6. Use DRS rules to separate Exchange VMs to ensure vMotion is not prevented (as per Part 4)
7. When evacuating ESXi hosts running Exchange VMs, vMotion the Exchange VM first, and once it has succeeded, put the hosts into maintenance mode.
8. Use Network I/O Control (NIOC) to ensure a minimum level of bandwidth to vMotion (Further details in an upcoming post)
9. Do not Route vMotion Traffic
10. Put vMotion traffic on a dedicated non route-able VLAN (ie: No gateway)
11. Increase cluster heartbeat time-outs for Windows failover clustering with the maximums outlined in Tuning Failover Cluster Network Thresholds.

Back to the Index of How to successfully Virtualize MS Exchange.

Example Architectural Decision – Host Isolation Response for a Nutanix Environment

Problem Statement

What are the most suitable HA / host isolation response when using Nutanix?

Assumptions

1. vSphere 5.0 or greater
2. Two x 10GB Network interfaces are shared for Nutanix Storage Traffic and Virtual Machine Traffic

Motivation

1. Minimize the chance of a false positive isolation response
2. Ensure in the event the storage is unavailable that virtual machines are promptly shutdown to enable HA to recover the VMs in a timely manner (where other hosts are unaffected by isolation) and to prevent a “split brain” scenario
3. Ensure maximum availability

Architectural Decision

Turn off the default isolation address and configure the below specified isolation addresses, which check connectivity to multiple Nutanix Controller VMs (CVMs) on the IP Storage VLAN.

Configure the following Isolation addresses

das.isolationaddress1 : NDFS Cluster IP Address

Configure Host Isolation Response to: Power Off

For Nutanix Controller VMs override the cluster setting and configure Host Isolation Response to “Leave Powered On”

Justification

1. The ESXi Management traffic along with the Virtual machine traffic and inter-Nutanix node storage traffic is running over 2 x 10GB connections. Using the ESXi management gateway (default isolation address) to check for isolation is not suitable as the management network can be offline without impacting the IP storage or data networks. This situation could lead to false positives isolation responses.
2. The isolation addresses chosen tests IP storage connectivity over the converged 10Gb network and in the event this is unavailable, there is no point testing further connectivity as Virtual machines cannot function without their storage
3. In the event the Nutanix cluster IP address cannot be reached by ICMP the Node will not be able to properly function. As such, triggering isolation response and powering off the VMs based on this criteria is logical as the VMs will not be able to function under these conditions.
4. In the event the NDFS Cluster IP address does not respond to ICMP on the Management interfaces it is likely there has been an isolation event OR a catastrophic failure in the environment, either to the network, or the storage controllers themselves, in which case the safest option is to Power Off the VMs.
5. In the event the isolation response is triggered and the isolation does not impact all hosts within the cluster, the VMs can be restarted by HA onto a surviving host and resume functioning
6. Using the Nutanix Controller VM (CVM) IP address (192.168.5.2) for the Isolation address is not suitable as this address exists on each ESXi hosts and as such it is not a good candidate for isolation detection as the host will always be able to find this address even when the network is offline due to the CVM being local to the host
7. The Nutanix Controller VM accesses local storage and can continue to run locally even in an isolation event. When the isolated event is over, the CVM will then regain connectivity to the other CVMs in the Nutanix cluster.
8. Shutting down the CVM would only increase the recovery time once the isolation even is over and has no added benefits.

Implications

1. In the event the host cannot reach any of the isolation addresses, virtual machines will be powered off.
2. Initial cluster setup would require the vSphere administrator to override the Cluster settings for each Controller VM. Note: This is a one time task (Set & Forget)

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address
4. Leave the CVM on the default cluster setting and “Shutdown” on isolation

Related Articles

1. VMware Host Isolation Response in a Nutanix Environment #NoSAN

2. Storage DRS and Nutanix – To use, or not to use, that is the question?

3. VMware HA and IP Storage

Example Architectural Decision – ESXi Host Hardware Sizing (Example 1)

Problem Statement

What is the most suitable hardware specifications for this environments ESXi hosts?

Requirements

1. Support Virtual Machines of up to 16 vCPUs and 256GB RAM
2. Achieve up to 400% CPU overcommitment
3. Achieve up to 150% RAM overcommitment
4. Ensure cluster performance is both consistent & maximized
5. Support IP based storage (NFS & iSCSI)
6. The average VM size is 1vCPU / 4GB RAM
7. Cluster must support approx 1000 average size Virtual machines day 1
8. The solution should be scalable beyond 1000 VMs (Future-Proofing)
9. N+2 redundancy

Assumptions

1. vSphere 5.0 or later
2. vSphere Enterprise Plus licensing (to support Network I/O Control)
3. VMs range from Business Critical Application (BCAs) to non critical servers
4. Software licensing for applications being hosted in the environment are based on per vCPU OR per host where DRS “Must” rules can be used to isolate VMs to licensed ESXi hosts

Constraints

1. None

Motivation

1. Create a Scalable solution
2. Ensure high performance
3. Minimize HA overhead
4. Maximize flexibility

Architectural Decision

Use Two Socket Servers w/ >= 8 cores per socket with HT support (16 physical cores / 32 logical cores) , 256GB Ram , 2 x 10GB NICs

Justification

1. Two socket 8 core (or greater) CPUs with Hyper threading will provide flexibility for CPU scheduling of large numbers of diverse (vCPU sized) VMs to minimize CPU Ready (contention)

2. Using Two Socket servers of the proposed specification will support the required 1000 average sized VMs with 18 hosts with 11% reserved for HA to meet the required N+2 redundancy.

3. A cluster size of 18 hosts will deliver excellent cluster (DRS) efficiency / flexibility with minimal overhead for HA (Only 11%) thus ensuring cluster performance is both consistent & maximized.

4. The cluster can be expanded with up to 14 more hosts (to the 32 host cluster limit) in the event the average VM size is greater than anticipated or the customer experiences growth

5. Having 2 x 10GB connections should comfortably support the IP Storage / vMotion / FT and network data with minimal possibility of contention. In the event of contention Network I/O Control will be configured to minimize any impact (see Example VMware vNetworking Design w/ 2 x 10GB NICs)

6. RAM is one of the most common bottlenecks in a virtual environment, with 16 physical cores and 256GB RAM this equates to 16GB of RAM per physical core. For the average sized VM (1vCPU / 4GB RAM) this meets the CPU overcommitment target (up to 400%) with no RAM overcommitment to minimize the chance of RAM becoming the bottleneck

7. In the event of a host failure, the number of Virtual machines impacted will be up to 64 (based on the assumed average size VM) which is minimal when compared to a Four Socket ESXi host which would see 128 VMs impacted by a single host outage

8. If using Four socket ESXi hosts the cluster size would be approx 10 hosts and would require 20% of cluster resources would have to be reserved for HA to meet the N+2 redundancy requirement. This cluster size is less efficient from a DRS perspective and the HA overhead would equate to higher CapEx and as a result lower the ROI

9. The solution supports Virtual machines of up to 16 vCPUs and 256GB RAM although this size VM would be discouraged in favour of a scale out approach (where possible)

10. The cluster aligns with a virtualization friendly “Scale out” methodology

11. Using smaller hosts (either single socket, or less cores per socket) would not meet the requirement to support supports Virtual machines of up to 16 vCPUs and 256GB RAM , would likely require multiple clusters and require additional 10GB and 1GB cabling as compared to the Two Socket configuration

12. The two socket configuration allows the cluster to be scaled (expanded) at a very granular level (if required) to reduce CapEx expenditure and minimize waste/unused cluster capacity by adding larger hosts

13. Enabling features such as Distributed Power Management (DPM) are more attractive and lower risk for larger clusters and may result in lower environmental costs (ie: Power / Cooling)

Alternatives

1.  Use Four Socket Servers w/ >= 8 cores per socket , 512GB Ram , 4 x 10GB NICs
2.  Use Single Socket Servers w/ >= 8 cores , 128GB Ram , 2 x 10GB NICs
3. Use Two Socket Servers w/ >= 8 cores , 512GB Ram , 2 x 10GB NICs
4. Use Two Socket Servers w/ >= 8 cores , 384GB Ram , 2 x 10GB NICs
5. Have two clusters of 9 hosts with the recommended hardware specifications

Implications

1. Additional IP addresses for ESXi Management, vMotion, FT & Out of band management will be required as compared to a solution using larger hosts

2. Additional out of band management cabling will be required as compared to a solution using larger hosts

Related Articles

1. Example Architectural Decision – Network I/O Control for ESXi Host using IP Storage (4 x 10 GB NICs)

2. Example VMware vNetworking Design w/ 2 x 10GB NICs

3. Network I/O Control Shares/Limits for ESXi Host using IP Storage

4. VMware Clusters – Scale up for Scale out?

5. Jumbo Frames for IP Storage (Do not use Jumbo Frames)

6. Jumbo Frames for IP Storage (Use Jumbo Frames)

CloudXClogo