Example Architectural Decision – vMotion configuration for Cisco UCS

Problem Statement

In an environment where a customer has pre-purchased Cisco UCS to replace end of life equipment, what is the most suitable way to configure vMotion to make the most efficient use of the infrastructure?

Assumptions

1. vSphere 5.1 or greater
2. Two x 10GB Network interfaces per UCS Blade (Cisco Palo Adapters)
3. Core & Edge Network topology is in place using Cisco Nexus
4. Cisco Fabric Interconnects are in use

Motivation

1. Optimize performance for vMotion without impacting other traffic
2. Reduce complexity where possible
3. Minimize network traffic across the Nexus core

Architectural Decision

Two (2) vNICs will be presented from the Cisco fabric interconnect to each blade (ESXi Host) which will appear to the ESXi host as vmNIC0 and vmNIC1.

vNIC0 will be connected to “Fabric A” and vNIC1 will be connected to “Fabric B”.

The vMotion VMKernel (VMK) for each ESXi host will be configured on a vSwitch (or Distributed vSwitch) with two (2) 10GB Network adapters with vmNIC0 as “Active” and vmNIC1 as “Standby”.

Fabric failover will not be enabled in the fabric interconnect.

vmNIC Failback at the vSphere layer will be disabled.

Justification

1. Under normal circumstances vMotion traffic will only traverse Fabric A and will not impact Fabric B or the core network thus it will minimize the north-south traffic.
2. In the event that Fabric A suffers a failure of any kind, the VMK for vMotion will failover to the standby vNIC (vmNIC1) which will result in the same optimal configuration as traffic will only traverse Fabric B and not the core network thus it will minimizing the north-south traffic.
3. The failover is being handled by vSphere at the software layer which removes the requirement for fabric failover to be enabled. This allows a vSphere administrator to have visibility of the status of the networking without going into the UCS Manager.
4. The operational complexity is reduced
5. The solution is self healing at the UCS layer and this is transparent to the vSphere environment
6. At the vSphere layer, failback is not required as using Fabric B for all VMK vMotion traffic is still optimal. In the event Fabric B fails, the environment can failback automatically to Fabric A.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)
2. vNIC0 and vNIC1 need to be manually configured to Fabric A and Fabric B at the Cisco Fabric Interconnect via UCS manager however this is also a one time task (Set & Forget)

Alternatives

1. Use Route Based on Physical NIC Load and have VMK for vMotion managed automatically by LBT
2. Use vPC and Route based on IP Hash for all vSwitch traffic (including vMotion VMK)
3. Use the Fabric Failover option at the UCS layer using a single vNIC presented to ESXi for all traffic
4. Use the Fabric Failover option at the UCS layer using two vNICs presented to ESXi for all traffic – Each vNIC would be pinned to a single Fabric (A or B)

Thank you to Prasenjit Sarkar (@stretchcloud) for Co-authoring this Example Architectural Decision.

Related Articles

1. Trade-off factor – Cisco UCS Fabric Failover OR OS based NIC teaming using dual fabric (Stretch-cloud – By Prasenjit Sarkar @stretchcloud)
2 . Why You Should Pin vMotion Port Groups In Converged Environments (By Chris Wahl @ChrisWahl)