Example Architectural Decision – vMotion configuration for Cisco UCS

Problem Statement

In an environment where a customer has pre-purchased Cisco UCS to replace end of life equipment, what is the most suitable way to configure vMotion to make the most efficient use of the infrastructure?

Assumptions

1. vSphere 5.1 or greater
2. Two x 10GB Network interfaces per UCS Blade (Cisco Palo Adapters)
3. Core & Edge Network topology is in place using Cisco Nexus
4. Cisco Fabric Interconnects are in use

Motivation

1. Optimize performance for vMotion without impacting other traffic
2. Reduce complexity where possible
3. Minimize network traffic across the Nexus core

Architectural Decision

Two (2) vNICs will be presented from the Cisco fabric interconnect to each blade (ESXi Host) which will appear to the ESXi host as vmNIC0 and vmNIC1.

vNIC0 will be connected to “Fabric A” and vNIC1 will be connected to “Fabric B”.

The vMotion VMKernel (VMK) for each ESXi host will be configured on a vSwitch (or Distributed vSwitch) with two (2) 10GB Network adapters with vmNIC0 as “Active” and vmNIC1 as “Standby”.

Fabric failover will not be enabled in the fabric interconnect.

vmNIC Failback at the vSphere layer will be disabled.

Justification

1. Under normal circumstances vMotion traffic will only traverse Fabric A and will not impact Fabric B or the core network thus it will minimize the north-south traffic.
2. In the event that Fabric A suffers a failure of any kind, the VMK for vMotion will failover to the standby vNIC (vmNIC1) which will result in the same optimal configuration as traffic will only traverse Fabric B and not the core network thus it will minimizing the north-south traffic.
3. The failover is being handled by vSphere at the software layer which removes the requirement for fabric failover to be enabled. This allows a vSphere administrator to have visibility of the status of the networking without going into the UCS Manager.
4. The operational complexity is reduced
5. The solution is self healing at the UCS layer and this is transparent to the vSphere environment
6. At the vSphere layer, failback is not required as using Fabric B for all VMK vMotion traffic is still optimal. In the event Fabric B fails, the environment can failback automatically to Fabric A.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)
2. vNIC0 and vNIC1 need to be manually configured to Fabric A and Fabric B at the Cisco Fabric Interconnect via UCS manager however this is also a one time task (Set & Forget)

Alternatives

1. Use Route Based on Physical NIC Load and have VMK for vMotion managed automatically by LBT
2. Use vPC and Route based on IP Hash for all vSwitch traffic (including vMotion VMK)
3. Use the Fabric Failover option at the UCS layer using a single vNIC presented to ESXi for all traffic
4. Use the Fabric Failover option at the UCS layer using two vNICs presented to ESXi for all traffic – Each vNIC would be pinned to a single Fabric (A or B)

Thank you to Prasenjit Sarkar (@stretchcloud) for Co-authoring this Example Architectural Decision.

Related Articles

1. Trade-off factor – Cisco UCS Fabric Failover OR OS based NIC teaming using dual fabric (Stretch-cloud – By Prasenjit Sarkar @stretchcloud)
2 . Why You Should Pin vMotion Port Groups In Converged Environments (By Chris Wahl @ChrisWahl)

10 thoughts on “Example Architectural Decision – vMotion configuration for Cisco UCS

    • I am not sure how far it is true at this time. However, with IP Hash or even with any LB Algo (MAC Pinning) on N1KV it worked perfectly when I used at a customer place.

      I believe this KB article need revision using current UCSM FW and Latest VMware build.

  1. Few couple of more points need to be considered during the vMotion and normal production traffic.

    1. The Fabric Interconnect and IO Module plays a big role in the design consideration. If you are using older generation like FI 61xx & IOM 21xx series which doesn’t support host pinning, it can impact vMotion traffic in case of failure. No need to worry in case of next generation FI 62xx & IOM 22xx, if host pinning is configured with port channel.
    2. If you are using the Palo card in the environment, carve out the separate virtual physical nics in the service profile for vMotion traffic, this should not impact on other production and management traffic.
    3. With separate virtual physical interface, network policies can be configured with virtual switch/distributed switch to get better performance.
    4. The jumbo frame for virtual nics can be enabled in service profile through UCS Manager for better network performance.

    • I think this is a great option, this decision is around keeping the environment simple and ensuring a vSphere admin with minimal or no UCS experience can easier manage the environment. Stay tuned for further example decisions which you can probably guess what the example decision will be. 😉

  2. Good article. I agree that 2 vNIC’s (1 on each fabric) is probably the simplest approach.

    Personally, I like to create 2 vNIC’s for each type of VMware traffic and pin them to each fabric. (mgmt, vMotion, FT, IP Storage, VM’s)

    This allows you to use QoS policies in UCS to prioritize the traffic. Also you can do multi-NIC vMotion where there is a vNIC on each fabric.

    • Drew I’ve used this approach at some sites as well. While we are talking about impact in the scenario, we must consider our MAC Address pools.

      Impact: While (2) vNICs per traffic type gives familiar isolation, QOS capabilities, and configuration granularity, you have to plan for your MAC addresses in your MAC pools to be used up rapidly at scale. Plan accordingly.

  3. Beware of any attempt to enable fabric failover on the interconnects when using VMware: this is not a recommended configuration, and will cause issues. http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9902/white_paper_c11-558242.html
    The reason is that the FIC will respond to the chassis hosts before the northbound links have actually failed over; therefore VMware will attempt to send traffic down a “dead” path, and will initiate the appropriate failover action (in this case, an unwanted action).

  4. I just wanted to say thank you to everyone for the feedback and comments. I am pleased to say a lot of the feedback has added to the strength of this example decision and has clarified that numerous known issues with fabric interconnections / fabric fail-over have been avoided.

    I wish every example decision had some much great input.

    I will update the Justification section of this decision with some of the feedback to ensure the example is as well justified as possible.

  5. Just some comments on some of the statements here:

    * first of all, Palo is our old cards, 1240 / 1240 + extender / 1240 + 1280 would be the preferred choice, this will allow multi-NIC vMotion 1 Fabric with correct engineering, these adapters depending on the config have a port-channel between the mezzanine and the IOM (up to 4 links) and hence multiple vMotion flows will use the distribution algorithm
    * try to avoid pinning as discussed here and use the automatic pinning, only use pinning when you have a very specific use case or traffic to majorly protect
    * try to use a low QOS policy on vMotion, you want your Production and NFS traffic to have priority, when using MTU 9000, make sure to have the north-bound stack completely tuned for this
    * it’s good to use A and B embedded in the MAC pools to clarify what adapter to use where in VMware
    * LBT cannot be used for vMotion as it will have unpredictable results (1 ESXi on A, another on B)
    * VPC cannot be used from ESXi towards the FI as both FI’s do _not_ form a VPC switch-pair, the interlink is only for heartbeating and cluster config
    * fabric fail-over should not be used as per best practises –> it’s better to give fabric failure visibility to ESXi
    * IP hash cannot be used –> it will work temporary but start flipping on high throughput –> MAC flapping north-bound
    * fabric failover will not have traffic dis-appear, fabric fail-over will be triggered by north-bound uplink failure (for the mapped VLAN) or when specific vital links go down, think about NIF ports on the IOM, the LIF will always stay online and the underlying VIF’s (linked to UIF 0 and 1) will re-act appropriately, I cannot imagine ESXi would do this faster when using 2 vNIC’s as we will still propagate a similar fail-over on link-down…, so there is no difference, so, do not use failover because we want _visibility_ of the failure in ESXi, not because technically it will cause a problem
    * do use failback –> when not using failback and for example 1 IOM rebooted in a multi-chassis cluster, you will have multiple ESXi hosts on B and the majority still on A, this is not good, so make sure to activate fail-back to make sure we do not go north-bound outside the cluster