Competition Example Architectural Decision Entry 6 – Improve Performance for BCAs on Cisco UCS

Name: Anuj Modi
Title: Unified Computing & Virtualization Consultant @ Cisco
Twitter: @vConsultant
Blog: http://anujmodi.wordpress.com

Problem Statement

Most of the companies are migrating application workload to virtual infrastructure to take the advantages of virtual computing. With benefits of virtualizing the environment, the application still are facing I/O performance issue and end-users are not happy with response time for moving applications to physical servers. What are the ways to improve the performance for business critical applications in such environments?

Assumptions

1.      Cisco Unified Computing System
2.      VMware vSphere 5.x
3.      Cisco Virtual Interface Card M81/1240/1280
4.      Critical applications/databases

Constraints

1.      No impact on the applications production data
2.      Benefits of Virtual infrastructure features
3.      High Availability of Applications
Motivation

1.      Better performance and response time for business critical applications
2.      Reduce CPU cycles on ESXi Servers and offload the I/O load to hardware level.
3.      Improved I/O throughput for applications

Architectural Decision

Use the Cisco VN-Link in hardware with VMDirectPath to get better I/O performance for network traffic. All the traffic will be redirected through physical interface card and bypassing the vmkernel. This will provide better I/O performance as this will reduce the OS kernel layer to pass the network traffic to physical interface card.

VN-Link in Hardware with VMDirectPath

Alternatives

Cisco provides three different options for Virtual machine traffic on hypervisor. These options are listed below

1.      VN-Link is Software
2.      VN-Link in Hardware
3.      VN-Link in Hardware with VMDirectPath

The other two options can be used to improve the performance for virtual machine traffic.
In option1, Nexus 1000V switch can be used for network traffic forwarding. Virtual machine nic will directly connects to Nexus 1000V switch and Nexus 1000V switch uplinks will connect to Cisco virtual interface card. With this option, you can get benefits of Nexus 1000V advanced network features like ERSPA and Netflow and standardization of network switch management.

In option 2, UCSM will be used as Distributed switch and will integrated with vCenter server to control the virtual machine traffic. Each virtual machine nic will maps to a different virtual interface (VIF) on the UCS Fabric Interconnect and directly pass the traffic through it. This will give better I/O performance to network traffic and directs the I/O load to physical interface card.

Justification

Option 3 is selected with this solution to provide higher I/O performance for network traffic. Hypervisor bypass is the ability for a virtual machine to access PCIe adaptor hardware directly in order to reduce the overhead on host CPU.  Cisco UCS provide this feature with VN-Link in Hardware with VMDirectPath option and help to reduce the overhead for host CPU/memory for I/O virtualization. The virtual machine directly talks to Cisco virtual interface card and bypass the vmkernel to provide higher performance to network traffic. The current virtual interface card can scale up to 256 virtual interface cards, which means the most of the virtual machines can get PCIe adaptor on a single host.

Implications

1.The disadvantage is currently limited vMotion support on VMware hypervisor.

Back to Competition Main Page or Competition Submissions

Example Architectural Decision – vMotion configuration for Cisco UCS

Problem Statement

In an environment where a customer has pre-purchased Cisco UCS to replace end of life equipment, what is the most suitable way to configure vMotion to make the most efficient use of the infrastructure?

Assumptions

1. vSphere 5.1 or greater
2. Two x 10GB Network interfaces per UCS Blade (Cisco Palo Adapters)
3. Core & Edge Network topology is in place using Cisco Nexus
4. Cisco Fabric Interconnects are in use

Motivation

1. Optimize performance for vMotion without impacting other traffic
2. Reduce complexity where possible
3. Minimize network traffic across the Nexus core

Architectural Decision

Two (2) vNICs will be presented from the Cisco fabric interconnect to each blade (ESXi Host) which will appear to the ESXi host as vmNIC0 and vmNIC1.

vNIC0 will be connected to “Fabric A” and vNIC1 will be connected to “Fabric B”.

The vMotion VMKernel (VMK) for each ESXi host will be configured on a vSwitch (or Distributed vSwitch) with two (2) 10GB Network adapters with vmNIC0 as “Active” and vmNIC1 as “Standby”.

Fabric failover will not be enabled in the fabric interconnect.

vmNIC Failback at the vSphere layer will be disabled.

Justification

1. Under normal circumstances vMotion traffic will only traverse Fabric A and will not impact Fabric B or the core network thus it will minimize the north-south traffic.
2. In the event that Fabric A suffers a failure of any kind, the VMK for vMotion will failover to the standby vNIC (vmNIC1) which will result in the same optimal configuration as traffic will only traverse Fabric B and not the core network thus it will minimizing the north-south traffic.
3. The failover is being handled by vSphere at the software layer which removes the requirement for fabric failover to be enabled. This allows a vSphere administrator to have visibility of the status of the networking without going into the UCS Manager.
4. The operational complexity is reduced
5. The solution is self healing at the UCS layer and this is transparent to the vSphere environment
6. At the vSphere layer, failback is not required as using Fabric B for all VMK vMotion traffic is still optimal. In the event Fabric B fails, the environment can failback automatically to Fabric A.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)
2. vNIC0 and vNIC1 need to be manually configured to Fabric A and Fabric B at the Cisco Fabric Interconnect via UCS manager however this is also a one time task (Set & Forget)

Alternatives

1. Use Route Based on Physical NIC Load and have VMK for vMotion managed automatically by LBT
2. Use vPC and Route based on IP Hash for all vSwitch traffic (including vMotion VMK)
3. Use the Fabric Failover option at the UCS layer using a single vNIC presented to ESXi for all traffic
4. Use the Fabric Failover option at the UCS layer using two vNICs presented to ESXi for all traffic – Each vNIC would be pinned to a single Fabric (A or B)

Thank you to Prasenjit Sarkar (@stretchcloud) for Co-authoring this Example Architectural Decision.

Related Articles

1. Trade-off factor – Cisco UCS Fabric Failover OR OS based NIC teaming using dual fabric (Stretch-cloud – By Prasenjit Sarkar @stretchcloud)
2 . Why You Should Pin vMotion Port Groups In Converged Environments (By Chris Wahl @ChrisWahl)