Competition Example Architectural Decision Entry 3 – Scalable network architecture for VXLAN

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

You are moving towards scalable network architecture for your large scale Virtualized Datacenter and want to configure VXLAN in your environment. You want to make sure that Teaming Policy for VXLAN transport is configured optimally for better performance and reduce operational complexity around it.

Assumptions

1. vSphere 5.1 or greater
2. vCloud Networking & Security 5.1 or greater
3. Core & Edge Network topology is in place

Constraints

1. Should have switches that support Static Etherchannel or LACP (Dynamic Etherchannel)
2. Have to use only IP Hash Load balancing method if using vSphere 5.1
3. Cannot use Beacon Probing as Failure Detection mechanism

Motivation

1. Optimize performance for VXLAN

2. Reduce complexity where possible

3. Choosing best teaming policy for VXLAN Traffic for future scalability

Architectural Decision

LACP – Passive Mode will be chosen as the teaming policy for the VXLAN Transport.

At least two or more physical links will be aggregated using LACP in the upstream Edge switches.

Two Edge switches will be connected to each other.

ESXi host will be cross connected to these two Physical upstream switches for forming a LACP group.

LACP will be configured in Passive mode in Edge switches so that the participating ports responds to the LACP packets that it receives but does not initiate LACP negotiation.

Alternatives

1. Use LACP – Active Mode and make sure you are using IP Hash algorithm for the load balancing in your vDS if using vSphere 5.1.

2. Use LACP – Active Mode and use any of the 22 available load balancing algorithm in your vDS if using vSphere 5.5.

3. Use LACP – Active Mode and use Cisco Nexus 1000v virtual switch and use any of the 19 available load balancing algorithm.

4. Use Static Etherchannel and make sure you are using IP Hash *Only* algorithm in your vDS.

5. If using Failover then have at least one 10G NIC to handle the VXLAN traffic.

Justification

1. Fail Over teaming policy for VXLAN vmkernel NIC uses only one uplink for all VXLAN traffic. Although redundancy is available via the standby link, all available bandwidth is not used.
2. Static Etherchannel requires IP Hash Load Balancing be configured on the switching infrastructure, which uses a hashing algorithm based on source and destination IP address to determine which host uplink egress traffic should be routed through.

3. Static Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations, such as, you can’t use beacon probing, you can’t configure standby or unused link etc.

4. Static Etherchannel does not do pre check both the terminating ends before forming the Channel Group. So, if there are issues within two ends then traffic will never pass and vSphere will not see any acknowledgement back in it’s Distributed Switches

5. Active LACP mode places a port into an active negotiating state, in which the port initiates negotiations with other ports by sending LACP packets. If using vSphere prior to 5.5 where only IP Hash algorithm is supported then LACP will not pass any traffic if vSphere uses any other algorithm other than IP Hash (such as Virtual Port ID)

6. The operational complexity is reduced

7. If using vSphere 5.5 then can use 22 different algorithm for load balancing and also Beacon Probing can be used for Failure Detection.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)

2. Only IP Hash algorithm is supported if using vSphere 5.1

3. Only one LAG can be supported for the entire vSphere Distributed Switches if using vSphere 5.1

4. IP Hash calculation if not done manually by taking VM’s vNIC and Physical NIC then there is no guarantee that it will balance the traffic across physical links

Back to Competition Main Page or Competition Submissions

Example Architectural Decision – vMotion configuration for Cisco UCS

Problem Statement

In an environment where a customer has pre-purchased Cisco UCS to replace end of life equipment, what is the most suitable way to configure vMotion to make the most efficient use of the infrastructure?

Assumptions

1. vSphere 5.1 or greater
2. Two x 10GB Network interfaces per UCS Blade (Cisco Palo Adapters)
3. Core & Edge Network topology is in place using Cisco Nexus
4. Cisco Fabric Interconnects are in use

Motivation

1. Optimize performance for vMotion without impacting other traffic
2. Reduce complexity where possible
3. Minimize network traffic across the Nexus core

Architectural Decision

Two (2) vNICs will be presented from the Cisco fabric interconnect to each blade (ESXi Host) which will appear to the ESXi host as vmNIC0 and vmNIC1.

vNIC0 will be connected to “Fabric A” and vNIC1 will be connected to “Fabric B”.

The vMotion VMKernel (VMK) for each ESXi host will be configured on a vSwitch (or Distributed vSwitch) with two (2) 10GB Network adapters with vmNIC0 as “Active” and vmNIC1 as “Standby”.

Fabric failover will not be enabled in the fabric interconnect.

vmNIC Failback at the vSphere layer will be disabled.

Justification

1. Under normal circumstances vMotion traffic will only traverse Fabric A and will not impact Fabric B or the core network thus it will minimize the north-south traffic.
2. In the event that Fabric A suffers a failure of any kind, the VMK for vMotion will failover to the standby vNIC (vmNIC1) which will result in the same optimal configuration as traffic will only traverse Fabric B and not the core network thus it will minimizing the north-south traffic.
3. The failover is being handled by vSphere at the software layer which removes the requirement for fabric failover to be enabled. This allows a vSphere administrator to have visibility of the status of the networking without going into the UCS Manager.
4. The operational complexity is reduced
5. The solution is self healing at the UCS layer and this is transparent to the vSphere environment
6. At the vSphere layer, failback is not required as using Fabric B for all VMK vMotion traffic is still optimal. In the event Fabric B fails, the environment can failback automatically to Fabric A.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)
2. vNIC0 and vNIC1 need to be manually configured to Fabric A and Fabric B at the Cisco Fabric Interconnect via UCS manager however this is also a one time task (Set & Forget)

Alternatives

1. Use Route Based on Physical NIC Load and have VMK for vMotion managed automatically by LBT
2. Use vPC and Route based on IP Hash for all vSwitch traffic (including vMotion VMK)
3. Use the Fabric Failover option at the UCS layer using a single vNIC presented to ESXi for all traffic
4. Use the Fabric Failover option at the UCS layer using two vNICs presented to ESXi for all traffic – Each vNIC would be pinned to a single Fabric (A or B)

Thank you to Prasenjit Sarkar (@stretchcloud) for Co-authoring this Example Architectural Decision.

Related Articles

1. Trade-off factor – Cisco UCS Fabric Failover OR OS based NIC teaming using dual fabric (Stretch-cloud – By Prasenjit Sarkar @stretchcloud)
2 . Why You Should Pin vMotion Port Groups In Converged Environments (By Chris Wahl @ChrisWahl)

Example Architectural Decision – Virtual Switch Load Balancing Policy

Problem Statement

What is the most suitable network adapter load balancing policy to be configured on the vSwitch & dvSwitch/es where 10Gb adapters are being used for dvSwitches and 1Gb for vSwitch which is only used for ESXi management traffic?

Assumptions

1. vSphere 4.1 or later

Motivation

1. Ensure optimal performance and redundancy for the network
2. Simplify the solution without compromising performance for functionality

Architectural Decision

Use “Route based on physical NIC load” for Distributed Virtual switches and “Route based on originating port ID” for vSwitches.

Justification

1. Route based on physical NIC load achieves both availability and performance
2. Requires only a basic switch configuration (802.1q and the required VLANs tagged)
3. Where a single pNIC’s utilization exceeds 75% the “route based on physical NIC load” will dynamically balance workloads to ensure the best possible performance

Implications

1. If NFS IP storage is used with a single VMKernel it will not use both connections concurrently. If using multiple 10GB connections for NFS traffic is required then two or more VLANs should be created with one VMK per VLAN. If only one VMK is used, the only option if you want traffic to go down multiple uplinks would be to use “Route based on IP hash” and have Etherchannel configured on the physical switch.

Alternatives

1. Route based on the originating port ID

Pros: Chooses an uplink based on the virtual port where the traffic entered the virtual switch. The virtual machine outbound traffic is mapped to a specific physical NIC based on the ID of the virtual port to which this virtual machine is connected. This method is simple and fast, and does not require the VMkernel to examine the frame for necessary information.

Cons: When the load is distributed in the NIC team using the port-based method, no virtual machine single-NIC will ever get more bandwidth than can be provided by a single physical adapter.

2. Route based on IP hash.

Pros: Chooses an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash. In this method, a NIC for each outbound packet is chosen based on its source and destination IP address. This method has a better distribution of traffic across physical NICs.

When the load is distributed in the NIC team using the IP-based method, a virtual machine single-NIC might use the bandwidth of multiple physical adapters.

Cons: This method has higher CPU overhead and is not compatible with all switches (it requires IEEE 802.3ad link aggregation support).

3. Route based on source MAC hash

Pros: Chooses an uplink based on a hash of the source Ethernet. This method is compatible with all physical switches. The virtual machine outbound traffic is mapped to a specific physical NIC based on the virtual NIC’s MAC address.

Cons: This method has low overhead, and might not spread traffic evenly across the physical NICs.

When the load is distributed in the NIC team using the MAC-based method, no virtual machine single-NIC will ever get more bandwidth than can be provided by a single physical adapter.

4. Use explicit fail-over order

Pros: Always uses the highest order uplink from the list of Active adapters which passes failover detection criteria.

Cons: This setting is equivalent to a fail over policy and is not strictly a load balancing policy.

5. Route based on Physical NIC load

Pros: Most efficient load balancing mechanism because it is base on the actual physical NIC workload.

Cons: Not available on standard vSwitches

For further information on the topic checkout the below two articles by a couple of very knowledgable VCDX’s

Michael Webster – Etherchanneling or Load based teaming?
Frank Denneman – IP Hash verses LBT