Example Architectural Decision – Virtual Switch Load Balancing Policy

Problem Statement

What is the most suitable network adapter load balancing policy to be configured on the vSwitch & dvSwitch/es where 10Gb adapters are being used for dvSwitches and 1Gb for vSwitch which is only used for ESXi management traffic?

Assumptions

1. vSphere 4.1 or later

Motivation

1. Ensure optimal performance and redundancy for the network
2. Simplify the solution without compromising performance for functionality

Architectural Decision

Use “Route based on physical NIC load” for Distributed Virtual switches and “Route based on originating port ID” for vSwitches.

Justification

1. Route based on physical NIC load achieves both availability and performance
2. Requires only a basic switch configuration (802.1q and the required VLANs tagged)
3. Where a single pNIC’s utilization exceeds 75% the “route based on physical NIC load” will dynamically balance workloads to ensure the best possible performance

Implications

1. If NFS IP storage is used with a single VMKernel it will not use both connections concurrently. If using multiple 10GB connections for NFS traffic is required then two or more VLANs should be created with one VMK per VLAN. If only one VMK is used, the only option if you want traffic to go down multiple uplinks would be to use “Route based on IP hash” and have Etherchannel configured on the physical switch.

Alternatives

1. Route based on the originating port ID

Pros: Chooses an uplink based on the virtual port where the traffic entered the virtual switch. The virtual machine outbound traffic is mapped to a specific physical NIC based on the ID of the virtual port to which this virtual machine is connected. This method is simple and fast, and does not require the VMkernel to examine the frame for necessary information.

Cons: When the load is distributed in the NIC team using the port-based method, no virtual machine single-NIC will ever get more bandwidth than can be provided by a single physical adapter.

2. Route based on IP hash.

Pros: Chooses an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash. In this method, a NIC for each outbound packet is chosen based on its source and destination IP address. This method has a better distribution of traffic across physical NICs.

When the load is distributed in the NIC team using the IP-based method, a virtual machine single-NIC might use the bandwidth of multiple physical adapters.

Cons: This method has higher CPU overhead and is not compatible with all switches (it requires IEEE 802.3ad link aggregation support).

3. Route based on source MAC hash

Pros: Chooses an uplink based on a hash of the source Ethernet. This method is compatible with all physical switches. The virtual machine outbound traffic is mapped to a specific physical NIC based on the virtual NIC’s MAC address.

Cons: This method has low overhead, and might not spread traffic evenly across the physical NICs.

When the load is distributed in the NIC team using the MAC-based method, no virtual machine single-NIC will ever get more bandwidth than can be provided by a single physical adapter.

4. Use explicit fail-over order

Pros: Always uses the highest order uplink from the list of Active adapters which passes failover detection criteria.

Cons: This setting is equivalent to a fail over policy and is not strictly a load balancing policy.

5. Route based on Physical NIC load

Pros: Most efficient load balancing mechanism because it is base on the actual physical NIC workload.

Cons: Not available on standard vSwitches

For further information on the topic checkout the below two articles by a couple of very knowledgable VCDX’s

Michael Webster – Etherchanneling or Load based teaming?
Frank Denneman – IP Hash verses LBT

Example Architectural Decision – Network Failover Detection Policy

Problem Statement

What is the most suitable network failover detection policy to be used on the vSwitch or dvSwitch NIC team/s in an environment which uses IP storage and has only 2 physical NICs per vSwitch or dvSwitch?

Assumptions

1. vSphere 5.0 or greater
2. Storage is presented to the ESXi hosts is NFS via Multi Switch Link Aggregation
3. A maximum of 2 physical NICs exist per dvSwitch
4. Physical Switches support “Link state tracking”

Motivation

1. Ensure a reliable network failover detection solution
2. Ensure Multi switch link aggregation can be used for IP storage

Architectural Decision

Enable “Link state tracking” on the physical switches and Use “Link Status”

Justification

1. To work properly, Beacon Probing requires at least 3 NICs for “triangulation”  otherwise a failed link cannot be determined.
2.“Link state tracking” can be enabled on the physical switch to report upstream network failures where an “edge” & “core” network topology is used, therefore preventing the link status from being OK when traffic cannot reach the destination due to an upstream failure
3. Beacon Probing and the “route based on IP hash” network load balancing option is not compatible which prevents a single VMKernel being able to use multiple interfaces for IP storage traffic

Implications

1. Link state tracking needs to be supported and enabled on the physical switches

Alternatives

1. Use “Beacon Probing”