Example Architectural Decision – Securing vMotion & Fault Tolerance Traffic in IaaS/Cloud Environments

Problem Statement

vMotion and Fault tolerance logging traffic is unencrypted and anyone with access to the same VLAN/network could potentially view and/or compromise this traffic. How can the environment be made as secure as possible to ensure security between in a multi-tenant/multi-department environment?

Assumptions

1.  vMotion and FT is required in the vSphere cluster/s (although FT is currently not supported for VMs hosted with vCloud Director)
2. IP Storage is being used and vNetworking has 2 x 10GB for non Virtual Machine traffic such as VMKernel’s & 2 x 10GB NICs are available for Virtual Machine traffic (Similar to Example vNetworking Design for IP Storage)
3. VI3 or later

Motivation

1. Ensure maximum security and performance for vMotion and FT traffic
2. Prevent vMotion and/or FT traffic impacting production virtual machines

Architectural Decision

vMotion & Fault tolerance logging traffic will each have a dedicated non routable VLAN which will be hosted on a dvSwitch which is physically separate from virtual machine distributed virtual switch.

Justification

1.  vMotion / FT traffic does not require external (or public) access
2. A VLAN per function ensures maximum security / performance with minimal design / implementation overhead
3. Prevent vMotion and/or FT traffic potentially impacting production virtual machine and vice versa by having the traffic share one or more broadcast domain/s
4. Ensure vMotion/FT traffic cannot leave there respective dedicated VLAN/s and potentially be sniffed

Implications

1. Two (2) VLANs with private IP ranges are required to be presented over 802.1q connections to the appropriate pNICs

Alternatives

1.  vMotion / FT share the ESXi management VLAN – This would increase risk of traffic being intercepted and “sniffed”
2. vMotion / FT share a dvSwitch with Virtual Machine networks while still running within dedicated non routable VLANs over 802.1q

Example Architectural Decision – Virtual Switch Load Balancing Policy

Problem Statement

What is the most suitable network adapter load balancing policy to be configured on the vSwitch & dvSwitch/es where 10Gb adapters are being used for dvSwitches and 1Gb for vSwitch which is only used for ESXi management traffic?

Assumptions

1. vSphere 4.1 or later

Motivation

1. Ensure optimal performance and redundancy for the network
2. Simplify the solution without compromising performance for functionality

Architectural Decision

Use “Route based on physical NIC load” for Distributed Virtual switches and “Route based on originating port ID” for vSwitches.

Justification

1. Route based on physical NIC load achieves both availability and performance
2. Requires only a basic switch configuration (802.1q and the required VLANs tagged)
3. Where a single pNIC’s utilization exceeds 75% the “route based on physical NIC load” will dynamically balance workloads to ensure the best possible performance

Implications

1. If NFS IP storage is used with a single VMKernel it will not use both connections concurrently. If using multiple 10GB connections for NFS traffic is required then two or more VLANs should be created with one VMK per VLAN. If only one VMK is used, the only option if you want traffic to go down multiple uplinks would be to use “Route based on IP hash” and have Etherchannel configured on the physical switch.

Alternatives

1. Route based on the originating port ID

Pros: Chooses an uplink based on the virtual port where the traffic entered the virtual switch. The virtual machine outbound traffic is mapped to a specific physical NIC based on the ID of the virtual port to which this virtual machine is connected. This method is simple and fast, and does not require the VMkernel to examine the frame for necessary information.

Cons: When the load is distributed in the NIC team using the port-based method, no virtual machine single-NIC will ever get more bandwidth than can be provided by a single physical adapter.

2. Route based on IP hash.

Pros: Chooses an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash. In this method, a NIC for each outbound packet is chosen based on its source and destination IP address. This method has a better distribution of traffic across physical NICs.

When the load is distributed in the NIC team using the IP-based method, a virtual machine single-NIC might use the bandwidth of multiple physical adapters.

Cons: This method has higher CPU overhead and is not compatible with all switches (it requires IEEE 802.3ad link aggregation support).

3. Route based on source MAC hash

Pros: Chooses an uplink based on a hash of the source Ethernet. This method is compatible with all physical switches. The virtual machine outbound traffic is mapped to a specific physical NIC based on the virtual NIC’s MAC address.

Cons: This method has low overhead, and might not spread traffic evenly across the physical NICs.

When the load is distributed in the NIC team using the MAC-based method, no virtual machine single-NIC will ever get more bandwidth than can be provided by a single physical adapter.

4. Use explicit fail-over order

Pros: Always uses the highest order uplink from the list of Active adapters which passes failover detection criteria.

Cons: This setting is equivalent to a fail over policy and is not strictly a load balancing policy.

5. Route based on Physical NIC load

Pros: Most efficient load balancing mechanism because it is base on the actual physical NIC workload.

Cons: Not available on standard vSwitches

For further information on the topic checkout the below two articles by a couple of very knowledgable VCDX’s

Michael Webster – Etherchanneling or Load based teaming?
Frank Denneman – IP Hash verses LBT