Example Architectural Decision – Network I/O Control for ESXi Host using IP Storage

Problem Statement

With 10GB connections, the proposed ESXi hosts will have less physical connections, but more bandwidth per connection than a host with 1GB NICs. In this case, 4 x 10GB NICs needs to cater for all traffic (including IP storage) for the ESXi hosts.

The design needs to ensure all types of traffic have sufficient burst and sustained bandwidth without negatively impacting other types of traffic.

How can this be achieved?

Assumptions

1. No additional Network cards (1gb or 10gb) can be supports
2. vSphere 5.0 or later
3. 2 x 48 port 10GB and 2 x 48 port 1GB switches exist in the environment
4. ESXi host are 4 way servers with 512GB RAM which are expected to run large numbers of VMs with varying workloads
5. Multi-NIC vMotion is not required due to using 10Gb NICs

Motivation

1.When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution.
2. Ensure IP Storage (NFS) performance is optimal
3.Ensure vMotion activities (including a host entering maintenance mode) can be performed in a timely manner without impact to IP Storage or Fault Tolerance
4. Fault tolerance is a latency-sensitive traffic flow, so it is recommended to always set the corresponding resource-pool shares to a reasonably high relative value in the case of custom shares.

Architectural Decision

Separate VMware infrastructure functions (VMKernel) from virtual machine network traffic by creating two (2) dvSwitches (each with 2 x 10GB connections), dvSwitch-Admin and dvSwitch-Data

Enable Network I/O control, and configure NFS and/or iSCSI traffic with a share value of 100 and vMotion & FT which will have share value of 25.

Configure the two (2) VMKernel’s for IP Storage on dvSwitch-Admin and set to be Active on one 10GB interface and Standby on the second.

Configure the VMKernel for vMotion on dvSwitch-Admin as Active on one interface and standby on the second and vice-versa for FT.

Configure all dvPortGroups for Virtual Machine data on dvSwitch-Data.

Justification

1. The share values were chosen to ensure storage traffic is not impacted as this can cause flow on effects for the environments performance. vMotion & FT are considered important, but during periods of contention, should not monopolize or impact IP storage traffic.
2. IP Storage is more critical to ongoing cluster and VM performance than vMotion or FT
3. IP storage requires higher priority than vMotion which is more of a burst activity and is not as critical to VM performance
4. Which a share value of 25,  Fault Tolerance still has ample bandwidth to support the maximum supported FT machines per host of 4 even during periods of contention
5. Which a share value of 25, vMotion still has ample bandwidth to support multiple concurrent vMotion’s during contention however performance should not be impacted on a day to day basis. With up to 8 vMotion’s supported as it is configured on a 10GB interface. (Limit of 4 on a 1GB interface)
6. The environment required 1GB switches to accommodate for various devices, such as Out of Band management & IP KVM devices, as such having ESXi management on 2 x 1GB ports was not adding significant cost to the solution

Implications

1. In the unlikely event of significant and ongoing contention, performance for vMotion and FT may affect the ability to perform the evacuation of a host in a timely manner. This may impact the ability to performance scheduled maintenance.

Alternatives

1. Use all 4 x 10Gb NICs on a single dvSwitch, and use “Active” and “Standby” to ensure traffic remained on a specified NIC unless there was a failure. Leverage Network I/O control similar to the above example to ensure minimal impact of contention

See Example VMware vNetworking Design for IP Storage for an overview of the vNetworking design described in this example.