Fight the FUD! – Not all VAAI-NAS storage solutions are created equal.

At a meeting recently, a potential customer who is comparing NAS/Hyper-converged solutions for an upcoming project advised me they only wanted to consider platforms with VAAI-NAS support.

As the customer was considering a wide range of workloads, including VDI and server the requirement for VAAI-NAS makes sense.

Then the customer advised us they are comparing 4 different Hyper-Converged platforms and a range of traditional NAS solutions. The customer eliminated two platforms due to no VAAI support at all (!) but then said Nutanix and one other vendor both had VAAI-NAS support so this was not a differentiator.

Having personally completed the VAAI-NAS certification for Nutanix, I was curious what other vendor had full VAAI-NAS support, as it was (and remains) my understanding Nutanix is the only Hyper-converged vendor who has passed the full suite of certification tests.

The customer advised who the other vendor was, so we checked the HCL together and sure enough, that vendor only supported a subset of VAAI-NAS capabilities even though the sales reps and marketing material all claim full VAAI-NAS support.

The customer was more than a little surprised that VAAI-NAS certification does not require all capabilities to be supported.

Any storage vendor wanting its customers to get support for VAAI-NAS with VMware is required to complete a certification process which includes a comprehensive set of tests. There are a total of 66 tests for VAAI-NAS vSphere 5.5 certification which are required to be completed to gain the full VAAI-NAS certification.

However as this customer learned, it is possible and indeed common for storage vendors not to pass all tests and gain certification for only a subset of VAAI-NAS capabilities.

The below shows the Nutanix listing on the VMware HCL for VAAI NAS highlighting the 4 VAAI-NAS features which can be certified and supported being:

1. Extended Stats
2. File Cloning
3. Native SS for LC
4. Space Reserve
NutanixVAAI-NAS

This is an example of a fully certified solution supporting all VAAI-NAS features.

Here is an example of a VAAI-NAS certified solution which has only certified 1 of the 4 capabilities. (This is a Hyper-converged platform although they were not being considered by the customer)

vaai-naslol

Here is another example of a VAAI-NAS certified solution which has only certified 2 of the 4 capabilities. (This is a Hyper-converged platform).

vaainasc

So customers using the above storage solution cannot for example create Thick Provisioned Virtual Disks, therefore preventing the use of Fault Tolerance (FT) or virtualization of business critical applications such as Oracle RAC.

In this next example, the vendor has certified 3 out of 4 capabilities and is not certified for Native SS for LC. (This is a traditional centralized NAS platform).

VNXvaainas

So this solution does not support using storage level snapshots for the creation of Linked Clones, so things like Horizon View (VDI) or vCloud Director FAST Provisioning deployments will not get the cloning performance or optimal capacity saving benefits of fully certified/supported VAAI-NAS storage solutions.

The point of this article is simply to raise awareness that not all solutions advertising VAAI-NAS support are created equal and ALWAYS CHECK THE HCL! Don’t believe the friendly sales rep as they may be misleading you or flat out lying about VAAI-NAS capabilities / support.

When comparing traditional NAS or Hyper-converged solutions, ensure you check the VMware HCL and compare the various VAAI-NAS capabilities supported as some vendors have certified only a subset of the VAAI-NAS capabilities.

To properly compare solutions, use the VMware HCL Storage/SAN section and as per the below image select:

Product Release Version: All
Partner Name: All or the specific vendor you wish to compare
Features Category: VAAI-NAS
Storage Virtual Appliance Only: No for SAN/NAS , Yes for Hyperconverged or VSA solutions

generichcl

Then click on the Model you wish to compare e.g.: NX-3000 Series

hclnutanix1

Then you should see something similar to the below:

ClickViewButtonHCL

Click the “View” link to show the VAAI-NAS capabilities and you will see the below which highlights the VAAI-NAS features supported.

Note: if the “View” link does not appear, the product is NOT supported for VAAI-NAS.

nutanixvaainasresults

If the Features do not list Extended StatsFile CloningNative SS for LCSpace Reserve the solution does not support the full VAAI-NAS capabilities.

Related Articles:

1. My checkbox is bigger than your checkbox@HansDeLeenheer

2. Unchain My VM, And Set Me Free!(Snapshots)

3. VAAI-NAS – Some snapshot chains are deeper than others

Example VMware vNetworking Design w/ 2 x 10GB NICs (IP based or FC/FCoE Storage)

I have had a large response to my earlier example vNetworking design with 4 x 10GB NICs, and I have been asked, “What if I only have 2 x 10GB NICs”, so the below is an example of an environment which was limited to just two (2) x 10GB NICs and used IP Storage.

If your environment uses FC/FCoE storage, the below still applies and the IP storage components can simply be ignored.

Requirements

1. Provide high performance and redundant access to the IP Storage (if required)
2. Ensure ESXi hosts could be evacuated in a timely manner for maintenance
3. Prevent significant impact to storage performance by vMotion / Fault Tolerance and Virtual machines traffic
4. Ensure high availability for all network traffic

Constraints

1. Two (2) x 10GB NICs

Solution

Use one dvSwitch to support all VMKernel and virtual machine network traffic and use “Route based of Physical NIC Load” (commonly refereed to as “Load Based teaming”).

Use Network I/O control to ensure in the event of contention that all traffic get appropriate network resources.

Configure the following Network Share Values

IP Storage traffic : 100
ESXi Management: 25
vMotion: 25
Fault Tolerance : 25
Virtual Machine traffic : 50

Configure two (2) VMKernel’s for IP Storage and set each on a different VLAN and Subnet.

Configure VMKernels for vMotion (or Multi-NIC vMotion), ESXi Management and Fault Tolerance and set to active on both 10GB interfaces (default configuration).

All dvPortGroups for Virtual machine traffic (in this example VLANs 6 through 8) will be active on both interfaces.

The above utilizes LBT to load balance network traffic which will dynamically move workload between the two 10GB NICs once one or both network adapters reach >=75% utilization.

vNetworking BLOG 2x10gb

Conclusion

Even when your ESXi hosts only have two x 10Gb interfaces, VMware provides enterprise grade features to ensure all traffic (including IP Storage) can get access to sufficient bandwidth to continue serving production workloads until the contention subsides.

This design ensures that in the event a host needs to be evacuated, even during production hours, that it will complete in the fastest possible time with minimal or no impact to production. The faster your vMotion activity completes, the sooner DRS can get your cluster running as smoothly as possible, and in the event you are patching, the sooner your maintenance can be completed and the hosts being patched are returned to the cluster to serve your VMs.

Related Posts

1. Example Architectural Decision – Network I/O Control for ESXi Host using IP Storage (4 x 10 GB NICs)
2. Network I/O Control Shares/Limits for ESXi Host using IP Storage

Example Architectural Decision – Network I/O Control Shares/Limits for ESXi Host using IP Storage

Problem Statement

With 10GB connections becoming the norm, ESXi hosts will generally have less physical connections than in the past where 1Gb was generally used, but more bandwidth per connection (and in total) than a host with 1GB NICs.

In this case, the hosts have only to 2 x 10GB NICs and the design needs to cater for all traffic (including IP storage) for the ESXi hosts.

The design needs to ensure all types of traffic have sufficient burst and sustained bandwidth for all traffic types without significantly negatively impacting other types of traffic.

How can this be achieved?

Assumptions

1. No additional Network cards (1gb or 10gb) can be supported
2. vSphere 5.1
3. Multi-NIC vMotion is desired

Constraints

1. Two (2) x 10GB NICs

Motivation

1. Ensure IP Storage (NFS) performance is optimal
2.Ensure vMotion activities (including a host entering maintenance mode) can be performed in a timely manner without impact to IP Storage or Fault Tolerance
3. Fault tolerance is a latency-sensitive traffic flow, so it is recommended to always set the corresponding resource-pool shares to a reasonably high relative value in the case of custom shares.
4. Proactively address potential contention due to limited physical network interfaces

Architectural Decision

Use one dvSwitch to support all VMKernel and virtual machine network traffic.

Enable Network I/O control, and configure NFS and/or iSCSI traffic with a share value of 100 and ESXi Management , vMotion & FT which will have share value of 25. Virtual Machine traffic will have a share value of 50.

Configure the two (2) VMKernel’s for IP Storage on dvSwitch and set to be Active on one 10GB interface and Standby on the second.

Configure two VMKernel interfaces for vMotion on the dvSwitch and set the first as Active on one interface and standby on the second.

A single VMKernel will be configured for Fault tolerance and will be configured as Active on one interface and standby on the second.

For ESXi Management, the VMKernel will be configured as Active on the interface where FT is standby and standby on the second interface.

All dvPortGroups for Virtual machine traffic will be active on both interfaces.

Justification

1. The share values were chosen to ensure IP storage traffic is not impacted as this can cause flow on effects for the environments performance. vMotion & FT are considered important, but during periods of contention, should not monopolize or impact IP storage traffic.
2. IP Storage is more critical to ongoing cluster and VM performance than ESXi Management, vMotion or FT
3. IP storage requires higher priority than vMotion which is more of a burst activity and is not as critical to VM performance
4. With a share value of 25,  Fault Tolerance still has ample bandwidth to support the maximum supported FT machines per host of 4 even during periods of contention
5. With a share value of 25, vMotion still has ample bandwidth to support multiple concurrent vMotion’s during contention however performance should not be impacted on a day to day basis. With up to 8 vMotion’s supported as it is configured on a 10GB interface. (Limit of 4 on a 1GB interface) Where no contention exists, vMotion traffic can burst and use a large percentage of both 10GB interfaces to complete vMotion activity as fast as possible
6. With a share value of 25,  ESXi Management still has ample bandwidth to continue normal operations even during periods of contention
7. When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution.
8. With a share value of 50,  Virtual machine traffic still has ample bandwidth and should result in minimal or no impact to VM performance across 10Gb NICs
9. Setting Limits may prevent operations from completing in a timely manner where there is no contention

Implications

1. In the unlikely event of significant and ongoing contention, performance for vMotion may affect the ability to perform the evacuation of a host in a timely manner. This may extend scheduled maintenance windows.
2. VMs protected by FT may be impacted

Alternatives

1. Use a share value  of 50 for IP storage traffic to more evenly share bandwidth during periods of contention. However this may impact VM performance eg: Increased CPU WAIT if the IP storage is not keeping up with the storage demand

Related Posts
1. Example VMware vNetworking Design for IP Storage (4 x 10GB NICs)
2. Example VMware vNetworking Design for IP Storage (2 x 100GB NICs)
3. Frank Denneman (VCDX) – Designing your vMotion Network – Multi-NIC vMotion & NIOC