Nutanix AOS 5.5 delivers 1M IOPS from a single VM, but what happens when you vMotion?

For many years Nutanix has been delivering excellent performance across multiple hypervisors as well as hardware platforms including the native NX series, OEMs (Dell XC & Lenovo HX) and more recently software only options with Cisco and HPE.

Recently I tweeted (below) showing how a single virtual machine can achieve 1 million 8k random read IOPS and >8GBps throughput on AHV, the next generation hypervisor.

While most of the response to this was positive, the usual negativity came from some competitors who tried to spread fear, uncertainty and doubt (FUD) about the performance including claims it was not sustainable during/after a live migration (vMotion) and that is does not demonstrate the performance of the IO path.

Let’s quickly cover of the IO path discussion of in-kernel vs a controller VM.

To test the IO path, in the case of Nutanix, via the Controller VM, you want to eliminate as many variables and bottlenecks as possible. This means a read/write test is not valid as writes are dependant on factors such as the network. As this was one a node using NVMe, the bottleneck would quickly become the network and not the path between the user VM and controller VM.

I’ve previously tweeted (below) showing an example of the throughput capabilities of SATA SSD, NVMe and 3DxPoint which clearly shows the network is the bottleneck with next generation flash.

I’ve also responded to 3rd party FUD about Nutanix Data locality with a post which goes in depth about Nutanix original & unique implementation of Data Locality which is how Nutanix minimises its dependancy on the network to deliver excellent performance.

So we are left with read IO to actually test and possibly stress the IO path between a User VM and software defined storage, be that in-kernel or in user space which is where the Nutanix CVM runs.

The tweet showing >1 million 8k random read IOPS and >8GBps throughput shows that the IO path of Nutanix is efficient enough to achieve this at just 110 micro (not milli) seconds.

The next question from those who try to discredit Nutanix and HCI in general is what happens after a vMotion?

Let me start by saying this is a valid question, but even if performance dropped during/after a vMotion is it even a major issue?

For business critical applications, it is common for vendors to recommend DRS should/must rules to prevent vMotion exception for in the event of maintenance or failure regardless of the infrastructure being traditional/legacy NAS/SAN or HCI.

With a NAS/SAN, the best case scenario is 100% remote IO where as with Nutanix this is the worse cast scenario. Let’s assume business as usual on Nutanix is 1M IOPS and during a vMotion and for a few mins after that performance dropped by 20%.

That would still be 800k IOPS which is higher than what most NAS/SAN solutions can delivery anyway.

But the fact is, Nutanix can sustain excellent performance during and after a vMotion as demonstrated by the video below which was recorded in real time. Hint: Watch the values in the putty session as these show the performance as measured at the guest level which is what ultimately matters.

Credit for the video goes to my friend and colleague Michael “Webscale” Webster (VCDX#66 & NPX#007).

The IO dropped below 1 million IOPS for approx 3 seconds during the vMotion with the lowest value recorded at 956k IOPS. I’d say an approx 10% drop for 3 seconds is pretty reasonable as the performance drop is caused by the migration stunning the VM and not by the underlying storage.

Over to our “friends” at the legacy storage vendors to repeat the same test on their biggest/baddest arrays.

Not impressed? Let’s see what 70/30 read/write workload performs!