Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV) – Part 1 – The Baseline Test

The following is Part 1 of the Jetstress performance testing on Nutanix Acropolis Hypervisor (AHV) series of videos.

This video shows the following:

  1. Stopping/Starting the NDSF cluster to ensure a fair starting point (No artificial pre warming of cache etc)
  2. The Performance required for 2500 Exchange Users (100 messages / Day with 2 DAG copies) being 732 Jetstress IOPS as per MS Exchange Server role requirements calculator.
  3. The Performance achieved by Jetstress with 8 threads using 8 vDisks (4 for DB, 4 for Logs)

The reason the demonstration is limited to 2500 users is because the Virtual machine compute requirements already is over the maximum recommended RAM for an Exchange 2013/2016 Server (96GB). As such, no additional storage performance is required as compute is more often than not the constraining factor.

For more information see: Peak performance vs Real World – Exchange

Note: This demonstration is not showing the peak performance which can be achieved by Jetstress on Nutanix. In fact it’s running on a ~3 year old NX-3450 with Ivy Bridge processors and Jetstress is tuned (as the video shows) to a low thread count which still achieves >3x the required IOPS for 2500 Exchange users.

Part 1

Return to the Table of Contents

Microsoft Exchange 2013/2016 Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV)

Virtualization of business critical application has been common place for a number of years, however it is less well known that these business critical applications are also regularly deployed on Nutanix Hyper-converged Infrastructure (HCI) as I discuss in the following post:

Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again!

I am regularly involved in discussions with customers about how well MS Exchange and other business critical applications perform on Nutanix especially during:

  • Storage software upgrades (Acropolis Base Software)
  • Hypervisor upgrade
  • VMs Migrations (e.g.: vMotion)
  • Failure scenarios.

Customers also ask how Data Locality works with workloads like Exchange which have large amounts of data, what overheads are there if any, how much data is served local vs remote and so on.

As a result, I have created the following series of Videos demonstrating the following:

  • Setting a baseline for Jetstress performance on Node 1
  • Migrating VM to a 2nd node and repeating the Jetstress performance test
  • Migrating VM to a 3rd node and repeating the Jetstress performance test
  • Migrating VM to a 4th node and repeating the Jetstress performance test
  • Migrating the VM back to the 1st node and repeating the Jetstress performance test
  • Repeating the test on the 2nd, 3rd and 4th nodes (second Jetstress run for comparison)
  • Performing a Jetstress performance test on a VM with the local Nutanix Controller VM (CVM) offline (to simulate a CVM failure, Storage Maintenance or Upgrade scenarios)

During the above videos I will show advanced Nutanix Distributed Storage Fabric (NDSF) performance statistics such as how Write I/O is being served and What percentage of data is being served locally verses remotely.

Enjoy the videos:

Part 1 – Setting a baseline for Jetstress performance on Nutanix AHV

Part 2 – Migrating Jetstress to 2nd node and repeating Jetstress test

Part 3 – Migrating Jetstress to 3rd node and repeating Jetstress test

Part 4 – Migrating Jetstress to 4th node and repeating Jetstress test

Part 5 through 8 – Repeat Jetstress Tests on all four nodes. (Coming soon)

Part 9 – Take the local Nutanix Controller VM (CVM) offline and repeat test (Coming soon)

Part 10 – Scale out Performance Validation (Coming soon)

Related Articles:

Nutanix AHV I/O path efficiency

The I/O path in AHV is unlike other hypervisors and is remarkably simple. Each VM is made up of one or more vDisks, with each vDisk presented directly to the VM via iSCSI. vDisks appear to the guest OS as if they were a physical disk or the same as a VMDK does in vSphere environments and do not require any special in guest configuration.

The I/O path for each vDisk bypasses the underlying QEMU storage stack and has a direct TCP connection to the iSCSI target on the local Controller VM. This bypasses any/all queues at the hypervisor layer and allows Stargate to manage the one and only queue.

Importantly, every single vDisk has its own TCP connection to stargate which means vdisks do not share any queues until they hit the storage controller (stargate). This reduces points of contention to Stargate itself and as every AHV node runs a stargate instance (within the CVM), only VMs on the same node share the queue for stargate, further reducing the chances of contention.

For those of you who are not familiar with the underlying Nutanix architecture, check out the below video describing what stargate does.

Because the vDisk is presented as a LUN via iSCSI the commands being sent do not require SCSI protocol emulation and simply send native SCSI commands.

The below diagram shows a VM with 3 vDisks and how they connect to Stargate. You will note QEMU is completely bypassed which optimises the I/O path.

AHVIOpath

If a Virtual machine has more than 3 vDisks, each additional vDisk will have its own TCP connection.

In the event the local Stargate instance is offline for any reason (e.g.: Rolling One-Click upgrade or CVM failure) each TCP connection will be redirected in a round robin manner across all the CVMs within the Nutanix cluster as described in Acropolis Hypervisor (AHV) I/O Failover & Load Balancing.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Why AHV is the next generation hypervisor – 10 Part Series