Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV) – Part 4 – Performance after three VM Migrations

This is Part 4 of the Jetstress performance testing on Nutanix Acropolis Hypervisor (AHV) series of videos.

This video shows the following:

  1. Jetstress performance on the forth node in the cluster after the VM has already been migrated in quick succession from its original node too a 2nd and 3rd node both of which have ran a Jetstress test.
  2. Nutanix Controller VM (CVM) CPU utilization

This test is the forth of several test to show the impact to performance by migrating VMs around several nodes in the same cluster which results in not all data being accessible/stored locally.

These post migration tests are designed to show the worst case performance when business critical workloads with large data sets (such as Exchange) move around nodes within a Nutanix AHV cluster. By running Jetstress on multiple nodes in quick succession, the workload on the Nutanix Controller VMs (CVMs) to localize data and maintain a even balance of data across all drives (SSD/HDD) and nodes within the cluster is higher than normal.

While migrating business critical applications around multiple nodes in quick succession (especially while running intensive benchmarks at much higher IOPS that the real world requires) is not something that will occur (unless initiated by an administrator), it demonstrates the performance impact (or lack thereof) of even this unrealistic operation.

Note: As with previous videos, This demonstration is not showing the peak performance which can be achieved by Jetstress on Nutanix.

Part 4

Return to the Table of Contents

Related Articles:

How to successfully Virtualize MS Exchange – Part 17 – Virtual Machine Storage Configuration

In addition to Part 16 where we discussed Virtual Disk Provisioning options and recommendations, In this part we will cover how to optimally configure a Virtual Machine for an Exchange MBX/MSR workload from a virtual storage controller perspective.

Once you have made the decision on storage platform, and assuming you have chosen to use VMFS or NFS datastores (and not iSCSI in-Guest or RDMs), then this article is for you.

Virtual Machines just like physical servers, have SCSI controllers (albeit virtual) and ESXi has a number of options to choose from which include:

1. BusLogic Parallel
2. LSI Logic Parallel
3. LSI Logic SAS
4. Paravirtual SCSI (PVSCSI)
5. AHCI SATA Controller

By default when creating a new virtual machine the default adapter for Windows 2008 and 2012 is “LSI Logic SAS” because Windows does not have the PVSCSI driver by default.

BusLogic ParallelLSI Logic Parallel adapters are not recommended for Windows 2008/2012 as they are legacy controllers with lower performance, as such I will not cover these in any more detail as they are irrelevant to Exchange deployments.

Instead I will cover the LSI Logic SASAHCI SATA Controller and Paravirtual SCSI (PVSCSI) adapters.

Starting with LSI Logic SAS,

This is the default controller for Windows 2008/2012 VMs, as a result, it is very common to see Exchange deployments using this controller. It has good performance and works out of the box with a Windows install without requiring drivers.

Advantages:

1. The default Controller for Windows 2008/2012
2. No need for manually inserting drivers to install Windows
3. Higher performance than AHCI SATA controller

Disadvantages:

1. Lower performance than PVSCSI
2. Higher CPU overheads in Guest compared to PVSCSI
3. Higher latency than PVSCSI
4. Lower maximum number of VMDKs supported per controller (15) compared to AHCI SATA (30)

Next let’s discuss the AHCI SATA Controller.

The AHCI SATA controller is new in vSphere 5.5 and is only supported in Virtual Machines with Hardware version 10. The SATA controller can be used on its own or in addition to LSI or PVSCSI controllers to provide additional VMDKs / Capacity which increases a single VMs maximum capacity from ~3.7PB to over 11PB.

Advantages:

1. Can support 30 VMDKs per Controller (120 total) compared to 15 for LSI / PVSCSI
2. Can be used in addition to PVSCSI controllers to provide more storage performance and capacity per Exchange VM (if required)
3. High capacity supported per controller than LSI Logic / PVSCSI

Disadvantages:

1. Higher CPU utilization per IO compared to LSI / PVSCSI options
2. Lower overall performance compared to LSI and PVSCSI
3. Higher latency compared to LSI and PVSCS

And Finally the Paravirtual SCSI Controller.

The PVSCSI controller is the highest performing controller and has been supported since ESXi 4.0 and are design for high performance storage environments and are available for virtual machines running hardware version 7 and later.

Advantages:

1. Performance , Performance , Performance. Oh yeah and did I mention performance?
2. Lower Latency and Higher IOPS compared to other controllers
3. Lower CPU overhead on the Guest OS (and therefore ESXi)
4. More CPU is available for Exchange due to lower CPU overheads

Disadvantages:

1. Windows Failover Clustering is not supported, but this has no impact on MS Exchange including DAG deployments.
2. PVSCSI is not the default and requires inserting drivers into the Windows installation OR the VM to be built on LSI Logic SAS and once VMware Tools is installed, swapping to PVSCSI.
3. Lower maximum VMDKs supported per controller (15) compared to AHCI SATA (30)

Performance Comparison

From a performance perspective, Michael Webster (VCDX#66) wrote this great post “VMware vSphere 5.5 Virtual Storage Adapter Performance” and produced the following graph showing a comparison between SATA, LSI Logic SAS and PVSCSI controllers from an IOPS, Latency perspective.

VMware-vSphere-5.5-Virtual-Storage-Adapter-Performance

As we can see, the PVSCSI adapter has significantly lower latency and higher IOPS than the SATA and LSILogic SAS controllers even when running on the same underlying storage.

While the Microsoft Exchange team have managed to successfully reduce I/O throughout the versions (2007-2013) the performance advantages also have a positive benefit on vCPU utilization.

Michael’s post states:

It (PVSCSI Controller) also had the lowest CPU usage. During the 32 OIO test SATA showed 52% CPU utilization vs 45% for LSI Logic SAS and 33% for PVSCSI.

What this means is less CPU utilization is used for I/O and lower average latency means more CPU is available for MS Exchange along with less CPU WAIT time (where the CPU is waiting for IO to complete before continuing). This means your onto a winner especially considering Exchange 2013 is very CPU intensive.

Which Controller should be used for Exchange VMs?

VMware have published the KB article “Do I choose the PVSCSI or LSI Logic virtual adapter on ESX\ESXi 4.0 for non-IO intensive workloads? (1017652)” which in summary explains:

The test results show that PVSCSI is better than LSI Logic, except under one condition–the virtual machine is performing less than 2,000 IOPS and issuing greater than 4 outstanding I/Os. This issue is fixed in vSphere 4.1 and later version, so that the PVSCSI virtual adapter can be used with good performance, even under this condition.

 

As the one caveat prior to vSphere 4.1 where LSI Logic can outperform PVSCSI, there are no significant downsides to using the PVSCSI compared to LSI as such, I recommend always using (multiple) PVSCSI adapters.

Now that we have decided on the PVSCSI adapter, what’s next?

As with physical servers, Virtual SCSI controllers including PVSCSI have their limits in terms of performance and scalability. To ensure maximum scalability, performance and low latency, multiple PVSCSI adapters should be used with all VMDKs evenly spread over the PVSCSI adapters as recommended in Part 11.

To do this, when adding a VMDK to the Exchange VM, ensure you select a different SCSI controller (which are created automatically on demand) by using the drop down box “Virtual Device Node” and selecting for example SCSI (1:0) as shown below.

MSRVMPVSCSI10

For subsequent VMDKs you must then select SCSI (2:0) as shown below.

MSRVMPVSCSI20

And then SCSI (3:0)

MSRVMPVSCSI30

For the forth VMDK, you then select SCSI (0:1) because SCSI (0:0) is taken by the VMDK used for the guest OS.

MSRVMPVSCSI01

Repeat the above process until you have sufficient VMDKs for your Exchange server VM.

The following illustrates my recommended configuration showing how to configure a VM supporting 8 database drives and 8 log drives.PVSCSIVMDKs

The above configuration will ensure maximum storage performance and can be expanded in the same configuration to support more than 3 times the number of databases + logs shown above and as such it is suitable for even very large (scale-up) Exchange MBX/MSR VMs.

For example, if each VMDK in the above configuration was just 4TB in size it would give you 64TB usable capacity and the VM can be scaled more than 3x the number of VMDKs.

Note: VMDKs can scale to 62TB (from vSphere 5.5) each although this may result in reduced performance.

TIP: Don’t forget to spread VMDKs evenly across datastores as per the recommendation in Part 11.

Recommendations for Exchange VM Storage Configuration:

1. Use multiple Paravirtual SCSI (PVSCSI) Adapters.
2. Use one VMDK per Database or Logs
3. Spread VMDKs evenly across multiple PVSCSI adapters
4. Spread VMDKs evenly across multiple datastores when using VMFS datastores
5. Spread VMDKs evenly across multiple datastores when using NFS datastores ensuring NFS datastores are served via multiple NAS controllers
6. Use more VMDKs as opposed to fewer larger VMDKs
7. Format NTFS volumes with an Allocation Unit Size of 64k
8. Keep it simple, do not mix virtual SCSI controller types.

Back to the Index of How to successfully Virtualize MS Exchange.

Virtualizing Business Critical Applications – The Web-Scale Way!

Since joining Nutanix back in July 2013, I have been working on testing the performance and resiliency of a range of virtual workloads including Business Critical Applications on the Nutanix platform. At the time, Nutanix only offered a single form factor (4 nodes in 2RU) which was not always a perfect fit depending on customer requirements.

Fast forward to August 2014 and now Nutanix has a wide range of node types to meet most workload requirements which can be found here.

The only real gap in the node types was a node which would support applications with large capacity requirements and also have a very large active working set which requires consistent low latency and high performance regardless of tier.

So what do I mean when I say “Active working Set”. I would define this as a data being regularly accessed by the VM/s, for example a file server may have 10TB of data, but users only access 10% on a regular basis. This 10% I would classify as the Active Working Set.

Now back to the topic at hand, The reason I am writing this post is because this has been a project I have been working towards for some time, and I am very excited about this product being released to the market. I have no doubt it will further increase the already fast up take of the Web-scale solutions and provide significant value and opportunities to new and existing customers wanting to simplify their datacenter/s and standardize on Nutanix Web-scale architecture.

Along with many others at Nutanix, we proposed a new node type (being the NX-8150), which has been undergoing thorough testing in my team (Solutions & Performance Engineering) for some time and I am pleased to say is being officially released (very) soon!

nx8050

What is the NX-8150?

A 1 Node per 2RU platform with the following specifications:

* 2 CPU Sockets with two CPU options (E5-2690v2 [20 cores / 3.0 GHz] OR E5-2697v2 [24 cores / 2.7 GHz]
* 4 x Intel 3700 Series SSDs (ranging from 400GB to 1.6TB ea)
* 20 x 1TB SATA HDDs
* Up to 768GB RAM
* Up to 4 x 10GB NICs
* 4 x 1GB NICs
* 1 x IPMI (Out of band Management)

What is the use case for the NX-8150?

Simply put, Applications which have high CPU/RAM requirements with large active working sets and/or the requirement for consistent high performance over a large data set.

Some examples of these applications include:

* Microsoft Exchange including DAG deployments
* Microsoft SQL including Always on Availability Groups
* Oracle including RAC
* SAP
* Microsoft Sharepoint
* Mixed Production Server Workloads with varying Capacity & I/O requirements

The NX-8150 is a great platform for the above workloads as it not only has fast CPUs and up to a massive 768GB of RAM to provide substantial compute resources to VMs, but also up to a massive 6.4TB of RAW SSD capacity for Virtual machines with high IO requirements. For workloads where peak performance is not critical the NX-8150 also provides solid consistent performance across the “Cold Tier” provided by the 20 x 1TB HDDs.

As with all Nutanix nodes, Intelligent Life-cycle management (ILM) maximizes performance by dynamically migrating hot data to SSD and cold data to SATA to provide the best of both worlds being high IOPS and high capacity.

One of the many major advantages of Nutanix Web-Scale architecture is Simplicity and its ability to remove the requirement for application specific silos! Now with the addition of the NX-8150 the vast majority of workloads including Business Critical Applications can be ran successfully on Nutanix, meaning less silos are required, resulting in a simpler, more cost effective, scalable and resilient datacenter solution.

Now with a number of customers already placing advanced orders for NX-8150’s to deploy Business Critical Applications, it wont be long until the now common “Virtual 1st” policies within many organisations turns into a “Nutanix Web-Scale 1st” policy!

Stay tuned for upcoming case studies for NX-8150 based Web-Scale solutions!