Scale out performance testing with Nutanix Storage Only Nodes

At Nutanix inaugural user conference in 2015, Storage Only nodes were announced which allowed customers for the first time to scale capacity without having to add compute nodes. This allows customers more flexibility and eliminates the need to license the storage nodes for vSphere as storage only nodes run Acropolis Hypervisor (AHV) and are managed entirely through PRISM.

A common question from prospective and existing Nutanix customers is what if my VMs storage exceeds the capacity of a Nutanix node? The answer is detailed in this blog post but in short, as the Acropolis Distributed Storage Fabric (ADSF) distributes data throughout the cluster at a 1MB granularity, a VMs storage can exceed the local node and performance even improves including reads from the capacity (SAS/SATA) tier.

Storage only nodes were previously limited to the NX-6035C (and Dell XC/Lenovo HX equivalents) but at Nutanix .NEXT conference in Las Vegas 2016, it was announced that any node (including all-flash) can be a storage only node.

This means even for high performance and/or high capacity environments, Nutanix clusters can be scaled without the need to add compute node or purchase additional licensing if you are running vSphere as the hypervisor.

However to date Nutanix are yet to publish any performance data showing the value of storage only nodes, so I decided to run a few tests and demonstrate the value of the Acropolis Distributed Storage Fabric (ADSF) and Storage Only Nodes.

Before we get to the performance data, to avoid competitors inevitable attempts to create FUD about Nutanix performance, I will not be publishing the exact specifications of the node types, drive or Jetstress configurations. I will be publishing the IOPS/latency and database creation, duplication and checksumming durations of the direct comparisons which clearly show the performance advantage of storage only nodes.

Jetstress was not configured to demonstrate maximum performance of the underlying Nutanix solution, it was configured to achieve around 1000 IOPS which is typically higher than even a large Exchange deployment requires per instance. This also allows this test to demonstrate how performance improves when the cluster is performing real world levels of IO (at least in the case of Exchange for this example).

The performance advantage will vary between node types and based on how many storage only nodes are added to the cluster. But the point of this example is to show that ADSF is a truely distributed storage fabric and the storage only nodes and additional Nutanix Controller VMs (CVMs) servicing replication (RF) traffic and remote reads significantly improves performance for VMs residing on the Compute+Storage nodes.

Test Overview:

The first test will be performed using four Jetstress VMs running on a four node cluster. The second test will be performed after an additional four storage only nodes are added to the cluster to form an eight node cluster. Before the second test the cluster will be wiped of all data with the exception of the Windows 2012 R2 template and all Jetstress DBs will be created from scratch so we can compare DB creation as well as performance and DB checksumming durations. Wiping all data also ensures there is no pre-warming of the extent cache (in memory read cache) or metadata cache.

Test Preparation:

I performed a cluster stop / cluster destroy / cluster create to ensure the cluster is totally clean and that we have a fair baseline for the test. The cluster was made up of four nodes.

I then created a base Windows 2012 R2 virtual machine with 4 PVSCSI adapters and 9 vDisks, one for the OS, 4 for the DBs and 4 for the logs. DB drives were formatted with 64k allocation size and log drives with 4k as the different allocation size and seperate virtual disks has shown approx 25% performance improvement in my testing not to mention I recommend In-Line compression and Erasure Coding (EC-X) for Exchange databases and no data reduction for logs.

Jetstress was configured to use 80% of the vDisks capacity which resulted in approx 80% of the Nutanix storage pool capacity being utilised for the test. I will point out these were not low capacity nodes such as NX3060s so the database creation time is significant because there was lots of data to create.

I then cloned the VM 3 times and spread the 4 VMs across 4 Nutanix Nodes running ESXi 5.5 Update 3.

Test 1: Create Databases and run 2hr test

The databases creation phase creates one database, then Jetstress duplicates the database in this case 3 times and immediately after creation the performance test begins.

Note: No data reduction was used for this test as it will result in unrealistic data reduction and performance results as I described in the post Jetstress Testing with Intelligent Tiered Storage Platforms.

I configured Jetstress in this way to ensure the extent cache (in memory read cache) was not pre-warmed and so the results of the test would be fair and repeatable.

Once the performance test completed, I waited for each test to complete before allowing the database checksum validation task to complete. (This is done by using the Multi-host option in Jetstress).

The results for each of the four Jetstress VMs are shown below including the average across the VMs for each of the difference metrics.

Jetstress4NodesSummary

Observations from Test 1:

  1. We achieved the desired >1000 IOPS per VM
  2. Performance was consistent across all Jetstress instances
  3. Log writes were in the 1ms range as they were serviced by the ADSF Oplog (persistent write buffer)
  4. Database reads were on average just under 10ms which is well below the Microsoft recommended 20ms
  5. The Database creation time averaged 2hrs 24mins
  6. The duplication of 3 databases averaged 4hrs 17mins
  7. The database checksum took on average around 38mins

Test 2: Delete all data, Add four nodes to the cluster & repeat test 1

All Jetstress VMs were deleted and a full curator scan manually initiated to ensure all data was fully removed from disk prior to beginning the next test which ensured a fair baseline.

Four Jetstress VMs were then deployed from the same template, powered on and the saved Jetstress configuration was applied before beginning the test.

Note: The Jetstress thread count was not changed and remains the same as for Test 1.

As with Test 1 the databases creation phase created one database, then Jetstress duplicates the database 3 times and immediately after creation the performance test begins and ran for the same 2hr duration.

The results for each of the four Jetstress VMs are shown below including the average across the VMs for each of the difference metrics.

Jetstress8NodesSummary

Observations from Test 2:

  1. Achieved IOPS jumped by almost 2x
  2. Log writes average latency was lower by 13%
  3. Database write latency dropped by >20%
  4. Database read latency dropped by almost 2x
  5. The Database creation time was just under 15 mins faster
  6. The duplication of 3 databases improved by almost 35 mins
  7. The database checksum was 40 seconds faster.

Without changing the Jetstress thread count, due to the improved performance of the cluster the achieved IOPS jumped by 2x!!

Summary:

These tests is a clear demonstration of the scalability advantage of the Acropolis Distributed Storage Fabric (ADSF) and storage only nodes for customers wanting to increase performance and/or capacity in their HCI environment.

The ability of ADSF to distribute write IO across all nodes within a cluster means write performance improves significantly with the addition of nodes (including storage only) to the cluster while reducing read and write latency due to the decreased workload on the compute + storage nodes servicing the VMs.

But data locality is lost with storage only nodes, right?

Wrong! Storage only nodes actually improve (yes, improve!) data locality by maximising the amount of available space on the compute+storage nodes. This is as a direct result of storage only nodes accepting replication data for write IO and storing the 2nd or 3rd copies (in the case of RF3) on the storage only nodes. This is also demonstrated by the lower read latency observed during this test.

Storage only nodes not only improve the performance and capacity for Virtual machines, but also for physical servers using Acropolis Block Services (ABS) and users of Acropolis File Services (AFS) both of which had enhancements announced at .NEXT 2016 this year.

Fight the FUD – Cisco “My VSA is better than your VSA”

It seems like the FUD is surging out of Cisco thick and fast, which is great news since Nutanix is getting all the mind share and recognition as the clear market leader.

The latest FUD from Cisco is their Virtual Storage Appliance (VSA, or what Nutanix calls a Controller VM, or CVM) is better than Nutanix because it provides I/O from across the cluster where as Nutanix only serves I/O locally.

I quite frankly don’t care how Cisco or any other vendor does what they do, I will just explain what Nutanix does and why then you can make up your own mind.

Q1. Does Nutanix only serve I/O locally?

A1. No

Nutanix performs writes (e.g.: RF2/RF3) across two or three nodes before providing an acknowledgement to the guest OS. One copy of the data is always written locally except in the case where the local SSD tier is full in which case all copies will be written remotely.

tiering_1

The above image is courtesy of the Nutanix Bible by Steve Poitras.

It shows that Write I/O is prioritized to be local to the Virtual Machine to enable future Read I/O to be served locally thus removing the network, other nodes/controllers as a potential bottleneck/dependancy and ensuring optimal performance.

This means a single VM can be serviced by “N” number of Controllers concurrently, which improves performance.

Nutanix does this as we want to avoid as many dependancies as possible. Allowing the bulk of Read I/O to be serviced locally helps avoid traditional storage issues like noisy neighbour. By writing locally we also avoid at least 1 network hop and remote controller/node involvement as one of the replica’s is always written locally.

Q2. What if a VM’s active data exceeds the local SSD tier?

A2. I/O will be served by controllers throughout the Cluster

I have previously covered this topic in a post titled “What if my VMs storage exceeds the capacity of a Nutanix node?“. As a quick summary, the below diagram shows a VM on Node B having its data served across a 4 node cluster all from SSD.

The above diagram also shows the node types can be Compute+Storage or Storage Only nodes while still providing SSD tier capacity and a Nutanix CVM to provide I/O and data services such as Compression/Dedupe/Erasure Coding as well as monitoring / management capabilities.

Q3. What if data is not in the SSD tier?

A3. If data is migrated to the SATA tier, it is accessed based on avg latency either locally or remotely.

If data is moved from SSD to SATA, the 1st option is to service the I/O locally, but if the local SATA latency is above a threshold, the I/O will be serviced by a remote replica. This ensures in the unlikely event of contention locally, I/O is not unnecessarily delayed.

For reads from SATA, the bottleneck is the SATA drive itself which means the latency of the network (typically <0.5ms) is insignificant when several ms can be saved by using a replica on drives which are not as busy.

This is explained in more detail in “NOS 4.5 Delivers Increased Read Performance from SATA“.

Q4. Cisco HX outperforms Nutanix

A4. Watch out for 4K unrealistic benchmarks, especially on lower end HW & older AOS releases.

I am very vocal that peak performance benchmarks are a waste of time, as I explain in the following article “Peak Performance vs Real World Performance“.

VMware and EMC constantly attack Nutanix on performance, which is funny because Nutanix AOS 4.6 outperforms VSAN comfortably as I show in this article:  Benchmark(et)ing Nonsense IOPS Comparisons, if you insist – Nutanix AOS 4.6 outperforms VSAN 6.2

Cisco will be no different, they will focus on unrealistic Benchmark(et)ing which I will respond to the upcoming nonsense in the not to distant future when it is released.

Coming soon: Cisco HX vs Nutanix AOS 4.6

Summary:

One of the reasons Nutanix is the market leader is our attention to detail. The value of the platform exceeds the sum of its parts because we consider and test all sorts of scenarios to ensure the platform performs in a consistent manner.

Nutanix can do things like remote SATA reads, and track performance and serve I/O from the optimal location because of the truly distributed underlying storage fabric (ADSF). These sort of capabilities are limited or not possible without this kind of underlying fabric.

Related Posts:

  1. Peak Performance vs Real World Performance
  2. Benchmark(et)ing Nonsense IOPS Comparisons, if you insist – Nutanix AOS 4.6 outperforms VSAN 6.2
  3. NOS 4.5 Delivers Increased Read Performance from SATA
  4. What if my VMs storage exceeds the capacity of a Nutanix node?
  5. Nutanix Bible

Benchmark(et)ing Nonsense IOPS Comparisons, if you insist – Nutanix AOS 4.6 outperforms VSAN 6.2

As many of you know, I’ve taken a stand with many other storage professionals to try to educate the industry that peak performance is vastly different to real world performance. I covered this in a post titled: Peak Performance vs Real World Performance.

I have also given a specific example of Peak Performance vs Real World Performance with a Business Critical Application (MS Exchange) where I demonstrate that the first and most significant constraining factor for Exchange performance is compute (CPU/RAM) so achieving more IOPS is unnecessary to achieve the business outcome (which is supporting a given number of Exchange mailboxes/message per day).

However vendors (all of them) who offer products which provide storage, whether it is as a component such as in HCI or a fully focused offering, continue to promote peak performance numbers. They do this because the industry as a whole has and continues to promote these numbers as if they are relevant and trying to one-up each other with nonsense comparisons.

VMware and the EMC federation have made a lot of noise around In-Kernel being better performance than Software Defined Storage running within a VM which is referred to by some as a VSA (Virtual Storage Appliance). At the same time the same companies/people are recommending business critical applications (vBCA) be virtualized. This is a clear contradiction, as I explain in an article I wrote titled In-Kernel verses Virtual Storage Appliance which in short concludes by saying:

…a high performance (1M+ IOPS) solution can be delivered both In-Kernel or via a VSA, it’s simple as that. We are long past the days where a VM was a significant bottleneck (circa 2004 w/ ESX 2.x).

I stand by this statement and the in-kernel vs VSA debate is another example of nonsense comparisons which have little/no relevance in the real world. I will now (reluctantly) cover off (quickly) some marketing numbers before getting to the point of this post.

VMware VSAN 6.2

Firstly, Congratulations to VMware on this release. I believe you now have a minimally viable product thanks to the introduction of software based checksums which are essential for any storage platform.

VMW Claim One: For the VSAN 6.2 release, “delivering over 6M IOPS with an all-flash architecture”

The basic math for a 64 node cluster = ~93700 IOPS / node but as I have seen this benchmark from Intel showing 6.7Million IOPS for a 64 node cluster, let’s give VMware the benefit of the doubt and assume its an even 7M IOPS which equates to 109375 IOPS / node.

Reference: VMware Virtual SAN Datasheet

VMW Claim Two: Highest Performance >100K IOPS per node

The graphic below (pulled directly from VMware’s website) shows their performance claims of >100K IOPS per node and >6 Million IOPS per cluster.

Reference: Introducing you to the 4th Generation Virtual SAN

Now what about Nutanix Distributed Storage Fabric (NDSF) & Acropolis Operating System (AOS) 4.6?

We’re now at the point where the hardware is becoming the bottleneck as we are saturating the performance of physical Intel S3700 enterprise-grade solid state drives (SSDs) on many of our hybrid nodes. As such we have moved onto performance testing of our NX-9460-G4 model which has 4 nodes running Haswell CPUs and 6 x Intel S3700 SSDs per node all in 2RU.

With AOS 4.6 running ESXi 6.0 on a NX9460-G4 (4 x NX-9040-G4 nodes), Nutanix are seeing in excess of 150K IOPS per node, which is 600K IOPS per 2RU (Nutanix Block).

The below graph shows performance per node and how the solution scales in terms of performance up to a 4 node / 1 block solution which fits within 2RU.

NOS46Perf

So Nutanix AOS 4.6 provides approx. 36% higher performance than VSAN 6.2.

(>150K IOPS per NX9040-G4 node compared to <=110K IOPS for All Flash VSAN 6.2 node)

It should be noted the above Nutanix performance numbers have already been improved upon in upcoming releases going through performance engineering and QA, so this is far from the best you will see.

but-wait-theres-more

Enough with the nonsense marketing numbers! Let’s get to the point of the post:

These 4k 100% random read IOPS (and similar) tests are totally unrealistic.

Assuming the 4k IOPS tests were realistic, to quote my previous article:

Peak performance is rarely a significant factor for a storage solution.

More importantly, SO WHAT if Vendor A (in this case Nutanix) has higher peak performance than Vendor B (in this case VSAN)!

What matters is customer business outcomes, not benchmark(eting)!

holdup

Wait a minute, the vendor with the higher performance is telling you peak performance doesn’t matter instead of bragging about it and trying to make it sound importaint?

Yes you are reading that correctly, no one should care who has the highest unrealistic benchmark!

I wrote things to consider when choosing infrastructure. a while back to highlight that choosing the “Best of Breed” for every workload may not be a good overall strategy, as it will require management of multiple silos which leads to inefficiency and increased costs.

The key point is if you can meet all the customer requirements (e.g.: performance) with a standard platform while working within constraints such as budget, power, cooling, rack space and time to value, you’re doing yourself (or your customer) a dis-service by not considering using a standard platform for your workloads. So if Vendor X has 10% faster performance (even for your specific workload) than Vendor Y but Vendor Y still meets your requirements, performance shouldn’t be a significant consideration when choosing a product.

Both VSAN and Nutanix are software defined storage and I expect both will continue to rapidly improve performance through tuning done completely in software. If we were talking about a product which is dependant on offloading to Hardware, then sure performance comparisons will be relevant for longer, but VSAN and Nutanix are both 100% software and can/do improve performance in software with every release.

In 3 months, VSAN might be slightly faster. Then 3 months later Nutanix will overtake them again. In reality, peak performance rarely if ever impacts real world customer deployments and with scale out solutions, it’s even less relevant as you can scale.

If a solution can’t scale, or does so in 2 node mirror type configurations then considering peak performance is much more critical. I’d suggest if you’re looking at this (legacy) style of product you have bigger issues.

Not only does performance in the software defined storage world change rapidly, so does the performance of the underlying commodity hardware, such as CPUs and SSDs. This is why its importaint to consider products (like VSAN and Nutanix) that are not dependant on proprietary hardware as hardware eventually becomes a constraint. This is why the world is moving towards software defined for storage, networking etc.

If more performance is required, the ability to add new nodes and the ability to form a heterogeneous cluster and distribute data evenly across the cluster (like NDSF does) is vastly more importaint than the peak IOPS difference between two products.

While you might think that this blog post is a direct attack on HCI vendors, the principle analogy holds true for any hardware or storage vendor out there. It is only a matter of time before customers stop getting trapped in benchmark(et)ing wars. They will instead identify their real requirements and readily embrace the overall value of dramatically simple on-premises infrastructure.

In my opinion, Nutanix is miles ahead of the competition in terms of value, flexibility, operational benefits, product maturity and market-leading customer service all of which matter way more than peak performance (which Nutanix is the fastest anyway).

Summary:

  1. Focus on what matters and determine whether or not a solution delivers the required business outcomes. Hint: This is rarely just a matter of MOAR IOPS!
  2. Don’t waste your time in benchmark(et)ing wars or proof of concept bake offs.
  3. Nutanix AOS 4.6 outperforms VSAN 6.2
  4. A VSA can outperform an in-kernel SDS product, so lets put that in-kernel vs VSA nonsense to rest.
  5. Peak performance benchmarks still don’t matter even when the vendor I work for has the highest performance. (a.k.a My opinion doesn’t change based on my employers current product capabilities)
  6. Storage vendors ALL should stop with the peak IOPS nonsense marketing.
  7. Software-defined storage products like Nutanix and VSAN continue to rapidly improve performance, so comparisons are outdated soon after publication.
  8. Products dependant upon propitiatory hardware are not the future
  9. Put a high focus on the quality of vendors support.

Related Articles:

  1. Peak Performance vs Real World Performance
  2. Peak performance vs Real World – Exchange on Nutanix Acropolis Hypervisor (AHV)
  3. The Key to performance is Consistency
  4. MS Exchange Performance – Nutanix vs VSAN 6.0
  5. Scaling to 1 Million IOPS and beyond linearly!
  6. Things to consider when choosing infrastructure.