The Key to performance is Consistency

In recent weeks I have been doing lots of proof of concepts and performance testing using tools such as Jetstress (with great success I might add).

What I have always told customers is to focus on choosing a solution which comfortably meets their performance requirements while also delivering consistent performance.

The key word here is consistency.

Many solutions can achieve very high peak performance especially when only testing cache performance, but this isn’t real world as I discussed in Peak Performance vs Real World Performance.

So with two Jetstress VMs on a 3 node Nutanix cluster (N+1 configuration) I configured Jetstress to create multiple databases which used about 85% of the available capacity per node. The nodes used were hybrid, meaning some SSD and some SATA drives.

What this means is the nodes have ~20% of data within the SSD tier and the bulk of the data residing within the SATA tier as shown in the Nutanix PRISM UI on the Storage tab as shown below.

Tierusage

As Jetstress performs I/O across all data concurrently, it means that things like caching and tiering become much less effective.

For this testing no tricks have been used such as de-duplicating Jetstress DBs, which are by design duplicates. Doing this would result in unrealistically high dedupe ratios where all data would be served from SSD/cache resulting in artificially high performance and low latency. That’s not how I roll, I only talk real performance numbers which customers can achieve in the real world.

In this post I am not going to talk about the actual IOPS result, the latency figures or the time it took to create the databases as I’m not interested in getting into performance bake offs. What I am going to talk about is the percentage difference in the following metrics between the nodes observed during these tests:

1. Time to create the databases : 1.73%

2. IOPS achieved : 0.44%

3. Avg Read Latency : 4.2%

As you can see the percentage difference between the nodes for these metrics is very low, meaning performance is very consistent across a Nutanix cluster.

Note: All testing was performed concurrently and background tasks performed by Nutanix “Curator” function such as ILM (Tiering) and Disk Balancing were all running during these tests.

What does this mean?

Running business critical workloads on the same Nutanix cluster does not cause any significant noisy neighbour types issues which can and do occur in traditional centralised shared storage solutions.

VMware have attempted to mitigate against this issue with technology such as Storage I/O Control (SIOC) and Storage DRS (SDRS) but these issues are natively eliminated thanks to the Nutanix scale out shared nothing architecture. (Nutanix Xtreme Computing Platform or XCP)

Customers can be confident that performance achieved on one node is repeatable as Nutanix clusters are scaled even with Business Critical applications with large working sets which easily exceed the SSD tier.

It also means performance doesn’t “fall of the cache cliff” and become inconsistent, which has long been a fear with systems dependant on cache for performance.

Nutanix has chosen not to rely on caching to achieve high read/write performance, instead we to tune our defaults for consistent performance across large working sets and to ensure data integrity which means we commit the writes to persistent media before acknowledging writes and perform checksums on all read and write I/O. This is key for business critical applications such as MS SQL, MS Exchange and Oracle.