The Key to performance is Consistency

In recent weeks I have been doing lots of proof of concepts and performance testing using tools such as Jetstress (with great success I might add).

What I have always told customers is to focus on choosing a solution which comfortably meets their performance requirements while also delivering consistent performance.

The key word here is consistency.

Many solutions can achieve very high peak performance especially when only testing cache performance, but this isn’t real world as I discussed in Peak Performance vs Real World Performance.

So with two Jetstress VMs on a 3 node Nutanix cluster (N+1 configuration) I configured Jetstress to create multiple databases which used about 85% of the available capacity per node. The nodes used were hybrid, meaning some SSD and some SATA drives.

What this means is the nodes have ~20% of data within the SSD tier and the bulk of the data residing within the SATA tier as shown in the Nutanix PRISM UI on the Storage tab as shown below.

Tierusage

As Jetstress performs I/O across all data concurrently, it means that things like caching and tiering become much less effective.

For this testing no tricks have been used such as de-duplicating Jetstress DBs, which are by design duplicates. Doing this would result in unrealistically high dedupe ratios where all data would be served from SSD/cache resulting in artificially high performance and low latency. That’s not how I roll, I only talk real performance numbers which customers can achieve in the real world.

In this post I am not going to talk about the actual IOPS result, the latency figures or the time it took to create the databases as I’m not interested in getting into performance bake offs. What I am going to talk about is the percentage difference in the following metrics between the nodes observed during these tests:

1. Time to create the databases : 1.73%

2. IOPS achieved : 0.44%

3. Avg Read Latency : 4.2%

As you can see the percentage difference between the nodes for these metrics is very low, meaning performance is very consistent across a Nutanix cluster.

Note: All testing was performed concurrently and background tasks performed by Nutanix “Curator” function such as ILM (Tiering) and Disk Balancing were all running during these tests.

What does this mean?

Running business critical workloads on the same Nutanix cluster does not cause any significant noisy neighbour types issues which can and do occur in traditional centralised shared storage solutions.

VMware have attempted to mitigate against this issue with technology such as Storage I/O Control (SIOC) and Storage DRS (SDRS) but these issues are natively eliminated thanks to the Nutanix scale out shared nothing architecture. (Nutanix Xtreme Computing Platform or XCP)

Customers can be confident that performance achieved on one node is repeatable as Nutanix clusters are scaled even with Business Critical applications with large working sets which easily exceed the SSD tier.

It also means performance doesn’t “fall of the cache cliff” and become inconsistent, which has long been a fear with systems dependant on cache for performance.

Nutanix has chosen not to rely on caching to achieve high read/write performance, instead we to tune our defaults for consistent performance across large working sets and to ensure data integrity which means we commit the writes to persistent media before acknowledging writes and perform checksums on all read and write I/O. This is key for business critical applications such as MS SQL, MS Exchange and Oracle.

Virtualizing Business Critical Applications – The Web-Scale Way!

Since joining Nutanix back in July 2013, I have been working on testing the performance and resiliency of a range of virtual workloads including Business Critical Applications on the Nutanix platform. At the time, Nutanix only offered a single form factor (4 nodes in 2RU) which was not always a perfect fit depending on customer requirements.

Fast forward to August 2014 and now Nutanix has a wide range of node types to meet most workload requirements which can be found here.

The only real gap in the node types was a node which would support applications with large capacity requirements and also have a very large active working set which requires consistent low latency and high performance regardless of tier.

So what do I mean when I say “Active working Set”. I would define this as a data being regularly accessed by the VM/s, for example a file server may have 10TB of data, but users only access 10% on a regular basis. This 10% I would classify as the Active Working Set.

Now back to the topic at hand, The reason I am writing this post is because this has been a project I have been working towards for some time, and I am very excited about this product being released to the market. I have no doubt it will further increase the already fast up take of the Web-scale solutions and provide significant value and opportunities to new and existing customers wanting to simplify their datacenter/s and standardize on Nutanix Web-scale architecture.

Along with many others at Nutanix, we proposed a new node type (being the NX-8150), which has been undergoing thorough testing in my team (Solutions & Performance Engineering) for some time and I am pleased to say is being officially released (very) soon!

nx8050

What is the NX-8150?

A 1 Node per 2RU platform with the following specifications:

* 2 CPU Sockets with two CPU options (E5-2690v2 [20 cores / 3.0 GHz] OR E5-2697v2 [24 cores / 2.7 GHz]
* 4 x Intel 3700 Series SSDs (ranging from 400GB to 1.6TB ea)
* 20 x 1TB SATA HDDs
* Up to 768GB RAM
* Up to 4 x 10GB NICs
* 4 x 1GB NICs
* 1 x IPMI (Out of band Management)

What is the use case for the NX-8150?

Simply put, Applications which have high CPU/RAM requirements with large active working sets and/or the requirement for consistent high performance over a large data set.

Some examples of these applications include:

* Microsoft Exchange including DAG deployments
* Microsoft SQL including Always on Availability Groups
* Oracle including RAC
* SAP
* Microsoft Sharepoint
* Mixed Production Server Workloads with varying Capacity & I/O requirements

The NX-8150 is a great platform for the above workloads as it not only has fast CPUs and up to a massive 768GB of RAM to provide substantial compute resources to VMs, but also up to a massive 6.4TB of RAW SSD capacity for Virtual machines with high IO requirements. For workloads where peak performance is not critical the NX-8150 also provides solid consistent performance across the “Cold Tier” provided by the 20 x 1TB HDDs.

As with all Nutanix nodes, Intelligent Life-cycle management (ILM) maximizes performance by dynamically migrating hot data to SSD and cold data to SATA to provide the best of both worlds being high IOPS and high capacity.

One of the many major advantages of Nutanix Web-Scale architecture is Simplicity and its ability to remove the requirement for application specific silos! Now with the addition of the NX-8150 the vast majority of workloads including Business Critical Applications can be ran successfully on Nutanix, meaning less silos are required, resulting in a simpler, more cost effective, scalable and resilient datacenter solution.

Now with a number of customers already placing advanced orders for NX-8150’s to deploy Business Critical Applications, it wont be long until the now common “Virtual 1st” policies within many organisations turns into a “Nutanix Web-Scale 1st” policy!

Stay tuned for upcoming case studies for NX-8150 based Web-Scale solutions!