Solving Oracle & SQL Licensing challenges with Nutanix

The Nutanix platform has and will continue to evolve to meet/exceed the ever increasing customer and application requirements while working within constraints such as licensing.

Two of the most common workloads which I work frequently with customers to design solutions around real or perceived licensing constraints are Oracle and SQL.

In years gone by, Nutanix solutions were constrained to being built around a limited number of node types. When I joined in 2013 only one type existed (NX-3450) which limited customers flexibility and often led to paying more for licensing than a traditional 3-tier solution.

With that said, the ROI and TCO for the Nutanix solutions back then were still more often than not favourable compared to 3-tier but these days we only have more and more good news for prospective and existing customers.

Nutanix has now rounded out the portfolio with the introduction of “Compute Only” nodes to target a select few niche workloads with real or perceived licensing and/or political constraints.

Compute only nodes compliment the traditional HCI nodes (Compute+Storage) as well as our unique Storage Only Nodes which were introduced in mid 2015.

So how do Compute Only nodes help solve these licensing challenges?

In short, Oracle leads the world in misleading and intimidating customers into paying more for licensing than what they need to. One of the most ridiculous claims is “You must license every physical CPU core in your cluster because Oracle could run or have ran on it”.

The below tweet makes fun of Oracle and shows how ridiculous their claim that customers need to license every node in a cluster (which I’ve never seen referenced in any actual contract) is.

So let’s get to how you can design a Nutanix solution to meet a typical Oracle customer licensing constraint while ensuring excellent Scalability, Resiliency and Performance.

At this stage we now assume you’ve given your first born child and left leg to Oracle and have subsequently been granted for example 24 physical core licenses from Oracle, what next?

If we we’re to use HCI nodes, some of the CPU would be utilised by the Nutanix Controller VM (CVM) and while the CVM does add a lot of value (see my post Cost vs Reward for the Nutanix Controller VM) you may be so constrained by licensing that you want to maximise the CPU power for just Oracle workloads.

Now in this example, we have 24 licensed physical cores, so we could use two Compute Only nodes using an Intel Gold 6128 [6 cores / 3.4 GHz] / 12 cores per server for 24 total physical cores.

Next we would assess the storage capacity, resiliency and performance requirements and decide how many and what configuration storage only nodes are required.

Because Virtual Machines cannot run on storage only nodes, the Oracle Virtual Machines cannot and will never run on any other CPU cores other than the two Compute Only nodes therefore you would be in compliance with your licensing.

The below is an example of what the environment could look like.

2CO_4SOnodes

SQL has ever changing CPU licensing models which in some cases are licensed by server or vCPU count, Compute Only can be used in the same way I explained above to address any SQL licensing constraints.

What about if I need to scale storage capacity and/or performance?

You’re in luck, without any modifications to the Oracle workloads, you can simply add one or more storage only nodes to the cluster and it will almost immediately increase capacity, performance and resiliency!

I’ve published an example of the performance improvement by adding storage only nodes to a cluster in an article titled Scale out performance testing with Nutanix Storage Only Nodes which I wrote back in 2016.

In short, the results show by doubling the number of nodes from 4 to 8, the performance almost exactly doubled while delivering low read and write latency.

What if you’ve already invested in Nutanix HCI nodes (example below) and are running Oracle/SQL or any other workloads on the cluster?

TypicalHCIcluster

Nutanix provides the ability to convert a HCI node into a Storage Only node which results in preventing Virtual Machines from running on that node. So all you need to do is add two or more Compute Only nodes to the cluster, then mark the existing HCI nodes as Storage Only and the result is shown below.

CO_PlusConvertedHCI

This is in fact the minimum supported configuration for Compute Only Environments to ensure minimum levels of resiliency and performance. For more information, check out my post “Nutanix Compute Only Minimum requirements“.

Now we have two nodes (Compute Only) which can run Virtual Machines and four nodes (HCI nodes converted to Storage Only) which are servicing the storage I/O. In this scenario, if the HCI nodes have unused CPU and/or RAM the Nutanix Controller VM (CVM) can also be scaled up to drive higher performance & lower latency.

Compute Only is currently available with the Nutanix Next Generation Hypervisor “AHV”.

Now let’s cover off a few of the benefits of running applications like Oracle & SQL on Nutanix:

  1. No additional Virtualization licensing (AHV is included when purchasing Nutanix AOS)
  2. No rip and replace for existing HCI investment
  3. Unique scale out distributed storage fabric (ADSF) which can be easily scaled as required
  4. Storage Only nodes add capacity, performance and resiliency to your mission critical workloads without incurring additional hypervisor or application licensing costs
  5. Compute Only allows scale up and out of CPU/RAM resources where applications are constrained by ONLY CPU/RAM and/or application software licensing.
  6. Storage Only nodes can also provide functions such as Nutanix Files (previously known as Acropolis File Services or AFS)

As a result of Nutanix now having HCI, Storage Only and Compute Only nodes, we’re now entering the time where Nutanix can truely be the standard platform for almost any workload including those with non technical constraints such as political or application licensing which have traditionally been at least perceived to be an advantage for legacy SAN products.

The beauty of the Nutanix examples above is while they look like a traditional 3-tier, we avoid the legacy SAN problems including:

1. Rip and Replace / High Impact / High Risk Controller upgrades/scalability
2. Difficulty in scaling performance with capacity
3. Inability to increase resiliency without adding additional Silos of storage (i.e.: Another dual controller SAN)

With Compute Only being supported by AHV, we also help customers avoid the unnecessary complexity and related operational costs of managing ESXi deployments which have become increasingly more complex over time without significantly improving value to the average customer who simply wants high performance, resilient and easy to manage virtualisation solution.

But what about VMware ESXi customers?

Obviously moving to AHV would be ideal but for those who cannot for whatever reasons can still benefit from Storage Only nodes which provide increased storage performance and resiliency to the Virtual machines running on ESXi.

Customers can run ESXi on Nutanix (or OEM / Software Only) HCI nodes and then scale the clusters performance/capacity with AHV based storage only nodes, therefore eliminating the need to license both ESXi and Oracle/SQL since no virtual machine will run on these nodes.

How does Nutanix compare to a leading all flash array?

For those of you who would like to see a HCI only Nutanix solution have better TCO as well as performance and capacity than a leading All Flash Array, checkout A TCO Analysis of Pure FlashStack & Nutanix Enterprise Cloud where even with giving every possible advantage to Pure Storage, Nutanix still comes out on top without data reduction assumptions.

Now consider that Nutanix the TCO as well as performance and capacity was better than a leading All Flash Array with only HCI nodes, imagine the increased efficiency and flexibility by being able to mix/match HCI, with Storage Only and Compute only.

This is just another example of how Nutanix is eliminating even the corner use cases for traditional SAN/NAS.

For more information about Nutanix Scalability, Resiliency and Performance, checkout this multi-part blog series.

Predictable & Scalable MS Exchange 2016 Performance on Nutanix with AHV

I’ve been doing some testing recently with Nutanix latest GA code (AOS 5.8) and I decided to do some quick MS Exchange Jetstress performance tests as part of a larger piece of work.

In short I wanted to check how well Exchange storage performance scaled so I performed three tests. I started with 4 threads, then increased to 8 and finally to 12 threads using Jetress with Exchange 2016 ESE database modules.

For this testing I disabled the Nutanix in memory read cache to ensure all read IO is serviced by the physical SSDs so the result is not artificially improved from cache.

I also disabled Compression, Erasure Coding and Deduplication as these also artificially improve performance due to Jetstress data being highly compressible & dedupable.

The hardware used was a NX-8150 with 6 x SSDs and Intel Broadwell processors. This is why the database size was only 1.7TB as that’s just below the total usable capacity of the node. The performance over larger database sizes remains the same when the metadata cache (in the Nutanix Controller VM) is sized for the desired working set size as shown by our ESRP certification.

The hypervisor is Acropolis Hypervisor (AHV) which is fully certified for Microsoft Windows under the MS SVVP programme as well as MS ESRP certified for MS Exchange.

So here is the result for 4 threads.

Jetstress2016_4Threads

5580 IOPS with just 4 threads is very good performance and is sufficient for at least five thousand mailboxes with hundreds of messages per day which is maximum recommended active users per Exchange MSR server.

The next question is: What’s the latency for the database reads and log writes? (These are two of the critical performance metrics for Jetstress Pass/Fail results)

Jetstress2016_4Threads_Latency

Here we can see log write latency average across all four log drives is below 1ms (0.99ms) and database read latency at 1.16ms.

Next up, here is the result for 8 threads.

Jetstress2016_8Threads

10147 IOPS with 8 threads is excellent performance and shows Nutanix easily has headroom for more than ten-thousand mailboxes with hundreds of messages per day which easily exceeds the requirements for the maximum recommended active users per Exchange MSR server.

Again let’s check out the latency, Here we can see log write latency average across all four log drives is still below 1ms (0.99ms) and database read latency at 1.29ms. That’s just 0.13ms higher latency for reads and exactly the same write latency while achieving almost DOUBLE the IOPS.

Jetstress2016_8Threads_Latency

Lastly here is the result for 12 threads.

Jetstress2016_12Threads

14351 IOPS with 12 threads proves how scalable the Nutanix platform is as this is almost a linear increase in IOPS.

Again let’s check out the latency, Here we can see log write latency average across all four log drives is still below 1ms (0.98ms) and database read latency at 1.42ms. That’s just 0.14ms higher latency for reads and slightly lower write latency while achieving almost linear improvement in IOPS.

Jetstress2016_12Threads_Latency

Summary:

Nutanix provides extremely high, predictable performance for even the most demanding MS Exchange environments.

 

Nutanix X-Ray Benchmarking tool – Snapshot Impact Scenario

In the first part of this series, I introduced Nutanix X-Ray benchmarking tool which has been designed very differently to traditional benchmarking tools as the performance of the app is the control and the variable is the platform,not the other way around.

This is done by generating realistic IO patterns (e.g.: Not 100% 4k read) and then performing functions against the platform to see how the control (the VM application performance) is impacted by the underlying platforms functionality.

A great example of this is performing snapshots as the first step in a space efficient backup solution.

X-Ray has a built in test which generates an OLTP workload which is ran for 8 hours which for an all flash platform generates 6000 IOPS across the database and 400 IOPS for the logs. The scenario is detailed in the X-Ray report shown below.

XraySnapshotImpactDescription

The Snapshot impact scenario is then ran against multiple platforms and using the Analysis functionality within X-ray. we can generate a report which overlays the results from multiple platforms.

The below example is GA Acropolis Hypervisor (AHV) on AOS 5.1.1 verses a leading hypervisor and SDS platform showing the snapshot impact scenario.

XraySnapshotImpact

Each of the red lines indicate a snapshot and what we observe is the performance of both platforms remains consistent until the 10th snapshot (shown below) where the Nutanix platform continues without impact and the leading hypervisor and SDS platform starts degrading significantly.

XraySnapshotImpactSnap10

In the real world, customers use the intelligent features of storage, SDS or hyper-converged platforms but rarely test how this functionality works prior to purchasing. This is because it’s difficult and time consuming to do so.

Nutanix X-Ray tool makes the process of validating a platforms performance under real world scenarios a quick and easy process and provides automatically generated reports where accurate comparisons can be made.

What this example shows is that while both platforms could achieve the required performance without snapshots, only Nutanix AHV & AOS could maintain the performance while utilising snapshots to achieve the type of recovery point objective (RPO) that is expected in production environments, especially with business critical workloads.

As part of the Nutanix Solutions and Performance engineering organisation, I can tell you that the focus for Nutanix is real world performance, using data reduction, leveraging snapshots, mixing workloads and testing a large scale.

In upcoming posts I will show more examples of X-Ray test scenarios as well as comparisons between GA Acropolis Hypervisor (AHV) & AOS 5.1.1 verses a leading hypervisor and SDS platform.

Related Articles:

Nutanix X-Ray Benchmarking tool Part 1 – Introduction

Nutanix X-Ray Benchmarking tool Part 3 – Extended Node Failure Scenario