Solving Oracle & SQL Licensing challenges with Nutanix

The Nutanix platform has and will continue to evolve to meet/exceed the ever increasing customer and application requirements while working within constraints such as licensing.

Two of the most common workloads which I work frequently with customers to design solutions around real or perceived licensing constraints are Oracle and SQL.

In years gone by, Nutanix solutions were constrained to being built around a limited number of node types. When I joined in 2013 only one type existed (NX-3450) which limited customers flexibility and often led to paying more for licensing than a traditional 3-tier solution.

With that said, the ROI and TCO for the Nutanix solutions back then were still more often than not favourable compared to 3-tier but these days we only have more and more good news for prospective and existing customers.

Nutanix has now rounded out the portfolio with the introduction of “Compute Only” nodes to target a select few niche workloads with real or perceived licensing and/or political constraints.

Compute only nodes compliment the traditional HCI nodes (Compute+Storage) as well as our unique Storage Only Nodes which were introduced in mid 2015.

So how do Compute Only nodes help solve these licensing challenges?

In short, Oracle leads the world in misleading and intimidating customers into paying more for licensing than what they need to. One of the most ridiculous claims is “You must license every physical CPU core in your cluster because Oracle could run or have ran on it”.

The below tweet makes fun of Oracle and shows how ridiculous their claim that customers need to license every node in a cluster (which I’ve never seen referenced in any actual contract) is.

So let’s get to how you can design a Nutanix solution to meet a typical Oracle customer licensing constraint while ensuring excellent Scalability, Resiliency and Performance.

At this stage we now assume you’ve given your first born child and left leg to Oracle and have subsequently been granted for example 24 physical core licenses from Oracle, what next?

If we we’re to use HCI nodes, some of the CPU would be utilised by the Nutanix Controller VM (CVM) and while the CVM does add a lot of value (see my post Cost vs Reward for the Nutanix Controller VM) you may be so constrained by licensing that you want to maximise the CPU power for just Oracle workloads.

Now in this example, we have 24 licensed physical cores, so we could use two Compute Only nodes using an Intel Gold 6128 [6 cores / 3.4 GHz] / 12 cores per server for 24 total physical cores.

Next we would assess the storage capacity, resiliency and performance requirements and decide how many and what configuration storage only nodes are required.

Because Virtual Machines cannot run on storage only nodes, the Oracle Virtual Machines cannot and will never run on any other CPU cores other than the two Compute Only nodes therefore you would be in compliance with your licensing.

The below is an example of what the environment could look like.

2CO_4SOnodes

SQL has ever changing CPU licensing models which in some cases are licensed by server or vCPU count, Compute Only can be used in the same way I explained above to address any SQL licensing constraints.

What about if I need to scale storage capacity and/or performance?

You’re in luck, without any modifications to the Oracle workloads, you can simply add one or more storage only nodes to the cluster and it will almost immediately increase capacity, performance and resiliency!

I’ve published an example of the performance improvement by adding storage only nodes to a cluster in an article titled Scale out performance testing with Nutanix Storage Only Nodes which I wrote back in 2016.

In short, the results show by doubling the number of nodes from 4 to 8, the performance almost exactly doubled while delivering low read and write latency.

What if you’ve already invested in Nutanix HCI nodes (example below) and are running Oracle/SQL or any other workloads on the cluster?

TypicalHCIcluster

Nutanix provides the ability to convert a HCI node into a Storage Only node which results in preventing Virtual Machines from running on that node. So all you need to do is add two or more Compute Only nodes to the cluster, then mark the existing HCI nodes as Storage Only and the result is shown below.

CO_PlusConvertedHCI

This is in fact the minimum supported configuration for Compute Only Environments to ensure minimum levels of resiliency and performance. For more information, check out my post “Nutanix Compute Only Minimum requirements“.

Now we have two nodes (Compute Only) which can run Virtual Machines and four nodes (HCI nodes converted to Storage Only) which are servicing the storage I/O. In this scenario, if the HCI nodes have unused CPU and/or RAM the Nutanix Controller VM (CVM) can also be scaled up to drive higher performance & lower latency.

Compute Only is currently available with the Nutanix Next Generation Hypervisor “AHV”.

Now let’s cover off a few of the benefits of running applications like Oracle & SQL on Nutanix:

  1. No additional Virtualization licensing (AHV is included when purchasing Nutanix AOS)
  2. No rip and replace for existing HCI investment
  3. Unique scale out distributed storage fabric (ADSF) which can be easily scaled as required
  4. Storage Only nodes add capacity, performance and resiliency to your mission critical workloads without incurring additional hypervisor or application licensing costs
  5. Compute Only allows scale up and out of CPU/RAM resources where applications are constrained by ONLY CPU/RAM and/or application software licensing.
  6. Storage Only nodes can also provide functions such as Nutanix Files (previously known as Acropolis File Services or AFS)

As a result of Nutanix now having HCI, Storage Only and Compute Only nodes, we’re now entering the time where Nutanix can truely be the standard platform for almost any workload including those with non technical constraints such as political or application licensing which have traditionally been at least perceived to be an advantage for legacy SAN products.

The beauty of the Nutanix examples above is while they look like a traditional 3-tier, we avoid the legacy SAN problems including:

1. Rip and Replace / High Impact / High Risk Controller upgrades/scalability
2. Difficulty in scaling performance with capacity
3. Inability to increase resiliency without adding additional Silos of storage (i.e.: Another dual controller SAN)

With Compute Only being supported by AHV, we also help customers avoid the unnecessary complexity and related operational costs of managing ESXi deployments which have become increasingly more complex over time without significantly improving value to the average customer who simply wants high performance, resilient and easy to manage virtualisation solution.

But what about VMware ESXi customers?

Obviously moving to AHV would be ideal but for those who cannot for whatever reasons can still benefit from Storage Only nodes which provide increased storage performance and resiliency to the Virtual machines running on ESXi.

Customers can run ESXi on Nutanix (or OEM / Software Only) HCI nodes and then scale the clusters performance/capacity with AHV based storage only nodes, therefore eliminating the need to license both ESXi and Oracle/SQL since no virtual machine will run on these nodes.

How does Nutanix compare to a leading all flash array?

For those of you who would like to see a HCI only Nutanix solution have better TCO as well as performance and capacity than a leading All Flash Array, checkout A TCO Analysis of Pure FlashStack & Nutanix Enterprise Cloud where even with giving every possible advantage to Pure Storage, Nutanix still comes out on top without data reduction assumptions.

Now consider that Nutanix the TCO as well as performance and capacity was better than a leading All Flash Array with only HCI nodes, imagine the increased efficiency and flexibility by being able to mix/match HCI, with Storage Only and Compute only.

This is just another example of how Nutanix is eliminating even the corner use cases for traditional SAN/NAS.

For more information about Nutanix Scalability, Resiliency and Performance, checkout this multi-part blog series.

Nutanix Scalability – Part 1 – Storage Capacity

It never ceases to amaze me that analysts as well as prospective/existing customers frequently are not aware of the storage scalability capabilities of the Nutanix platform.

When I joined back in 2013, a common complaint was that Nutanix had to scale in fixed building blocks of NX-3050 nodes with compute and storage regardless of what the actual requirement was.

Not long after that, Nutanix introduced the NX-1000 and NX-6000 series which had lower and higher CPU/RAM and storage capacity options which gave more flexibility, but still there were some use cases where Nutanix still had significant gaps.

In October 2013 I wrote a post titled “Scaling problems with traditional shared storage” which covers why simply adding shelves of SSD/HDD to a dual controller storage array does not scale an environment linearly, can significantly impact performance and add complexity.

At .NEXT 2015, Nutanix announced the ability to Scale Storage separately to Compute which allowed customers to scale capacity by adding similar to a shelf of drives like they could with their legacy SAN/NAS, but with the added advantage of having a storage controller (the Nutanix CVM) to add additional data services, performance and resiliency.

Storage only nodes are supported with any Hypervisor but the good news in they run on Nutanix’ Acropolis Hypervisor (AHV) which means no additional hypervisor licensing if you run VMware ESXi, and storage only nodes still support all the 1-click rolling upgrades so they add no additional management overhead.

Advantages of Storage Only Nodes:

  1. Ability to scale capacity seperate to CPU/RAM like a traditional disk shelf on a storage array
  2. Ability to start small and scale capacity if/when required, i.e.: No oversizing day 1
  3. No hypervisor licensing or additional management when scaling capacity
  4. Increased data services/resiliency/performance thanks to the Nutanix Controller VM (CVM)
  5. Ability to increase capacity for hot and cold data (i.e.: All Flash and Hybrid/Storage heavy)
  6. True Storage only nodes & the way data is distributed to them is unique to Nutanix

Example use cases for Storage Only Nodes

Example 1: Increasing capacity requirement:

MS Exchange Administrator: I’ve been told by the CEO to increase our mailbox limits from 1GB to 2GB but we don’t have enough capacity.

Nutanix: Let’s start small and add storage only nodes as the Nutanix cluster (storage pool) reaches 80% utilisation.

Example 2: Increasing flash capacity:

MS SQL DBA: We’re growing our mission critical database and now we’re hitting SATA for some day to day operations, we need more flash!

Nutanix: Let’s add some all flash storage only nodes.

Example 3: Increasing resiliency

CEO/CIO: We need to be able to tolerate failures and the infrastructure self heal but we have a secure facility which is difficult and time consuming to get access too, what can we do?

Nutanix: Let’s add some storage only nodes to ensure you have enough capacity (All Flash and/or Hybrid) to ensure sufficient capacity to tolerate “n” number of failures and rebuild the environment back to a fully resilient and performant state.

Example 4: Implementing Backup / Long Term Retention

CEO/CIO: We need to be able to keep 7 years of data for regulatory requirements and we need to be able to access it within 1hr.

Nutanix: We can either add storage only nodes to one or more existing clusters OR create a dedicated Backup/Retention cluster. Let’s start with enough capacity for Year 1, and then as capacity is required, add more storage only nodes as the cost per GB drops over time. Nutanix allows mixing of hardware generations so you’ll never be in a situation where you need to rip & replace.

Example 5: Supporting one or more Monster VMs

Server Administrator: We have one or more VMs with storage capacity requirements of 100TB each, but the largest Nutanix node we have only supports 20TB. What do we do?

Nutanix: The Distributed Storage Fabric (ADSF) allows a VMs data set to be distributed throughout a Nutanix cluster ensuring any storage requirement can be met. Adding storage only nodes will ensure sufficient capacity while adding resiliency/performance to all other VMs in the cluster. Cold data will be distributed throughout the cluster while frequently accessed data will remain local where possible within the local storage capacity on the node where the VM runs.

For more information on this use case see: What if my VMs storage exceeds the capacity of a Nutanix node?

Example 6: Performance for infrequently accessed data (cold data).

Server Administrator: We have always stored our cold data on SATA drives attached to our SAN because we have a lot of data and flash is expensive. One or twice a year we need to do a bulk read of our data for auditing/accounting purposes but it’s always been so slow. How can we solve this problem and give good performance while keeping costs down?

Nutanix: Hybrid Storage only nodes are a cost effective way to store cold data and combined with ADSF, Nutanix is able to deliver optimum read performance from SATA by reading from the replica (copy of data) with the lowest latency.

This means if a HDD or even a node is experiencing heavy load, ADSF will dynamically redirect Read I/O throughout the cluster to Deliver Increased Read Performance from SATA. This capability was released in 2015 and storage only nodes adding more spindles to a cluster is very complimentary to this capability.

Frequently asked questions (FAQ):

  1. How many storage only nodes can a single cluster support?
    1. There is no hard limit, typically cluster sizes are less than 64 nodes as it’s important to consider limiting the size of a single failure domain.
  2. How many Compute+Storage nodes are required to use Storage Only nodes?
    1. Two. This also allows N+1 failover for the nodes running VMs in the event a compute+storage node failed so VMs can be restarted. Technically, you can create a cluster with only storage only nodes.
  3. How does adding storage only node increase capacity for my monster VM?
    1. By distributing replicas of data throughout the cluster, thus freeing up local capacity for the running VM/s on the local node. Where a VMs storage requirement exceeds the local nodes capacity, storage only nodes add capacity and performance to the storage pool. Note: One VM even with only one monster vDisk can use the entire capacity of a Nutanix cluster without any special configuration.

Summary:

For many years Nutanix has supported and recommended the use of Storage only nodes to add capacity, performance and resiliency to Nutanix clusters.

Back to the Scalability, Resiliency and Performance Index.

Expanding Capacity on a Nutanix environment – Design Decisions

I recently saw an article about design decisions around expanding capacity for a HCI platform which went through the various considerations and made some recommendations on how to proceed in different situations.

While reading the article, it really made me think how much simpler this process is with Nutanix and how these types of areas are commonly overlooked when choosing a platform.

Let’s start with a few basics:

The Nutanix Acropolis Distributed Storage Fabric (ADSF) is made up of all the drives (SSD/SAS/SATA etc) in all nodes in the cluster. Data is written locally where the VM performing the write resides and replica’s are distributed based on numerous factors throughout the cluster. i.e.: No Pairing, HA pairs, preferred nodes etc.

In the event of a drive failure, regardless of what drive (SSD,SAS,SATA) fails, only that drive is impacted, not a disk group or RAID pack.

This is key as it limited the impact of the failure.

It is importaint to note, ADSF does not store large objects nor does the file system require tuning to stripe data across multiple drives/nodes. ADSF by default distributes the data (at a 1MB granularity) in the most efficient manner throughout the cluster while maintaining the hottest data locally to ensure the lowest overheads and highest performance read I/O.

Let’s go through a few scenarios, which apply to both All Flash and Hybrid environments.

  1. Expanding capacityWhen adding a node or nodes to an existing cluster, without moving any VMs, changing any configuration or making any design decisions, ADSF will proactively send replicas from write I/O to all nodes within the cluster, therefore improving performance while reactively performing disk balancing where a significant imbalance exists within a cluster.

    This might sound odd but with other HCI products new nodes are not used unless you change the stripe configuration or create new objects e.g.: VMDKs which means you can have lots of spare capacity in your cluster, but still experience an out of space condition.

    This is a great example of why ADSF has a major advantage especially when considering environments with large IO and/or capacity requirements.

    The node addition process only requires the administrator to enter the IP addresses and its basically a one click, capacity is available immediately and there is no mass movement of data. There is also no need to move data off and recreate disk groups or similar as these legacy concepts & complexities do not exist in ADSF.

    Nutanix is also the only platform to allow expanding of capacity via Storage Only nodes and supports VMs which have larger capacity requirements than a single node can provide. Both are supported out of the box with zero configuration required.

    Interestingly, adding storage only nodes also increases performance, resiliency for the entire cluster as well as the management stack including PRISM.

  2. Impact & implications to data reduction of adding new nodesWith ADSF, there are no considerations or implications. Data reduction is truely global throughout the cluster and regardless of hypervisor or if you’re adding Compute+Storage or Storage Only nodes, the benefits particularly of deduplication continue to benefit the environment.

    The net effect of adding more nodes is better performance, higher resiliency, faster rebuilds from drive/node failures and again with global deduplication, a higher chance of duplicate data being found and not stored unnecessarily on physical storage resulting in a better deduplication ratio.

    No matter what size node/s are added & no matter what Hypervisor, the benefits from data reduction features such as deduplication and compression work at a global level.

    What about Erasure Coding? Nutanix EC-X creates the most efficient stripe based on the cluster size, so if you start with a small 4 node cluster your stripe would be 2+1 and if you expand the cluster to 5 nodes, the stripe will automatically become 3+1 and if you expand further to 6 nodes or more, the stripe will become 4+1 which is currently the largest stripe supported.

  3. Drive FailuresIn the event of a drive failure (SSD/SAS or SATA) as mentioned earlier, only that drive is impacted. Therefore to restore resiliency, only the data on that drive needs to be repaired as opposed to something like an entire disk group being marked as offline.

    It’s crazy to think a single commodity drive failure in a HCI product could bring down an entire group of drives, causing a significant impact to the environment.

    With Nutanix, a rebuild is performed in a distributed manner throughout all nodes in the cluster, so the larger the cluster, the lower the per node impact and the faster the configured resiliency factor is restored to a fully resilient state.

At this point you’re probably asking, Are there any decisions to make?

When adding any node, compute+storage or storage only, ensure you consider what the impact of a failure of that node will be.

For example, if you add one 15TB storage only node to a cluster of nodes which are only 2TB usable, then you would need to ensure 15TB of available space to allow the cluster to fully self heal from the loss of the 15TB node. As such, I recommend ensuring your N+1 (or N+2) node/s are equal to the size of the largest node in the cluster from both a capacity, performance and CPU/RAM perspective.

So if your biggest node is an NX-8150 with 44c / 512GB RAM and 20TB usable, you should have an N+1 node of the same size to cover the worst case failure scenario of an NX-8150 failing OR have the equivalent available resources available within the cluster.

By following this one, simple rule, your cluster will always be able to fully self heal in the event of a failure and VMs will failover and be able to perform at comparable levels to before the failure.

Simple as that! No RAID, Disk group, deduplication, compression, failure, or rebuild considerations to worry about.

Summary:

The above are just a few examples of the advantages the Nutanix ADSF provides compared to other HCI products. The operational and architectural complexity of other products can lead to additional risk, inefficient use of infrastructure, misconfiguration and ultimately an environment which does not deliver the business outcome it was originally design to.