NOS 4.5 Delivers Increased effective SSD tier capacity

In addition to the increased effective SSD (and SATA) tier capacity gained by using Erasure Coding (EC-X) which was announced at the Nutanix .NEXT conference earlier this year, the upcoming NOS (Nutanix Operating System) 4.5 is providing a yet another effective capacity increase for the SSD tier.

Here’s how it works:

The below 4 node cluster has 3 VMs actively using data (known as extents) represented by the A,B,C blocks. This is a very simplified example as VMs will have potentially hundreds or thousands of extents distributed throughout a cluster.

AllHotDataSSD

What we can see in the above diagram is two copies of each piece of data as this is an RF2 deployment. The VM on Node A is using extent A, the VM on Node B is using extent B and the VM on Node C is using extent C.

Because the VMs are using Extents A,B and C, they all remain within the SSD tier including the replicas distributed throughout the cluster. When these extents become cold they will be dynamically moved to the SATA tier.

What is changing in NOS 4.5 is the Nutanix tiering solution called ILM (Intelligent Lifecycle Management) now perform up-migrations (from SATA to SSD) on a per extent basis which means replicas are treated independent of each other. What this means is the hot extents will up-migrate to SSD on the node where the VM is running (via Data Locality) giving all flash performance while the replicas distributed throughout the cluster will remain in the SATA tier as shown below:

PerExtentUpMigrations

As we can see in the above diagram, all copies of A,B,C and D were in the SATA tier. Then the VM on node A started frequently reading from data A and the local extent is therefore up-migrate to SSD.

For the VM on node B, it started frequently accessing data D and B. Data D was up-migrated from local SATA and data B was up-migrated AND localized as it was residing on a remote node. The VM on node C also up-migrated from local SATA the same as VM on node A.

Now we can see that out of the 8 extents, we have 4 which have me up-migrated and localized (where required) and 4 which remain in the low cost SATA tier.

As a result the SSD tiers effective capacity is doubled for RF2 and tripled for RF3. So this means for customers using RF2, the active working set can potentially double while still providing all flash performance.

If data is frequently being overwritten NDFS will detect this and up-migrate both the local and remote copy/copies to ensure write I/O is always serviced by the SSD tier. The below diagram shows Data A being up-migrated to node C SSD tier ready to service the redundant replicas for any write I/O.

PerExtentUpMigrationsWriteIO

As typical mixed workload environments have a higher Read vs Write ratio e.g.: 70/30 the benefits of only up-migrating one extent when it becomes hot is effective for a large percentage of the I/O.

Even in the event the Read vs Write Ratio is reversed e.g.: 30/70 which is typical for VDI environments, the new ILM process will still provide a significant effective increase of the SSD tier by only up-migrating one out of two extents. It should be noted for VDI solutions, VAAI-NAS already provides huge data reduction savings thanks to intelligent cloning and as a result it is not uncommon to find large VDI deployments on Nutanix using only the SSD tier.

Summary:

NOS 4.5 delivers Double or Triple (for RF3) the effective SSD tier capacity in addition to data reduction savings from technologies such as deduplication, compression and Erasure Coding (EC-X). This feature is like most things with Nutanix is hypervisor agnostic!

Not bad for a free software upgrade huh!

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM

NOS & Hypervisor Upgrade Resiliency in PRISM

I have had several prospective and existing customers say how much they like the One Click upgrade PRISM provides for NOS, Hypervisor’s, Firmware and NCC. These customers typically also ask questions about what happens if they perform a One Click upgrade and the cluster is for any reason degraded such as from a drive, node, block failure.

Before starting a One Click upgrade, NOS always performs Pre-Upgrade checks to ensure the cluster is healthy. In the event the cluster is not fully resilient the upgrade process will be aborted as shown below:

AcropolisUpgrade

 

In the above case, the cause of the cluster being “under-replicated” (meaning the configured Resiliency Factor of 2 or 3 was not in compliance) was due to the fact NOS had just be upgraded on the cluster and one of the nodes had not yet come back online when the One Click Upgrade for the Acropolis hypervisor (AHV) was started.

Other situations where the cluster may be under replication is following a HDD, SSD, Node or Block failure. In all these cases, the Nutanix Distributed File System (NDFS) will restore resiliency assuming sufficient rebuilt capacity is available in the Storage Pool. This is why Nutanix always recommends clusters be designed with at least N+1 available capacity to ensure rebuild capacity exists and the cluster can automatically self heal.

As a general rule it is recommended to wait for approx 10 mins between NOS and Hypervisor upgrades to avoid these kind of issues, or you can simply check the Home screen of PRISM and ensure the Heath status is Good as shown below:

HealthGood

and that the Data Resiliency Status is “OK” as shown below.DataResiliencyOk

Both the Health and Data Resiliency status are Hypervisor agnostic and appear on the Home screen of all Nutanix deployments.

If both the Health Status and Data Resiliency are good then you can go ahead and start the upgrade and it should complete successfully.

Summary:

PRISM will not start an upgrade of NOS or the Hypervisor if the cluster is degraded, so you can rest assured that even if you attempt an upgrade by accident when the cluster is degraded, NOS will protect you.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

3. Advanced Storage Performance Monitoring with Nutanix

4. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

5. Nutanix – Erasure Coding (EC-X) Deep Dive

6. Acropolis: VM High Availability (HA)

7. Acropolis: Scalability

Not just Hypervisor Agnostic, Hypervisor Version Agnostic!

Its well known that Nutanix is Hypervisor agnostic supporting ESXi, Hyper-V and KVM, but what most people either don’t know, or haven’t considered, is the fact the Nutanix Operating System (NOS) version is not dependant on the hypervisor version.

What does this mean?

You can run the latest and greatest NOS 4.1.x releases on ESXi 5.0 , ESXi 6.0 or anything in between. In fact, you could run older versions of NOS such as 3.x with vSphere 6.0 as well (although I see no reason you would do this.)

Why is this important?

This past week I was discussing with some government customers how they can perform upgrades of NOS using our 1-Click upgrade, and I was asked a similar question on several occasions:

“What version of ESXi do we need?”

The reason the customers asked this question is because for large environments, changing/upgrading the hypervisor can be a significant project requiring Design/project management and implementation labor which could cost huge amounts of money when the only goal is to increase storage performance or functionality.

NOS can be upgraded independent of the Hypervisor (and without performing a single vMotion or putting hosts into maintenance mode). This ensures that customers who cannot or do not wish to upgrade ESXi for any reason, continue to benefit from the ever increasing performance and feature set of NOS.

While hyper-converged solutions like Nutanix combine the compute/hypervisor layer and the storage layer delivering numerous benefits over traditional 3-tier architecture, it’s a significant advantage to be able to independently upgrade the compute or storage layer.

This is something Nutanix delivers which is just one of the many ways we make our solution “Uncompromisingly Simple”.

Oh, did I mention NOS also provides support for 1-Click Hypervisor and Firmware upgrades ? 🙂

Related Articles: