Nutanix AHV/AOS Functionality – Removing nodes

A Nutanix ADSF (Acropolis Distributed Storage Fabric) is designed to live forever, meaning as new nodes are added and older nodes removed, the cluster remains online and critically, in a fully resilient state at all times.

While this might not sound that critical, it avoids problems which have plagued legacy (and even many modern) datacenter products where forklift upgrades/replacements are not only complex, high risk and time consuming, they typically also reduce the resiliency of the platform throughout the process.

A common example of reduced resiliency is where one (of two) SAN/NAS controllers is taken offline during a fork lift storage controller upgrade, meaning a single failure can cause the storage to be offline.

Nutanix has now been shipping product for around 5 years so we have had many customers go through hardware refresh cycles, and many more who are about to embark on a HW refresh.

I thought I would quickly demonstrate how easy it is to remove an old node from a cluster and ensure existing and prospective Nutanix customers have the facts about the node removal process.

Firstly lets look at the environment the demonstration is performed on.

We have an AHV environment with 8 nodes with a mix of NX3050 and NX6050 spread over 3 blocks as shown in Nutanix PRISM UI (below).

EnvironmentSummary

To remove a host, all we need to do is go to the hardware tab in PRISM, click the host we want to remove and select Remove Host as shown below.

RemoveHost

No preparation tasks are required at all which also means less planning and change control is required. Once you select Remove Host, the host enters maintenance mode and starts performing the required tasks to remove the node as shown below.

RemoveHost2

As you can see, Acropolis OS (AOS) is removing each individual disk from the cluster before taking the node out of the cluster. This means the configured Resiliency Factor (RF) is always in compliance, ensuring that data is still available even in the event of a drive or node failure. This can be observed on the PRISM Home screen in the data resiliency view shown below.

DataResiliencyStatus

This process is handled by the curator function of AOS and because data is distributed throughout all nodes within the cluster, the process is both lower impact than traditional RAID based solutions or solutions using RAID+Replication, as well as faster because all nodes and therefore CVMs, SSDs and HDDs participate in the process. Nutanix ADSF does not mirror or replicate data from one node to another node, but to and from all nodes. This eliminates the potential bottleneck of a single node.

The following shows the speed at which Nutanix Distributed Storage Fabric (ADSF) performs the data migration even when the majority of data resides on the HDD tier (including in this example).

StoragePoolPerfNodeRemove

For a cluster with 20 x 1TB and 20 x 4TB SATA spindles for a total of 100TB of SATA and just 6.4TB SSD (or approx 6.5%) the node removal rate where it reached >830MBps quite impressive since most of the extents (data) which needed to be replicated throughout the cluster were retrieved from SATA tier.

The rate at which a node can be removed will vary depending on the front end I/O, node types and cluster size with larger cluster sizes able to remove nodes faster due to more available controllers (CMVs) and importantly more choice of source and destination of extents.

The process can be monitored via the Tasks view (shown earlier) or at a very granular level such as per disk (SSD or HDD).

The below shows us the status of the disk is Migrating Data and it also shows the drive had a significant amount of data on it as this was not an empty cluster demonstration. In fact this screen shot was taken about halfway through the node removal process.

DiskStatus

So many of you may be wondering what the CVM CPU utilisation is throughout this process During the process I took the following screenshot showing the eight Controller VMs, there vCPU configuration (8 vCPUs) and the CPU utilisation.

CVMCPUutilRemoveHost

As we can see, the utilisation ranges from just 6% through to 16% with an average of just under 10%. It should be noted these nodes are using Intel Ivy Bridge processors so with latest generation Intel Broadwell chipsets the process would use less percentage of CPU and perform faster (due to higher per core performance) than on this 3 year old equipment.

Note: The CVM is not just doing IO processing. It is providing the full AHV / AOS management stack which makes the fact the CVM is using under 10% CPU even more impressive.

The Remove host task also resets the configuration of the Controller VM (CVM) back to default which ensures the node can be quickly/easily added to a new or existing cluster.

The end result is a fully functional 7 node cluster as shown below.

EndResultNodeRemoval

Summary:

Node removal from a Nutanix cluster (regardless of hypervisor) is a 1-Click, Non disruptive operation which maintains cluster resiliency at all times while being a fast and low impact process.

Related Articles:

1. VMware you’re full of it (FUD) : Nutanix CVM/AHV & vSphere/VSAN overheads

2. Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor

3. Think HCI is not an ideal way to run mission-critical x86 workloads? Think Again!

Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again! – Part 2

Now continuing from Part 1, lets look at another one of VCE COO Todd Pavone’s statements from the COO: VCE converged infrastructure not affected by Dell-EMC article:

We believe that there was a major gap in the core data center for hyper-converged, where customers wanted hyper-converged architecture — they don’t want to invest in tier-one storage or tier-one servers. They want the intelligence in the software, but they also want massive scale. This is for globals, large service providers in a massive scale, like thousands of nodes. We have a large financial service company in New York that is using us for a platform-free application build-up. And they want to pilot it with 10,000 users, but it’s going to go to 10 million users. And so, can we give them an infrastructure for 10,000, but can scale simply and easily to 10 million — or 20 million?

You can’t do that on an appliance, right? But they want hyper-converged. When you get to 10 million users, you want an infrastructure that scales and is nonlinear, leading to a lower cost model. So, we said, “There’s a gap in that market,” and we created the rack.

Let’s again address these points:

  • Todd: “They don’t want to invest in tier-one storage or tier-one servers. They want the intelligence in the software, but they also want massive scale.”

If customers don’t want to invest in what I would call “traditional” tier one storage and servers, them I’d have to agree with them they need a very different solution, such as Nutanix if they want to get to massive scale, especially if they want easy management & deployment.

Nutanix has customers ranging from 3 to thousands of nodes, in fact many of our large customers run Acropolis Hypervisor. So any question about scalability for Nutanix is just laughable.

  • Todd: “And they want to pilot it with 10,000 users, but it’s going to go to 10 million users. And so, can we give them an infrastructure for 10,000, but can scale simply and easily to 10 million — or 20 million? You can’t do that on an appliance, right?”

Well, you can with Nutanix! In fact that sounds like a common use case for Nutanix, we frequently design and pilot repeatable models and then scale as required.

  • Todd: “But they want hyper-converged. When you get to 10 million users, you want an infrastructure that scales and is nonlinear, leading to a lower cost model. So, we said, “There’s a gap in that market,” and we created the rack.”

It’s no surprise to me at all that customers want Hyperconverged and the ability to scale both linearly and non linearly. Nutanix can do this today and has been able to do it for a long time. Back in 2013 for example, you could mix NX3000 series being Compute heavy / Storage Light with NX6000 nodes which are Compute light and Storage Heavy. This is an example of non linear scaling which achieves the reduced cost (e.g.: Cost/GB) over time.

Then in 2014 an even wider range of nodes were released (NX1000, NX3000, NX6000 & NX8000) which enhanced Nutanix ability to scale both up and out, linearly and non linearly.

In 2015 Nutanix launched the NX-6035C “Storage Only” node which allows customers to Scale Storage separately to Compute, ensuring non linear scaling compute vs storage for customers with high capacity requirements. Importantly, no hypervisor licensing is required to scale storage as storage only nodes run Acropolis Hypervisor (AHV) which is fully interoperable with ESXi and Hyper-V environments.

Remember the Rule of thumb: Don’t scale capacity without scaling storage controllers!

Nutanix Storage Only nodes run a light weight Controller VM (CVM) to ensure Management, Monitoring and Data services (e.g.: Disk Balancing, Compression, Dedupe, Erasure Coding etc) do not degrade even when scaling compute and storage in a vastly non linear manner. Storage only nodes also help improve performance by participating in cluster replication (RF2/RF3) and disk balancing activities.

  • Todd: “So, we said, “There’s a gap in that market,” and we created the rack.”

There may have been a gap back in early 2013, but since then Nutanix has continued to innovate and lead the market with solutions to scale both linearly and non linearly, I’d say the gap has long been filled. Nutanix also scales management with a single HTML 5 GUI called PRISM, with central management of multiple clusters/sites/geographical locations via PRISM central.

Summary:

I’m sure it’s pretty obvious by now VCE COO Todd Pavone and I have different opinions on what HCI is capable of. During my time at Nutanix I have seen countless successful small, medium and large scale mission-critical application deployments and the percentage of Nutanix business from these workloads continues to increase thanks to our investment in a dedicated vBCA team which I am fortunate to be a part of.

Next time you’re considering new infrastructure for mission critical application, reach out and I’ll happily work with you and see if Nutanix is a good fit for your use case.

Let me finish by saying, I can guarantee you that if in the unlikely event the workload/s are not suitable for Nutanix, I will be the first one to tell you, and help you find an alternate solution.

Back to Part 1.

Scaling Hyper-converged solutions – Compute only.

A quick bit of history on Nutanix, back in mid 2013 when I joined, in almost every meeting I went to, and presentation I gave, there was a common theme. People wanted to scale compute and storage at different rates.

Now this makes perfect sense, and this issue has long been addressed by a large range of node types which can be mixed in the same Nutanix cluster.

For example: NX3060 nodes with Dual Intel Haswell CPUs and ~2TB usable storage can be mixed with NX6060 nodes also running dual Intel Haswell CPUs but with ~8TB usable each.

Nutanix also has configure to order (CTO) nodes where size of SSDs and HDDs can be modified to suit customer requirements. So at this point I never have a challenge sizing for a customer workload as I have plenty of great options to choose from.

Another common question has been “How do I scale storage only?”. Nutanix has also addressed this in an intelligent way and as a result adding “Storage Only” nodes makes sense as I described in Scale Storage separately to Compute on Nutanix!

In recent months a new question has emerged and a small percentage of partners/customers have been asking about adding Compute only nodes (e.g.: Traditional ESXi hosts) to a Nutanix (or HCI) cluster.

My first question to these customers/partners is: Why?

The typical reply is something like “Because we need to add more VMs which have low storage requirements” or “Because we don’t need storage”.

Let’s look at these answers:

Firstly, my favourite one, “Because we don’t need storage”.

Is this really true, or do you mean the new VMs have low storage requirements. In almost all cases the truth is the new VMs have a small requirement for storage capacity and performance.

So next let’s look at the other common (and more realistic) situation:

“Because we need to add more VMs which have low storage requirements”

So this is very possible and something a HCI solution should cater for and for Nutanix we do. For example one of our most popular nodes is the NX-3050 or NX-3060 which are a compute heavy node with 2 sockets each with up to 24 physical CPU cores (Haswell) and 512GB RAM.

This node also comes with 2 x SSDs and 4 x SATA HDDs with a minimum usable capacity of approx 2TB (of which 20% is SSD).

So while the solution adds some capacity, its giving the added advantage of ensuring all the advantages of HCI while eliminating the complexity of a 3-tier architecture, which is why customers are flocking to HCI in the 1st place.

Even if the capacity is not required and the SSDs simply service the reads locally where required and increase the shared SSD tier of the cluster which means more write performance for workloads throughout the cluster. Sounds pretty good to me!

Does having an additional 4 x SATA drives really matter? Well from a cost perspective, its minimal cost and thanks to Disk Balancing, the SATA drives will hold some data (such as replicas) which lowers the overheads on other nodes, therefore improving resiliency and performance.

So there is lots of advantages to adding even a small amount of storage even if the new workloads don’t require most of it.

But for those of you who aren’t already convinced that adding some storage is advantageous, how about adding dual Intel Haswell CPUs and up to 512GB RAM just 1 x SSD to accelerate write I/O and serve what little storage locally that the VMs need and just 2 x SATA HDDs.

Nutanix has such a node, which is another option to scale high compute and very low storage.

Another question I get is: “Is the fact Nutanix can’t do this why you don’t recommend it?”

The answer is, Nutanix can add compute only, and we can actually do it very well and get very good performance, but its not HCI and it adds complexity which is not necessary which is why we don’t recommend (or Productise) this option.

Now let’s look at what adding compute only to HCI looks like?

warning-contents-may-offend_design-200x200 (1)

*Scroll down when ready!

V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V

HCInotHCI

 

Yuk! That looks like old school 3-tier stuff to me!

As the above shows, adding Compute Only to HCI basically means you have a non HCI solution for part of your workloads.

Non HCI workloads on compute only nodes would therefore:

  • Be running in the same setup as traditional 3-tier infrastructure
  • Have different performance than HCI based workloads
  • Loose the advantage of having compute + storage close together
  • Increase dependency on Network
  • Impact network utilization of HCI node
  • Impact benefits of HCI for the native HCI workloads and much more.

The industry has accepted HCI as they way of the future and while adding compute only nodes might sound nice at a high level, its just re-introducing the class 3-tier complexity and problems of the past.

Summary:

If you have already invested in HCI, you clearly understand the advantages and value of the solution. Adding compute only is not a true “value” its just a “perceived value”.

Adding “Compute only” is just adding complexity and moving away from the value HCI brings, so my advice, don’t make the mistake, but if you have, you now know the solution.

Invest in a compute+storage node (albeit at a higher CAPEX) and enjoy the continued value of HCI and improve performance and resiliency to your entire cluster! Now that’s real value (at a reasonable cost).

And just remember….

cheaper

Related Posts:

1. Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM