Nutanix Dares2Compare with HPE Simplivity

The following is a list of claims made by HPE as part of the #HPEDare2Compare twitter campaign regarding Nutanix and a series of blog articles and Youtube videos which disprove these claims and highlight the value of the Nutanix platform.

Dare2Compare Part 1 : HPE/Simplivity’s 10:1 data reduction HyperGuarantee Explained

Dare2Compare Part 2 : HPE/Simplivity’s claim Nutanix snaps take 10x longer

Dare2Compare Part 3 : Nutanix can’t support Dedupe without 8vCPUs

Dare2Compare Part 4 : HPE provides superior resiliency than Nutanix?

Dare2Compare Part 5 : Nutanix can’t claim single screen management w/o extra fees or GUIs

Dare2Compare Part 6 : Nutanix data efficiency stats can’t be found

Dare2Compare Part 7 : HPE provides superior performance to Nutanix

 

More coming soon! Stay tuned!

Nutanix AHV/AOS Functionality – Removing nodes

A Nutanix ADSF (Acropolis Distributed Storage Fabric) is designed to live forever, meaning as new nodes are added and older nodes removed, the cluster remains online and critically, in a fully resilient state at all times.

While this might not sound that critical, it avoids problems which have plagued legacy (and even many modern) datacenter products where forklift upgrades/replacements are not only complex, high risk and time consuming, they typically also reduce the resiliency of the platform throughout the process.

A common example of reduced resiliency is where one (of two) SAN/NAS controllers is taken offline during a fork lift storage controller upgrade, meaning a single failure can cause the storage to be offline.

Nutanix has now been shipping product for around 5 years so we have had many customers go through hardware refresh cycles, and many more who are about to embark on a HW refresh.

I thought I would quickly demonstrate how easy it is to remove an old node from a cluster and ensure existing and prospective Nutanix customers have the facts about the node removal process.

Firstly lets look at the environment the demonstration is performed on.

We have an AHV environment with 8 nodes with a mix of NX3050 and NX6050 spread over 3 blocks as shown in Nutanix PRISM UI (below).

EnvironmentSummary

To remove a host, all we need to do is go to the hardware tab in PRISM, click the host we want to remove and select Remove Host as shown below.

RemoveHost

No preparation tasks are required at all which also means less planning and change control is required. Once you select Remove Host, the host enters maintenance mode and starts performing the required tasks to remove the node as shown below.

RemoveHost2

As you can see, Acropolis OS (AOS) is removing each individual disk from the cluster before taking the node out of the cluster. This means the configured Resiliency Factor (RF) is always in compliance, ensuring that data is still available even in the event of a drive or node failure. This can be observed on the PRISM Home screen in the data resiliency view shown below.

DataResiliencyStatus

This process is handled by the curator function of AOS and because data is distributed throughout all nodes within the cluster, the process is both lower impact than traditional RAID based solutions or solutions using RAID+Replication, as well as faster because all nodes and therefore CVMs, SSDs and HDDs participate in the process. Nutanix ADSF does not mirror or replicate data from one node to another node, but to and from all nodes. This eliminates the potential bottleneck of a single node.

The following shows the speed at which Nutanix Distributed Storage Fabric (ADSF) performs the data migration even when the majority of data resides on the HDD tier (including in this example).

StoragePoolPerfNodeRemove

For a cluster with 20 x 1TB and 20 x 4TB SATA spindles for a total of 100TB of SATA and just 6.4TB SSD (or approx 6.5%) the node removal rate where it reached >830MBps quite impressive since most of the extents (data) which needed to be replicated throughout the cluster were retrieved from SATA tier.

The rate at which a node can be removed will vary depending on the front end I/O, node types and cluster size with larger cluster sizes able to remove nodes faster due to more available controllers (CMVs) and importantly more choice of source and destination of extents.

The process can be monitored via the Tasks view (shown earlier) or at a very granular level such as per disk (SSD or HDD).

The below shows us the status of the disk is Migrating Data and it also shows the drive had a significant amount of data on it as this was not an empty cluster demonstration. In fact this screen shot was taken about halfway through the node removal process.

DiskStatus

So many of you may be wondering what the CVM CPU utilisation is throughout this process During the process I took the following screenshot showing the eight Controller VMs, there vCPU configuration (8 vCPUs) and the CPU utilisation.

CVMCPUutilRemoveHost

As we can see, the utilisation ranges from just 6% through to 16% with an average of just under 10%. It should be noted these nodes are using Intel Ivy Bridge processors so with latest generation Intel Broadwell chipsets the process would use less percentage of CPU and perform faster (due to higher per core performance) than on this 3 year old equipment.

Note: The CVM is not just doing IO processing. It is providing the full AHV / AOS management stack which makes the fact the CVM is using under 10% CPU even more impressive.

The Remove host task also resets the configuration of the Controller VM (CVM) back to default which ensures the node can be quickly/easily added to a new or existing cluster.

The end result is a fully functional 7 node cluster as shown below.

EndResultNodeRemoval

Summary:

Node removal from a Nutanix cluster (regardless of hypervisor) is a 1-Click, Non disruptive operation which maintains cluster resiliency at all times while being a fast and low impact process.

Related Articles:

1. VMware you’re full of it (FUD) : Nutanix CVM/AHV & vSphere/VSAN overheads

2. Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor

3. Think HCI is not an ideal way to run mission-critical x86 workloads? Think Again!

Fight the FUD: Nutanix scale limitations

I was reading COO: VCE converged infrastructure not affected by Dell-EMC on TechTarget this morning and came across the following quote from VCE COO Todd Pavone which I found a little amusing.

One of the risks that we see in the marketplace for these appliance players is they’re trying to take that appliance that’s been architected for what I think are more single, simple, edge use cases, and they’re trying to put those into the core. We said, “Rather than trying to do that, we’re going to build an architecture for scale.” Because if you study Nutanix and <Redacted>, any of these companies that we know really well, they have scale limitations. They get to certain nodes sizes, and they break. And then, you have to cut another cluster, you have to cut another cluster.

That’s not ideal for a core data center, because now, you’re managing all of them individually — you can’t tie them into your other core systems. And so, now, you have proliferating silos, which for us is … we think that’s a big no-no. Your operational costs aren’t going to improve.

What doesn’t surprise me is how much focus Nutanix gets from other vendors, especially EMC/VCE. Its a great validation of the success of the Nutanix platform and a great indication of what will be dominant datacenter architecture (Hyperconvered/HCI) and what platform will lead the market (Nutanix XCP) in the future.

As for this post, I will only speak about Nutanix Xtreme Computing Platform (XCP) and not about the other vendor he mentioned as I don’t see the value in talking about other vendors.

The below is my summary of the points Todd has made and my thoughts:

  • Todd: Nutanix has scale limitations

Josh: Nutanix has no Maximum cluster size (nodes per cluster). In fact, as the Nutanix Distributed Storage Fabric scales, the Write I/O continues to be distributed further meaning higher Write performance.

In this article (Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 3 – Scalability) I cover all aspects of scalability including Management, Performance, Capacity, Resiliency and how scaling effects Operational aspects.

While the above post is focusing on Acropolis Hypervisor (AHV), the scalability is also true when using other supported Hypervisors such as ESXi and Hyper-V within the limitations of those hypervisors.

I wonder if Todd would say vSphere has “Scale limitations” being they support clusters of 64? Probably not, he wouldn’t want to FUD VMware.

Update: Pretty timely claim by Todd when Nutanix has just delivered a >100 node, 2PB solution used for mixed workloads such as eDiscovery for Legal, High Performance SQL, MS Exchange and more.

Nutanix2PB

  • Todd: They get the certain node sizes and they break?

Josh: I believe Todd may have been referring to “Cluster sizes” as opposed to “Node sizes” but as he is unfamiliar with Nutanix technology he is using incorrect terminology.

The first point covers “cluster” sizing, now I’ll cover nodes sizing. Nutanix along with Dell and Lenovo has numerous different node configurations which range from one to four CPU sockets and up to 768G RAM with various SSD/HDD combinations including All-Flash.

There is not a node size maximum for the Acropolis Base Software (formally known as NOS), its simply a matter of practicality. Nutanix is a distributed platform, not a legacy monolithic centralised platform. As such, scaling out is by design to improve things like resiliency and performance.

Nutanix also recommends against scaling up as this increases the impact in the event of a single node failure. e.g.: A 3 node cluster has an impact of 33% with one node failure, but an 8 node cluster has only a 12.5% impact with one failure.

  • Todd: They get to certain nodes sizes, and they break. And then, you have to cut another cluster, you have to cut another cluster.

Josh: Apart from repeating himself and using the term “node” incorrectly (again), Todd is implying Nutanix forces you to create new clusters at a given scale (which he fails to mention). As I mentioned earlier, Nutanix has no Maximum cluster size (nodes per cluster).

But as any good architect knows, there are considerations such as failure domains, security and constraints where having multiple clusters may be required or simply advantageous. One of the many great things about Nutanix XCP is multiple clusters (even with different hypervisors) can be managed centrally with PRISM central.

That brings us nicely to Todd’s next point:

  • Todd: That’s not ideal for a core data center, because now, you’re managing all of them individually

Josh: This statement is the last part of the quoted section, and again Todd is talking management of “nodes” as opposed to clusters. So first point, Nutanix XCP requires 3 nodes to form a cluster and that cluster managed via PRISM Element. Where multiple clusters exist, PRISM central is then used as a single pane of glass to manage all clusters.

The below is a video showing PRISM Element for two clusters then joining them to a PRISM central instance for central management. Note: This is a fairly old video (posted September 22, 2014) as Nutanix has been doing this for a long time, as such, PRISM Element and Central have been enhanced since this was created.

Here is an example of scaling Nutanix VDI for 20K to 200K+ Power User Desktops. It is a good example showing a real world design with Management clusters and VDI clusters which takes into consideration failure domains. This also follows well proven and accepted best practices for VMware Horizon View deployments, where the scale limitations are at the vSphere/Horizon layer, not the Nutanix layer.

Summary:

This is yet another example of one vendor talking nonsense about a vendor they compete with. If its reliable information your after, speak to the vendor who makes the product/s your interested in, get them to tell you about the product then ask to speak with reference customers to validate the information you have been provided.

Competitive vendors will only focus on what they perceive to be the issues with a given competitors platform. A good vendor will focus on their product and not discuss competitors even when asked for comparisons by customers.

To quote a person I have learnt a lot from while at Nutanix, “While our competitors focus on us, We are focusing on our customers”Dheeraj Pandey Nutanix Founder and CEO.

FocusOnCustomers

Fight the FUD!

Follow up posts:

For more information about Nutanix XCP scalability see the following posts:

1. Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 3 – Scalability

2. Scaling Hyper-converged solutions – Compute only.

3. Scale Storage separately to Compute on Nutanix!