Dare2Compare Part 6 : Nutanix data efficiency stats can’t be found

If you’ve not read Parts 1 through 5, we have already proven several claims by HPE Simplivity regarding Nutanix to be false, as well as explored the misleading way in which HPE SVT promote data efficiency.

We continue with Part 6 where we will discuss HPE’s claim that “Nutanix data efficiency stats are stealthier than a ninja”. (below)

While HPE’s claim is an attempt to create Fear, Uncertainty and Doubt (FUD), HPE are partially correct in that we (Nutanix) have done a very poor job of promoting the arguably market leading data efficiency that Nutanix provides.

In fact, several colleagues and I created a feature request to properly report in a clear and detailed way, the ADSF data efficiencies and I am pleased to say these changes were included as part of the recent AOS 5.1 release.

Now what Nutanix users see in PRISM “Storage” view is (as shown below):

  1. A Capacity optimization overview
  2. Data reduction ratio which is made up of deduplication, compression and erasure coding savings*.
  3. Data reduction savings which is a total GB/TB/PB value from data reduction
  4. An Overall Efficiency ratio which is a combination of Data Reduction, Cloning and Thin Provisioning

*Metadata copies/snapshops/pointers etc are not included in the deduplication value as they are not deduplication.

The resulting summary is very clear and easy to understand so customers can see what efficiencies are from data reduction, and which savings (which typically form by far the largest “efficiency”) come from Cloning and thin provisioning.

DataReductionSummary2

One major item which will be included in an upcoming release is zero suppression. Zero suppression is a capability which has been in Nutanix Distributed Storage Fabric since Day 1 and it avoids unnecessarily storing zeros, instead storing metadata which achieves the same outcome but is much higher performance and uses much less capacity.

Nutanix snapshots or pointer based copies (depending on how you refer to them) are also not included in the overall efficiency number, however these will also be included as a seperate line item in a future release as we aim to be very clear regarding what data efficiencies a customer is achieving with Nutanix.

Some vendors recommend Eager Zero Thick (EZT) VMDKs on vSphere, and then deduplicate the zeros which artificially increases the deduplication ratio. Nutanix does not do this as it’s inefficient to create more data to deduplicate when you can simply avoid writing the data in the first place. However we do plan to report the savings from Zero suppression as a seperate line item as it is a value our platform provides.

For a more detailed view, Nutanix customers can dive down into the storage,Diagram view where admins can view of each containers data efficiency breakdown (as shown below).

DetailedContainerView

As we can see, Nutanix is very transparent showing what data reduction features are enabled, what ratio is being achieved, the total, used, reserved and even Thick Provisioned storage with an effective free based on physical multiplied by data reduction ratio and an overall efficiency value.

Now that we’ve covered off how Nutanix measures and reports on data reduction/efficiency, I’d like to highlight a critical factor when discussing data reduction/efficiency and that is that data efficiency is totally dependant on the individual customers data. For the same dataset, the difference between vendors with the same capabilities, e.g.: Deduplication, Compression and Erasure Coding (EC-X) are unlikely to be vastly different (or better put, change a business outcome one way or another) despite what each vendor will say about their implementation of such technologies.

In short: The biggest factor in the achieved data reduction is not the vendor, it’s the customer data.

With that said, if you’re comparing HPE SVT and Nutanix, then there is a pretty major delta between the two products in terms of capabilities and that is because Nutanix supports Erasure Coding (EC-X) and HPE SVT does not.

As a result, Nutanix has a major advantage as Erasure Coding in the Nutanix Acropolis Distributed Storage Fabric (ADSF) is complimentory to both deduplication and compression.

Unlike Compression and Deduplication, Erasure Coding can provide savings (or another way to look at it would be lower data redundancy overheads) regardless of the data type.

So where Deduplication and Compression will get minimal/no savings for data such as Video files, Erasure Coding still provides savings so the delta between Nutanix and HPE SVT will only increase in Nutanix favour the less the customer data will dedupe and/or compress.

HPE SVT on the other hand has a RAID (RAID 6 being N-2 usable or RAID 60 being N-4 usable) overhead and on top of that, use replication (2 copies / 50% usable) for an usable capacity (of raw) of well below 50% depending on the number of drives per node.

Nutanix, using RF2 and EC-X provides between 50% (minimum) and 80% (maximum) usable capacity of RAW and with RF3 (N+2) between 33% (minimum) and 66% (maximum) usable excluding the benefits of compression and deduplication.

The next major factor in data efficiency ratios is how they are measured!

In Part 1 I have already covered how misleading HPE SVT’s 10:1 efficiency guarantee is, and this is a great example of why it can be difficult to compare apples/apples between vendors. Nutanix on the other hand does not measure data efficiency in the same misleading manner.

In Summary:

  1. Nutanix AOS 5.1 has comprehensive data reduction/efficiency reporting within the PRISM HTML GUI
  2. Nutanix data reduction capabilities exceed that of HPE SVT as both products have Dedupe and Compression, but Erasure Coding (EC-X) is only supported on Nutanix
  3. All data reduction capabilities on Nutanix are complimentory, so Dedupe , Compression and Erasure Coding can all work together to maximise savings.
  4. Erasure Coding provides data reduction even for data which is not compressible or dedupeable
  5. Nutanix data efficiency stats are easily visible in the PRISM GUI and are much more detailed than HPE SVT

Return to the Dare2Compare Index:

But wait, there’s more!

As far as data reduction results are concerned, they are all over twitter and a simple search comes up with many examples. The first one being my favorite. Not because of the data reduction ratio itself but because it shows one of the major values of a 100% software solution where a simple software upgrade (which is one-click rolling, non-disruptive) provided the customer a significantly higher data reduction ratio. So basically, the customer got more capacity for free!

Note: None of the below show the latest data efficiency reporting capabilities from AOS 5.1.

Here are a few other examples which I found using this Twitter search:

Dare2Compare Part 4 : HPE provides superior resiliency than Nutanix?

As discussed in Part 1, we have proven HPE have made false claims about Nutanix snapshot capabilities as part of the #HPEDare2Compare twitter campaign.

In part 2, I explained how HPE/Simplivity’s 10:1 data reduction HyperGuarantee is nothing more than smoke and mirrors and that most vendors can provide the same if not greater efficiencies, even without hardware acceleration.

In part 3, I corrected HPE on their false claim that Nutanix cannot support dedupe without 8vCPUs and in part 4, I will respond to the claim (below) that Nutanix has less resiliency than HPE Simplivity 380.

To start with, the biggest causes of data loss, downtime, outages etc in my experience are caused by human error. From poor design, improper use of a product, poor implementation/validation and a lack of operations procedures or discipline to follow procedures, the number of times I’ve seen properly designed solutions have issues I can count on one hand.

Those rare situations have came down to multiple concurrent failures at different levels of the solution (e.g.: Infrastructure, Application, OS etc), not just things like one or more drive or server failures.

None the less, HPE Simplivity are commonly targeting Resiliency Factor 2 (RF2) and claiming it not to be resilient because they lack a basic understanding of the Acropolis Distributed Storage Fabric and how it distributes data, rebuilds from failures and therefore how resilient it is.

RF2 is often mistakenly compared to RAID 5, where a single drive failure takes a long time to rebuild and subsequent failures during rebuilds are not uncommon which would lead to a data loss scenario (for RAID 5).

Lets talk about some failure scenarios comparing HPE Simplivity to Nutanix.

Note: The below information is accurate to the best of my knowledge and testing, experience with both products.

When is a write acknowledged to the Virtual machine

HPE Simplivity – They use what they refer to as an Omnistack Accelerator card (OAC) which uses “Super capacitors to provide power to the NVRAM upon a power loss”. When a write hits the OAC it is then acknowledged to the VM. It is assumed or even likely that the capacitors will provide sufficient power to commit the writes persistently to flash but the fact is that writes are acknowledged BEFORE it is committed to persistent media. HPE will surely argue the OAC is persistent, but until the data is on something such as a SATA-SSD drive I do not consider it persistent and invite you to ask your trusted advisor/s their option because this is a grey area at best.

This can be confirmed on Page 29 of the SimpliVity Hyperconverged Infrastructure Technology Overview:

OACPowerLossLol

Nutanix – Writes are only acknowledged to the Virtual Machine when the write IO has been checksummed and confirmed written to persistent media (e.g.: SATA-SSD) on the number of nodes/drives based on the configured Resiliency Factor (RF).

Writes are never written to RAM or any other non persistent media and at any stage you can pull the power from a Nutanix node/block/cluster and 100% of the data will be in a consistent state. i.e.: It was written and acknowledged, or it was not written and therefore not acknowledged.

The fact Nutanix only acknowledges writes when data is written to persistent media on two or more hosts makes the platform compliant with FUA and Write Through which for HPE SVT, in the best case is dependant on power protection (UPS and/or OAC Capacitors) means Nutanix is more resilient (less risk) and has a higher level of data integrity than the HPE SVT product.

Checkout “Ensuring Data Integrity with Nutanix – Part 2 – Forced Unit Access (FUA) & Write Through” for more information and this will explain how Nutanix is compliant to critical data integrity protocols such as FUA and Write through and you can make your mind up if the HPE product is or not. Hint: A product is not compliant to FUA unless data is written to persistent media before acknowledgement.

Single Drive (NVMe/SSD/HDD) failure

HPE Simplivity – Protects data with RAID 6 (or RAID 5 on small nodes) + Replication (2 copies). A single drive failure causes a RAID rebuild which is a medium/high impact activity for the RAID group. RAID rebuilds are well known to be slow, this is one reason why HPE chooses (and wisely so) to use low capacity spindles to minimise the impact of RAID rebuilds. But this choice to use RAID and smaller drives has implications around cost/capacity/rack unit/power/cooling and so on.

Nutanix – Protects data with configurable Replication Factor (2 or 3 copies, or N+1 and N+2) along with rack unit (block) awareness. A single drive failure causes a distributed rebuild of the data contained on the failed drive across all nodes within the cluster. This distributed rebuild is evenly balanced throughout the cluster for low impact and faster time to recover. This allows Nutanix to support large capacity spindles, such as 8TB SATA.

Two concurrent drive (NVMe/SSD/HDD) failures *Same Node

HPE Simplivity – RAID 6 + Replication (2 copies) supports the loss of two drive failures and as with a single drive failure causes a RAID rebuild which is a medium/high impact activity for the RAID group.

Nutanix – Two drive failure causes a distributed rebuild of the data contained on the failed drives across all nodes within the cluster. This distributed rebuild is evenly balanced throughout the cluster for low impact and faster time to recover. This allows Nutanix to support large capacity spindles, such as 8TB SATA. No data is lost even when using Resiliency Factor 2 (which is N+1), despite what HPE claims. This is an example of the major advantage Nutanix Acropolis Distributed File System has over the RAID and mirroring type architecture of HPE SVT.

Three concurrent drive (NVMe/SSD/HDD) failures *Same Node

HPE Simplivity – RAID 6 + Replication (2 copies) supports the loss of only two drives per RAID group, at this stage the RAID group has failed and all data must be rebuilt.

Nutanix – Three drive failures again just causes a distributed rebuild of the data contained on the failed drives (in this case, 3) across all nodes within the cluster. This distributed rebuild is evenly balanced throughout the cluster for low impact and faster time to recover. This allows Nutanix to support large capacity spindles, such as 8TB SATA. No data is lost even when using Resiliency Factor 2 (which is N+1). Again, despite what HPE claims. This is an example of the major advantage Nutanix Acropolis Distributed File System has over the RAID and mirroring type architecture of HPE SVT.

Four or more concurrent drive (NVMe/SSD/HDD) failures *Same Node

HPE Simplivity – The RAID 6 + Replication (2 copies) supports the loss of only two drives per RAID group, any failures 3 or more result in a failure RAID group and a total rebuild of the data is required.

Nutanix – Nutanix can support N-1 drive failures per node, meaning in a 24 drive system, such as the NX-8150, 23 drives can be lost concurrently without the node going offline and without any data loss. The only caveat is the lone surviving drive for a hybrid platform must be an SSD. This is an example of the major advantage Nutanix Acropolis Distributed File System has over the RAID and mirroring type architecture of HPE SVT.

Next let’s cover off failure scenarios across multiple nodes.

Two concurrent drive (NVMe/SSD/HDD) failures in the same cluster.

HPE Simplivity – RAID 6 protects from 2 drive failures locally perRAID group whereas Replication (2 copies) supports the loss of one copy (N-1). Assuming the RAID groups are intact, data would not be lost.

Nutanix – Nutanix has configurable resiliency (Resiliency Factor) of either 2 copies (RF2) or three copies (RF3). Using RF3, under any two drive failure scenario there is no data loss and it causes a distributed rebuild of the data contained on the failed drives across all nodes within the cluster.

When using RF2 and block (rack unit) awareness, in the event two or more drives fail within a block (which is up to 4 nodes of 24 SSDs/HDDs), there is no data loss. In fact, in this configuration Nutanix can support the loss of up to 24 drives concurrently e.g.: 4 entire nodes and 24 drives without data loss/unavailability.

When using RF3 and block awareness, Nutanix can support the loss of up to 48 drives concurrently e.g.: 8 entire nodes and 48 drives without data loss/unavailability.

Under no circumstances can HPE Simplivity support the loss of ANY 48 drives (e.g.: 2 HPE SVT nodes w/ 24 drives each) and maintain data availability.

This is another example of the major advantage Nutanix Acropolis Distributed File System has over the RAID and mirroring type architecture of HPE SVT. Nutanix distributes all data throughout the ADSF cluster, which is something HPE SVT cannot do which impacts both performance and resiliency.

Two concurrent node (NVMe/SSD/HDD) failures in the same cluster.

HPE Simplivity – If the two HPE SVT nodes mirroring the data both go offline, you have data unavailability at best, with data loss at worst. As HPE SVT is not a cluster, (note the careful use of the term “Federation”) it scales essentially in pairs and each pair cannot fail concurrently.

Nutanix – With RF3 even without the use of block awareness, any two nodes and all drives within those nodes can be lost, with no data unavailability.

Three or more concurrent node (NVMe/SSD/HDD) failures in the same cluster.

HPE Simplivity – As previously discussed, HPE SVT cannot support the loss of any two nodes, so three or more makes matters worse.

Nutanix – With RF3 and block awareness, up to eight (yes 8!!) can be lost along with all drives within those nodes, with no data unavailability. That’s up to 48 SSD/HDDs concurrently failing without data loss.

So we can clearly see Nutanix provides a highly resilient platform and there are numerous configurations which ensure two drive failures do not cause data loss despite what the HPE campaign suggests.

The above tweet would be like me configuring a HPE Proliant server with RAID 5 and complaining HPE lost my data when two drive fails, it’s just ridiculous.

The key point here is when deploying any technology to understand your requirements and configure the underlying platform to meet/exceed your resiliency requirements.

Installation/Configuration

HPE Simplivity – Dependant on vCenter.

Nutanix – Uses PRISM which is a fully distributed HTML 5 GUI with no external dependancies regardless of Hypervisor choice (ESXi, AHV, Hyper-V and XenServer). In the event any hypervisor management tool (e.g.: vCenter) is down, PRISM is fully functional.

Management (GUI)

HPE Simplivity – Uses a vCenter backed GUI. If vCenter is down, Simplivity cannot be fully managed. In the event a vCenter goes down, best case scenario vCenter HA is used, then management will have a short interruption.

Nutanix – Uses PRISM which is a fully distributed HTML 5 GUI with no external dependancies regardless of Hypervisor choice (ESXi, AHV, Hyper-V and XenServer). In the event any hypervisor management tool (e.g.: vCenter) is down, PRISM is fully functional.

In the event of a node/s failing, PRISM being a distributed management layer continues to operate.

Data Availability:

HPE Simplivity – RAID 6 (or RAID 60) + Replication (2 copies), Deduplication and Compression for all data. Not configurable.

Nutanix – Configurable resiliency and data reduction with:

  1. Resiliency Factor 2 (RF2)
  2. Resiliency Factor 3 (RF3)
  3. Resiliency Factor 2 with Block Awareness
  4. Resiliency Factor 3 with Block Awareness
  5. Erasure Coding / Deduplication / Compression in any combination across all resiliency types.

Key point:

Nutanix can scale out with compute+storage OR storage only nodes, in either case, resiliency of the cluster is increased as all nodes (or better said, Controllers) in our distributed storage fabric (ADSF) help with the distributed rebuild in the event of drive/s or node/s failures. Therefore restoring the cluster to a fully resilient state faster, to therefore be able to support subsequent failures.

HPE Simplivity – Due to HPE SVTs platform not being a distributed file system, and working in a mirror style configuration, adding additional nodes to the “per datacenter” limit of eight (8) does not increase resiliency. As such the platform does not improve as it grows which is a strength of the Nutanix platform.

Summary:

Thanks to our Acropolis Distributed Storage Fabric (ADSF) and without the use of legacy RAID technology, Nutanix can support:

  1. Equal or more concurrent drive failures per node than HPE Simplivity
  2. Equal or more concurrent drive failures per cluster than HPE Simplivity
  3. Equal or more concurrent node failures than HPE Simplivity
  4. Failure of hypervisor management layer e.g.: vCenter with full GUI functionality

Nutanix also has the follow capabilities over and above the HPE SVT offering:

  1. Configurable resiliency and data reduction on a per vDisk level
  2. Nutanix resiliency/recoverability improves as the cluster grows
  3. Nutanix does not require any UPS or power protection to be compliant with FUA & Write Through

HPE SVT is less resilient during the write path because:

  1. HPE SVT acknowledge writes before committing data to persistent media (by their own admission)

Return to the Dare2Compare Index:

Dare2Compare Part 3 : Nutanix can’t support Dedupe without 8vCPUs

As discussed in Part 1, we have proven HPE have made false claims about Nutanix snapshot capabilities as part of the #HPEDare2Compare twitter campaign.

In part 2, I explained how HPE/Simplivity’s 10:1 data reduction HyperGuarantee is nothing more than smoke and mirrors and that most vendors can provide the same if not greater efficiencies, even without hardware acceleration.

Now in part 3, I will respond to yet another false claim (below) that Nutanix cannot support dedupe without 8vCPUs.

This claim is interesting for a number of reasons.

1. There is no minimum or additional vCPU requirement for enabling deduplication.

The only additional CVM (Controller VM) requirement for enabling of deduplication is detailed in the Nutanix Portal (online documentation) which states:

DedupeEnable

There is no additional vCPU requirement for enabling cache or capacity deduplication.

I note that the maximum 32GB RAM requirement is well below the RAM requirements for the HPE SVT product which can exceed 100GB RAM per node.

2. Deduplication is part of our IO engine (stargate) which is limited in AOS to N-2 vCPUs.

In short, this means the maximum number of vCPUs that stargate can use of a 8vCPU CVM is 6. However, this 6 vCPUs is not just for dedupe, its to process all I/O and things like statistics for PRISM (our HTML 5 GUI). Depending on the workload, only a fraction of the maximum 6 vCPUs are used, allowing those cores to be used for other workloads. (Hey, this is virtualization after all)

Deduplication itself uses a small fraction of the N-2 CPU cores and this brings us to my next point which speaks to the efficiency of the Nutanix deduplication compared to other vendors like HPE SVT who brute force dedupe all data regardless of the ratio which is clearly inefficient.

3. Nutanix Controller VM (CVM) CPU usage depends on the workload and feature set being used.

This is a critical point, Nutanix has configurable data reduction at a per vDisk granularity, Meaning for workloads which do not have a dataset which provides significant (or any) savings from deduplication, it can be left disabled (default).

This ensures CVM resources are not wasted performing what I refer to as “brute force” data reduction on all data regardless of the benefits.

4. Nutanix actually has global deduplication which spans across all nodes within a cluster whereas HPE Simplivity deduplication is not truly global. HPE Simplivity does not form a cluster of nodes, the nodes act more like HA pairs for the virtual machines and the deduplication in simple terms in with one or a pair of HPE SVT nodes.

I’ve shown this where 4 copies of the same appliance are deployed across four HPE SVT nodes and the deduplication ratio is only 2.1:1, if the deduplication was global the rate would be closer to, if not 4:1 and this is what we see on Nutanix.

Nutanix can also have defined deduplication boundaries, so customers needing to seperate data for any reason (e.g.: Multi-tenancy / Compliance) can create two containers, both with deduplication enabled and enjoy global deduplication across the entire cluster without having customers refer to the same blocks.

5. Deduplication is vastly less valuable than vendors lead you to believe!

I can’t stress this point enough. Deduplication is a great technology and it works very well on many different platforms depending on the dataset.

But deduplication does not solve 99.9% of the challenges in the datacenter, and is one of the most overrated capabilities in storage.

Even if Nutanix did not support deduplication at all, it would not prevent all our existing and future customers achieving great business outcomes. If a vendor such as HPE SVT want to claim they have the best dedupe in the world, I don’t think anyone really cares, because even if it was true (which in my opinion it is not), then the value of Nutanix is so far beyond the basic storage functionality that we’re still far and away the market leader that deduplication it’s all but a moot point.

For more information about what the vCPUs assigned to the Nutanix CVM provide beyond storage functions, check out the following posts which addresses FUD from VMware about the CVMs overheads and the value the CVM provides much of which is unique to Nutanix.

Nutanix CVM/AHV & vSphere/VSAN overheads

Cost vs Reward for the Nutanix Controller VM (CVM)

 

Return to the Dare2Compare Index: