Dare2Compare Part 1 : HPE/Simplivity’s 10:1 data reduction HyperGuarantee Explained

HPE have been relentless with their #HPEDare2Compare twitter campaign focused on the market leading Nutanix Enterprise Cloud platform and I for one have had a good laugh from it. But since existing and prospective customers have been asking for clarification I thought I would do a series of posts addressing each claim.

In part of this series, I will respond to the claim (below) that Nutanix can’t guarantee at least a 10:1 data efficiency ratio.

HPEDare2CompareTweetLol

Firstly let’s think about what is being claimed by HPE/SVT.

If you’re a potential customer, it would be fair for you to assume that if you have 100TB on your current SAN/NAS and you purchase HPE/SVT, you would only need to buy 10TB plus some room growth or to tolerate failures.

But this couldn’t be further from the truth. In fact, if you have 100TB today, you’ll likely need to purchase a similar capacity of HPE/SVT as most platform, even older/legacy ones have some data efficiency already, so what HPE/SVT is offering with deduplication and compression is nothing new or unique.

Let’s go over what the “HyperGuarantee” states and why it’s not worth the paper it’s written on.

HyperEfficientLol

It sounds pretty good, but two things caught my eye. The first is “relative to comparable traditional solutions” which excludes any modern storage which has this functionality (such as Nutanix) and the words “across storage and backup combined”.

Let’s read the fine print about “across storage and backup combined”.

HyperEfficientMoreLol

Hold on, I thought we were talking about a data reduction guarantee but the fine print is talking about a caveat requiring we configure HPE/Simplivity “backups”?

The first issue is if you use an enterprise backup solution such as Commvault, or SMB plays such as Veeam? The guarantee is void and with good reason as you will (HPE)discover shortly. 😉

Let’s do the math on how HPE/SVT can guarantee 10:1 without giving customers ANY real data efficiency compared to even legacy solutions such as Netapp or EMC VNX type platforms.

  1. Let’s use a single 1 TB VM as a simple example.
  2. Take 30 snapshots (1 per day for 30 days) and count each snapshot as if it was a full “backup” to disk.
  3. Data stored now equals 31TB  (1 TB + 30 TB)
  4. Actual Size on Disk is only ~1TB (This is because snapshots don’t create any copies of the data)
  5. Claimed Data Efficiency is 31:1
  6. Effective Capacity Savings = 96.8% (1TB / 31TB = 0.032) which is rigged to be >90% every time

So the guarantee is satisfied by default, for every customer and without actually providing data efficiency for your actual data!

I have worked with numerous platforms over the years, and the same result could be guaranteed by Netapp, Dell/EMC, Nutanix and many more. In my opinion the reason these vendors don’t have a guarantee is because this capability has long been table stakes.

Let’s take a look at a screenshot of the HPE/SVT interface (below).

SimplyshittyScreenshot

Source of image us an official SVT case study which can be found at: https://www.simplivity.com/case-study-coughlan-companies/

It shows an efficiency of 896:1 which again sounds great, but behind the smoke and mirrors it’s about as misleading as you can get.

Firstly the total “VM data” is 9.9TB

The “local backups” which are actually just pointer based copies (not backups at all) reports 3.2PB.

Note: To artificially inflate the report “deduplication” ratio, simply schedule more frequent metadata copies (what HPE/SVT incorrectly refer to as “backups”) and the ratio will increase.

The “remote backups” funnily enough are 0.0Kb which means the solution actually has no backups.

The real data reduction ratio can be easily calculated by taking the VM data of 9.9TB and dividing that by the “Used” capacity of 3.7TB which equates to 2.67:1 which can be broken down to be 2:1 compression as shown in the GUI with a <1.5:1 deduplication ratio.

In short, the 10:1 data efficiency HyperGuarantee is not worth the paper it’s written on, especially if you’re using a 3rd party backup product. If you choose to use the HPE/SVT built in pointer based option with or without replication, you will see the guaranteed efficiency ratio but don’t be fooled into thinking this is something unique to HPE/SVT as most other vendors including Nutanix have the same if not better functionality.

Remember, other vendors including Nutanix do not report metadata copies as “backups” or “data reduction” because its not.

So ask your HPE/SVT rep: “How much deduplication and compression is guaranteed WITHOUT using their pointer based “backups”. The answer is NONE!

For more information read this article which has been endorsed by multiple vendors on what should be included in data reduction ratios.

Return to the Dare2Compare Index:

Sizing infrastructure based on vendor Data Reduction assumptions – Part 2

In part 1, we discussed how data reduction ratios can, and do, vary significantly between customers and datasets and that making assumptions on data reduction ratios, even when vendors provide guarantees, does not protect you from potentially serious problems if the data reduction ratios are not achieved.

In Part 2 we will go through an example of how misleading data reduction guarantees can be.

One HCI manufacturer provides a guarantee promising 10:1 which sounds too good to be true, and that’s because it, quite frankly, isn’t true. The guarantee includes a significant caveat for the 10:1 data reduction:

The savings/efficiency are based on the assumption that you configure a backup policy to take at least one <redacted> backup per day of every virtual machine on every<redacted> system in a given VMware Datacenter with those backups retained for 30 days.

I have a number of issues with this limitation including:

  1. The use of the word “backup” referring directly/indirectly to data reduction (savings)
  2. The use of the word “backup” when referring to metadata copies within the same system
  3. No actual deduplication or compression is required to achieve the 10:1 data reduction because metadata copies (or what the vendor incorrectly calls “backups”) are counted towards deduplication.

It is important to note, I am not aware of any other vendor who makes the claim that metadata copies ( Snapshots / Point in time copies / Recovery points etc.) are deduplication. They simply are not.

I have previously written about what should be counted in deduplication ratios, and I encourage you to review this post and share your thoughts as it is still a hot topic and one where customers are being oversold/mislead regularly in my experience.

Now let’s do the math on my claim that no actual deduplication or compression is required to achieve the 10:1 ratio.

Let’s use a single 1 TB VM as a simple example. Note: The size doesn’t matter for the calculation.

Take 1 “backup” (even though we all know this is not a backup!!) per day for 30 days and count each copy as if it was a full backup to disk, Data logically stored now equals 31TB  (1 TB + 30 TB).

The actual Size on disk is only a tiny amount of metadata higher than the original 1TB as the metadata copies pointers don’t create any copies of the data which is another reason it’s not a backup.

Then because these metadata copies are counted as deduplication, the vendor reports a data efficiency of 31:1 in its GUI.

Therefore, Effective Capacity Savings = 96.8% (1TB / 31TB = 0.032) which is rigged to be >90% every time.

So the only significant capacity savings which are guaranteed come from “backups” not actual reduction of the customer’s data from capacity saving technologies.

As every modern storage platform I can think of has the capability to create metadata based point in time recovery points, this is not a new or even a unique feature.

So back to our topic, if you’re sizing your infrastructure based on the assumption of the 10:1 data efficiency, you are in for rude shock.

Dig a little deeper into the “guarantee” and we find the following:

It’s the ratio of storage capacity that would have been used on a comparable traditional storage solution to the physical storage that is actually used in the <redacted> hyperconverged infrastructure.  ‘Comparable traditional solutions’ are storage systems that provide VM-level synchronous replication for storage and backup and do not include any deduplication or compression capability.

So if you, for example, had a 5 year old NetApp FAS, and had deduplication and/or compression enabled, the guarantee only applies if you turned those features off, allowed the data to be rehydrated and then compared the results with this vendor’s data reduction ratio.

So to summarize, this “guarantee” lacks integrity because of how misleading it is. It  is worthless to any customer using any form of enterprise storage platform probably in the last 5 –  10 years as the capacity savings from metadata based copies are, and have been, table stakes for many, many years from multiple vendors.

So what guarantee does that vendor provide for actual compression and deduplication of the customers data? The answer is NONE as its all metadata copies or what I like to call “Smoke and Mirrors”.

Summary:

“No one will question your integrity if your integrity is not questionable.” In this case the guarantee and people promoting it have questionable integrity especially when many customers may not be aware of the difference between metadata copies and actual copies of data, and critically when it comes to backups. Many customers don’t (and shouldn’t have too) know the intricacies of data reduction, they just want an outcome and 10:1 data efficiency (saving) sounds to any reasonable person as they need 10x less than I have now… which is clearly not the case with this vendors guarantee or product.

Apart from a few exceptions which will not be applicable for most customers, 10:1 data reduction is way outside the ballpark of what is realistically achievable without using questionable measurement tactics such as counting metadata copies / snapshots / recovery points etc.

In my opinion the delta in the data reduction ratio between all major vendors in the storage industry for the same dataset, is not a significant factor when making a decision on a platform. This is because there are countless other substantially more critical factors to consider. When the topic of data reduction comes up in meetings I go out of my way to ensure the customer understands this and has covered off the other areas like availability, resiliency, recoverability, manageability, security and so on before I, quite frankly waste their time talking about table stakes capability like data reduction.

I encourage all customers to demand nothing less of vendors than honestly and integrity and in the event a vendor promises you something, hold them accountable to deliver the outcome they promised.

Sizing infrastructure based on vendor Data Reduction assumptions – Part 1

One of the most common mistakes people make when designing solutions is making assumptions. Assumptions in short are things an architect has failed to investigate and/or validate which puts a project at risk of not delivering the desired business outcome/s.

A great example of a really bad assumption to make is what data reduction ratio a storage platform will deliver.

But what if a vendor offers a data reduction guarantee and promises to give you as much equipment required if the ratio is not achieved, you’re protected right? The risk of your assumption being wrong is mitigated with the promise of free storage. Hooray!

Let’s explore this for a minute using an example of one of the more ludicrous guarantees going around the industry at the moment:

A guarantee of 10:1 data reduction!

Let’s say we have 100TB of data, that means we’d only need 10TB right? This might only be say, 4RU of equipment which sounds great!

After deployment, we start migrating and we only get a more realistic 2:1 data reduction, at which point the project stalls due to lack of capacity.

I go back to the vendor and lets say, best case scenario they agree on the spot (HA!) to give you more equipment, its unlikely to be delivered in less than 4 weeks.

So your project is delayed a minimum of 4 weeks until the equipment arrives. You now need to go through your change control process and if you’re doing this properly it would be documented with detailed steps on how to install the equipment, including appropriate back out strategies in the event of issues.

Typically change control takes some time to prepare, go through approvals, documentation etc especially in larger mission critical environments.

When installing any equipment you should also have documented operational verification steps to ensure the equipment has been installed correctly and is highly available, performing as expected etc.

Now that the new equipment is installed, the project continues and all 100TB of your data has been migrated to the new platform. Hooray!

Now let’s talk about the ongoing implications of the assumption of 10:1 data reduction only resulting in a much more realistic 2:1 ratio.

We now have 5x more equipment than we expected, so assuming the original 10TB was 4RU, we would now have 20RU of equipment which is taking up valuable real estate in our datacenter, or which may have required you to lease another rack in your datacenter.

If the product you purchased was a SAN/NAS, you now have lower IOPS/GB as you have just added a bunch more disk shelves to the existing controllers. This is because the controllers have a finite amount of performance, and you’ve just added more drives for it to manage. More drives on a traditional two controller SAN/NAS is only a good thing if the controller is not maxed out, and with flash ever increasing in performance, Controllers will be assuming they are not already the bottleneck.

If the product was HCI, now you require considerably more network interfaces. Depending on the HCI platform, you may require more hypervisor licensing, further increasing CAPEX and OPEX.

Depending on the HCI product, can you even utilise the additional storage without changing the virtual machines configuration? It might sound silly but some products don’t distribute data throughout the cluster, rather having mirrored objects so you may even need to create more virtual disks or distribute the VMs to make use of the new capacity.

Then you need to consider if the HCI product has any scale limitations, as these may require you to redesign your solution.

What about operational expenses? We now have 5x more equipment, so our environmental costs such as power & cooling will increase significantly as will our maintenance windows where we now have to patch 5x more hypervisor nodes in the case of HCI.

Typically customers no longer size for 3-5 years due to the fact HCI is becoming the platform of choice compared to SAN/NAS. This is great but when your data reduction assumption is wrong, (in this example off by 5x) the ongoing impact is enormous.

This means as you scale, you need to scale at 5x the rate you originally designed for. That’s 5x more rack units (RU), 5x more Power, 5x more cooling required, potentially even 5x more hypervisor licensing.

What does all of this mean?

Your Total Cost of Ownership (TCO) and Return on Investment (ROI) goes out the window!

Interestingly, Nutanix recently considered offering a data reduction guarantee and I was one of many who objected and strongly recommended we not drop to the levels of other vendors just because it makes the sales cycle easier.

All of the reasons above and more were put to Nutanix product management and they made the right decision, even though Nutanix data reduction (and avoidance) is very strong, we did not want to put customers in a position where their business outcomes were potentially at risk due to assumptions.

Summary:

While data reduction is a valuable part of a storage platform, the benefits (data reduction ratio) can and do vary significantly between customers and datasets. Making assumptions on data reduction ratios even when vendors provide lots of data showing their averages and providing guarantees, does not protect you from potentially serious problems if the data reduction ratios are not achieved.

In Part 2, I will go through an example of how misleading data reduction guarantees can be.