Expanding Capacity on a Nutanix environment – Design Decisions

I recently saw an article about design decisions around expanding capacity for a HCI platform which went through the various considerations and made some recommendations on how to proceed in different situations.

While reading the article, it really made me think how much simpler this process is with Nutanix and how these types of areas are commonly overlooked when choosing a platform.

Let’s start with a few basics:

The Nutanix Acropolis Distributed Storage Fabric (ADSF) is made up of all the drives (SSD/SAS/SATA etc) in all nodes in the cluster. Data is written locally where the VM performing the write resides and replica’s are distributed based on numerous factors throughout the cluster. i.e.: No Pairing, HA pairs, preferred nodes etc.

In the event of a drive failure, regardless of what drive (SSD,SAS,SATA) fails, only that drive is impacted, not a disk group or RAID pack.

This is key as it limited the impact of the failure.

It is importaint to note, ADSF does not store large objects nor does the file system require tuning to stripe data across multiple drives/nodes. ADSF by default distributes the data (at a 1MB granularity) in the most efficient manner throughout the cluster while maintaining the hottest data locally to ensure the lowest overheads and highest performance read I/O.

Let’s go through a few scenarios, which apply to both All Flash and Hybrid environments.

  1. Expanding capacityWhen adding a node or nodes to an existing cluster, without moving any VMs, changing any configuration or making any design decisions, ADSF will proactively send replicas from write I/O to all nodes within the cluster, therefore improving performance while reactively performing disk balancing where a significant imbalance exists within a cluster.

    This might sound odd but with other HCI products new nodes are not used unless you change the stripe configuration or create new objects e.g.: VMDKs which means you can have lots of spare capacity in your cluster, but still experience an out of space condition.

    This is a great example of why ADSF has a major advantage especially when considering environments with large IO and/or capacity requirements.

    The node addition process only requires the administrator to enter the IP addresses and its basically a one click, capacity is available immediately and there is no mass movement of data. There is also no need to move data off and recreate disk groups or similar as these legacy concepts & complexities do not exist in ADSF.

    Nutanix is also the only platform to allow expanding of capacity via Storage Only nodes and supports VMs which have larger capacity requirements than a single node can provide. Both are supported out of the box with zero configuration required.

    Interestingly, adding storage only nodes also increases performance, resiliency for the entire cluster as well as the management stack including PRISM.

  2. Impact & implications to data reduction of adding new nodesWith ADSF, there are no considerations or implications. Data reduction is truely global throughout the cluster and regardless of hypervisor or if you’re adding Compute+Storage or Storage Only nodes, the benefits particularly of deduplication continue to benefit the environment.

    The net effect of adding more nodes is better performance, higher resiliency, faster rebuilds from drive/node failures and again with global deduplication, a higher chance of duplicate data being found and not stored unnecessarily on physical storage resulting in a better deduplication ratio.

    No matter what size node/s are added & no matter what Hypervisor, the benefits from data reduction features such as deduplication and compression work at a global level.

    What about Erasure Coding? Nutanix EC-X creates the most efficient stripe based on the cluster size, so if you start with a small 4 node cluster your stripe would be 2+1 and if you expand the cluster to 5 nodes, the stripe will automatically become 3+1 and if you expand further to 6 nodes or more, the stripe will become 4+1 which is currently the largest stripe supported.

  3. Drive FailuresIn the event of a drive failure (SSD/SAS or SATA) as mentioned earlier, only that drive is impacted. Therefore to restore resiliency, only the data on that drive needs to be repaired as opposed to something like an entire disk group being marked as offline.

    It’s crazy to think a single commodity drive failure in a HCI product could bring down an entire group of drives, causing a significant impact to the environment.

    With Nutanix, a rebuild is performed in a distributed manner throughout all nodes in the cluster, so the larger the cluster, the lower the per node impact and the faster the configured resiliency factor is restored to a fully resilient state.

At this point you’re probably asking, Are there any decisions to make?

When adding any node, compute+storage or storage only, ensure you consider what the impact of a failure of that node will be.

For example, if you add one 15TB storage only node to a cluster of nodes which are only 2TB usable, then you would need to ensure 15TB of available space to allow the cluster to fully self heal from the loss of the 15TB node. As such, I recommend ensuring your N+1 (or N+2) node/s are equal to the size of the largest node in the cluster from both a capacity, performance and CPU/RAM perspective.

So if your biggest node is an NX-8150 with 44c / 512GB RAM and 20TB usable, you should have an N+1 node of the same size to cover the worst case failure scenario of an NX-8150 failing OR have the equivalent available resources available within the cluster.

By following this one, simple rule, your cluster will always be able to fully self heal in the event of a failure and VMs will failover and be able to perform at comparable levels to before the failure.

Simple as that! No RAID, Disk group, deduplication, compression, failure, or rebuild considerations to worry about.

Summary:

The above are just a few examples of the advantages the Nutanix ADSF provides compared to other HCI products. The operational and architectural complexity of other products can lead to additional risk, inefficient use of infrastructure, misconfiguration and ultimately an environment which does not deliver the business outcome it was originally design to.

Benchmark(et)ing Nonsense IOPS Comparisons, if you insist – Nutanix AOS 4.6 outperforms VSAN 6.2

As many of you know, I’ve taken a stand with many other storage professionals to try to educate the industry that peak performance is vastly different to real world performance. I covered this in a post titled: Peak Performance vs Real World Performance.

I have also given a specific example of Peak Performance vs Real World Performance with a Business Critical Application (MS Exchange) where I demonstrate that the first and most significant constraining factor for Exchange performance is compute (CPU/RAM) so achieving more IOPS is unnecessary to achieve the business outcome (which is supporting a given number of Exchange mailboxes/message per day).

However vendors (all of them) who offer products which provide storage, whether it is as a component such as in HCI or a fully focused offering, continue to promote peak performance numbers. They do this because the industry as a whole has and continues to promote these numbers as if they are relevant and trying to one-up each other with nonsense comparisons.

VMware and the EMC federation have made a lot of noise around In-Kernel being better performance than Software Defined Storage running within a VM which is referred to by some as a VSA (Virtual Storage Appliance). At the same time the same companies/people are recommending business critical applications (vBCA) be virtualized. This is a clear contradiction, as I explain in an article I wrote titled In-Kernel verses Virtual Storage Appliance which in short concludes by saying:

…a high performance (1M+ IOPS) solution can be delivered both In-Kernel or via a VSA, it’s simple as that. We are long past the days where a VM was a significant bottleneck (circa 2004 w/ ESX 2.x).

I stand by this statement and the in-kernel vs VSA debate is another example of nonsense comparisons which have little/no relevance in the real world. I will now (reluctantly) cover off (quickly) some marketing numbers before getting to the point of this post.

VMware VSAN 6.2

Firstly, Congratulations to VMware on this release. I believe you now have a minimally viable product thanks to the introduction of software based checksums which are essential for any storage platform.

VMW Claim One: For the VSAN 6.2 release, “delivering over 6M IOPS with an all-flash architecture”

The basic math for a 64 node cluster = ~93700 IOPS / node but as I have seen this benchmark from Intel showing 6.7Million IOPS for a 64 node cluster, let’s give VMware the benefit of the doubt and assume its an even 7M IOPS which equates to 109375 IOPS / node.

Reference: VMware Virtual SAN Datasheet

VMW Claim Two: Highest Performance >100K IOPS per node

The graphic below (pulled directly from VMware’s website) shows their performance claims of >100K IOPS per node and >6 Million IOPS per cluster.

Reference: Introducing you to the 4th Generation Virtual SAN

Now what about Nutanix Distributed Storage Fabric (NDSF) & Acropolis Operating System (AOS) 4.6?

We’re now at the point where the hardware is becoming the bottleneck as we are saturating the performance of physical Intel S3700 enterprise-grade solid state drives (SSDs) on many of our hybrid nodes. As such we have moved onto performance testing of our NX-9460-G4 model which has 4 nodes running Haswell CPUs and 6 x Intel S3700 SSDs per node all in 2RU.

With AOS 4.6 running ESXi 6.0 on a NX9460-G4 (4 x NX-9040-G4 nodes), Nutanix are seeing in excess of 150K IOPS per node, which is 600K IOPS per 2RU (Nutanix Block).

The below graph shows performance per node and how the solution scales in terms of performance up to a 4 node / 1 block solution which fits within 2RU.

NOS46Perf

So Nutanix AOS 4.6 provides approx. 36% higher performance than VSAN 6.2.

(>150K IOPS per NX9040-G4 node compared to <=110K IOPS for All Flash VSAN 6.2 node)

It should be noted the above Nutanix performance numbers have already been improved upon in upcoming releases going through performance engineering and QA, so this is far from the best you will see.

but-wait-theres-more

Enough with the nonsense marketing numbers! Let’s get to the point of the post:

These 4k 100% random read IOPS (and similar) tests are totally unrealistic.

Assuming the 4k IOPS tests were realistic, to quote my previous article:

Peak performance is rarely a significant factor for a storage solution.

More importantly, SO WHAT if Vendor A (in this case Nutanix) has higher peak performance than Vendor B (in this case VSAN)!

What matters is customer business outcomes, not benchmark(eting)!

holdup

Wait a minute, the vendor with the higher performance is telling you peak performance doesn’t matter instead of bragging about it and trying to make it sound importaint?

Yes you are reading that correctly, no one should care who has the highest unrealistic benchmark!

I wrote things to consider when choosing infrastructure. a while back to highlight that choosing the “Best of Breed” for every workload may not be a good overall strategy, as it will require management of multiple silos which leads to inefficiency and increased costs.

The key point is if you can meet all the customer requirements (e.g.: performance) with a standard platform while working within constraints such as budget, power, cooling, rack space and time to value, you’re doing yourself (or your customer) a dis-service by not considering using a standard platform for your workloads. So if Vendor X has 10% faster performance (even for your specific workload) than Vendor Y but Vendor Y still meets your requirements, performance shouldn’t be a significant consideration when choosing a product.

Both VSAN and Nutanix are software defined storage and I expect both will continue to rapidly improve performance through tuning done completely in software. If we were talking about a product which is dependant on offloading to Hardware, then sure performance comparisons will be relevant for longer, but VSAN and Nutanix are both 100% software and can/do improve performance in software with every release.

In 3 months, VSAN might be slightly faster. Then 3 months later Nutanix will overtake them again. In reality, peak performance rarely if ever impacts real world customer deployments and with scale out solutions, it’s even less relevant as you can scale.

If a solution can’t scale, or does so in 2 node mirror type configurations then considering peak performance is much more critical. I’d suggest if you’re looking at this (legacy) style of product you have bigger issues.

Not only does performance in the software defined storage world change rapidly, so does the performance of the underlying commodity hardware, such as CPUs and SSDs. This is why its importaint to consider products (like VSAN and Nutanix) that are not dependant on proprietary hardware as hardware eventually becomes a constraint. This is why the world is moving towards software defined for storage, networking etc.

If more performance is required, the ability to add new nodes and the ability to form a heterogeneous cluster and distribute data evenly across the cluster (like NDSF does) is vastly more importaint than the peak IOPS difference between two products.

While you might think that this blog post is a direct attack on HCI vendors, the principle analogy holds true for any hardware or storage vendor out there. It is only a matter of time before customers stop getting trapped in benchmark(et)ing wars. They will instead identify their real requirements and readily embrace the overall value of dramatically simple on-premises infrastructure.

In my opinion, Nutanix is miles ahead of the competition in terms of value, flexibility, operational benefits, product maturity and market-leading customer service all of which matter way more than peak performance (which Nutanix is the fastest anyway).

Summary:

  1. Focus on what matters and determine whether or not a solution delivers the required business outcomes. Hint: This is rarely just a matter of MOAR IOPS!
  2. Don’t waste your time in benchmark(et)ing wars or proof of concept bake offs.
  3. Nutanix AOS 4.6 outperforms VSAN 6.2
  4. A VSA can outperform an in-kernel SDS product, so lets put that in-kernel vs VSA nonsense to rest.
  5. Peak performance benchmarks still don’t matter even when the vendor I work for has the highest performance. (a.k.a My opinion doesn’t change based on my employers current product capabilities)
  6. Storage vendors ALL should stop with the peak IOPS nonsense marketing.
  7. Software-defined storage products like Nutanix and VSAN continue to rapidly improve performance, so comparisons are outdated soon after publication.
  8. Products dependant upon propitiatory hardware are not the future
  9. Put a high focus on the quality of vendors support.

Related Articles:

  1. Peak Performance vs Real World Performance
  2. Peak performance vs Real World – Exchange on Nutanix Acropolis Hypervisor (AHV)
  3. The Key to performance is Consistency
  4. MS Exchange Performance – Nutanix vs VSAN 6.0
  5. Scaling to 1 Million IOPS and beyond linearly!
  6. Things to consider when choosing infrastructure.

The truth about Storage Data efficiency ratios.

We’ve all heard the marketing claims from some storage vendors about how efficient their storage products are. Data efficiency ratios of 40:1 , 60:1 even 100:1 continue to be thrown around as if they are amazing, somehow unique or achieved as a result of proprietary hardware.

Let’s talk about how vendors may try to justify these crazy ratios:

For many years, Storage vendors have been able to take space efficient copies of LUNs, Datastores, Virtual Machines etc which rely on snapshots or metadata. These are not full copies and reporting this as data efficiency is quite mis-leading in my opinion as this is and has been for many years Table stakes.

Be wary of vendors encouraging (or requiring) you configure more frequent “backups” (which are after all just Snapshots or metadata copies) to achieve the advertised data efficiencies.

  • Reporting VAAI/VCAI clones as full copies

If I have a VMware Horizon View environment, It makes sense to use VAAI/VCAI space efficient clones as they provide numerous benefits including faster provisioning, recompose and use less space which leads to them being served from cache (making performance better).

So if I have an environment with just 100 desktops deployed via VCAI, You have a 100:1 data reduction ratio, 1000 desktops and you have 1000:1. But this is again Table stakes… well sort of because some vendors don’t support VAAI/VCAI and others only have partial support as I discuss in Not all VAAI-NAS storage solutions are created equal.

Funnily enough, one vendor even offloads what VAAI/VCAI can do (with almost no overhead I might add) to proprietary hardware. Either way, while VAAI/VCAI clones are fantastic and can add lots of value, claiming high data efficiency ratios as a result is again mis-leading especially if done so in the context of being a unique capability.

  • Compression of Highly compressible data

Some data, such as Logs or text files are highly compressible, so ratios of >10:1 for this type of data are not uncommon or unrealistic. However consider than if logs only use a few GB of storage, then 10:1 isn’t really saving you that much space (or money).

For example a 100:1 data reduction ratio of 100MB of logs is only saving you ~10GB which is good, but not exactly something to make a purchasing decision on.

Also compression of databases which lots of white space also compress very well, so the larger the Initial size of the DB, the more it will compress.

The compression technology used by storage vendors is not vastly different, which means for the same data, they will all achieve a similar reduction ratio. As much as I’d love to tell you Nutanix has much better ratios than Vendors X,Y and Z, its just not true, so I’m not going to lie to you and say otherwise.

  • Deduplication of Data which is deliberately duplicated

An example of this would be MS Exchange Database Availability Groups (DAGs). Exchange creates multiple copies of data across multiple physical or virtual servers to provide application and storage level availability.

Deduplication of this is not difficult, and can be achieved (if indeed you want to dedupe it) by any number of vendors.

In a distributed environment such as HCI, you wouldn’t want to deduplicate this data as it would force VMs across the cluster to remotely access more data over the network which is not what HCI is all about.

In a centralised SAN/NAS solution, deduplication makes more sense than for HCI, but still, when an application is creating the duplicate data deliberately, it may be a good idea to exclude it from being deduplicated.

As with compression, for the same data, most vendors will achieve a similar ratio so again this is table stakes no matter how each vendor tries to differentiate. Some vendors dedupe at more granular levels than others, but this provides diminishing returns and increased overheads, so more granular isn’t always going to deliver a better business outcome.

  • Claiming Thin Provisioning as data efficiency

If you have a Thin Provisioned 1TB virtual disk and you only write 50GB to the disk, you would have a data efficiency ratio of 20:1. So the larger you create your virtual disk and the less data you write to it, the better the ratio will be. Pretty silly in my opinion as Thin Provisioning is nothing new and this is just another deceptive way to artificially improve data efficiency ratios.

  • Claiming removal of zeros as data reduction

For example, if you create an Eager Zero Thick VMDK, then use only a fraction, as with the Thin Provisioning example (above), removal of zeros will obviously give a really high data reduction ratio.

However Intelegent storage doesn’t need Eager Zero Thick (EZT) VMDKs to give optimal performance nor will they write zeros to begin with. Intelligent storage will simply store metadata instead of a ton of worthless zeros. So a data reduction ratio from a more intelligent storage solution would be much lower than a vendor who has less intelligence and has to remove zeros. This is yet another reason why data efficiency (marketing) numbers have minimal value.

Two of the limited use cases for EZT VMDKs is Fault Tolerance (who uses that anyway) and Oracle RAC, so removal of zeros with intelligent storage is essentially moot.

Summary:

Data reduction technologies have value, but they have been around for a number of years so if you compare two modern storage products, you are unlikely to see any significant difference between vendor A and B (or C,D,E,F and G).

The major advantage of data reduction is apparent when comparing new products with 5+ year old technology. If you are in this situation where you have very old tech, most newer products will give you a vast improvement, it’s not unique to just one vendor.

At the end of the day, there are numerous factors which influence what data efficiency ratio can be achieved by a storage product. When comparing between vendors, if done in a fair manner, the differences are unlikely to be significant enough to sway a purchasing decision as most modern storage platforms have more than adequate data reduction capabilities.

Beware: Dishonest and mis-leading marketing about data reduction is common so don’t get caught up in a long winded conversations about data efficiency or be tricked into thinking one vendor is amazing and unique in this area, it just isn’t the case.

Data reduction is table stakes and really shouldn’t be the focus of a storage or HCI purchasing decision.

My recommendation is focus on areas which deliver operational simplicity, removes complexity/dependancies within the datacenter and achieve real business outcomes.