It’s 2017, let’s review Thick vs Thin Provisioning

For a long time, it has been widely considered that thick provisioning is required to achieve maximum storage performance and for many years this was a good rule of thumb.

Before we get into details, what are Thick and Thin provisioning?

Thick provisioning is where storage allocated to a LUN, NFS mount or Virtual Disk (such as a VMDK in ESXi, VHDX in Hyper-V or vDisk in AHV) is zeroed out and/or fully reserved regardless of how much capacity is actually used.

Thick provisioning avoids a storage subsystem from having to zero out a block before writing new data which is one of the reasons higher performance could be achieved on many storage platforms.

Thin provisioning on the other hand is where storage allocated to a LUN or Virtual Disk is zeroed as data is written and allows physical capacity to be overcommitted.

The advantages of Thick provisioning included easier capacity management, or simply put a “What you see is what you get” as well as maximum performance on most platforms. But by maximum performance, even on older storage platforms the advantage was rarely significant as people would claim.

VMware conducted a Performance Study of VMware vStorage Thin Provisioning back in the ESXi 4.0 days (~2009) which I will briefly summarise.

On page 6 of the performance study the following graph shows the different in performance between Thin and Thick VMDKs during zeroing and post-zeroing.

As you can see the performance is almost identical.

The disadvantages though were and remain significant to this day which include an inability to overcommit storage, meaning physical free space has to be maintained at multiple layers such as RAID group, LUN, Virtual Disk layers, leading to inefficiency.

The advantages of Thin provisioning include the ability to overcommit storage which results in more flexibility when sizing LUNs & Virtual Disks and less wasted space. The only real downsides were potentially increased capacity management complexity and lower performance.

I have previously written two example architectural decisions regarding using “Thin on Thin“, meaning thin provisioned virtual disks on a thin provisioned LUN or NFS mount as well as “Thin on Thick” meaning thin provisioned virtual disks on a thick provisioned LUN or NFS mount. These two examples cover off many of the traditional pros and cons between thick and think, so I won’t repeat myself here.

I never wrote an example design decision for Thick on Thick, but this was common practice when provisioning storage was time consuming, difficult and involved lengthly delays to engage subject matter experts.

In early 2015, I wrote a two part blog series where I explained it’s not as simple as you might think to calculate usable capacity where I compared SAN/NAS verses Nutanix. In part 1, I highlight that the LUN Provisioning Type is one area which can greatly impact the usable capacity of a traditional storage platform.

But fast forward into the era of hyper-converged platforms like Nutanix and some modern storage arrays and the major downsides of thin provisioning, being complexity of capacity management and reduced performance have not only been reduced, but at least in the case of Nutanix, have been eliminated all together.

Let’s address Capacity management w/ Nutanix:

Storage utilisation only needs to be monitored in ONE place, the storage summary which lives on the home screen of the Nutanix HTML 5 UI.

NutanixStorageSummary

No matter how many nodes in your cluster, number of containers (which translate to datastores in a VMware environment), virtual machines & virtual disks or physical servers connecting via ABS, this is the only place you need to monitor capacity.

There are no RAID groups, Disk Groups, Aggregates, LUNs etc where capacity needs to be managed. All nodes in a cluster contributed to the capacity of the cluster and even when one or more virtual machines use more capacity than a the node they run on, Nutanix Acropolis Distributed Storage Fabric (ADSF) takes care of it.

So issue #1, Capacity management, is solved. Now it’s onto the issue of performance.

Thin Provisioning Performance w/ Nutanix:

When running ESXi, Nutanix runs NFS datastores and supports thick provisioning via the VAAI-NAS Space reservation primitive as discussed in this post. This allows the creation of thick provisioned (Eager Zero or Lazy Zero Thick) VMDKs when traditionally NFS datastores did not support it.

However this was only required for Oracle RAC and VMware Fault Tolerance and was not a performance requirement.

However from a performance perspective, Thin provisioning actually outperforms thick on intelligent storage such as Nutanix. In the specific case of Nutanix, random write I/O is serviced by the fastest tier available (e.g.: SSD) and via the operations log (OPLOG) which takes the random writes commits them to persistent media, and then coalesces them into sequential IO to then commit to SSD before tiering it off to lower cost storage in the case of hybrid nodes.

This means the write penalty for overwriting or zeroing blocks before writing new I/O is eliminated.

In fact if you configure thick provisioned virtual disks, as the zeros (or whitespace) is being written by the hypervisor, the Nutanix storage fabric acknowledges every I/O and discards the zeros in favour of storing metadata and simply reserving the capacity. In simple terms, this just means Nutanix has to acknowledge a whole bunch of nothing and the thick provisioning is achieve with a simple reservation as opposed to zeroing out many GBs or TBs of storage.

This means thick provisioning is actually lower performance than thin provisioning on Nutanix.

With modern, intelligent storage, there is limited if any benefits to using thick provisioning, the only example I can think of is to artificially inflate the deduplication ratio as thick provisioned virtual disks tend to have a lot of zeros all of which dedupe. I wrote an article titled: “Deduplication ratios – What should be included in the reported ratio?” which covers off this point in detail but in short, don’t create unnessasary data (in this case, zeros) just to inflate your dedupe ratio, it just wastes storage controller resources and achieves no additional benefits.

The following is a comprehensive list of the real world advantages of using thick provisioning on Nutanix.

This space is intentionally left blank

Summary:

For the best efficiency and performance when deploying virtual machines or storage for physical servers via ABS on Nutanix, use thin provisioning!

Dare2Compare Part 7 : HPE provides superior performance to Nutanix

In part 4, we covered off a series of failure scenarios and how the HPE/SVT product responds and the same scenarios and how Nutanix responds which clearly proved HPEs claim of having superior resiliency than Nutanix to be false and I would argue even highlighted how much more resilient the Nutanix platform is.

Now in part 7, I will address two false claims (below) that Nutanix has lower performance to HPE SVT and that Nutanix doesn’t post performance results.

Tweet #1 – HPE Simplivity 380 provides superior performance than Nutanix

Problem number 1 with HPE’s claim: Their URL is dead… so we cannot review what scenario/s they are claiming HPE/SVT is higher performing.

HPEBrokenURL

Before we discuss Nutanix performance, HPE have repeatably made further claims that Nutanix does not post performance results and have further complained there are no 3rd party published performance testing results.

One recent example of these claim is shown below which states: “I know you don’t publish performance results”

Nutanix does in fact publish performance data, which is validated by:

  • 3rd parties partners/vendors such as Microsoft and LoginVSI
  • Independant 3rd parties such as Enterprise Storage Group (ESG) and;
  • Internally created material

The following is a few examples of published performance data.

  1. Nutanix Citrix XenDesktop Validated by LoginVSI

In fairness to HPE, this is a recent example so let’s take a look at Nutanix track record with LoginVSI.

LoginVSIBenchmarks

Here we can see six examples dating back to Jan 2013 where Nutanix has made performance results with LoginVSI available.

2. Nutanix Reference Architecture: Citrix Validated Solution for Nutanix

This was a jointly developed solution between Citrix and Nutanix and was the first of it’s kind globally and was made available in 2014.

3. Microsoft Exchange Solution Reviewed Program (ESRP) – Storage

Nutanix has for many years been working with business critical applications such as MS Exchange and has published two ESRP solutions.

The first is for 24,000 Users on Hyper-V and the second is for 30k Users on AHV.

NutanixESRPScreenshot

Interestingly, while HPE/SVT have a reference architecture for MS Exchange, they do not have an ESRP for the platform and this is because they cannot provide a supportable configuration due to lack of multi-protocol support.

Nutanix on the other hand has Microsoft supportable configurations for ESXi, Hyper-V and AHV.

4. ESG Performance Analysis: Nutanix Hyperconverged Infrastructure

This report is an example of a 3rd party who has validated performance data for VDI, MS SQL and MS Exchange.

As we can clearly see with the above examples, Nutanix does and has for a long time provided publicly available performance data from many sources including independant 3rd parties.

Moving onto the topic of Nutanix vs HPE/SVT performance, I feel it’s importaint to first review my thoughts on this topic in detail in an article I wrote back in 2015 titled: Peak performance vs real world performance.

In short, I can get any two products and make one look better than the other by simply designing tests which highlight strengths or weaknesses of either product. This is why many vendors have a clause in the EULA preventing publishing of performance data without written permission.

One of the most importaint factors when it comes to performance is sizing. An incorrectly sized environment will likely not perform within acceptable levels, and this goes for any product on the market.

For next generation platforms like Nutanix, customers are protected from under-sizing because of the platforms ability to scale by adding additional nodes. In 2016 I wrote the post titled “Scale out performance testing with Nutanix Storage Only Nodes” which shows how adding additional storage only nodes to a Nutanix cluster increased IOPS by approx 2x while lowering read and write latency.

What is more impressive than the excellent performance improvements is this was done without any changes to the configuration of the cluster or virtual machines.

The same test performed on HPE/SVT and other SDS/HCI products cannot double the IOPS or decrease read/write latency as the SVT platform is not a distributed storage fabric.

Here in lies a major advantage to Nutanix. In the event Nutanix performance was no longer sufficient, or another platform was higher performance, say per node, then Nutanix can (if/when required) scale performance without rip/replace or reconfiguration to meet almost any performance requirement. The performance per node is not a limiting factor for Nutanix like it is with HPE/SVT and other platforms.

What about performance for customers who are maximising the ROI from existing physical servers using Acropolis Block Services. The benefits just keep coming. A server connected using ABS will improve its IOPS, latency and throughput when additional nodes are added to the Nutanix cluster automatically as the Acropolis Distributed Storage Fabric (ADSF) increases the number of paths dynamically so all Controller VMs in the cluster service ABS traffic as shown in the tweet below.

As such, regardless of if workloads are virtual or physical, when using Nutanix, performance can always be improved non-disruptively and without compromising the resiliency of the cluster by simply adding nodes (which BTW is a one click operation).

Summary:

  1. Nutanix has been publishing performance results through independant 3rd parties and partners for many years.
  2. Nutanix has validated solutions from Microsoft, LoginVSI and Citrix to name a few.
  3. Nutanix performance can scale well beyond HPE/SVT for both virtual and physical workloads
  4. Nutanix provides validated performance data across multiple hypervisors
  5. HPE/SVT have provided no evidence, scenarios or references to SVT being a higher performance platform.

Return to the Dare2Compare Index:

The All-Flash Array (AFA) is Obsolete!

Over the last few years, I’ve had numerous customers ask about how Nutanix can support bare metal workloads. Up until recently, I haven’t had an answer the customers have wanted to hear.

As a result, some customers have been stuck using their exisiting SAN or worse still being forced to go out and buy a new SAN.

As a result many customers who have wanted to use or have already deployed hyperconverged infrastructure (HCI) for all other workloads are stuck managing an all flash array silo to service some bare metal workloads.

In June at .NEXT 2016, Nutanix announced Acropolis Block Services (ABS) which now allows bare metal workloads to be serviced by new or existing Nutanix clusters.

ABSoverview

As Nutanix has both hybrid (SSD+SATA) and all-flash nodes, customers can chose the right node type/s for their workloads and present the storage externally for bare metal workloads while also supporting Virtual Machines and Acropolis File Services (AFS) and containers.

So why would anyone buy an all-flash array? Let’s discuss a few scenarios.

Scenario 1: Bare metal workloads

Firstly, what applications even need bare metal these days? This is an important question to ask yourself. Challenge the requirement for bare metal and see if the justifications are still valid and if so, has anything changed which would allow virtualization of the applications. But this is a topic for another post.

If a customer only needs new infrastructure for bare metal workloads, deploying Nutanix and ABS means they can start small and scale as required. This avoids one of the major pitfalls of having to size a monolithic centralised, dual controller storage array.

While some AFA vendors can/do allow for non-disruptive controller upgrades, it’s still not a very attractive proposition, nor is it quick or easy. and reduces resiliency during the process as one of two controllers are offline. Nutanix on the other hand performs one click rolling upgrades which mean the largest the cluster, the lower the impact of an upgrade as it is performed one node at a time without disruption and can also be done without risk of a subsequent failure taking storage offline.

If the environment will only ever be used for bare metal workloads, no problem. Acropolis Block Services offers all the advantages of an All Flash Array, with far superior flexibility, scalability and simplicity.

Advantages:

  1. Start small and scale granularly as required allowing customers to take advantage of newer CPU/RAM/Flash technologies more frequently
  2. Scale performance and capacity by adding node/s
  3. Scale capacity only with storage-only nodes (which come in all flash)
  4. Automatically scale multi-pathing as the cluster expands
  5. Solution can support future workloads including multiple hypervisors / VMs / file services & containers without creating a silo
  6. You can use Hybrid nodes to save cost while delivering All Flash performance for workloads which require it by using VM flash pinning which ensures all data is stored in flash and can be specified on a per disk basis.
  7. The same ability as an all flash array to only add compute nodes.

Disadvantages:

  1. Your all-flash array vendor reps will hound you.

Scenario 2: Mixed workloads inc VMs and bare metal

As with scenario 1, deploying Nutanix and ABS means customers can start small and scale as required. This again avoids the major pitfall of having to size a monolithic centralised, dual controller storage array and eliminates the need for separate environments.

Virtual machines can run on compute+storage nodes while bare metal workloads can have storage presented by all nodes within the cluster, including storage-only nodes. For those who are concerned about (potential but unlikely) noisy neighbour situations, specific nodes can also be specified while maintaining all the advantages of Nutanix one-click, non-disruptive upgrades.

Advantages:

  1. Start small and scale granularly as required allowing customers to take advantage of newer CPU/RAM/Flash technologies more frequently
  2. Scale performance and capacity by adding node/s
  3. Scale capacity only with storage-only nodes (which also come in all flash)
  4. Automatically scale multi-pathing for bare metal workloads as the cluster expands
  5. Solution can support future workloads including multiple Hypervisors / VMs / file services & containers without creating a silo.

Disadvantages:

  1. Your All-Flash array vendor reps will hound you.

What are the remaining advantages of using an all flash array?

In all seriousness, I can’t think of any but for fun let’s cover a few areas you can expect all-flash array vendors to argue.

Performance

Ah the age old appendage measuring contest. I have written about this topic many times, including in one of my most popular posts “Peak performances vs Real world performance“.

The fact is, every storage product has limits, even all-flash arrays and Nutanix. The major difference is that Nutanix limits are per cluster rather than per Dual Controller Pair, and Nutanix can continue to scale the number of nodes in a cluster and continue to increase performance. So if ultimate performance is actually required, Nutanix can continue to scale to meet any performance/capacity requirements.

In fact, with ABS the limit for performance is not even at the cluster layer as multiple clusters can provide storage to the same bare metal server/s while maintaining single pane of glass management through PRISM Central.

I recently completed some testing with where I demonstrated the performance advantage of storage only nodes for virtual machines as well as how storage-only nodes improve performance for bare metal servers using Acropolis Block Services which I will be publishing results for in the near future.

Data Reduction

Nutanix has had support for deduplication, compression for a long time and introduced Erasure Coding (EC-X) mid 2015. Each of these technologies are supported when using Acropolis Block Services (ABS).

As a result, when comparing data reduction with all-flash array vendors, while the implementation of these data reduction technologies varies between vendors, they all achieves similar data reduction ratios when applied to the same dataset.

Beware of some vendors who include things like backups in their deduplication or data reduction ratios, this is very misleading and most vendors have the same capabilities. For more information on this see: Deduplication ratios – What should be included in the reported ratio?

Cost

Here we should think about what are the age old problems are with centralized shared storage (like AFAs)? Things like choosing the right controllers and the fact when you add more capacity to the storage, you’re not (or at least rarely) scaling the controller/s at the same time come to mind immediately.

With Nutanix and Acropolis Block Services you can start your All Flash solution with three nodes which means a low capital expenditure (CAPEX) and then scale either linearly (with the same node types) or non-linearly (with mixed types or storage only nodes) as you need to without having to rip and replace (e.g.: SAN controller head swaps).

Starting small and scaling as required also allows you to take advantage of newer technologies such as newer Intel chipsets and NVMe/3D XPoint to get better value for your money.

Starting small and scaling as required also minimizes – if not eliminates – the risk of oversizing and avoids unnecessary operational expenses (OPEX) such as rack space, power, cooling. This also reduces supporting infrastructure requirements such as networking.

Summary:

As shown below, the Nutanix Acropolis Distributed Storage Fabric (ADSF) can support almost any workload from VDI to mixed server workloads, file, block , big data, business critical applications such as SAP / Oracle / Exchange / SQL and bare metal workloads without creating silos with point solutions.

NutanixSingleFabricAllWorkloads

In addition to supporting all these workloads, Nutanix ADSF scalability both from a capacity/performance and resiliency perspective ensures customers can start small and scale when required to meet their exact business needs without the guesswork.

With these capabilities, the All-Flash array is obsolete.

I encourage everyone to share (constructively) your thoughts in the comments section.

Note: You must sign in to comment using WordPress, Facebook, LinkedIn or Twitter as Anonymous comments will not be approved,

Related Articles:

  1. Things to consider when choosing infrastructure.

  2. Scale out performance testing with Nutanix Storage Only Nodes

  3. What’s .NEXT 2016 – Acropolis Block Services (ABS)

  4. Scale out performance testing of bare metal workloads on Acropolis Block Services (Coming soon)

  5. What’s .NEXT 2016 – Any node can be storage only

  6. What’s .NEXT 2016 – All Flash Everywhere!