object store | CloudXC

Following on from Part 1, this post will discuss hyper-converged Distributed File Systems (i.e,: Nutanix) and compare with traditional SAN/NAS RAID and hyper-converged solutions using Object storage for data protection.

The below diagram shows a 4 node hyper-converged solution using a Distributed File System with the same 4 x 4TB SATA drives with data protection using replication with 2 copies. (Nutanix calls this Resiliency Factor 2)

The first difference you may have noticed, is the data is much more granular than the Hyper-Converged Object store example in Part 1.

The second less obvious difference is the replicated copies of the data (i.e.: The data with Purple letters) on node 1 do not reside on a single other node, but are distributed throughout the cluster.

Now lets look at a drive failure example:

Here we see Node 1 has lost a Drive hosting 8 granular pieces of data 1MB in size each.

Now the Distributed File System detects that the data represented by A,B,C,D,E,I,M,P has only a single copy within the cluster and starts the restoration process.

Lets walk through each step although these steps are completed concurrently.

1. Data “A” is replicated from Node 2 to Node 3
2. Data “B” is replicated from Node 2 to Node 4
3. Data “C” is replicated from Node 3 to Node 2
4. Data “D” is replicated from Node 4 to Node 2
5. Data “E” is replicated from Node 2 to Node 4
6. Data “I” is replicated from Node 3 to Node 2
7. Data “M” is replicated from Node 4 to Node 3
8. Data “P” is replicated from Node 4 to Node 3

Now the cluster has restored resiliency.

So what was the impact on each node?

The above table shows a simplified representation of the workload of restoring resiliency to the cluster. As we can see, the workload (being 8 granular pieces of data being replicated) was distributed across the nodes very evenly.

Next lets look at the advantages of a Hyper-Converged Solution with a Distributed File System (which Nutanix uses).

Highly granular distribution using 1MB extents not large Objects.
The work required to restore resiliency after one drive (or node) failure was distributed across all drives and nodes in the Cluster leveraging all drives/nodes capability. (i.e.: Not constrained to the <100 IOPS of a single drive)
The restoration rebuild is a low impact activity as the workload is distributed across the cluster and not dependant on source/destination pair of drives or nodes
The rebuild has a low impact on the virtual machines running on the distributed file system and consistent performance is maintained.
The larger the cluster the quicker and lower impact the rebuild is as the workload is distributed across a higher number of drives/nodes for the same size (Gb) worth of restoration.
With Nutanix SSDs are used not only for Read/Write cache but as a persistent storage tier, meaning the recovering data will be written to SSD and where the data being recovered is not in cache (Memory or SSD tiers) it is still possible the data will be in the persistent SSD tier which will dramatically improve the performance of the recovery.

Summary:

As discussed in Part 1, Traditional RAID used by SAN/NAS and Hyper-converged solutions using Object based storage both suffer similar issues when recovering from drive or node failure.

Where as Nutanix Hyper-converged solution using the Nutanix Distributed File System (NDFS) can restore resiliency following a drive or node failure faster and with lower impact thanks to its highly granular and distributed architecture, meaning more consistent performance for virtual machines.

I regularly get asked to compare the resiliency of traditional centralized storage with converged as well as newer technologies such as hyper-converged.

So this post will discuss the problems with RAID and newer hyper-converged solutions using Object based storage for data protection.

This post will discuss two examples below, with Part 2 discussing Hyper-converged solutions using Distributed File Systems.

1. Traditional RAID

2. Hyper-converged Object Based Storage

Starting with Traditional shared storage, and the most common RAID level in my experience, RAID 5.

The below diagram shows a 3 x 4TB SATA drives in a RAID 5 with a Hot Spare.

Now lets look a drive failure scenario. We now have the Hot Spare activate and start rebuilding as shown below.

So this all sounds fine, we’ve had a drive failure, and a spare drive has automatically taken its place and started rebuilding the data.

The problem now is that even in this simplified/small example we have 2 drives (or say 200 IOPS of drives) trying to rebuild onto just a single drive. So the maximum rate at which the RAID 5 can restore resiliency is limited to that of a single drive or 100 IOPS.

If this was a 8 disk RAID 5, we would have 7 drives (or 700 IOPS) trying to rebuild again to only a single drive or 100 IOPS.

There are multiple issues with this architecture.

The restoration of resiliency of the entire RAID is constrained by the destination drive, in this case a SATA drive which can sustain less than 100 IOPS
A single subsequent HDD failure within the RAID will cause data loss.
The RAID rebuild is a high impact activity on the storage controllers which can impact all storage
The RAID rebuild is an especially high impact activity on the virtual machines running on the RAID.
The larger the RAID or the capacity drives in the RAID, the longer the rebuild takes and the higher the performance impact and chance of subsequent failures leading to data loss.

Now I’m sure most of you understand this concept, and have felt the pain of a RAID rebuild taking many hours or even days, but with new hyper converged technology this issue is no longer a problem, right?

Wrong!

It entirely depends on how data is recovered in the event of a drive failure. Lets look at an example of an hyper-converged solution using an object store.The below shows a simplified example of a Hyper-converged Object Based Storage with 4 objects represented by Object A,B,C and D in Black, and the 2nd replicated copy of the object represented Object A,B,C and D in Purple.

Note: Each object in the Object Store can be hundreds of GB in size.

Let’s take a look what happens in a disk failure scenario.

From the above diagram we can see a drive has failed on Node 1, which means Object A and Object D’s replica have been lost. The object store will then replicate a copy of Object A to Node 4, and a replica of Object D to Node 2 to restore resiliency.

There are multiple issues with this architecture.

Object based storage can lack granularity as Objects can be 200Gb+.
The restoration of resiliency of any single object is constrained by the source drive or node.
The restoration of resiliency of any single object is also constrained by the destination drive or node.
The restoration of multiple objects (such as Object A & D in the above example) is constrained by the same drive or node which will result in contention and slow the process of restoring resiliency to both objects.
The impact of the recovery is High on virtual machines running on the source and destination nodes.
The recovery of an Object is constrained by the source and destination node per object.
Object stores generally require a witness, which is stored on another node in the cluster. (Not illustrated above)

It should be pointed out, where SSDs are used for a write cache, this can help reduce the impact and speed up recovery in some cases, but where data needs to be recovered from outside of cache, i.e.: A SAS or SATA drive, the fact writes go to SSD makes no difference as the writes are constrained by the read performance.

Summary:

Traditional RAID used by SAN/NAS and newer Hyper-converged Object based storage both suffer similar issue when recovering from drive or node failures which include:

The restoration of resiliency is constrained by the source drive or node
The restoration of resiliency is constrained by the destination drive or node
The restoration is high impact on the desination
The recovery of one object is constrained by the network connectivity between just two nodes.
The impact of the recovery is High on any data (such as virtual machines) running on the RAID or source/destination node/s
The recovery of RAID or an Object is constrained by a single part of the infrastructure being a RAID controller / drive or a single node.

In Part 2, we will look at the Hyper-converged Distributed File Systems.

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: object store

PART 2 – Problems with RAID and Object Based Storage for data protection

PART 1 – Problems with RAID and Object Based Storage for data protection

Share this:

Share this: