PART 1 – Problems with RAID and Object Based Storage for data protection

I regularly get asked to compare the resiliency of traditional centralized storage with converged as well as newer technologies such as hyper-converged.

So this post will discuss the problems with RAID and newer hyper-converged solutions using Object based storage for data protection.

This post will discuss two examples below, with Part 2 discussing Hyper-converged solutions using Distributed File Systems.

1. Traditional RAID

2. Hyper-converged Object Based Storage

Starting with Traditional shared storage, and the most common RAID level in my experience, RAID 5.

The below diagram shows a 3 x 4TB SATA drives in a RAID 5 with a Hot Spare.
3 Disk R5 w Hot Spare NO BG

Now lets look a drive failure scenario. We now have the Hot Spare activate and start rebuilding as shown below.

3 Disk R5 w Hot Spare REBUILDING NO BG

So this all sounds fine, we’ve had a drive failure, and a spare drive has automatically taken its place and started rebuilding the data.

The problem now is that even in this simplified/small example we have 2 drives (or say 200 IOPS of drives) trying to rebuild onto just a single drive. So the maximum rate at which the RAID 5 can restore resiliency is limited to that of a single drive or 100 IOPS.

If this was a 8 disk RAID 5, we would have 7 drives (or 700 IOPS) trying to rebuild again to only a single drive or 100 IOPS.

There are multiple issues with this architecture.

  1. The restoration of resiliency of the entire RAID is constrained by the destination drive, in this case a SATA drive which can sustain less than 100 IOPS
  2. A single subsequent HDD failure within the RAID will cause data loss.
  3. The RAID rebuild is a high impact activity on the storage controllers which can impact all storage
  4. The RAID rebuild is an especially high impact activity on the virtual machines running on the RAID.
  5. The larger the RAID or the capacity drives in the RAID, the longer the rebuild takes and the higher the performance impact and chance of subsequent failures leading to data loss.

Now I’m sure most of you understand this concept, and have felt the pain of a RAID rebuild taking many hours or even days, but with new hyper converged technology this issue is no longer a problem, right?

Wrong!

It entirely depends on how data is recovered in the event of a drive failure. Lets look at an example of an hyper-converged solution using an object store.The below shows a simplified example of a Hyper-converged Object Based Storage with 4 objects represented by Object A,B,C and D in Black, and the 2nd replicated copy of the object represented Object A,B,C and D in Purple.

Note: Each object in the Object Store can be hundreds of GB in size.HyperconvergedObjectStoreNormal

Let’s take a look what happens in a disk failure scenario.

HyperconvergedObjectStoreFailure

From the above diagram we can see a drive has failed on Node 1, which means Object A and Object D’s replica have been lost. The object store will then replicate a copy of Object A to Node 4, and a replica of Object D to Node 2 to restore resiliency.

There are multiple issues with this architecture.

  1. Object based storage can lack granularity as Objects can be 200Gb+.
  2. The restoration of resiliency of any single object is constrained by the source drive or node.
  3. The restoration of resiliency of any single object is also constrained by the destination drive or node.
  4. The restoration of multiple objects (such as Object A & D in the above example) is constrained by the same drive or node which will result in contention and slow the process of restoring resiliency to both objects.
  5. The impact of the recovery is High on virtual machines running on the source and destination nodes.
  6. The recovery of an Object is constrained by the source and destination node per object.
  7. Object stores generally require a witness, which is stored on another node in the cluster. (Not illustrated above)

It should be pointed out, where SSDs are used for a write cache, this can help reduce the impact and speed up recovery in some cases, but where data needs to be recovered from outside of cache, i.e.: A SAS or SATA drive, the fact writes go to SSD makes no difference as the writes are constrained by the read performance.

Summary:

Traditional RAID used by SAN/NAS and newer Hyper-converged Object based storage both suffer similar issue when recovering from drive or node failures which include:

  1. The restoration of resiliency is constrained by the source drive or node
  2. The restoration of resiliency is constrained by the destination drive or node
  3. The restoration is high impact on the desination
  4. The recovery of one object is constrained by the network connectivity between just two nodes.
  5. The impact of the recovery is High on any data (such as virtual machines) running on the RAID or source/destination node/s
  6. The recovery of RAID or an Object is constrained by a single part of the infrastructure being a RAID controller / drive or a single node.

In Part 2, we will look at the Hyper-converged Distributed File Systems.

Example Architectural Decision – Jumbo Frames with IP Storage (Use Jumbo Frames)

Problem Statement

When using IP based storage over a converged 10GB network, should Jumbo Frames be used?

Requirements

1. Fully Supported storage

2. Maximum vSphere environment availability

3. Maximize performance where possible

Assumptions

1. Dedicated 10GB Storage Network which is highly available

2. Two 10GB connections per ESXi host dedicated to IP Storage

3. Storage array supports Jumbo Frames

4. Benefit of Jumbo Frames outweighs the complexity to implement/maintain/support

5. Network performance is constrained at an interrupt level

Constraints

1. Maximum of two connections per ESXi host for IP Storage

Motivation

1. Maximum performance and security

Architectural Decision

Use Jumbo Frames

Justification

1. There is a dedicated physical network for IP storage

2. All devices end to end support Jumbo Frames and this is enabled on all switches globally

3. As only IP storage traffic traverses the dedicated network, a larger MTU will not have any adverse effects on data network traffic.

4.  IP storage packets will not be fragmented or dropped as the storage network has been verified and configured to support Jumbo Frames. Thus avoiding costly re-transmits

5. No routing exists (or is required) for the IP storage network, as such the environment is flat and simple to support

6. IP Storage performance will not be constrained by MTU

7. A standard MTU of 1500 can optionally be configured at the VMKernel layer if performance is negatively impacted by Jumbo Frames without the need to modify the switch configuration which will support up to 9216 MTU

8. Increasing the MTU will decrease the number of packets required for the same bandwidth to help prevent IP storage network being constrained at an interrupt level

Implications

1. A dedicated network needs to be maintained for IP storage which reduces consolidation

2. Storage network needs to be configured for Jumbo Frames

3. The Storage controller needs to be configured for Jumbo Frames

4. The VMKernel/s need to be configured for Jumbo Frames

5. Where the networks becomes constrained at either an interrupt or throughput level, any benefit of Jumbo Frames may be reduced or lost and IP storage performance may degrade

Alternatives

1. Do not use Jumbo Frames

2. Use Jumbo Frames in a converged network (ie: No dedicated IP Storage switches)

Relates Articles

1. Example Architectural Decision – Jumbo Frames for IP Storage (Use Jumbo Frames)

 Contributors

Thanks to Rob McNab (IBM) and Peter McCrystal (IBM) for their input into this example architectural decision.