PART 2 – Problems with RAID and Object Based Storage for data protection

Posted on September 26, 2014 by Josh Odgers

Following on from Part 1, this post will discuss hyper-converged Distributed File Systems (i.e,: Nutanix) and compare with traditional SAN/NAS RAID and hyper-converged solutions using Object storage for data protection.

The below diagram shows a 4 node hyper-converged solution using a Distributed File System with the same 4 x 4TB SATA drives with data protection using replication with 2 copies. (Nutanix calls this Resiliency Factor 2)

The first difference you may have noticed, is the data is much more granular than the Hyper-Converged Object store example in Part 1.

The second less obvious difference is the replicated copies of the data (i.e.: The data with Purple letters) on node 1 do not reside on a single other node, but are distributed throughout the cluster.

Now lets look at a drive failure example:

Here we see Node 1 has lost a Drive hosting 8 granular pieces of data 1MB in size each.

Now the Distributed File System detects that the data represented by A,B,C,D,E,I,M,P has only a single copy within the cluster and starts the restoration process.

Lets walk through each step although these steps are completed concurrently.

1. Data “A” is replicated from Node 2 to Node 3
2. Data “B” is replicated from Node 2 to Node 4
3. Data “C” is replicated from Node 3 to Node 2
4. Data “D” is replicated from Node 4 to Node 2
5. Data “E” is replicated from Node 2 to Node 4
6. Data “I” is replicated from Node 3 to Node 2
7. Data “M” is replicated from Node 4 to Node 3
8. Data “P” is replicated from Node 4 to Node 3

Now the cluster has restored resiliency.

So what was the impact on each node?

The above table shows a simplified representation of the workload of restoring resiliency to the cluster. As we can see, the workload (being 8 granular pieces of data being replicated) was distributed across the nodes very evenly.

Next lets look at the advantages of a Hyper-Converged Solution with a Distributed File System (which Nutanix uses).

Highly granular distribution using 1MB extents not large Objects.
The work required to restore resiliency after one drive (or node) failure was distributed across all drives and nodes in the Cluster leveraging all drives/nodes capability. (i.e.: Not constrained to the <100 IOPS of a single drive)
The restoration rebuild is a low impact activity as the workload is distributed across the cluster and not dependant on source/destination pair of drives or nodes
The rebuild has a low impact on the virtual machines running on the distributed file system and consistent performance is maintained.
The larger the cluster the quicker and lower impact the rebuild is as the workload is distributed across a higher number of drives/nodes for the same size (Gb) worth of restoration.
With Nutanix SSDs are used not only for Read/Write cache but as a persistent storage tier, meaning the recovering data will be written to SSD and where the data being recovered is not in cache (Memory or SSD tiers) it is still possible the data will be in the persistent SSD tier which will dramatically improve the performance of the recovery.

Summary:

As discussed in Part 1, Traditional RAID used by SAN/NAS and Hyper-converged solutions using Object based storage both suffer similar issues when recovering from drive or node failure.

Where as Nutanix Hyper-converged solution using the Nutanix Distributed File System (NDFS) can restore resiliency following a drive or node failure faster and with lower impact thanks to its highly granular and distributed architecture, meaning more consistent performance for virtual machines.

Can I use my existing SAN/NAS storage with Nutanix?

Posted on September 22, 2014 by Josh Odgers

I question I get regularly is, “Can I use my existing SAN/NAS storage with Nutanix?”.

The short answer is, as always “It depends”.

iSCSI, NFS & SMB 3.0 can be presented to Nutanix nodes just like existing non Nutanix nodes.
FC based storage cannot be used as Nutanix does not support FC HBAs

The below diagram shows a Nutanix NX-3460 block w/ 4 nodes having both Nutanix Containers presented to the nodes as well as iSCSI LUNs , SMB 3.0 or NFS Mount points connected from the centralized SAN/NAS.

Note: SMB 3 is not supported for ESXi hosts & NFS is not supported for Hyper-V.

So what is the use cases for this style of deployment?

If you’re not ready to do an entire infrastructure refresh for whatever reason/s, you may wish to transition to Nutanix over time while maximizing ROI and lifespan of you’re existing storage.

Here is some examples of what I recommend customers do:

1. Migrate Business Critical Applications (BCAs) to Nutanix

There are many benefits of doing this including:

Improving resiliency / performance for vBCAs
Simplifying storage management for vBCAs
Freeing up capacity and reducing the workload on legacy SAN
Increasing ease of scalability for critical workloads
Use legacy SAN/NAS for high capacity low IOPS workloads which are better suited to centralized storage than vBCAs

Another great option is

2. Migrate Virtual Desktops (VDI) to Nutanix which shares similar benefits to migrating vBCAs including:

Separating non complimentary VDI workloads from Server & vBCAs as these workloads do not mix well in centralized storage deployments
Improving resiliency / performance for VDI
Simplifying storage management for VDI
Reducing the workload on legacy SAN/NAS which will give an effective increase in performance for workloads remaining on the SAN/NAS
Increasing linear scalability for VDI for if/when the environment scales
Use legacy SAN/NAS for high capacity low IOPS workloads which are better suited to centralized storage than VDI

The last example I wanted to point out is Management workloads.

1. Migrate Infrastructure Management workloads to Nutanix.

As has been recommended by many industry experts, separating Management VMs from customer (e.g.: vCAC / vCloud tenants) or production server/desktop workloads (at both the Compute & Storage layers) can dramatically simplify the datacenter and help improve performance, resiliency & recoverability.

Again doing this provides similar benefits to the previous two examples.

Separating Management workloads from Server / vBCAs / VDI as these workloads should be separate from a security, resiliency, performance and recoverability perspectives.
Improving resiliency / performance for all workloads in the datacenter
Simplifying storage management for Management
Reducing the workload on legacy SAN/NAS which will give an effective increase in performance for workloads remaining on the SAN/SAN
Increasing scalability for if/when the management demands increase.
Maximizes the life span / performance of the legacy SAN/NAS

In summary, where it is not possible for budgetary reasons to migrate all workloads to Nutanix, migrating some workloads such as VDI, vBCA or Management to Nutanix will help alleviate the impact of scalability, performance and/or resiliency issues with your existing centralized SAN/NAS.

Nutanix also provides a solution which can start (very) small and continue to be scaled in a granular fashion over time until the SAN/NAS goes End of Life and/or when budget exists. At this time all workloads can then be migrated to Nutanix!

Data Locality & Read Cache – Why it’s critical for high performance Horizon View environments (Part 2)

Posted on September 24, 2013 by Josh Odgers

In Part 1 of this series, we discussed how Nutanix “Extent Cache” dramatically improves read performance in not only Horizon View environments, but all Virtual machines.

In Part 2, we will discuss how Nutanix further enhances Horizon View performance using a Nutanix feature which is known as “Shadow Clone”.

So the first question is “What is a Shadow Clone” and “How does it improve performance”?

To answer this question, lets first discuss the issue.

All the Linked Clones in a desktop pool access a shared “replica” disk. This creates large amounts of read I/O to the shared storage.

The below diagram shows what this looks like in a traditional storage architecture.

So when a Virtual Desktop in a Desktop Pool using Linked Clones needs to read data it has to exit the ESXi host, traverse the Storage Network, go via a Storage Controller and access the “replica” from either disk or cache.

As we discussed in Part 1, VMware have helped address this problem with CBRC, but not all the replica can fit within the CBRC which is limited to 2GB,.

Enter Nutanix with “Extent Cache” and the size of the Extent cache can be configured to any size thus ensuring the maximum amount of the “replica” can be served via Cache. So why do we need “Shadow Clones”?

The only issue with Extent Cache is that it is RAM assigned to the CVM, so the bigger the Extent Cache, the more RAM is being used on the ESXi host, so you want to aim for a balance between Cache capacity (and therefore % of cache hits) and Virtual Machine consolidation ratio on the ESXi host.

Enter Shadow Clones and we have the best of both worlds, 100% of the replica Read I/O will be served locally, via either Extent Cache or Shadow Clones.

Show how does “Shadow Clones” work?

What it does is intelligently analyse the I/O access pattern at the storage layer to identify what files are a shared read only disk (ie: Linked Clone Replica).

When a 100% read only disk is discovered, Nutanix will take a snapshot at the storage layer on each Controller VM (CVM) and redirect all read I/O to the local copy.

The below diagram shows what Shadow Clones looks like

The above is a dramatically simpler and more scalable solution than tradition architecture, as the solution will scale indefinitely without degrading performance.

Some of the benefits of Nutanix Shadow Clones are

1. Replica data is always served locally ot the ESXi host (via Extent Cache and Shadow Clones)
2. Does not require the use of CBRC and is not limited to 2GB
3. Reduced overhead on the Storage Network (IP Network) as read I/O is serviced locally
4. During boot storms, login storms and antivirus scans all replica data can be served locally and NO read I/O is forced to be served by a single storage controller. This not only improves Read performance but makes more I/O available for Write operations which are generally >=65% in VDI environments
6. The solution can scale while maintaining linear performance (Performance does not taper off at scale)
7. When the base image is updated, Nutanix detects the file has been written to an automatically creates a new snapshot which is replicated out to all nodes.
8. Feature is enabled once and does not require ongoing configuration or maintenance

Back to Part 1

A special Thank you to Jason Langone VCDX#54 (@langonej) for reviewing this post and Tabrez Memon one of the brilliant Engineers at Nutanix who has worked on features discussed in this post and provided valuable input into this series.

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: Josh Odgers

PART 2 – Problems with RAID and Object Based Storage for data protection

Can I use my existing SAN/NAS storage with Nutanix?

Share this:

Share this:

Share this: