Cloning VMs – Why less (I/O & throughput) is better!

I’ve seen the picture below floating around Twitter and LinkedIn which shows a 32GB VM being cloned in just 7 seconds on an All Flash Array (AFA) and has got a lot of attention.

The AFA peaked at over 7000MB/s during this time showing the AFA is capable of some serious throughput!345363bf-bbb3-4389-aafa-71c81f182de3-large

At this stage some people may be thinking im talking about Nutanix, so I would like to point out the above AFA is not a Nutanix NX-9000 All Flash Node.

So why did I write this post?

I am still surprised that technical people find this sort of test and result impressive, because to me the fact the AFA used 7000MB/s of bandwidth to perform the clone means it has not intelligently performed the clone and the process has used additional capacity while potentially having a high impact on the other workloads using the storage.

At this stage I guess I should explain what I mean by intelligently clone.

An intelligent clone in my mind is where:

a) The clone takes a few seconds to occur
b) The clone is offloaded to the storage layer
c) Uses almost zero I/O & bandwidth to perform the clone
d) Uses almost zero additional space

So in the above example, the solution has cloned the VM in a few seconds, so a) has been satisfied, and since there is no information provided I’m going to give it the benefit of the doubt and say the clone was offloaded to the storage layer, so im assuming (rightly or wrongly) that b) is also satisfied.

But what about c) and d).

If the clone uses 7000MB/s of bandwidth that must have some impact (if not a significant impact) on other workloads running on the storage, even if it is only for 7 seconds.

The clone was also writing data throughout the 7 seconds, so its also duplicating the data.

So the net result is a fast yet high impact (capacity / performance) clone.

Back in 2012, when I worked at IBM, I wrote this post (Netapp Edge VSA – Rapid Cloning Utility) about intelligent cloning, as a customer was suffering terrible VDI recompose times due to using a big dumb storage solution which had no inteligent cloning capabilities. The post shows even on an old IBM x3850 M2 with slow old 4 core processors running a Virtual Storage Appliance running on 3 peices of spinning rust (146GB SAS disks) and it still completes the task in just 4.73 seconds per clone in full compliance with the 4 items I identified as aspects of intelligent cloning (below).

a) The clone takes a few seconds to occur
b) The clone is offloaded to the storage layer
c) Uses almost zero I/O & bandwidth to perform the clone
d) Uses almost zero additional space

The reason intelligent cloning is so much faster is because there is no need to duplicate a VM, the intelligent cloning process simply creates pointers back to the original file (which remains Read Only) and only uses I/O & capacity when new data is created.

The process is actually mostly dependant on vCenter to register the new VM which is why the process takes a couple of seconds as the process takes almost no time at the storage layer. The size of the VM being cloned is irrelevant. (Note: In my post from 2012 it was a 10Gb VM although again the size has no impact on the speed of an intelligent clone)

In the post from 2012, I made the following observation:

Even if you have the worlds fastest array (insert you favorite vendor here), storage connectivity and the biggest and most powerful ESXi hosts the process of cloning a large number of virtual machines will still;

1. Take more time to complete than an intelligent cloning process like RCU

2. Impact the performance of your ESXi hosts and more than likley production VMs

3. Impact the performance of your storage network & array (and anything that uses it , physical or virtual).

So fast forward to 2015, we have lots of really fast All-Flash storage solutions, but for tasks like cloning, even these super fast all-flash solutions can’t outperform a single controller (2vCPU) Virtual Storage appliance running on an old IBM x3850 M2 server running in my test lab using intelligent cloning from back in 2012.

I also wrote this article (Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?) recently explaining the benefits of VAAI-NAS and how VAAI-NAS supports intelligent cloning even with Virtual Storage Appliance solutions.

In Summary:

I find a clone taking a few seconds and using next to no throughput and capacity to be impressive. This is a perfect example of less I/O and throughput (to perform the same task) being better!

Its great if a storage array has the capability to drive many GB/s of throughput, but its totally unnecessary for cloning and is only demonstrating the lack of intelligent cloning capabilities for the storage solution.

In my opinion its much better for a storage solutions to use its high performance capability for driving I/O to virtual machines servicing business applications than for tasks like cloning which can be done intelligently.

To show off more real world performance capabilities of a storage solution (especially an All-Flash array), the example really has to include multiple workloads with different I/O characteristics. This is something the storage industry (all vendors) continues to fail to provide and its something I would like to be a part of changing as things like “Peak” performance are no where near as important as “consistent” performance.

Back on topic though, If cloning is something you or your customers require, for say a VDI, Cloud deployment or just for rapid provisioning of testing & development VMs, consider a storage solution which has intelligent cloning capabilities such as VAAI-NAS which integrates with products like Horizon View (VCAI Clones) and vCloud Director (FAST Provisioning).

How to Architect a VSA , Nutanix or VSAN solution for >=N+1 availability.

How to architect a VSA, Nutanix or VSAN solution for the desired level of availability (i.e.: N+1 , N+2 etc) is a question I am asked regularly by customers and contacts throughout the industry.

This needs to be addressed in two parts.

1. Compute
2. Storage

Firstly, Compute level resiliency, As a cluster grows, the chances of a failure increases so the percentage of resources reserved for HA should increase with the size of the cluster.

My rule of thumb (which is quite conservative) is as follows:

1. N+1 for clusters of up to 8 hosts
2. N+2 for clusters of >8 hosts but <=16
3. N+3 for clusters of >16 hosts but <=24
4. N+4 for clusters of >24 hosts but <=32

The above is discussed in more detail in : Example Architectural Decision – High Availability Admission Control Setting and Policy

The below table highlights in Green my recommended HA percentage configuration based on the cluster size, up to the current vSphere limit of 32 nodes.

HApercentages

Some of you may be thinking, if my Nutanix or VSAN cluster is only configured for RF2 or FT1 for VSAN, I can only tolerate one node failure, so why am I reserving more than N+1.

In the case of Nutanix, after a node failure, the cluster can restore itself to a fully resilient state and tolerate subsequent failures. In fact, with “Block Awareness” a full 4 node block can be lost (so an N-4 situation) which if this is a requirement, needs to be considered for HA admission control reservations to ensure compute level resources are available to restart VMs.

Next lets talk about the issue perceived to be more complicated, Storage redundancy.

Storage redundancy for VSA, Nutanix or VSAN is actually not as complicated as most people think.

The following is my rule of thumb for sizing.

For N+1 , Ensure you have enough capacity remaining in the cluster to tolerate the largest node failing.

For N+2, Ensure you have enough capacity remaining in the cluster to tolerate the largest TWO nodes failing.

The examples below discuss Nutanix nodes and their capacity, but the same is applicable to any VSA or VSAN solution where multiple copies of data is kept for data protection, as opposed to RAID.

Example 1 , If you have 4 x Nutanix NX3060 nodes configured with RF2 (FT1 in VSAN terms) with 2TB usable per node (as shown below), in the event of a node failure, 2TB is no longer available. So the maximum storage utilization of the cluster should be <75% (6TB) to ensure in the event of any node failure, the cluster can be restored to a fully resilient state.

4node3060

Example 2 , If you have 2 x Nutanix NX3060 nodes configured with RF2 (FT1 in VSAN terms) with 2TB usable per node and 2 x Nutanix NX6060 nodes with 8TB usable per node (as shown below), in the event of a NX6060 node failure, 8TB is no longer available. So the maximum storage utilization of the cluster should be 12TB to ensure in the event of any node failure (including the 8TB nodes), the cluster can be restored to a fully resilient state.

4nodemixed

For environments using Nutanix RF3 (3 copies of data) or VSAN (FT2) the same rule of thumb applies but the usable capacity per node would be lower due to the increased capacity required for data protection.

Specifically for Nutanix environments, the PRISM UI shows if a cluster has sufficient capacity available to tolerate a node failure, and if not the following is displayed on the HOME screen and alerts can be sent if desired.

CapacityCritical

In this case, the cluster has suffered a node failure, and because it was sized suitably, it shows “Rebuild Capacity Available” as “Yes” and advises an “Auto Rebuild in progress” meaning the cluster is performing a fully automated self heal. Importantly no admin intervention is required!

If the cluster status is normal, the following will be shown in PRISM.

CapacityOK

In summary: The smaller the cluster the higher the amount of capacity needs to remain unused to enable resiliency to be restored in the event of a node failure, the same as the percentage of resources reserved for HA in a traditional compute only cluster.

The larger the cluster from both a storage and compute perspective, the lower the unused capacity is required for HA, so as has been a virtualization recommended practice for years….. Scale-out!

Related Articles:

1. Scale Out Shared Nothing Architecture Resiliency by Nutanix

2. PART 1 – Problems with RAID and Object Based Storage for data protection

3. PART 2 – Problems with RAID and Object Based Storage for data protection

Cost vs Reward for the Nutanix Controller VM (CVM)

I hear a lot of FUD (Fear Uncertainty and Doubt) getting thrown around about the Nutanix Controller VM (CVM) being a resource (vCPU/vRAM) hog.

So I thought I would address this perceived issue.

For those of you who are car people, you will understand the benefits of a Supercharger increasing performance of an engine.

The supercharger does this by attaching a belt to a pulley connected to the motor which spins the supercharger to force more air into the combustion chambers. This allows more fuel to be added to the mix to produce higher horsepower from the same engine displacement (engine capacity, ie: 2.0 Litres)

What is downside of a Supercharger?

The supercharger belt connected to the pulley can require even hundreds of horsepower to simply drive the supercharger. As such, a 300HP engine may have to use half of its power to just drive the supercharger.

So for example, a 300HP engine less 60HP (25%) to drive the supercharger equates to only 240HP remaining. But, as a result of the supercharger forcing more air into the engine, the engine now produces an additional 200HP.

So the “cost” of running the supercharger is 60HP, but the overall benefit is 200HP, resulting in the engine now producing 440HP.

Let’s now relate this back to the Nutanix Controller VM (CVM).

The CVM provides the storage features,functionality,excellent scalability and performance for the Virtual Machines. For example, reducing the latency thanks to Data Locality keeping data local to the compute node running the VM for faster reads and writes.

The faster the reads and writes, the less time VMs spend in a “CPU wait” state waiting for I/Os to be acknowledged by the storage which means the CPUs are being more efficiently utilized. This is a small part of the value the Nutanix CVM provides.

In Summary, the CVM does use some compute resources from the host (which depend on the node type and performance required) but like a Supercharger to an engine, the Nutanix CVM delivers significantly higher value to the VMs than the resources it uses.

Related Articles:

1. Rule of Thumb: Sizing for Storage Performance in the new world.

2. Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?

3. PART 1 – Problems with RAID and Object Based Storage for data protection