How to Architect a VSA , Nutanix or VSAN solution for >=N+1 availability.

How to architect a VSA, Nutanix or VSAN solution for the desired level of availability (i.e.: N+1 , N+2 etc) is a question I am asked regularly by customers and contacts throughout the industry.

This needs to be addressed in two parts.

1. Compute
2. Storage

Firstly, Compute level resiliency, As a cluster grows, the chances of a failure increases so the percentage of resources reserved for HA should increase with the size of the cluster.

My rule of thumb (which is quite conservative) is as follows:

1. N+1 for clusters of up to 8 hosts
2. N+2 for clusters of >8 hosts but <=16
3. N+3 for clusters of >16 hosts but <=24
4. N+4 for clusters of >24 hosts but <=32

The above is discussed in more detail in : Example Architectural Decision – High Availability Admission Control Setting and Policy

The below table highlights in Green my recommended HA percentage configuration based on the cluster size, up to the current vSphere limit of 32 nodes.

HApercentages

Some of you may be thinking, if my Nutanix or VSAN cluster is only configured for RF2 or FT1 for VSAN, I can only tolerate one node failure, so why am I reserving more than N+1.

In the case of Nutanix, after a node failure, the cluster can restore itself to a fully resilient state and tolerate subsequent failures. In fact, with “Block Awareness” a full 4 node block can be lost (so an N-4 situation) which if this is a requirement, needs to be considered for HA admission control reservations to ensure compute level resources are available to restart VMs.

Next lets talk about the issue perceived to be more complicated, Storage redundancy.

Storage redundancy for VSA, Nutanix or VSAN is actually not as complicated as most people think.

The following is my rule of thumb for sizing.

For N+1 , Ensure you have enough capacity remaining in the cluster to tolerate the largest node failing.

For N+2, Ensure you have enough capacity remaining in the cluster to tolerate the largest TWO nodes failing.

The examples below discuss Nutanix nodes and their capacity, but the same is applicable to any VSA or VSAN solution where multiple copies of data is kept for data protection, as opposed to RAID.

Example 1 , If you have 4 x Nutanix NX3060 nodes configured with RF2 (FT1 in VSAN terms) with 2TB usable per node (as shown below), in the event of a node failure, 2TB is no longer available. So the maximum storage utilization of the cluster should be <75% (6TB) to ensure in the event of any node failure, the cluster can be restored to a fully resilient state.

4node3060

Example 2 , If you have 2 x Nutanix NX3060 nodes configured with RF2 (FT1 in VSAN terms) with 2TB usable per node and 2 x Nutanix NX6060 nodes with 8TB usable per node (as shown below), in the event of a NX6060 node failure, 8TB is no longer available. So the maximum storage utilization of the cluster should be 12TB to ensure in the event of any node failure (including the 8TB nodes), the cluster can be restored to a fully resilient state.

4nodemixed

For environments using Nutanix RF3 (3 copies of data) or VSAN (FT2) the same rule of thumb applies but the usable capacity per node would be lower due to the increased capacity required for data protection.

Specifically for Nutanix environments, the PRISM UI shows if a cluster has sufficient capacity available to tolerate a node failure, and if not the following is displayed on the HOME screen and alerts can be sent if desired.

CapacityCritical

In this case, the cluster has suffered a node failure, and because it was sized suitably, it shows “Rebuild Capacity Available” as “Yes” and advises an “Auto Rebuild in progress” meaning the cluster is performing a fully automated self heal. Importantly no admin intervention is required!

If the cluster status is normal, the following will be shown in PRISM.

CapacityOK

In summary: The smaller the cluster the higher the amount of capacity needs to remain unused to enable resiliency to be restored in the event of a node failure, the same as the percentage of resources reserved for HA in a traditional compute only cluster.

The larger the cluster from both a storage and compute perspective, the lower the unused capacity is required for HA, so as has been a virtualization recommended practice for years….. Scale-out!

Related Articles:

1. Scale Out Shared Nothing Architecture Resiliency by Nutanix

2. PART 1 – Problems with RAID and Object Based Storage for data protection

3. PART 2 – Problems with RAID and Object Based Storage for data protection

Cost vs Reward for the Nutanix Controller VM (CVM)

I hear a lot of FUD (Fear Uncertainty and Doubt) getting thrown around about the Nutanix Controller VM (CVM) being a resource (vCPU/vRAM) hog.

So I thought I would address this perceived issue.

For those of you who are car people, you will understand the benefits of a Supercharger increasing performance of an engine.

The supercharger does this by attaching a belt to a pulley connected to the motor which spins the supercharger to force more air into the combustion chambers. This allows more fuel to be added to the mix to produce higher horsepower from the same engine displacement (engine capacity, ie: 2.0 Litres)

What is downside of a Supercharger?

The supercharger belt connected to the pulley can require even hundreds of horsepower to simply drive the supercharger. As such, a 300HP engine may have to use half of its power to just drive the supercharger.

So for example, a 300HP engine less 60HP (25%) to drive the supercharger equates to only 240HP remaining. But, as a result of the supercharger forcing more air into the engine, the engine now produces an additional 200HP.

So the “cost” of running the supercharger is 60HP, but the overall benefit is 200HP, resulting in the engine now producing 440HP.

Let’s now relate this back to the Nutanix Controller VM (CVM).

The CVM provides the storage features,functionality,excellent scalability and performance for the Virtual Machines. For example, reducing the latency thanks to Data Locality keeping data local to the compute node running the VM for faster reads and writes.

The faster the reads and writes, the less time VMs spend in a “CPU wait” state waiting for I/Os to be acknowledged by the storage which means the CPUs are being more efficiently utilized. This is a small part of the value the Nutanix CVM provides.

In Summary, the CVM does use some compute resources from the host (which depend on the node type and performance required) but like a Supercharger to an engine, the Nutanix CVM delivers significantly higher value to the VMs than the resources it uses.

Related Articles:

1. Rule of Thumb: Sizing for Storage Performance in the new world.

2. Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?

3. PART 1 – Problems with RAID and Object Based Storage for data protection

The Impact of Transparent Page Sharing (TPS) being disabled by default

Recently VMware announced via the VMware Security Blog, that Transparent Page Sharing (TPS) will be disabled by default in an upcoming update of ESXi.

Since this announcement I have been asked how will this impact sizing vSphere solutions and as a result I’ve been involved in discussions about the impact of this on Business Critical Application, Server and VDI solutions.

Firstly what benefits does TPS provide? In my experience, in recent times with large memory pages essentially not being compatible with TPS, even for VDI environments where all VMs are running the same OS, the benefits have been minimal, in general <20% if that.

Memory overcommitment in general is not something that can achieve significant savings from because memory is much harder to overcommit than CPU. Overcommitment can be achieved but only where memory is not all being used by the VM/OS & Applications, in which case, simply right sizing VMs will give similar memory saving and likely result in better overall VM and cluster performance.

So to begin, in my opinion TPS is in most cases overrated.

Next Business Critical Applications (vBCA):

In my experience, Business Critical Applications such as MS Exchange, MS SQL , Oracle would generally have memory reservations, and in most cases the memory reservation would be 100% (All Memory Locked).

As a result, in most environments running vBCA’s, TPS has no benefits already, so TPS being disabled has no significant impact for these workloads.

Next End User Computing (EUC) Solutions:

There are a number of EUC solutions, such as Horizon View , Citrix XenDesktop and Citrix PVS which all run very well on vSphere.

One common issue with EUC solutions is architects fail to consider the vSwap storage requirements for Virtual Servers (for Citrix PVS) or VDI such as Horizon View.

As a result, a huge amount of Tier 1 storage can be wasted with vswap file storage. This can be up to the amount vRAM allocated to VMs less memory reservations!

The last part is a bit of a hint, how can we reduce or eliminate the need for Tier 1 storage of vSwap? By using Memory Reservations!

While TPS can provide some memory savings, I would invite you to consider the cost saving of eliminating the need for vSwap storage space on your storage solution, and the guarantee of consistent performance (at least from a memory perspective) outweigh the benefits of TPS.

Next Virtual Server Solutions:

Lets say we’re talking about general production servers excluding vBCAs (discussed earlier). These servers are providing applications and functions to your end users so consistent performance is something the business is likely to demand.

When sizing your cluster/s, architects should size for at least N+1 redundancy and to have memory utilization around the 1:1 mark in a host failure scenario. (i.e.: Size your cluster assuming a host failure or maintenance of one host is being performed).

As a result, any reasonable architectural assumption around TPS savings would be minimal.

As with EUC solutions, I would again invite you to consider the cost saving of eliminating the vSwap storage requirement and the guarantee of consistent performance outweigh the benefits of TPS.

Next Test/Dev Environments:

This is probably the area where TPS will provide the most benefit, where memory overcommitment ratios can be much higher as the impact to the applications(VMs) of memory saving techniques such as swapping/ballooning should not have as high an impact on the business as with vBCA, EUC or Server workloads.

However, what is Test/Dev for? In my opinion, Test/Dev should where possible simulate production conditions so the operational verification of an application can be accurately conducted before putting the workloads into production. As such, the Test/Dev VMs should be configured the same way as they are intended to be put into production, including Memory Reservations and CPU overcommitment.

So, can more compute overcommitment be achieved in Test/Dev, sure, but again is the impact of vSwap space, potentially inconsistent performance and the increased risk of operational verification not being performed to properly simulate product worth the minimal benefits of TPS?

Summary

If VMware believe TPS is a significant enough security issue to make it disabled by default, this is something architects should consider, however I would argue there are many other areas where security is a much larger issue, but that’s a different topic.

TPS being disabled by default is likely to only impact a small percentage of virtual workloads and with RAM being one of the most inexpensive components in the datacenter, ensuring consistent performance by using Memory Reservations and eliminating the architectural considerations and potentially high storage costs for VMs vSwap make leaving TPS disabled an attractive option regardless of if its truly a security advantage or not.

Related Articles:

1. Future direction of disabling TPS by default and its impact on capacity planning – @FrankDenneman (VCDX #29)

2. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)

The VCDX candidates advantage over the panellists.

As the candidate, you submit a VCDX application based on a project you have worked on from start to finish and in most case, lead from a technical/architectural perspective.

You therefore should have:

  • Had Initial discussions with the customer about requirements.
  • Lead or been involved in Design workshops
  • Considered design decisions
  • Documented the detailed design along with implementation & test plans etc
  • Either overseen or been actively involved with implementation

As a result of the above, you have spent many hours, potentially hundreds or thousands of hours depending on the size of the project and you will have intimate knowledge of the design & solution.

On the other hand, VCDX panellists, while they are experts in the field, have been given your application, including the design and a very limited amount of time to review and prepare for the defence.

As a result, the VCDX candidate has a HUGE advantage over the panellists!

So in a defence, who is the expert in the room on the design? The Candidate!

As a result, the candidate should be an expert in the design being presented and answering questions from the panel about the design should not be intimidating.

To all VCDX candidates, understand that you have a major advantage and go defend your design with confidence!

Microsoft Exchange on Nutanix Best Practice Guide

I am pleased to announce that the Best Practice guide for Microsoft Exchange on Nutanix is released and can be found here.

For me deploying MS Exchange on Nutanix with vSphere combines best of breed application level resiliency (in the form of Exchange Database Availability Groups), infrastructure and hypervisor technologies to provide an infrastructure with not only high performance, but with industry leading scalability, no silos and very high efficiency & resiliency.

All of this leads to overall lower CAPEX/OPEX for customers.

In summary by Virtualizing MS Exchange on Nutanix, customers realize several key benefits including:

  • Ability to use a standard platform for all workloads in the datacenter, thus allowing the removal of legacy silos resulting in lower overall cost, and increased operational efficiencies.
    • An example of this is no disruption to MS Exchange users when performing Nutanix / Hypervisor or HW maintenance
  • A highly resilient , scalable and flexible MS Exchange deployment.
  • Reducing the number of Exchange Mailbox servers required to maintain 4 copies of Exchange data thanks to the combination of NDFS + DAG. (2 copies at NDFS layer / 2 copies at DAG layer)
  • Eliminate the need for large / costly refresh cycles of HW as individual nodes can be added and removed non disruptively.
  • Simplified architecture, no need for complex sizing architecture or risk of over sizing day 1, start small and scale VMs, Compute or storage if/when required.
  • No dependency of specific HW, Exchange VMs can be migrated to/from any Nutanix node and even to non Nutanix nodes.
  • Full support from Nutanix including at the Exchange, Hypervisor and Storage layers with support from Microsoft via Premier Support contracts or via TSANet.
  • Lower CAPEX/OPEX as Exchange can be combined with new or existing Nutanix/Virtualization deployment.
  • Reduced datacenter costs including Power, Cooling , Space (RU)

I hope you enjoy the Best Practice guide and look forward to hearing about your MS Exchange on Nutanix questions & experiences.

PART 2 – Problems with RAID and Object Based Storage for data protection

Following on from Part 1, this post will discuss hyper-converged Distributed File Systems (i.e,: Nutanix) and compare with traditional SAN/NAS RAID and  hyper-converged solutions using Object storage for data protection.

The below diagram shows a 4 node hyper-converged solution using a Distributed File System with the same 4 x 4TB SATA drives with data protection using replication with 2 copies. (Nutanix calls this Resiliency Factor 2)

HyperconvergedDFSNormal

The first difference you may have noticed, is the data is much more granular than the Hyper-Converged Object store example in Part 1.

The second less obvious difference is the replicated copies of the data (i.e.: The data with Purple letters) on node 1 do not reside on a single other node, but are distributed throughout the cluster.

Now lets look at a drive failure example:

Here we see Node 1 has lost a Drive hosting 8 granular pieces of data 1MB in size each.

HyperconvergedDFSRecovery

Now the Distributed File System detects that the data represented by A,B,C,D,E,I,M,P has only a single copy within the cluster and starts the restoration process.

Lets walk through each step although these steps are completed concurrently.

1. Data “A” is replicated from Node 2 to Node 3
2. Data “B” is replicated from Node 2 to Node 4
3. Data “C” is replicated from Node 3 to Node 2
4. Data “D” is replicated from Node 4 to Node 2
5. Data “E” is replicated from Node 2 to Node 4
6. Data “I” is replicated from Node 3 to Node 2
7. Data “M” is replicated from Node 4 to Node 3
8. Data “P” is replicated from Node 4 to Node 3

Now the cluster has restored resiliency.

So what was the impact on each node?readwriteiorecovery

The above table shows a simplified representation of the workload of restoring resiliency to the cluster. As we can see, the workload (being 8 granular pieces of data being replicated) was distributed across the nodes very evenly.

Next lets look at the advantages of a Hyper-Converged Solution with a Distributed File System (which Nutanix uses).

  1. Highly granular distribution using 1MB extents not large Objects.
  2. The work required to restore resiliency after one drive (or node) failure was distributed across all drives and nodes in the Cluster leveraging all drives/nodes capability. (i.e.: Not constrained to the <100 IOPS of a single drive)
  3. The restoration rebuild is a low impact activity as the workload is distributed across the cluster and not dependant on source/destination pair of drives or nodes
  4. The rebuild has a low impact on the virtual machines running on the distributed file system and consistent performance is maintained.
  5. The larger the cluster the quicker and lower impact the rebuild is as the workload is distributed across a higher number of drives/nodes for the same size (Gb) worth of restoration.
  6. With Nutanix SSDs are used not only for Read/Write cache but as a persistent storage tier, meaning the recovering data will be written to SSD and where the data being recovered is not in cache (Memory or SSD tiers) it is still possible the data will be in the persistent SSD tier which will dramatically improve the performance of the recovery.

Summary:

As discussed in Part 1, Traditional RAID used by SAN/NAS and Hyper-converged solutions using Object based storage both suffer similar issues when recovering from drive or node failure.

Where as Nutanix Hyper-converged solution using the Nutanix Distributed File System (NDFS) can restore resiliency following a drive or node failure faster and with lower impact thanks to its highly granular and distributed architecture, meaning more consistent performance for virtual machines.

PART 1 – Problems with RAID and Object Based Storage for data protection

I regularly get asked to compare the resiliency of traditional centralized storage with converged as well as newer technologies such as hyper-converged.

So this post will discuss the problems with RAID and newer hyper-converged solutions using Object based storage for data protection.

This post will discuss two examples below, with Part 2 discussing Hyper-converged solutions using Distributed File Systems.

1. Traditional RAID

2. Hyper-converged Object Based Storage

Starting with Traditional shared storage, and the most common RAID level in my experience, RAID 5.

The below diagram shows a 3 x 4TB SATA drives in a RAID 5 with a Hot Spare.
3 Disk R5 w Hot Spare NO BG

Now lets look a drive failure scenario. We now have the Hot Spare activate and start rebuilding as shown below.

3 Disk R5 w Hot Spare REBUILDING NO BG

So this all sounds fine, we’ve had a drive failure, and a spare drive has automatically taken its place and started rebuilding the data.

The problem now is that even in this simplified/small example we have 2 drives (or say 200 IOPS of drives) trying to rebuild onto just a single drive. So the maximum rate at which the RAID 5 can restore resiliency is limited to that of a single drive or 100 IOPS.

If this was a 8 disk RAID 5, we would have 7 drives (or 700 IOPS) trying to rebuild again to only a single drive or 100 IOPS.

There are multiple issues with this architecture.

  1. The restoration of resiliency of the entire RAID is constrained by the destination drive, in this case a SATA drive which can sustain less than 100 IOPS
  2. A single subsequent HDD failure within the RAID will cause data loss.
  3. The RAID rebuild is a high impact activity on the storage controllers which can impact all storage
  4. The RAID rebuild is an especially high impact activity on the virtual machines running on the RAID.
  5. The larger the RAID or the capacity drives in the RAID, the longer the rebuild takes and the higher the performance impact and chance of subsequent failures leading to data loss.

Now I’m sure most of you understand this concept, and have felt the pain of a RAID rebuild taking many hours or even days, but with new hyper converged technology this issue is no longer a problem, right?

Wrong!

It entirely depends on how data is recovered in the event of a drive failure. Lets look at an example of an hyper-converged solution using an object store.The below shows a simplified example of a Hyper-converged Object Based Storage with 4 objects represented by Object A,B,C and D in Black, and the 2nd replicated copy of the object represented Object A,B,C and D in Purple.

Note: Each object in the Object Store can be hundreds of GB in size.HyperconvergedObjectStoreNormal

Let’s take a look what happens in a disk failure scenario.

HyperconvergedObjectStoreFailure

From the above diagram we can see a drive has failed on Node 1, which means Object A and Object D’s replica have been lost. The object store will then replicate a copy of Object A to Node 4, and a replica of Object D to Node 2 to restore resiliency.

There are multiple issues with this architecture.

  1. Object based storage can lack granularity as Objects can be 200Gb+.
  2. The restoration of resiliency of any single object is constrained by the source drive or node.
  3. The restoration of resiliency of any single object is also constrained by the destination drive or node.
  4. The restoration of multiple objects (such as Object A & D in the above example) is constrained by the same drive or node which will result in contention and slow the process of restoring resiliency to both objects.
  5. The impact of the recovery is High on virtual machines running on the source and destination nodes.
  6. The recovery of an Object is constrained by the source and destination node per object.
  7. Object stores generally require a witness, which is stored on another node in the cluster. (Not illustrated above)

It should be pointed out, where SSDs are used for a write cache, this can help reduce the impact and speed up recovery in some cases, but where data needs to be recovered from outside of cache, i.e.: A SAS or SATA drive, the fact writes go to SSD makes no difference as the writes are constrained by the read performance.

Summary:

Traditional RAID used by SAN/NAS and newer Hyper-converged Object based storage both suffer similar issue when recovering from drive or node failures which include:

  1. The restoration of resiliency is constrained by the source drive or node
  2. The restoration of resiliency is constrained by the destination drive or node
  3. The restoration is high impact on the desination
  4. The recovery of one object is constrained by the network connectivity between just two nodes.
  5. The impact of the recovery is High on any data (such as virtual machines) running on the RAID or source/destination node/s
  6. The recovery of RAID or an Object is constrained by a single part of the infrastructure being a RAID controller / drive or a single node.

In Part 2, we will look at the Hyper-converged Distributed File Systems.

Rule of Thumb: Sizing for Storage Performance in the new world.

In the new world where storage performance is decoupled with capacity with new read/write caching and Hyper-Converged solutions, I always get asked:

How do I size the caching or Hyper-Converged solution to ensure I get the storage performance I need.

Obviously I work for Nutanix, so this question comes from prospective or existing Nutanix customers, but its also relevant to other products in the market, such as PernixData or any Hybrid (SSD+SAS/SATA) solution.

So for indicative sizing (i.e.: Presales) where definitive information is not available, I use the following Rule of Thumb.

Take your last two monthly full backups, and take the delta between them and multiply that by 3.

So if my full backup from August was 10TB and my full backups from September is 11TB, my delta is 1TB. I then multiply that by 3 and we get 3TB which is our assumption of the “Active Working Set” or in basic terms, the data which needs performance. (Because cold or inactive data can sit on any tier without causing performance issues).

Now I  size my SSD tier for 3TB of usable capacity.

The next question is:

Why multiple the backup data delta by 3?

This is based on an assumption (since we don’t have any hard data to go on) that the Read/Write ratio is 70% Read, 30% write.

Now those of you familiar with this thing called Maths, would argue 70/30 is 2.33333 which is true. So rounding up to 3 is essentially a buffer.

I have found this rule of thumb works very well, and customers I have worked with have effectively had All Flash Array performance because the “Active Working Set” all resides within the SSD tier.

Caveats to this rule of thumb.

1. If a customer does a significant amount of deletions during the month, the delta may be smaller and result in an undersized SSD tier.

Mitigation: Review several months of full backup logs and average the delta.

2. If the environment’s Read/Write ratio is much higher than 70/30, then the delta from the backup multiplied by 3 may again result in  an undersized SSD tier.

Mitigation: Perform some investigation into your most critical workloads and validate or correct the assumption of multiplying by 3

3. This rule of thumb is for Server workloads, not VDI.

VDI Read/Write ratio is generally almost opposite to server, and around 30/70 Read/Write. However the SSD tier for VDI should be sized taking into account the benefits of VAAI/VCAI cloning and things like de duplication (for Memory and SSD tiers) which some products, like Nutanix offer.

Summary / Disclaimer

This rule of thumb works for me 90% of the time when designing Nutanix solutions, but your results may vary depending on the platform you use.

I welcome any feedback or suggestions of alternate sizing strategies which I will update the post with where appropriate.

Can I use my existing SAN/NAS storage with Nutanix?

I question I get regularly is, “Can I use my existing SAN/NAS storage with Nutanix?”.

The short answer is, as always “It depends”.

  • iSCSI, NFS & SMB 3.0 can be presented to Nutanix nodes just like existing non Nutanix nodes.
  • FC based storage cannot be used as Nutanix does not support FC HBAs

The below diagram shows a Nutanix NX-3460 block w/ 4 nodes having both Nutanix Containers presented to the nodes as well as iSCSI LUNs , SMB 3.0 or NFS Mount points connected from the centralized SAN/NAS.

Note: SMB 3 is not supported for ESXi hosts & NFS is not supported for Hyper-V.

Nutanix w External iSCSi NFS  SMB Storage

So what is the use cases for this style of deployment?

If you’re not ready to do an entire infrastructure refresh for whatever reason/s, you may wish to transition to Nutanix over time while maximizing ROI and lifespan of you’re existing storage.

Here is some examples of what I recommend customers do:

1. Migrate Business Critical Applications (BCAs) to Nutanix

There are many benefits of doing this including:

  • Improving resiliency / performance for vBCAs
  • Simplifying storage management for vBCAs
  • Freeing up capacity and reducing the workload on legacy SAN
  • Increasing ease of scalability for critical workloads
  • Use legacy SAN/NAS for high capacity low IOPS workloads which are better suited to centralized storage than vBCAs

Another great option is

2. Migrate Virtual Desktops (VDI) to Nutanix which shares similar benefits to migrating vBCAs including:

  • Separating non complimentary VDI workloads from Server & vBCAs as these workloads do not mix well in centralized storage deployments
  • Improving resiliency / performance for VDI
  • Simplifying storage management for VDI
  • Reducing the workload on legacy SAN/NAS which will give an effective increase in performance for workloads remaining on the SAN/NAS
  • Increasing linear scalability for VDI for if/when the environment scales
  • Use legacy SAN/NAS for high capacity low IOPS workloads which are better suited to centralized storage than VDI

The last example I wanted to point out is Management workloads.

1. Migrate Infrastructure Management workloads to Nutanix.

As has been recommended by many industry experts, separating Management VMs from customer (e.g.: vCAC / vCloud tenants) or production server/desktop workloads (at both the Compute & Storage layers) can dramatically simplify the datacenter and help improve performance, resiliency & recoverability.

Again doing this provides similar benefits to the previous two examples.

  • Separating Management workloads from Server / vBCAs / VDI as these workloads should be separate from a security, resiliency, performance and recoverability perspectives.
  • Improving resiliency / performance for all workloads in the datacenter
  • Simplifying storage management for Management
  • Reducing the workload on legacy SAN/NAS which will give an effective increase in performance for workloads remaining on the SAN/SAN
  • Increasing scalability for if/when the management demands increase.
  • Maximizes the life span / performance of the legacy SAN/NAS

In summary, where it is not possible for budgetary reasons to migrate all workloads to Nutanix, migrating some workloads such as VDI, vBCA or Management to Nutanix will help alleviate the impact of scalability, performance and/or resiliency issues with your existing centralized SAN/NAS.

Nutanix also provides a solution which can start (very) small and continue to be scaled in a granular fashion over time until the SAN/NAS goes End of Life and/or when budget exists. At this time all workloads can then be migrated to Nutanix!

Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?

I saw a tweet recently (below) which inspired me to write this post as there is still a clear misunderstanding of the benefits VAAI provides (even with Virtual Storage Appliances).

vaaionvsatweet2

I have removed the identity of the individual who wrote the tweet and the people who retweeted this as the goal of this post is solely to correct what I believe is mis-information.

My interpretation of the tweet was (and remains) if a solution uses a Virtual Storage Appliance (VSA) which resides on the ESXi host then VAAI is not providing any benefits.

My opinion on this topic is:

Compared to a traditional centralised NAS (such as a Netapp or EMC Isilon) providing NFS storage with VAAI-NAS support, a Nutanix or VSA solution has exactly the same benefits from VAAI!

My 1st reply to the tweet was:

vaaionvsatweetconvojoshreply2

The test I was referring to with Netapp OnTap Edge can be found here which was posted in Jan 2013, well prior to my joining Nutanix when I was working for IBM where I had been evangelising VAAI/VCAI based solutions for a long time as VAAI/VCAI provides significant value to VMware customers.

The following shows the persons initial reply to my tweet.

vaaionvsatweetconvo32

I responded with the below mentioning I will do a blog which is what you’re reading now.

I went onto provide some brief replies as shown below.

repliesdetail

The main comments from this persons tweets I would summarize (rightly or wrongly) below:

  • VAAI is designed only to offload functions externally (or off the ESXi host)
  • He/She had not seen any proof of performance advantages from VAAI on VSAs
  • Its broken logic to use VAAI with a VSA

Firstly, I would like comment on VAAI being designed to offload functions externally (or off the ESXi host). I don’t disagree VAAI has some functions designed to offload to the (centralised) array but VAAI also has numerous functions which are designed to bring other efficiencies to a vSphere environment.

An example of a feature designed to offload to a central array is the “XCOPY” primitive.

A simple example of what “XCOPY” or Extended Copy provides is offloading a Storage vMotion on block based storage (i.e.: VMFS over iSCSI,FC,FCoE not NFS) to the array so the ESXi host does not have to process the data movement.

This VAAI primitive would likely be of little benefit in a VSA environment where the storage is presented is block based and Storage DRS for example was used. The data movement would be offloaded from ESXi to the VSA running on ESXi and host would still be burdened with the SvMotion.

However XCOPY is only one of the many primitives of VAAI, and VAAI does alot more than just offload Storage vMotions.

For the purpose of this post, I will be discussing VAAI with Nutanix whos Software defined storage solution runs in a VM on every ESXi host in a Nutanix cluster.
Note: This information is also relevant to other VSAs which support VAAI-NAS.

So what benefit does VAAI provide to Nutanix or a VSA solution running NFS?

Nutanix deploys by default with NFS and supports the VAAI-NAS primitives which are:

  • Full File Clone
  • Fast File Clone
  • Reserve Space
  • Extended Statistics

Note: XCOPY is not supported on NFS, importantly and specifically speaking for Nutanix it is not required as SvMotion will be rarely if ever used with Nutanix solutions.

See my post “Storage DRS and Nutanix – To use, or not to use, that is the question?” for more details on why SvMotion is rarely needed when using Nutanix.

For more details of VAAI primitives, Cormac Hogan (@CormacJHogan) wrote an excellent post which can be found here.

Now here is an example of a significant performance benefits of VAAI with Nutanix.

Lets look at Clone of a VM on a Nutanix platform, the VMs details are below.admin01vm

The VM I have used for this test resides on a datastore called “Management” (as per the above image) which presented via NFS and has VAAI (Hardware Acceleration) enabled as shown below.datastore

Now if I do a simple clone of a VM (as shown below) if the VM is turned on, VAAI-NAS is bypassed as the “Fast File Clone” primitive only works on VMs which are powered off.

clone

So a simple way to test the performance benefits of VAAI on any platform (including Hyper-converged such as Nutanix, a Virtual Storage Appliance (VSA) such as Netapp Ontap Edge or traditional centralised SAN or NAS) is to clone a VM while powered on then shut-down the VM and clone it again.

I performed this test and the first clone with the VM powered on started at 1:17:23 PM and finished at 1:26:12 PM, so a total of 8 mins 49 seconds.

Next I shut down the VM and repeated the clone operation.cloneresults

As we can see in the above screen capture from the 2nd clone started at 1:26:49 PM and finished at 1:26:54 PM, so a total of 5 seconds.

The reason for the huge difference in the speed of the two clones is because VAAI-NAS “Fast File Clone” primitive offloaded the 2nd clone to the Nutanix platform (which runs as a VM on the ESXi host) which has intelligently cloned the VM (using metadata resulting in almost zero data creation) as opposed to 1st clone where VAAI-NAS was not used which resulted in the hypervisor and storage solution having to read 11.18GB of data (being the source VM – Admin01) and write a full copy of the same data resulting in effectively >22GB of data movement in the environment.

Now from a capacity savings perspective, a simple way to demonstrate the capacity savings of VAAI on any platform is to clone a VM multiple times and compare the before and after datastore statistics.

Before I performed this test I captured a baseline of the Management datastore as shown below.

BeforeCloningCapacityVMcount

The above highlighted areas show:

  • Virtual Machines and Templates as 83
  • Capacity 8.49TB
  • Provisioned Space 7.09TB
  • Free Space 7.01TB

I then cloned the Admin01 VM a total of 7 times.clone7vmsrecenttasks

Immediately following the last clone completing I took the below screen shot of the Management datastores statistics.

AfterCloningCapacityVMcount

The above highlighted areas in the updated datastore summary show:

  • Virtual Machines and Templates INCREASED by 7 to 90 (as I cloned 7 VMs)
  • Capacity remained the same at 8.49TB
  • Provisioned Space INCREASED to 7.29TB as we cloned 7 x ~40Gb VMs (Total of ~280GB)
  • Free Space REMAINED THE SAME at 7.01TB due to VAAI-NAS Fast File Clone primitive working with the Nutanix Distributed File System.

So VAAI-NAS allowed a VM of ~11GB of used storage (~40GB provisioned) to be cloned without using any significant additional disk space and the clones were each done in between 5 and 7 seconds each.

So some of the benefits VAAI-NAS provides to Nutanix (which some people would term as a VSA type solution) include:

  • Near instant VM cloning via vSphere Client/s (as shown above)
  • Near instant Horizon View Linked Clone deployments (VCAI) – Similar to example shown.
  • Near instant vCloud Director clones (via FAST Provisioning) – Similar to example shown.
  • Major capacity savings by using Intelligent cloning rather than Full Clones (As shown above)
  • Lower CPU overhead for both ESXi hosts AND Nutanix Controller VM (CVM)
  • Ability to create EagerZeroThick VMDKs on NFS (e.g.: To support Fault Tolerance & clustered workloads such as Oracle RAC)
  • Enhanced ability to get statistics on file sizes , capacity usage etc on NFS

In Summary:

Overall I would say that VMware have developed an excellent API in VAAI and Nutanix along with VSA providers having support for VAAI provides major advantages and value to our joint customers with VMware.

It would be broken logic NOT to leverage the advantages of VAAI regardless of storage type (VSA, Nutanix or traditional centralized SAN/NAS) and for the vast majority of vSphere deployments, any storage solution not supporting (or having issues/bugs with) VAAI will have significant downsides.

I am looking forward to ongoing developments from VMware such as vVols and VASA 2.0 to continue to enhance storage of vSphere solutions in the future.

I hope customers and architects now have the correct information to make the most effective design and purchasing recommendations to meet/exceed customer requirements.