How to Architect a VSA , Nutanix or VSAN solution for >=N+1 availability.

How to architect a VSA, Nutanix or VSAN solution for the desired level of availability (i.e.: N+1 , N+2 etc) is a question I am asked regularly by customers and contacts throughout the industry.

This needs to be addressed in two parts.

1. Compute
2. Storage

Firstly, Compute level resiliency, As a cluster grows, the chances of a failure increases so the percentage of resources reserved for HA should increase with the size of the cluster.

My rule of thumb (which is quite conservative) is as follows:

1. N+1 for clusters of up to 8 hosts
2. N+2 for clusters of >8 hosts but <=16
3. N+3 for clusters of >16 hosts but <=24
4. N+4 for clusters of >24 hosts but <=32

The above is discussed in more detail in : Example Architectural Decision – High Availability Admission Control Setting and Policy

The below table highlights in Green my recommended HA percentage configuration based on the cluster size, up to the current vSphere limit of 32 nodes.

HApercentages

Some of you may be thinking, if my Nutanix or VSAN cluster is only configured for RF2 or FT1 for VSAN, I can only tolerate one node failure, so why am I reserving more than N+1.

In the case of Nutanix, after a node failure, the cluster can restore itself to a fully resilient state and tolerate subsequent failures. In fact, with “Block Awareness” a full 4 node block can be lost (so an N-4 situation) which if this is a requirement, needs to be considered for HA admission control reservations to ensure compute level resources are available to restart VMs.

Next lets talk about the issue perceived to be more complicated, Storage redundancy.

Storage redundancy for VSA, Nutanix or VSAN is actually not as complicated as most people think.

The following is my rule of thumb for sizing.

For N+1 , Ensure you have enough capacity remaining in the cluster to tolerate the largest node failing.

For N+2, Ensure you have enough capacity remaining in the cluster to tolerate the largest TWO nodes failing.

The examples below discuss Nutanix nodes and their capacity, but the same is applicable to any VSA or VSAN solution where multiple copies of data is kept for data protection, as opposed to RAID.

Example 1 , If you have 4 x Nutanix NX3060 nodes configured with RF2 (FT1 in VSAN terms) with 2TB usable per node (as shown below), in the event of a node failure, 2TB is no longer available. So the maximum storage utilization of the cluster should be <75% (6TB) to ensure in the event of any node failure, the cluster can be restored to a fully resilient state.

4node3060

Example 2 , If you have 2 x Nutanix NX3060 nodes configured with RF2 (FT1 in VSAN terms) with 2TB usable per node and 2 x Nutanix NX6060 nodes with 8TB usable per node (as shown below), in the event of a NX6060 node failure, 8TB is no longer available. So the maximum storage utilization of the cluster should be 12TB to ensure in the event of any node failure (including the 8TB nodes), the cluster can be restored to a fully resilient state.

4nodemixed

For environments using Nutanix RF3 (3 copies of data) or VSAN (FT2) the same rule of thumb applies but the usable capacity per node would be lower due to the increased capacity required for data protection.

Specifically for Nutanix environments, the PRISM UI shows if a cluster has sufficient capacity available to tolerate a node failure, and if not the following is displayed on the HOME screen and alerts can be sent if desired.

CapacityCritical

In this case, the cluster has suffered a node failure, and because it was sized suitably, it shows “Rebuild Capacity Available” as “Yes” and advises an “Auto Rebuild in progress” meaning the cluster is performing a fully automated self heal. Importantly no admin intervention is required!

If the cluster status is normal, the following will be shown in PRISM.

CapacityOK

In summary: The smaller the cluster the higher the amount of capacity needs to remain unused to enable resiliency to be restored in the event of a node failure, the same as the percentage of resources reserved for HA in a traditional compute only cluster.

The larger the cluster from both a storage and compute perspective, the lower the unused capacity is required for HA, so as has been a virtualization recommended practice for years….. Scale-out!

Related Articles:

1. Scale Out Shared Nothing Architecture Resiliency by Nutanix

2. PART 1 – Problems with RAID and Object Based Storage for data protection

3. PART 2 – Problems with RAID and Object Based Storage for data protection

Virtual Machine Swap File Location & Capacity Usage on Nutanix

The Location of the Virtual Machine swap file can be critical when deploying vSphere with traditional centralized storage solutions, or legacy solutions which acknowledge “zeros” or “White-space” as the Virtual Machine swap file can be as large as the VMs configured vRAM where Memory Reservations are not used.

The below shows the default configuration.
VMswapFileLocation

If a VM resides on Tier 1 storage for example, and the VM does not have a memory reservation set (or a reservation of less than 100%), the Swap-file will take up valuable Tier 1 storage capacity.

This can be avoided by specifying a Swap-file datastore however this introduces complexity and in the event the Swap-file datastore is on a low tier of storage, performance in the event of swapping will degrade significantly.

Some platforms recommend having different datastores for VM swap files to minimize the overheads on de duplication or replication for environments using SRM as discussed in Example Architectural Decision – Virtual Machine Swap-file location for SRM Protected VMs.

The Nutanix Distributed File System does not write “White space” to disk, as a result the impact of Virtual Machine swap files is negligible which makes the issue of swap file placement much less of an issue.

The only time when Virtual machine swap files will use storage capacity in the Nutanix Distributed File System is when host memory utilization is >100% and swapping needs to occur.

As such, the default vSphere configuration of “Virtual Machine Directory” is ideal for Nutanix environments and valuable storage capacity is not unnecessarily wasted resulting in increased usable space, reduced complexity by removing the requirement for dedicated swap-file datastores without compromising the benefits of de-duplication and compression.

Competition Example Architectural Decision Entry 4 – vCloud Allocation Pool Usable Memory

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

When using an Allocation Pool with 100% memory reservation, due to the VM memory overhead, the usable memory is less than what is expected by the users. What is the best way to ensure users can use the entire memory assigned to the Allocation pool.

Assumptions

1. vCD 5.1.2 is in use

2. vSphere 5.1 or later is in use

3. Org VDC created with Allocation Pool

Constraints

1. vCD 5.1.2 has to be used

2. Allocation Model only VDCs are affected

Motivation

1. Need to use 100% memory allocated to the VDC with Allocation Pool model

2. Optimal use of Memory assigned to Org VDC and then to the VM

Architectural Decision

Due to the “by design” fact of VM memory overhead, we cannot use the entire allocated memory and this will be solved by enabling Elastic Allocation Pool in the vCloud System level and then set a lower vCPU Speed value (260 MHz). This will allow VMs to use the entire allocated memory (100% guarantees) in the Org VDC.

Alternatives

1. Over allocate resources to the customer but only reserve the amount they purchased.

Historically VM overhead ranges in between <=5% to 20%. Most configurations have an overhead of less than 5%, if you assume such you could over allocate resources by 5% but only reserve ~95%. The effect would be that the customer could consume up to the amount of vRAM they purchased and if they created VMs with low overhead (high vRAM allocations, low vCPU) they could possibly actually consume more than they “purchased”. In the case of a 20GHz/20GB purchase we would have to set the Allocation to 21GHz but set the reservation to 95%.

Justification

VM memory overhead is calculated with so many moving targets like the model of the CPU in the ESXi host the VM will be running on, whether 3D is enabled for MKS, etc. So you cannot use the entire allocated memory at any point in time.

By selecting the Elastic VDC, we are overwriting this behavior and still not allowing more VMs to power on from what they have entitled to. Also Elastic VDC gives us an opportunity to set a custom vCPU speed and lowering the vCPU speed will allow you to deploy more vCPUs without being penalized. Without setting this flag, you cannot overcommit the vCPU, which is really bad.

260MHz is the least vCPU speed we can set and thus this has been taken to allow system administrators to overcommit the vCPUs in a VDC with Allocation Pool.

 

Implications

1. One of the caveat is not having any memory reservation for any VMs. Due to the nature of OrgVDCs, it does not allow an Org Admin to set the resource reservation for the VMs (unlike Reservation Pool) and thus any VMs with Elasticity on will not have any reservation which will be marked as overkill for the customer’s high I/O VMs (like DB or Mail Server).

You can easily overwrite the resource reservation using the vSphere but that is not the intent. Hence, we flag it as RISK as it will hamper customer’s VM performance for sure.

If we say we are reserving 100% memory and thus spawning the VMs will get equal memory and can’t oversubscribe the memory as the limit is still what the customer has bought, then also if there is a contention of memory within those VMs, I don’t have an option to prefer those VMs which are resource hungry. In a nutshell all of the VMs will get equal share.

Equal shares will distribute the resource in a RP equally and thus there will not be any guarantee that a hungry VM can get more resource on demand.

Back to Competition Main Page or Competition Submissions