Nutanix Resiliency – Part 3 – Node failure rebuild performance with RF3

In part 1 we discussed the ability of Nutanix AOS to rebuild Resiliency Factor 2 (RF2) from a node failure in a fast and efficient manner thanks to the Acropolis Distributed Storage Fabric (ADSF) while part 2 showed how a storage container can be converted from RF2 to RF3 to further improve resiliency and how fast the process completed.

As with Part 2, we’re using a 12 node cluster and the breakdown of disk usage per node is as follows:

NodeCapacityUsage12NodeCLusterRF3

The node I’ll be simulating a failure on has 5TB of disk usage which is very similar to the capacity usage in the node failure testing in Part 1. It should be noted that as the cluster is now only 12 nodes, there are less controllers to read/write from/too as compared to Part 1.

Next I accessed the IPM Interface of the node and performed a “Power Off – Immediate” operation to simulate the node failure.

The following shows the storage pool throughput for the node rebuild which completed re-protecting the 5TB of data in approx 30mins.

RebuildPerformanceAndCapacityUsageRF3_12Nodes

Looking at the results at first glance, re-protecting in around 30 mins for 5TB of data, especially on 5yo hardware is pretty impressive, especially compared to SANs and other HCI products, but I felt it should have been faster so I did some investigation.

I found that the cluster was in an imbalanced state at the time I simulated the node failure and therefore not all nodes could contribute to the rebuild (from a read perspective) like they would under normal circumstances because they had little/no data on them.

The reason the cluster was in the un-balanced state is due to my having been performing frequent/repeated node failure simulations and I did not wait for disk balancing to complete after adding nodes back to the cluster before simulating the node failure.

Usually a vendor would not post sub-optimal performance results, but I strongly feel that transparency is key and while unlikely, it is possible to get in situations where a cluster is unbalanced and if a node failure occurred during this unlikely scenario it’s important to understand how that may impact resiliency.

So I ensured the cluster was in a balanced state and then re-ran the test and the result is shown below:

RF3NodeFailureTest4.5TBnode

We can now see over 6GBps throughput compared to around 5Gbps, an improvement of over 1GBps, and a duration of approx 12mins. We also can see there was no drop in throughput as we previously saw in the unbalanced environment. This is due to all nodes being able to participate for the duration of the rebuild as they all had an even amount of data.

Summary:

  • Nutanix RF3 is vastly more resilient than RAID6 (or N+2) style architectures
  • ADSF performs continual disk scrubbing to detect and resolve underlying issues before they can cause data integrity issues
  • Rebuilds from drive or node failures are an efficient distributed operation using all drives and nodes in a cluster
  • A recovery from a >4.5TB node failure (in this case, the equivalent of 6 concurrent SSD failures) around 12mins
  • Unbalanced clusters still perform rebuilds in a distributed manner and can recover from failures in a short period of time
  • Clusters running in a normal balanced configuration can recover from failures even faster thanks to the distributed storage fabric built in disk balancing, intelligent replica placement and even distribution of data.

Index:
Part 1 – Node failure rebuild performance
Part 2 – Converting from RF2 to RF3
Part 3 – Node failure rebuild performance with RF3
Part 4 – Converting RF3 to Erasure Coding (EC-X)
Part 5 – Read I/O during CVM maintenance or failures
Part 6 – Write I/O during CVM maintenance or failures
Part 7 – Read & Write I/O during Hypervisor upgrades
Part 8 – Node failure rebuild performance with RF3 & Erasure Coding (EC-X)
Part 9 – Self healing
Part 10: Nutanix Resiliency – Part 10 – Disk Scrubbing / Checksums

Example Architectural Decision – Advanced Power Management for vSphere Clusters with Business Critical Applications

Problem Statement

In a vSphere environment where Business Critical Applications have been successfully virtualized, should Advanced Power Management be used to help reduce data center costs?

Requirements

1. Fully Supported solution

2. Reduce data center costs where possible

3. Business Critical Application performance must not be significantly degraded

Assumptions

1. Supported Hardware

2. vSphere 5.0 or later

3. Admission Control is enabled with >= N+1 redundancy

Constraints

1. None

Motivation

1. Reduce Datacenter costs where possible with minimal/no impact to performance

Architectural Decision

Configure the BIOS to “OS Controlled”

Set ESXi Advanced Power Management to “Balanced”

Justification

1. Power savings can be realized with almost no impact to performance

2. The performance difference between “High performance” & “Balanced” options is insignificant however Power savings can be achieved reducing cost and environmental impacts

3. In the unlikely event of performance issues as a result of using the “Balanced” option, the BIOS is set to OS Controlled so ESXi can be updated without downtime during troubleshooting

4. Advanced Power Management Options (other than “High Performance” & “Balanced”) have proven to have excellent power savings but at a high cost to performance which is not suitable for Business Critical Applications

5. As HA Admission Control is used to provide >=N+1 redundancy, the ESXi hosts will generally not be fully utilized which will give Advanced Power Management opportunities to conserve power

6. The workloads in the cluster/s run 24/7 however demand is generally higher during business hours and some low demand or idle time exists

7. Even where only a small power saving is realized, if performance is not significantly impacted then a faster ROI can be achieved due to cost savings

Implications

1. Where performance issues exist using “Balanced” a vSphere administrator may need to change Advanced Power Management to “High Performance”

Alternatives

1. Use “High Performance”

2. Use “BIOS Controlled”

3. Do not use Advanced Power Management

4. Use Advanced Power Management in conjunction with DPM

Relates Articles

1. Power Management and Performance in ESXi 5.1 – By Rebecca Grider (@RebeccaGrider)

 AdvancedPowerManagement

 

High CPU Ready with Low CPU Utilization?

I have noticed an increasing amount of search engine terms which results in people accessing my blog similar to

* High CPU Ready Low CPU usage
* CPU ready and Low utilization
* CPU ready relationship to utilization

So I wanted to try and clear this issue up.

First lets define CPU Ready & CPU Utilization.

CPU ready (percentage) is the percentage of time a virtual machine is waiting to be scheduled onto a physical (or HT) core by the CPU scheduler.

CPU utilization measures the amount of Mhz or Ghz that is being used.

Next to find out how much CPU ready is ok, check out my post How Much CPU ready is OK?

CPU Ready and CPU utilization have very little to do with each other, high CPU utilization does not mean you will have high CPU ready, and vice versa.

So it is entirely possible to have either of the below scenarios

Scenario 1 : An ESXi host has 20% CPU utilization and VMs to suffer high CPU ready (>10%).
Scenario 2: An ESXi host has 95% CPU utilization and VMs to have little or no CPU ready (<2.5%)

How are the above two scenarios possible?

Scenario 1 may occur when

* One or more VMs are oversized (ie: not utilizing the resources they are assigned)
* The host (or cluster) is highly overcommited (either with or without right sized VMs)
* Where power management settings are set to Balanced / Low Power or custom

Scenario 2 may occur when

* VMs are correctly sized
* The ESXi hosts are well sized for the virtual machine workloads
* The VM to host ratio has been well architected

So the question on everyone lips, How can high CPU ready with Low CPU utilization be addressed/avoided?

If you have a situation where you are experiencing high CPU ready and low ESXi host utilization the following steps should be taken

* Right size your VMs

This is by far the most important thing to do. I Recommend using a tool such as vCenter Operations to assist with determining the correct size for VMs.

* Ensure your hosts/clusters are not excessively overcommited

I generally find 4:1 vCPU overcommitment is achievable with right sized VMs where the avg VM size is <4 vCPUs. The higher the vCPU per VM average, the lower CPU overcommitment you will achieve.)
If you have an average VM size of 8 vCPUs then you may only see <1.5:1 overcommitment before suffering contention (CPU ready).

* Use DRS affinity rules to keep complimentary workloads together
VMs with high CPU utilization and VMs with very low CPU utilization can work well together. You  also may have an environment where some servers are busy overnight and others are only busy during business hours, these are examples of workload to keep together.

* Use DRS anti-affinity rules to keep non-complimentary workloads apart

VMs with very high CPU utilization (assuming the high utilization is at the same time) can be spread over a number of hosts to avoid stress on the CPU scheduler.

* Ensure your ESXi hosts are chosen with the virtual machine workloads in mind
If your VMs are >=8vCPUs choose a CPU with >=8 cores per socket and more sockets per host, like 4 socket hosts as opposed to 2 socket hosts. If the bulk of your VMs are 1 or 2 vCPUs, then even older 2 socket 4 core processors should generally work well.

* Use Hyperthreading
Assuming you have a mix of workloads and not all VMs require large amounts of cores and Ghz, using hyper threading increases the efficiency of the CPU schedulure by effectively doubling the scheduling opportunities. Note: A HT core will generally give much less than half the performance of a pCore.

* Use “High Performance” for your Power Management Policy

The above seven (7) steps should resolve the vast majority of issues with CPU ready.

For an example of the benefits of right sizing your VMs, check out my earlier post – VM Right Sizing , An example of the benefits.

Also please note, using CPU reservations does not solve CPU ready, I have also written an article on this topic – Common Mistake – Using CPU reservations to solve CPU ready

I hope this helps clear up this issue.