Example Architectural Decision – VMware HA – Percentage of Cluster resources reserved for HA

Problem Statement

The decision has been made to use “Percentage of cluster resources reserved for HA” admission control setting, and use Strict admission control to ensure the N+1 minimum redundancy level is maintained. However, as most virtual machines do not use  “Reservations” for CPU and/or Memory, the default reservation is only 32Mhz and 0MB+overhead for a virtual machine. In the event of a failure, this level of resources is unlikely to provide sufficient compute to operate production workloads. How can the environment be configured to ensure a minimum level of performance is guaranteed in the event of one or more host failures?

Requirements

1. All Clusters have a minimum requirement of N+1 redundancy
2. In the event of a host failure, a minimum level of performance must be guaranteed

Assumptions

1. vSphere 5.0 or later (Note: This is Significant as default reservation dropped from 256Mhz to 32Mhz, RAM remained at 0MB + overhead)

2. Percentage of Cluster resources reserved for HA is used and set to a value as per Example Architectural Decision – High Availability Admission Control

3. Strict admission control is enabled

4. Target over commitment Ratios are <=4:1 vCPU / Physical Cores | <=1.5 : 1 vRAM / Physical RAM

5. Physical CPU Core speed is >=2.0Ghz

6. Virtual machines sizes in the cluster will vary

7. A limited number of mission critical virtual machines may be set with reservations

8. Average VM size uses >2GB RAM

9. Clusters compute resources will be utilized at >=50%

Constraints

1. Ensuring all compute requirements are provided to Virtual machines during BAU

Motivation

1. Meet/Exceed availability requirements
2. Minimize complexity
3. Ensure the target availability and performance is maintained without significantly compromising  over commitment ratios

Architectural Decision

Ensure all clusters remain configured with the HA admission control setting use
“Enable – Do not power on virtual machines that violate availability constraints”

and

Use “Percentage of Cluster resources reserved for HA” for the admission control policy with the percentage value based on the following Architectural Decision – High Availability Admission Control

Configure the following HA Advanced Settings

1. “das.vmMemoryMinMB” with a value of “1024″
2. “das.vmCpuMinMHz” with a value of “512”

Justification

1. Enabling admission control is critical to ensure the required level of availability.
2. The “Percentage of cluster resources reserved for HA” setting allows a suitable percentage value of cluster resources to reserved depending on the size of each cluster to maintain N+1
3.The potentially inefficient slot size calculation used with “Host Failures cluster tolerates” does not suit clusters where virtual machines sizes vary and/or where some mission Critical VMs require reservations

  • 4.
  • Using advanced settings “das.vmCpuMinMHz” & “das.vmMemoryMinMB” allows a minimum level of performance (per VM) to be guaranteed in the event of one or more host failures
  • 5.
  • Advanced settings have been configured to ensure the target over commit ratios are still achieved while ensuring a minimum level of resources in a the event of a host failure
  • 6.
  • Maintains an acceptable minimum level of performance in the event of a host failure without requiring the administrative overhead of setting and maintaining “reservations” at the Virtual machine level
  • 7.
  • Where no reservations are used, and advanced settings not configured, the default reservation would be 32Mhz and 0MB+ memory overhead is used. This would likely result in degraded performance in the event a host failure occurs.

Alternatives

1. Use “Specify a fail over host” and have one or more hosts specified
2. “Host Failures cluster tolerates” and set it to appropriate value depending on hosts per cluster without using advanced settings
3.Use higher Percentage values
4. Use Higher / Lower values for “das.vmMemoryMinMB” and “das.vmCpuMinMHz”
5. Set Virtual machine level reservations on all VMs

Implications

1. The “das.vmCpuMinMHz” advanced setting applies on a per VM basis, not a per vCPU basis, so VMs with multiple vCPUs will still only be guarenteed 512Mhz in a HA event

2. This will reduce the number of virtual machines that can be powered on within the cluster (in order to enforce the HA requirements)

CloudXClogo

 

 

8 thoughts on “Example Architectural Decision – VMware HA – Percentage of Cluster resources reserved for HA

  1. One thing I didnt quite get Josh is how will the performance degrade in the event of a host failure when we have enabled Admission Control? With it enabled, the cluster wont have allow any more VM’s be powered on if it cant successfully fail them over. So how can VM performance be impacted if a host fails? Thanks, Manny.

    • Performance will degrade, but it depends on how overcommitted your cluster is on resources and how your VMs are sized (ie: oversized, undersized or right sized).

      With a properly designed cluster with right sized VMs and normal levels of overcommitment (<=4:1 vCPU/pCore) and <=1.5:1 vRAM/pRAM) a single host failure should not have a significant impact on performance. What you may see is slightly higher CPU ready numbers due to a reduced number of physical cores for the CPU scheduler to balance the VMs across, and if your right at the limit of ram, then some swapping/balooning, but in general, the larger the cluster the lower the impact of a single host failure.

        • Best of luck with the DCD, my tip is not to spend to long on the multiple choice questions and leave plenty of time for the design scenario (a.k.a Visio style) questions.

          • Thanks mate, I came very close, got upto 291. Maybe 2-3 multiple choice questions away. The scenario questions did me head in, 2 of them just didnt make any sense. If you have a moment, can you please check out my latest blog post and advise on what I can do? Thanks again!

          • Sorry to hear you were not successful. I think your blog summed it up, “Lack of Experience”. The exam is tough, so no shame in not passing especially if you dont have alot of experience. I have no doubt the secret to my passing all the exams first go is experience, I didnt sit a VCAP until I had 4 years virtualization experience in implementation/support/projects and architecture. If you haven’t already, I would suggest sitting the vSphere Design Workshop course, and this should help you think through design scenarios and give you a better chance on the Visio style questions. All up, I would encourage you to continue perusing the VCAP path, just dont rush, get some more experience, do some more design work and ideally the course if possible, and Im sure you’ll knock it off next time.

  2. Thanks Josh, I’ve attended the Design Workshop already. I didnt get a whole lot out of it since other attendees were almost at the same level as me. As a result, there wasnt a lot of design-related talk. Sure we talked about the various bits and pieces that vSphere offers, but not how they all kind of come together. Anyways, it wasnt bad.

    I’m not going to give up on the DCD at all, far too much time put in already to be able to back out! I love virtualization and this exam is just a step towards knowing more about VMware’s suite of products. I’ll catch up with you at the next Melbourne VMUG. Thanks again!