High CPU Ready with Low CPU Utilization?

I have noticed an increasing amount of search engine terms which results in people accessing my blog similar to

* High CPU Ready Low CPU usage
* CPU ready and Low utilization
* CPU ready relationship to utilization

So I wanted to try and clear this issue up.

First lets define CPU Ready & CPU Utilization.

CPU ready (percentage) is the percentage of time a virtual machine is waiting to be scheduled onto a physical (or HT) core by the CPU scheduler.

CPU utilization measures the amount of Mhz or Ghz that is being used.

Next to find out how much CPU ready is ok, check out my post How Much CPU ready is OK?

CPU Ready and CPU utilization have very little to do with each other, high CPU utilization does not mean you will have high CPU ready, and vice versa.

So it is entirely possible to have either of the below scenarios

Scenario 1 : An ESXi host has 20% CPU utilization and VMs to suffer high CPU ready (>10%).
Scenario 2: An ESXi host has 95% CPU utilization and VMs to have little or no CPU ready (<2.5%)

How are the above two scenarios possible?

Scenario 1 may occur when

* One or more VMs are oversized (ie: not utilizing the resources they are assigned)
* The host (or cluster) is highly overcommited (either with or without right sized VMs)
* Where power management settings are set to Balanced / Low Power or custom

Scenario 2 may occur when

* VMs are correctly sized
* The ESXi hosts are well sized for the virtual machine workloads
* The VM to host ratio has been well architected

So the question on everyone lips, How can high CPU ready with Low CPU utilization be addressed/avoided?

If you have a situation where you are experiencing high CPU ready and low ESXi host utilization the following steps should be taken

* Right size your VMs

This is by far the most important thing to do. I Recommend using a tool such as vCenter Operations to assist with determining the correct size for VMs.

* Ensure your hosts/clusters are not excessively overcommited

I generally find 4:1 vCPU overcommitment is achievable with right sized VMs where the avg VM size is <4 vCPUs. The higher the vCPU per VM average, the lower CPU overcommitment you will achieve.)
If you have an average VM size of 8 vCPUs then you may only see <1.5:1 overcommitment before suffering contention (CPU ready).

* Use DRS affinity rules to keep complimentary workloads together
VMs with high CPU utilization and VMs with very low CPU utilization can work well together. You  also may have an environment where some servers are busy overnight and others are only busy during business hours, these are examples of workload to keep together.

* Use DRS anti-affinity rules to keep non-complimentary workloads apart

VMs with very high CPU utilization (assuming the high utilization is at the same time) can be spread over a number of hosts to avoid stress on the CPU scheduler.

* Ensure your ESXi hosts are chosen with the virtual machine workloads in mind
If your VMs are >=8vCPUs choose a CPU with >=8 cores per socket and more sockets per host, like 4 socket hosts as opposed to 2 socket hosts. If the bulk of your VMs are 1 or 2 vCPUs, then even older 2 socket 4 core processors should generally work well.

* Use Hyperthreading
Assuming you have a mix of workloads and not all VMs require large amounts of cores and Ghz, using hyper threading increases the efficiency of the CPU schedulure by effectively doubling the scheduling opportunities. Note: A HT core will generally give much less than half the performance of a pCore.

* Use “High Performance” for your Power Management Policy

The above seven (7) steps should resolve the vast majority of issues with CPU ready.

For an example of the benefits of right sizing your VMs, check out my earlier post – VM Right Sizing , An example of the benefits.

Also please note, using CPU reservations does not solve CPU ready, I have also written an article on this topic – Common Mistake – Using CPU reservations to solve CPU ready

I hope this helps clear up this issue.

How much CPU ready is OK?

I have noticed a lot of search results hitting my blog asking

Question: How much CPU ready is OK?

so I thought I would address this question with a quick post.

Of course the answer is it depends, for example Server workloads have a lower tolerance to CPU ready than desktop workloads but as a rule of thumb, here is my thoughts.

For Production server workloads

<2.5% CPU Ready
Generally No Problem!

2.5%-5% CPU Ready
Minimal contention that should be monitored during peak times

5%-10% CPU Ready
Significant Contention that should be investigated & addressed

>10% CPU Ready
Serious Contention to be investigated & addressed ASAP!

In my experience, the above have been good for a rule of thumb.

However, applications which are latency sensitive may be severely impacted even with low levels of CPU ready, these types of VMs should be on clusters with lower CPU overcommitment, leverage DRS rules to separate the contending workloads or in extreme cases, dedicated clusters.

On the flip side, Some servers are much more tolerant to CPU ready, and 5%-10% CPU ready or higher may not noticeably impact performance.

Keep in mind that setting CPU Reservations does not solve CPU Ready, see my post on the topic for more details.

VMware vCenter Operations is a tool which can help easily identify contention (including CPU) within your vSphere environment.

For Virtual Desktop workloads, what level of CPU ready is acceptable will largely depend on the individual user (ie: Power User verses Task Worker). Keep in mind virtual desktop deployments generally have high CPU consolidation ratios of  around 6:1 all the way to >12:1.

I would suggest the following , again as a rule of thumb

<5% CPU Ready
Generally No Problem!

5%-10% CPU Ready
Minimal contention that should be monitored during peak times

>10% CPU Ready
Contention to be investigated & addressed where the end user experience is being impacted.

Any Higher CPU ready will likely be impacting your users, and should be investigated.

VMware have recently released vCenter Operations for View, which you could use to monitor your VMware View environment.

Example Architectural Decision – Supporting VMware View Infrastructure Servers

Problem Statement

When designing a VMware View environment, there are numerous management virtual machines which are required to run the environment, including but not limited to Domain Controllers, vCenter , VUM , View Connection Brokers , View Security Servers, View Transfer servers , View Composer. These servers are typically heavily utilized in larger View deployments and in the event of compute or storage contention, would likely impact the performance of the Virtual Desktop Infrastructure, especially where View Composer or virtual desktop power or provisioning operations are frequent.

How can the VDI environment be designed so management servers have a consistent high level of performance and ensure that high consolidation ratios can be achieved for desktops whilst maintaining a consistent end user experience?

Assumptions

1.  One or more VMware View “Blocks”
2. ~2000 Users per Block
3. Using VMware View Linked Clones
4. Target overcommitment for Virtual desktops vCPU is >=6:1 – This is a conservative overcommitment ratio, >10:1 can be achieved
5. Target overcommitment for Virtual desktops vRAM is >=1.5:1 – This is a reasonable overcommitment ratio,  although higher can be achieved
6. vSphere 4.1 or later
7. VMware View 4.5 or later
8. ESXi Hosts are large enough to support >200 users each (eg: At least 2 way / 256GB assuming 1vCPU/1GB RAM VDI VMs)
9. An existing vSphere cluster supporting server workloads is not available or is at or near capacity
10. Antivirus has been optimized for Virtual desktop environments, such as vShield Endpoint to offload AV scanning to the hypervisor

Motivation

1.  Ensure consistent & optimal performance for Virtual desktops and VMware View Infrastructure VMs
2. Achieve the best ROI for the solution

Architectural Decision

Create a three (3) node “Management Cluster” with a scale out approach using 2 Way servers (as opposed to Four way servers like the VMware View Blocks) to ensure lower HA overhead (33% for N+1) and higher DRS efficiency than a two (2) node cluster. Have management virtual machines use different underlying storage, being either dedicated RAID packs or aggregates or for a large environments, storage controllers. Have a vCenter dedicated to running the Management infrastructure.

Justification

1.  The CPU overcommitment ratio for Virtual desktops is generally much higher than for server workloads
2. Server workloads are less tolerant to high CPU overcommitment ratios than virtual desktops
3. CPU contention (a.k.a CPU Ready) will likely have significant impact on infrastructure VMs
4. If Management VMs we’re hosted within the VMware View Blocks, the overcommitment would have to be lower to enable adequate performance, thus reducing the ROI for the solution
5. Server and desktop workloads have very different compute and storage profiles and generally are not good candidates to share the same ESXi host or cluster
6. During VMware View Linked Clone deployments, or maintenance activities such as a “recompose”of one or more Pools, Management VMs such as vCenter and View Composer should have minimal or no compute contention to ensure timely completion of maintenance. This does not fit well in a cluster with >6:1 CPU overcommitment.
7. Having a management cluster minimizes or removes the requirement for complexity/overheads of setting CPU or Memory reservations in an attempt to ensure performance for management VMs competing for compute resources with virtual desktops. (See “Common Mistake – Using CPU reservations to solve CPU ready” for more information)
8. Maximize the efficiency of the CPU scheduler, as the majority of Virtual Desktops should be 1vCPU as compared to management VMs such as vCenter / SQL / Connection brokers which will likely be 2 and 4 vCPU. Scheduling VMs with higher vCPU numbers on an environment with >6:1 vCPU overcommitment is unlikely to result in acceptable performance for the management virtual machines.
9. Having a cluster/s dedicated to desktops will give more flexibility to use features such as Distributed Power Management (DPM) for VMware View Blocks which will help achieve a faster ROI
10. vCenter’s workload with virtual desktops is generally higher (compared to vCenter servers managing server workloads) due to increased frequency of things like power operations and provisioning operations from View Composer. One (1) vCenter should be used per Block, or up to 2000 users.
11. In the event of performance/stability issues in the View Block/s, if the management servers shared the cluster, the ability for vSphere/View administrators to access management servers will likely be impacted, which may delay the troubleshooting process and eventual resolution of the issue/s
12. Having a separate management cluster with dedicated storage (RAID packs/aggregates and/or storage controllers) prevents the IO load of the View Desktops impacting the ability to manage the environment, especially during recompose and provisioning operations.

Implications

1.  Hardware will be required for the Management cluster – Although as the ESXi hosts in View Blocks (as they wont be hosting management workloads) should as a result achieve higher consolidation ratios which should close to if not entirely neutralize the cost of the Management Host Hardware
2. The storage solution will need to provide storage for Management virtual machines which is separate to Virtual desktops
3. The scale out approach for the management cluster may not achieve as higher memory savings form transparent page sharing due to having less virtual machines per host
4. Having an additional cluster is an additional administrative overhead, albeit minimal however this should reduce the risk in the environment leading to lower BAU effort/costs.

Alternatives

1. Run Management VMs in VMware View Blocks (with desktop workloads). – Not recommended
2. Run management VMs in an existing vSphere cluster running server workloads (if available)

A special Thanks to Michael Webster (VCDX#66) for his contribution to this example Architectural decision.