How much CPU ready is OK?

I have noticed a lot of search results hitting my blog asking

Question: How much CPU ready is OK?

so I thought I would address this question with a quick post.

Of course the answer is it depends, for example Server workloads have a lower tolerance to CPU ready than desktop workloads but as a rule of thumb, here is my thoughts.

For Production server workloads

<2.5% CPU Ready
Generally No Problem!

2.5%-5% CPU Ready
Minimal contention that should be monitored during peak times

5%-10% CPU Ready
Significant Contention that should be investigated & addressed

>10% CPU Ready
Serious Contention to be investigated & addressed ASAP!

In my experience, the above have been good for a rule of thumb.

However, applications which are latency sensitive may be severely impacted even with low levels of CPU ready, these types of VMs should be on clusters with lower CPU overcommitment, leverage DRS rules to separate the contending workloads or in extreme cases, dedicated clusters.

On the flip side, Some servers are much more tolerant to CPU ready, and 5%-10% CPU ready or higher may not noticeably impact performance.

Keep in mind that setting CPU Reservations does not solve CPU Ready, see my post on the topic for more details.

VMware vCenter Operations is a tool which can help easily identify contention (including CPU) within your vSphere environment.

For Virtual Desktop workloads, what level of CPU ready is acceptable will largely depend on the individual user (ie: Power User verses Task Worker). Keep in mind virtual desktop deployments generally have high CPU consolidation ratios of  around 6:1 all the way to >12:1.

I would suggest the following , again as a rule of thumb

<5% CPU Ready
Generally No Problem!

5%-10% CPU Ready
Minimal contention that should be monitored during peak times

>10% CPU Ready
Contention to be investigated & addressed where the end user experience is being impacted.

Any Higher CPU ready will likely be impacting your users, and should be investigated.

VMware have recently released vCenter Operations for View, which you could use to monitor your VMware View environment.

Example Architectural Decision – Supporting VMware View Infrastructure Servers

Problem Statement

When designing a VMware View environment, there are numerous management virtual machines which are required to run the environment, including but not limited to Domain Controllers, vCenter , VUM , View Connection Brokers , View Security Servers, View Transfer servers , View Composer. These servers are typically heavily utilized in larger View deployments and in the event of compute or storage contention, would likely impact the performance of the Virtual Desktop Infrastructure, especially where View Composer or virtual desktop power or provisioning operations are frequent.

How can the VDI environment be designed so management servers have a consistent high level of performance and ensure that high consolidation ratios can be achieved for desktops whilst maintaining a consistent end user experience?

Assumptions

1.  One or more VMware View “Blocks”
2. ~2000 Users per Block
3. Using VMware View Linked Clones
4. Target overcommitment for Virtual desktops vCPU is >=6:1 – This is a conservative overcommitment ratio, >10:1 can be achieved
5. Target overcommitment for Virtual desktops vRAM is >=1.5:1 – This is a reasonable overcommitment ratio,  although higher can be achieved
6. vSphere 4.1 or later
7. VMware View 4.5 or later
8. ESXi Hosts are large enough to support >200 users each (eg: At least 2 way / 256GB assuming 1vCPU/1GB RAM VDI VMs)
9. An existing vSphere cluster supporting server workloads is not available or is at or near capacity
10. Antivirus has been optimized for Virtual desktop environments, such as vShield Endpoint to offload AV scanning to the hypervisor

Motivation

1.  Ensure consistent & optimal performance for Virtual desktops and VMware View Infrastructure VMs
2. Achieve the best ROI for the solution

Architectural Decision

Create a three (3) node “Management Cluster” with a scale out approach using 2 Way servers (as opposed to Four way servers like the VMware View Blocks) to ensure lower HA overhead (33% for N+1) and higher DRS efficiency than a two (2) node cluster. Have management virtual machines use different underlying storage, being either dedicated RAID packs or aggregates or for a large environments, storage controllers. Have a vCenter dedicated to running the Management infrastructure.

Justification

1.  The CPU overcommitment ratio for Virtual desktops is generally much higher than for server workloads
2. Server workloads are less tolerant to high CPU overcommitment ratios than virtual desktops
3. CPU contention (a.k.a CPU Ready) will likely have significant impact on infrastructure VMs
4. If Management VMs we’re hosted within the VMware View Blocks, the overcommitment would have to be lower to enable adequate performance, thus reducing the ROI for the solution
5. Server and desktop workloads have very different compute and storage profiles and generally are not good candidates to share the same ESXi host or cluster
6. During VMware View Linked Clone deployments, or maintenance activities such as a “recompose”of one or more Pools, Management VMs such as vCenter and View Composer should have minimal or no compute contention to ensure timely completion of maintenance. This does not fit well in a cluster with >6:1 CPU overcommitment.
7. Having a management cluster minimizes or removes the requirement for complexity/overheads of setting CPU or Memory reservations in an attempt to ensure performance for management VMs competing for compute resources with virtual desktops. (See “Common Mistake – Using CPU reservations to solve CPU ready” for more information)
8. Maximize the efficiency of the CPU scheduler, as the majority of Virtual Desktops should be 1vCPU as compared to management VMs such as vCenter / SQL / Connection brokers which will likely be 2 and 4 vCPU. Scheduling VMs with higher vCPU numbers on an environment with >6:1 vCPU overcommitment is unlikely to result in acceptable performance for the management virtual machines.
9. Having a cluster/s dedicated to desktops will give more flexibility to use features such as Distributed Power Management (DPM) for VMware View Blocks which will help achieve a faster ROI
10. vCenter’s workload with virtual desktops is generally higher (compared to vCenter servers managing server workloads) due to increased frequency of things like power operations and provisioning operations from View Composer. One (1) vCenter should be used per Block, or up to 2000 users.
11. In the event of performance/stability issues in the View Block/s, if the management servers shared the cluster, the ability for vSphere/View administrators to access management servers will likely be impacted, which may delay the troubleshooting process and eventual resolution of the issue/s
12. Having a separate management cluster with dedicated storage (RAID packs/aggregates and/or storage controllers) prevents the IO load of the View Desktops impacting the ability to manage the environment, especially during recompose and provisioning operations.

Implications

1.  Hardware will be required for the Management cluster – Although as the ESXi hosts in View Blocks (as they wont be hosting management workloads) should as a result achieve higher consolidation ratios which should close to if not entirely neutralize the cost of the Management Host Hardware
2. The storage solution will need to provide storage for Management virtual machines which is separate to Virtual desktops
3. The scale out approach for the management cluster may not achieve as higher memory savings form transparent page sharing due to having less virtual machines per host
4. Having an additional cluster is an additional administrative overhead, albeit minimal however this should reduce the risk in the environment leading to lower BAU effort/costs.

Alternatives

1. Run Management VMs in VMware View Blocks (with desktop workloads). – Not recommended
2. Run management VMs in an existing vSphere cluster running server workloads (if available)

A special Thanks to Michael Webster (VCDX#66) for his contribution to this example Architectural decision.

VM Right Sizing – An example of the benefits

I thought this example may be useful to show the benefits of Right sizing a virtual machine.

The VM is an SQL Database server with 4 vCPUs on a cluster which is highly overcommitted with lots of oversized VMs.

As we can see by the below graph, the CPU ready was more or less averaging 10% and on the 24th of July most vCPUs spiked to greater than 30% CPU ready each. ie: 30% of the time the server is waiting to be scheduled onto the pCPU cores.

The performance of applications using databases hosted on the server were suffering serious issues during this time.

On the  24th the VM was dropped from 4 vCPUs, down to 2 vCPUs and the results are obvious.

CPU ready dropped immediately (even in a heavily over-committed environment) to around 1% and CPU utilization remained at around the same levels. Performance also improved for applications (for example vCenter) using the database server.

TIP: Right Sizing not only helps the VM you right size, but it helps relieve the contention on the ESXi host (and cluster), which will improve performance for all VMs.

It is also important to point out this VM is the first VM to be right sized, so as more VMs are right sized in the cluster, Ready time will drop further and performance will continue to improve.

This also results in opportunities for greater consolidation within the environment without compromising performance or redundancy.

I would like to point out that I believe this server may benefit from 4 vCPUs, but definitely not in this highly CPU contended environment.

As more virtual machines are Right Sized, then this environment would likely have the opportunity to consider increasing vCPUs in suitable VMs after monitoring performance for a suitable period of time. Products like VMware vCenter Operations is excellent for reporting on Oversized and undersized VMs.

Do you believe in right sizing now?