Common Mistake: Using CPU reservations to solve CPU Ready

One of the more common problems I see in virtual environments is over sized virtual machines which typically results in lower performance, and your guessed it, high CPU Ready.

What is CPU Ready?

CPU ready is basically the time it takes a VM to be scheduled onto physical core after it is placed in the CPU scheduling queue.

What is High CPU Ready?

In my opinion, during peak load, anything above 2% (or 400ms) is a concern and should be monitored. Above 5% will be impacting performance (resulting in lower CPU utilization) and 10% or more, should be considered a serious problem and remediated immediately.

The below is a screenshot showing CPU ready from a recent test I conducted in my home lab

To calculate the percentage of CPU Ready, we divide the VMs “Summation” value (in the screen shot above its the “W2K8 CPU TEST VM 1” line by 20000 (ms) which is the statistics collection interval, then divide the result by the number of vCPUs in the VM.

So if we use the value from the “latest” column, its 7337 divide 20000, equals : 0.36685, then we divide that by 2 as the VM has 2 vCPUs and we end up with 0.183425

That’s 18% CPU Ready, which basically means 18% of the time, the VM is not doing anything!

Note: CPU Ready % can be found using ESXTOP or RESXTOP via the vMA or on the ESXi host directly.

Now to try and diagnose the Performance/CPU ready issue, we need to work out if the VM is oversized and if so, Right Size the VM.

What is an Oversized VM?

Basically a VM which has more compute resources assigned than it requires, for example, a VM which uses no more than 20% of its CPU and has 4 vCPUs.
What is Right Sizing?

In the above example, the VM is oversized as it doesn’t use more than 1vCPU (or 25%) of the CPU resources and therefore could be reduced to to 1 vCPU and run at 80%.
So the VM is oversized, and has High CPU ready, what happens when we right size it from 4vCPUs to 1vCPU and why does this help performance?
Its pretty simple, the less vCPUs a VM has, the easier job the CPU scheduler has to find enough physical cores to schedule the VM onto. If a cluster has a lot of oversized VMs, all the VMs are all competing for the same physical cores, and making it more and more difficult for the scheduler.

But what about setting a CPU Reservation? Don’t reservations “guarantee” resources?

The answer is, Yes and No.

The reservation “reserves” CPU resources measured in Mhz, but this has nothing to do with the CPU scheduler.

So setting a reservation will help improve performance for the VM you set it on, but will not “solve” CPU ready issues caused by “oversized” VMs, or by too high an overcommitment ratio of CPU resources.

In my testing I set an 80% reservation of a VMs 2 vCPUs worth of Mhz and prior to setting the reservation the CPU ready was ~20% and then CPU Ready did drop to around 10%. Note: This test was performed with only 25% overcommitment – 5 vCPUs on 4 physical Cores using CPUBUSY to keep the CPUs running at 100% (measured within the guest by Windows Task Manager).

I then set a 100% reservation of the VMs 2 vCPUs worth of Mhz, prior to setting the reservation the CPU ready was ~10% and CPU Ready did not get below 2.5% even with 100% reservation.

The result would have been exponentially worse had I tested with 50% or 100% overcommitment which is generally easily achieved with VMware and a well architected cluster. (I have seen well above these overcommitment numbers with no CPU ready issues).

Reducing CPU Ready down to 2.5% may sound like a pretty good result, but when we look at the other 3 x 1vCPU VMs on the host (4 core test ESXi 5 host) they had CPU ready of 40%!! Not to mention 2.5% is still not good!

If you have poor performance, and you discover you have High CPU Ready the best solution is  Right Size Your VMs!

I have recommended exactly that countless times and the customers never believe that performance can increase with less vCPUs, until after the Right Sizing exercise.

If after Right sizing, you still have CPU Ready, your overcommitment on CPU is simply to high for the workloads within your cluster.

You can address this by

1. Adding additional compute to the cluster. (Duh!)

2. Using Affinity rules to locate complimentary workloads together (Lots of small 1vCPU VMs which don’t have high CPU utilization will generally work well with a limited number of higher vCPU VMs)

3. Use Anti-Affinity rules to separate non complimentary workloads (eg: Don’t place all your 8vCPU VMs on one host with 300% overcommitment on CPU and expect them to work well).

4. Scaling out (not up) your VMs ie: Don’t have one 8 vCPU SQL DB server, use 4 smaller 2vCPU VMs

So now you know better than to use reservations to solve CPU contention.

Its time too go Right Sizing!

This simple task is about the best bang for buck you will get in your data center, since virtualizing on VMware in the first place.

VMware Clusters – Scale up or out?

I get asked this question all the time, is it better to Scale up or out?

The answer is of course, it depends. 🙂

First lets define the two terms. Put simply,

Scale Up is having larger hosts, and less of them.

Scale Out is having more smaller hosts.

What are the Pro’s and Con’s of each?

Scale Up 

* PRO – More RAM per host will likely achieve higher transparent memory sharing (higher consolidation ratio!)

* PRO – Greater CPU scheduling flexibility as more physical cores are available (less chance for CPU contention!)

* PRO – Ability to support larger VMs (ie: The 32vCPU monster VM w/ 1TB RAM)

* PRO – Larger NUMA node sizes for better memory performance. Note: For those of you not familiar with NUMA, i recommend you check out Sizing VMs and NUMA nodes | frankdenneman.nl

* PRO – Use less ports in the Data and Storage networks

* PRO – Less complex DRS simulations to take place (every 5 mins)

* CON – Potential for Network or I/O bottlenecks due to larger number of VMs per host

* CON – When a host fails, a larger number of VMs are impacted and have to be restarted on the surviving hosts

* CON – Less hosts per cluster leads to a higher HA overhead or “waste”

* CON – Less hosts for DRS to effectively load balance VMs across

Scale Out

* CON – Less RAM per host will likely achieve lower transparent memory sharing (thus reducing overcommitment)

* CON – Less physical cores may impact CPU scheduling (which may lead to contention – CPU ready)

* CON – Unable to support larger VMs (ie: 8vCPU VMs or the 32vCPU monster VM w/ 1TB RAM)

* CON – Use more ports in the Data and Storage networks – ie: Cost!

* PRO – Less likely for Data or I/O bottlenecks due to smaller number of VMs per host

* PRO – When a host fails, a smaller number of VMs are impacted and have to be restarted on the surviving hosts

* PRO – More hosts per cluster may lead to a lower HA overhead or “waste”

* PRO – Greater flexibility for DRS to load balance VMs

Overall, both Scale out and up have their advantages so how do you choose?

When doing your initial capacity planning exercise, determine how many VMs you will have day 1 (and their vCPU/RAM/Disk/Network/IOPS) and try and start with a cluster size which gives you the minimum HA overhead.

Example: If you have 2 large hosts with heaps of CPU / RAM your HA overhead is 50%, if you have 8 smaller hosts your overhead is 12.5% (both with N+1).

As a general rule, I believe the ideal cluster would be large 4 way hosts with a bucket load of ram and around 16-24 hosts. This would be in my opinion the best of both worlds. Sadly, few environments meet the requirements (or have the budget) for this type of cluster.

I believe a cluster should ideally start with enough hosts to ensure a sufficient number of hosts to minimize the initial HA overhead (say <25%) and ensure DRS can load balance effectively, then scale up (eg: RAM) to cater for additional VMs. If more compute power is required in future, scaling out and then scaling up (add RAM) further. I would generally suggest not to design to the maximum, so up to 24 node clusters.

From a HA perspective, I feel in a 32 node cluster, 4 hosts worth of compute should be reserved for HA, or 1 in 8 (12.5% HA Reservation). Similar to the RAID-DP concept from Netapp, of 14+2 disks in a RAID pack.

Tip: Choose Hardware which can be upgraded (Scaled up) . Avoid designing a cluster with hosts hardware specs maxed out day 1.

There are exceptions to this, such as Management clusters, which may only have (and need) 2 or 3 hosts over their life span, (eg: For environments where vCloud Director is used), or environments with static or predictable workloads.

To achieve the above, the chosen hardware needs to be upgradable, ie: If a Servers maximum RAM is 1TB, you may consider only half populating it (being careful to choose DIMMs that allow you to expand) to enable you to scale up as the environments compute requires grow.

Tip: Know your workloads! So use tools like Capacity Planner so you understand what your designing for.

It is very important to consider the larger VMs, and ensure the hardware you select has suitable number of physical cores.

Example: Don’t expect 2 x 8vCPU VMs (highly utilized) to run well together on a 2 way 4 core host.

When designing a new cluster or scaling an existing one, be sure to consider the CPU to RAM ratio, so that you don’t end up with a cluster with heaps of available CPU and maxed out memory or vice versa. This is a common mistake i see.

Note: Typically in environments I have seen over many years, Memory is almost always the bottleneck.

The Following is an example where a Scale Out and Up approach end up with very similar compute power in their respective clusters, but would likely have very different performance characteristics and consolidation ratios.

Example Scenario: A customer with 200 VMs day one , and lets say the average VM size is 1vCPU / 4GB RAM but they have 4 highly utilized 8vCPU / 64GB Ram VMs running database workloads.

The expected consolidation ratio is 2.5:1 vCPUs to physical cores and 1.5:1 vRAM to physical Ram.

The customer expects to increase the number of VMs by 25% per year, for the next 3 years.

So our total compute required is

Day one : 92.8 CPU cores and 704GB Ram.

End of Year 1 : 116 CPU cores and 880GB Ram.

End of Year 2 : 145 CPU cores and 1100GB Ram.

End of Year 3 : 181 CPU cores and 1375GB Ram.

The day 1 requirements could be achieved in a number of ways, see two examples below.

Option 1 (Scale Out) – Use 9 hosts with 2 Way / 6 core / 96GB Ram w/ HA reservation of 12% (~N+1)

Total Cluster Resources = 108 Cores & 864GB RAM

Usable assuming N+1 HA = 96 cores & 768GB RAM

Option 2 (Scale Up) – Use 4 hosts with 4 Way / 8 core / 256GB Ram w/ HA reservation of 25% (~N+1)

Total Cluster Resources = 128 Cores & 1024GB RAM

Usable assuming N+1 HA = 96 cores & 768GB RAM

Both Option 1 and Option 2 appear to meet the Day 1 compute requirements of the customer, right?

Well, yes, at the high level, both scale out and up appear to provide the required compute resources.

Now lets review how the clusters will scale to meet the End of Year 3 requirements, after all, we don’t design just for day 1 do we. 🙂

End of Year 3 Requirements : 181 CPU cores and 1375GB Ram.

Option 1 (Scale Out) would require ~15 hosts (2RU per host) based on CPU & ~15 hosts based on RAM plus HA capacity of ~12% (N+2 as the cluster is >8 hosts.) taking the total required hosts to 18 hosts.

Total Cluster Resources = 216 Cores & 1728GB RAM

Usable assuming N+2 HA = 190 cores & 1520GB RAM

Note: At between 16 and 24 hosts N+3 should be considered. (Equates to 1 spare host of compute per 8 hosts)

Option 2 (Scale Up) – would require Use 6 hosts (4RU per host) based on CPU &  5 hosts based on RAM plus HA capacity of ~15% (N+1 as the cluster is <8 hosts.) taking the total required hosts to 7 hosts.

Total Cluster Resources = 224 Cores & 1792GB RAM

Usable assuming N+1 HA = 190 cores & 1523GB RAM

So on the raw compute numbers, we have two viable options which scale from Day to end of Year 3 and meet the customers compute requirement.

Which option would I choose I hear you asking, good question.

I think I could easily defend either Option, but I believe Option 2 would be be more economically viable and result in better performance. The below are a few reasons for my conclusion.

* Option 2 Would give significant transparent page sharing, compared to Option 1 therefore getting a higher consolidation ratio.

* Option 2 would likely be much cheaper from a Network / Storage connectivity point of view (less connections)

* Option 2 is more suited to host the 4 x 8vCPU highly utilized VMs (as they fit within a NUMA node and will only use 1/4 of the hosts CPU as opposed to 3/4’s of the 2 Way host)

* The 4 way (32 core) host would provide better CPU scheduling due to the large number of cores

* From a data center perspective, Option 2 would only use 28RU compared to 36RU

Note: A cluster of 7 hosts is not really ideal, but in my opinion is large enough to get both HA and DRS efficiencies. The 18 node cluster (option 1) is really in the sweet spot for cluster sizing, but the CPUs did not suit the 8 vCPU workloads. Had Option 1 used 8 core processors that would have made Option 1 more attractive.

Happy to hear everyone’s thoughts on the topic.