Example Architectural Decision – Memory Reservation for Virtual Desktops

Problem Statement

In a VMware View (VDI) environment with a large number of virtual desktops, the potential Tier 1 storage requirement for vswap files (*.vswp) can make the solution less attractive from a ROI perspective and have a high upfront cost for storage. What can be done to minimize the storage requirements for the vswap file thus reducing the storage requirements for the VMware View (VDI) solution?

Assumptions

1. vSwap files are placed on Tier 1 shared storage with the Virtual machine (default setting)

Motivation

1. Minimize the storage requirements for the virtual desktop solution
2. Reduce the up front cost of storage for VDI
3. Ensure the VDI solution gets the fastest ROI possible without compromising performance

Architectural Decision

Set the VMware View Master Template with a 50% memory reservation so all VDI machines deployed have a 50% memory reservation

Justification

1. Setting 50% reservation reduces the storage requirement for vSwap by half
2. Setting only 50% ensures some memory overcommitment and transparent page sharing can still be achieved
3. Memory overcommitment is generally much lower than CPU overcommitment (around 1.5:1 for VDI)
4. Reserving 50% of a VDI machines RAM is cheaper than the equivalent shared storage
5. A memory reservation will generally provide increased performance for the VM
6. Reduces/Removes the requirement/benefit for a dedicated datastore for vSwap files
7. Transparent page sharing (TPS) will generally only give up to 30-35% memory savings

Implications

1. Less memory overcommitment will be achieved

Alternatives

1. Set a higher memory reservation  of 75% – This would further reduce the shared storage requirement while still allowing for 1.25:1 memory overcommitment
2. Set a 100% memory reservation – This would eliminate the vSwap file but prevent memory overcommitment
3. Set a lower memory reservation of 25% – This would not provide significant storage savings and as transparent page sharing generally only achieves upto 30-35% there would still be a sizable requirement for vSwap storage with minimal benefit
4. Create a dedicated datastore for vSwap files on lower Tier storage

 

Common Mistake: Using CPU reservations to solve CPU Ready

One of the more common problems I see in virtual environments is over sized virtual machines which typically results in lower performance, and your guessed it, high CPU Ready.

What is CPU Ready?

CPU ready is basically the time it takes a VM to be scheduled onto physical core after it is placed in the CPU scheduling queue.

What is High CPU Ready?

In my opinion, during peak load, anything above 2% (or 400ms) is a concern and should be monitored. Above 5% will be impacting performance (resulting in lower CPU utilization) and 10% or more, should be considered a serious problem and remediated immediately.

The below is a screenshot showing CPU ready from a recent test I conducted in my home lab

To calculate the percentage of CPU Ready, we divide the VMs “Summation” value (in the screen shot above its the “W2K8 CPU TEST VM 1” line by 20000 (ms) which is the statistics collection interval, then divide the result by the number of vCPUs in the VM.

So if we use the value from the “latest” column, its 7337 divide 20000, equals : 0.36685, then we divide that by 2 as the VM has 2 vCPUs and we end up with 0.183425

That’s 18% CPU Ready, which basically means 18% of the time, the VM is not doing anything!

Note: CPU Ready % can be found using ESXTOP or RESXTOP via the vMA or on the ESXi host directly.

Now to try and diagnose the Performance/CPU ready issue, we need to work out if the VM is oversized and if so, Right Size the VM.

What is an Oversized VM?

Basically a VM which has more compute resources assigned than it requires, for example, a VM which uses no more than 20% of its CPU and has 4 vCPUs.
What is Right Sizing?

In the above example, the VM is oversized as it doesn’t use more than 1vCPU (or 25%) of the CPU resources and therefore could be reduced to to 1 vCPU and run at 80%.
So the VM is oversized, and has High CPU ready, what happens when we right size it from 4vCPUs to 1vCPU and why does this help performance?
Its pretty simple, the less vCPUs a VM has, the easier job the CPU scheduler has to find enough physical cores to schedule the VM onto. If a cluster has a lot of oversized VMs, all the VMs are all competing for the same physical cores, and making it more and more difficult for the scheduler.

But what about setting a CPU Reservation? Don’t reservations “guarantee” resources?

The answer is, Yes and No.

The reservation “reserves” CPU resources measured in Mhz, but this has nothing to do with the CPU scheduler.

So setting a reservation will help improve performance for the VM you set it on, but will not “solve” CPU ready issues caused by “oversized” VMs, or by too high an overcommitment ratio of CPU resources.

In my testing I set an 80% reservation of a VMs 2 vCPUs worth of Mhz and prior to setting the reservation the CPU ready was ~20% and then CPU Ready did drop to around 10%. Note: This test was performed with only 25% overcommitment – 5 vCPUs on 4 physical Cores using CPUBUSY to keep the CPUs running at 100% (measured within the guest by Windows Task Manager).

I then set a 100% reservation of the VMs 2 vCPUs worth of Mhz, prior to setting the reservation the CPU ready was ~10% and CPU Ready did not get below 2.5% even with 100% reservation.

The result would have been exponentially worse had I tested with 50% or 100% overcommitment which is generally easily achieved with VMware and a well architected cluster. (I have seen well above these overcommitment numbers with no CPU ready issues).

Reducing CPU Ready down to 2.5% may sound like a pretty good result, but when we look at the other 3 x 1vCPU VMs on the host (4 core test ESXi 5 host) they had CPU ready of 40%!! Not to mention 2.5% is still not good!

If you have poor performance, and you discover you have High CPU Ready the best solution is  Right Size Your VMs!

I have recommended exactly that countless times and the customers never believe that performance can increase with less vCPUs, until after the Right Sizing exercise.

If after Right sizing, you still have CPU Ready, your overcommitment on CPU is simply to high for the workloads within your cluster.

You can address this by

1. Adding additional compute to the cluster. (Duh!)

2. Using Affinity rules to locate complimentary workloads together (Lots of small 1vCPU VMs which don’t have high CPU utilization will generally work well with a limited number of higher vCPU VMs)

3. Use Anti-Affinity rules to separate non complimentary workloads (eg: Don’t place all your 8vCPU VMs on one host with 300% overcommitment on CPU and expect them to work well).

4. Scaling out (not up) your VMs ie: Don’t have one 8 vCPU SQL DB server, use 4 smaller 2vCPU VMs

So now you know better than to use reservations to solve CPU contention.

Its time too go Right Sizing!

This simple task is about the best bang for buck you will get in your data center, since virtualizing on VMware in the first place.