Example Architectural Decision – Advanced Power Management for vSphere Clusters with Business Critical Applications

Problem Statement

In a vSphere environment where Business Critical Applications have been successfully virtualized, should Advanced Power Management be used to help reduce data center costs?

Requirements

1. Fully Supported solution

2. Reduce data center costs where possible

3. Business Critical Application performance must not be significantly degraded

Assumptions

1. Supported Hardware

2. vSphere 5.0 or later

3. Admission Control is enabled with >= N+1 redundancy

Constraints

1. None

Motivation

1. Reduce Datacenter costs where possible with minimal/no impact to performance

Architectural Decision

Configure the BIOS to “OS Controlled”

Set ESXi Advanced Power Management to “Balanced”

Justification

1. Power savings can be realized with almost no impact to performance

2. The performance difference between “High performance” & “Balanced” options is insignificant however Power savings can be achieved reducing cost and environmental impacts

3. In the unlikely event of performance issues as a result of using the “Balanced” option, the BIOS is set to OS Controlled so ESXi can be updated without downtime during troubleshooting

4. Advanced Power Management Options (other than “High Performance” & “Balanced”) have proven to have excellent power savings but at a high cost to performance which is not suitable for Business Critical Applications

5. As HA Admission Control is used to provide >=N+1 redundancy, the ESXi hosts will generally not be fully utilized which will give Advanced Power Management opportunities to conserve power

6. The workloads in the cluster/s run 24/7 however demand is generally higher during business hours and some low demand or idle time exists

7. Even where only a small power saving is realized, if performance is not significantly impacted then a faster ROI can be achieved due to cost savings

Implications

1. Where performance issues exist using “Balanced” a vSphere administrator may need to change Advanced Power Management to “High Performance”

Alternatives

1. Use “High Performance”

2. Use “BIOS Controlled”

3. Do not use Advanced Power Management

4. Use Advanced Power Management in conjunction with DPM

Relates Articles

1. Power Management and Performance in ESXi 5.1 – By Rebecca Grider (@RebeccaGrider)

 AdvancedPowerManagement

 

Example Architectural Decision – Distributed Power Management (DPM) for Virtual Desktop Clusters

Problem Statement

In a VMware View (VDI) environment where the bulk of the workforce work between 8am and 6pm daily, how can vSphere be configured to minimize the power consumption without significant impact to the end user experience?

Assumptions

1. The bulk of the workforce work between 8am and 6pm daily
2. Most users login during a 2 hour window between 7:30 and 9:30 daily
3. Most users logoff during a 2 hour window between 4:30 and 6:30 daily
4. VMware View cluster maintains at least N+1 redundancy
5. VMware View cluster only runs desktop workloads
6. VMware View cluster size is >=5
7. VMware View cluster/s are configured with HA admission control policy of “Percentage of cluster resources reserved for HA” to avoid the potentially inefficient slot size calculation preventing hosts going into standby mode

Motivation

1. Reduce the power consumption
2. Align with Green IT strategies
3. Reduce the datacenter costs
4. Reduce the carbon footprint

Architectural Decision

Configure and enable DPM on all ESXi hosts with the power management set to “Automatic” and the DPM threshold set to “Apply priority 3 or higher recommendations” and set hosts 1,2 and 3 in the cluster not to enter standby mode.

Justification

1. As the bulk of the users are inactive outside of normal business hours, a significant power saving can be achieved
2. The users do not all login at once, which allows DPM to gradually start ESXi hosts (which were put into standby mode by DPM previously)
3. In the event the workload is unusually low on a given day, power savings can be realized without significant impact to the end user experience
4. Where a large number of users login unexpectedly early one morning, the impact to users will be minimal
5. DPM is configured to ensure a minimum of three (3)  ESXi hosts remain on at all times. This number is expected to be able to support all desktops within the environment under low load (ie: 80% of desktops at idle). This number can be adjusted if required.

Implications

1. In the unlikely event a large number of users logon unexpectedly early one morning, the impact to users may be experienced for the time it takes for one or more ESXi hosts to exit maintenance mode. This is generally <10mins for most servers.
2. Out of band interfaces such as DRAC / iLO / RSA or IMM interfaces (depending on host hardware type) will need to be configured and be accessible to vCenter and the ESXi hosts to enable DPM to function
3. As the “Percentage of cluster resources reserved for HA” setting is static (not dynamically adjusted by DPM) in the event of a host failure while one or more hosts are in standby mode, in unlikely event a VM attempts to power on before a host has been able to successful exit standby mode, the VM may fail to power on.
4. Where large percentages of Memory reservations are used (see Example AD – Memory Reservation for VDI) then ability for the for DPM to put one or more hosts into standby will be reduced. Where DPM is expected to be used, no more than 50% memory reservation should be configured to ensure maximum  memory overcommitment can be achieved without placing a significant overhead on the shared storage for vSwap files
5. Monitoring solutions may need to be customized/modified not to trigger an alarm for a host that is put into standby mode

Alternatives

1. Set a lower number of hosts to remain on to maximize power savings – This may result in higher impact to users first thing in the morning in the event of high concurrent logins
2. Set a higher number of host to remain on, however this will minimize power savings and give less value to the added complexity of setting up DPM (and associated out of band management interfaces)
3. Set the DPM threshold more aggressive to maximize power savings – This would likely result in some impact to VMs due to increased physical cores being available to the CPU scheduler and physical memory being available for VMs which may result in swapping