Example Architectural Decision – DRS Automation Level

Problem Statement

What is the most suitable DRS automation level and migration threshold for a vSphere cluster running an IaaS offering with a self service portal w/ unpredictable workloads?

Assumptions

1. Workload types and size are unpredictable in a IaaS environment, workloads may vary greatly and without notice
2. The solution needs to be as automated as possible without introducing significant risk

Motivation

1. Prevent unnecessary vMotion migrations which will impact host & cluster performance
2.Ensure the cluster standard deviation is minimal
3. Reduce administrative overhead of reviewing and approving DRS recommendations

Alternatives

1.Use Fully automated and Migration threshold 1 – Apply priority 1 recommendations
2.Use Fully automated and Migration threshold 2- Apply priority 1 & 2 recommendations
3. Use Fully automated and Migration threshold 4- Apply priority 1,2,3 and 4 recommendations
4.Use Fully automated and Migration threshold 5- Apply priority 1,2,3,4 & 5 recommendations
5. Set DRS to manual and have a VMware administrator assess and apply recommendations

Justification

1. Prevent excessive vMotion migrations that do not provide significant benefit to cluster balance as the vMotion itself will use cluster and network resources
2. Ensure cluster remains in a reasonably load balanced state without resource being wasted on load balancing for minimal improvement
3. DRS is a low risk, proven technology which has been used in large production environments for many years
4. Setting DRS to manual would be a significant administrative overhead and introduce additional risk from human error
5. Setting a more aggressive DRS migration threshold would put an additional load on the cluster which will likely not result in significantly better balance

Architectural Decision

Use DRS in Fully Automated mode with setting “3” – Apply priority 1,2 and 3 recommendations

Implications

1. DRS will not move workloads via vMotion where only a moderate improvement to the cluster will be achieved
2. At times, including after performing updates (via VUM) of ESXi hosts the cluster may appear to be unevenly balanced as DRS may calculate minimal benefit from migrations. Setting DRS to “Use Fully automated and Migration threshold 5” for a short period of time following maintenance should result in a more evenly balanced cluster.

One thought on “Example Architectural Decision – DRS Automation Level