What’s .NEXT 2016 – Acropolis X-Fit

Now that Acropolis Hypervisor (AHV) has been GA for approx 18 months (with many customers using it in production well before official GA), Nutanix has had a lot of positive feedback about its ease of deployment, management, scaling and performance. However there has been a common theme that customers have wanted the ability to create rules to seperate VMs and to keep VMs together much like vSphere’s DRS functionality.

Since the GA of AHV, it has supported some basic DRS functionality including initial placement of VMs and the ability to restore a VMs data locality by migrating the VM to the node containing the most data locally.

These features work very well, so the affinity and anti-affinity rules were the main pain point. While AHV is not designed to or aimed to be feature parity with ESXi or Hyper-V, DRS style rules is one area where similar functionality makes sense whereas in many other areas, AHV is and will remain very different to legacy hypervisors.

No surprise the AHV scheduler now provides VM/host affinity and anti-affinity rule capabilities which (similar to vSphere DRS) allows for “Should” and “Must” rules to control how the cluster enforces the rules.

DRSAffinityAntiAffinity

Rule types which can be created:

  • VM-VM affinity: Keep VMs on a same host.
  • VM-VM anti-affinity: Keep VMs on separate hosts.
  • VM-Host affinity: Keep a given VM on a group of hosts.
  • VM-Host anti-affinity: Keep a given VM out of a group of hosts.
  • Affinity and Anti-affinity rules are cross-cluster policies.
  • Users can specify MUST as well as SHOULD enforcement of DRS rules

In addition to matching the capabilities of vSphere DRS, the Acropolis X-Fit functionality is also tightly integrated with both the compute and storage layers and works to proactively identify and resolve storage and compute contention by automatically moving virtual machines while ensuring data locality is optimised.

AHVScheduling1

There are many other exciting load balancing capabilities to come so stay tuned as the AHV scheduler has plenty more tricks up its sleeve.

Related .NEXT 2016 Posts

Nutanix Acropolis Hypervisor (AHV) Intelligent VM Load Balancing at Power On

I thought I would share a quick demonstration showing how Nutanix Acropolis Hypervisor (AHV) load balances VMs at power on to ensure optimal performance.

This is just one of the many seemingly simple functions of AHV to ensure optimal performance throughout the cluster. Its pretty simple, when AHV issues a Power On request it queries the Acropolis Master which continually receives compute (CPU/RAM) and Storage (Capacity/Performance) data from the Acropolis Slaves (other nodes) in the cluster.

When the Acropolis Master receives the power on request it simply looks for the node with the lowest utilization and issues the command to power on the VM.

This is a simple, proactive way of minimising the chance of compute contention within a cluster.

Related Articles:

  1. Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor

Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for Mixed Production Servers (1 of 2)

Problem Statement

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for an environment running mixed production server workloads?

Assumptions

1. TPS is disabled by default
2. Storage is expensive
3. Two Socket ESXi Hosts have been chosen to align with a scale out methodology.
4. Average Server VM is between 2-4vCPU and 4-8GB Ram with some larger.
5. Memory is the first compute level constraint.
6. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
7. vSphere 5.5 or earlier

Requirements

1. The environment must deliver consistent performance
2. Minimize the cost of shared storage

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure

Architectural Decision

Leave TPS disabled (default) and leave Large Memory pages enabled (default).

Justification

1. Setting 100% memory reservations ensures consistent performance by eliminating the possibility of swapping.
2. The 100% memory reservation also eliminates the capacity usage by the vswap file which saves space on the shared storage as well as reducing the impact on the storage in the event of swapping.
3. RAM is cheaper than Tier 1 storage (which is recommended for vSwap storage to ensure minimal performance impact during swapping) so the increased cost of memory in the hosts is easily offset by the saving in Tier 1 shared storage.
4. Simplicity. Leaving default settings is advantageous from both an architectural and operational perspective.  Example: ESXi Patching can cause settings to revert to default which could negate TPS savings and put a sudden high demand on storage where TPS savings are expected.
5. TPS savings for server workloads is typically much less than with desktop workloads and as a result less attractive.
6. The decision has been made to use 2 socket ESXi hosts and scale out so the TPS savings per host compared to a 4 socket server with double the RAM will be lower.
7. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM leading to more consistent performance under a wider range of circumstances.
8. Lower core count (and lower cost) CPUs will likely be viable as RAM will likely be the first constraint for further consolidation.
9. Remove the real or perceived security risk of sensitive information being gathered from other VMs using TPS as described in VMware KB 2080735

Implications

1. Using 100% memory reservations requires ESXi hosts and the cluster be sized at a 1:1 ratio of vRAM to pRAM (Physical RAM) and should include N+1 so a host failure can be tolerated.
2. Increased RAM costs
3. No memory overcommitment can be achieved
4. Potential for lower CPU utilization / overcommitment as RAM may become the first constraint.

Alternatives

1. Use 50% reservation and enable TPS
2. Use no reservation, Enable TPS and disable large pages

Related Articles:

1. The Impact of Transparent Page Sharing (TPS) being disabled by default @josh_odgers (VCDX#90)

2. Example Architectural Decision – Transparent Page Sharing (TPS) Configuration for Production Servers (2 of 2)

3. Future direction of disabling TPS by default and its impact on capacity planning –@FrankDenneman (VCDX #29)

4. Transparent Page Sharing Vulnerable, Yet Largely Irrelevant – @ChrisWahl (VCDX#104)