What’s .NEXT 2016 – Acropolis X-Fit

Now that Acropolis Hypervisor (AHV) has been GA for approx 18 months (with many customers using it in production well before official GA), Nutanix has had a lot of positive feedback about its ease of deployment, management, scaling and performance. However there has been a common theme that customers have wanted the ability to create rules to seperate VMs and to keep VMs together much like vSphere’s DRS functionality.

Since the GA of AHV, it has supported some basic DRS functionality including initial placement of VMs and the ability to restore a VMs data locality by migrating the VM to the node containing the most data locally.

These features work very well, so the affinity and anti-affinity rules were the main pain point. While AHV is not designed to or aimed to be feature parity with ESXi or Hyper-V, DRS style rules is one area where similar functionality makes sense whereas in many other areas, AHV is and will remain very different to legacy hypervisors.

No surprise the AHV scheduler now provides VM/host affinity and anti-affinity rule capabilities which (similar to vSphere DRS) allows for “Should” and “Must” rules to control how the cluster enforces the rules.

DRSAffinityAntiAffinity

Rule types which can be created:

  • VM-VM affinity: Keep VMs on a same host.
  • VM-VM anti-affinity: Keep VMs on separate hosts.
  • VM-Host affinity: Keep a given VM on a group of hosts.
  • VM-Host anti-affinity: Keep a given VM out of a group of hosts.
  • Affinity and Anti-affinity rules are cross-cluster policies.
  • Users can specify MUST as well as SHOULD enforcement of DRS rules

In addition to matching the capabilities of vSphere DRS, the Acropolis X-Fit functionality is also tightly integrated with both the compute and storage layers and works to proactively identify and resolve storage and compute contention by automatically moving virtual machines while ensuring data locality is optimised.

AHVScheduling1

There are many other exciting load balancing capabilities to come so stay tuned as the AHV scheduler has plenty more tricks up its sleeve.

Related .NEXT 2016 Posts

What’s .NEXT 2016 – All Flash Everywhere!

I am pleased to say Nutanix and our OEMs are now offering even more flexibility with our “Configure To Order” option (a.k.a CTO) by allowing any node type, yes ANY node type to be configured with all flash.

Why is this so cool, well Nutanix and our OEMs (Dell XC & Lenovo HX) have a wide range of models which customers can choose from and for customers who require large usable capacity of high performance storage, this is a simple way to get a pre-certified solution with all the flexibility of build your own without the risks.

AllFlashEverywhere

With this increased level of flexibility, the argument for BYO/HCL is all but moot in my opinion.

So let’s think about what this means.

The NX-8150, a 1 node per 2RU product (which I was heavily involved in the design of) will now support 24 x SSDs!

Even with the currently supported SSDs (1.92TB each), this would mean >46TB of RAW SSD capacity along with dual Broadwell CPUs and up to 768GB RAM.

Note: Higher capacity SSDs are coming soon to provide even more capacity!

Now with 24 x SSDs that is some serious power!

What’s also exciting is this doesn’t just mean higher flash capacity, it also means higher performance. This is because Nutanix persistent write buffer (OpLog) is striped across all SSDs in a node, this means the write performance can benefit from all SSDs in the node, in the case of that’s NX8150 that’s 24 drives!

Combine this with the fact Nutanix now supports any node as storage only, and this gives customers near unlimited flexibility without the risk/complexity of BYO/HCL options.

After all, the hardware is commodity, all the value is in the software so who cares what HW it runs on as long as its reliable.

Summary:

  • Configure to Order (CTO) now allows any node type to be configured with All Flash
  • All Flash nodes can also be Storage Only nodes
  • Write Performance takes advantage of all SSDs in a node
  • Nutanix Configure to Order (CTO) option makes the argument for BYO/HCL options all but moot.

Related .NEXT 2016 Posts

What’s .NEXT 2016 – Metro Availability Witness

In 2014, Nutanix introduced Metro Availability which allows Virtual Machines to have mobility between sites as well as to provide failover in the event of a site failure.

The goal of the Metro Availability (MA) Witness is to automate failovers in case of inter-site network failures or site failures. By the virtue of running the Witness in a different location than the two Metro Sites, it provides the ‘outside’ view that can determine whether a site is actually down or whether the network connection between the two sites is down, avoiding a split-brain scenario that can occur without an external Witness.

The main functions of a Witness include:

  • Making failover decision in the event of site or inter-site network failure
  • Avoiding split brain where the same container is active on both sites
  • Design to handle situations where a single storage or network domain fails

For example, in the case of a Metro Availability (MA) relationship between clusters, a Witness residing in a separate failure domain (e.g.: 3rd site) decides which site should be activated when a split brain occurs due to a WAN failure, or in the situation where a site goes down. For that reason, it is a requirement that there are independent network connections for inter-site connectivity *and* for connections to the witness.

How Metro works without a Witness:

In the event of the primary site failure (the site where the Metro container is currently active) or the links between the sites going offline, the Nutanix administrator is required to manually Disable Metro Availability and Promote the Container to Active on the site where VMs are desired to be ran. This is a quick and simple process, but it is not automated which may impact the Recovery Time Objective (RTO).

In case of a communication failure with the secondary site (either due to the site going down or the network link between the sites going down), the Nutanix administrator can configure the system in two ways:

  • Automatic: the system will automatically disable Metro Availability on the container on the primary site after a short pause if the secondary site connection doesn’t recover within that time
  • Manual: wait for the administrator to manually take action

MetroNoWitness

How Metro Availability (MA) works with the witness:

With the new Witness capability, the process of disabling Metro Availability and Promoting the Container in case of a site outage or a network failure is fully automated which ensures the fastest possible RTO.  The Witness functionality is only used in case of a failure, meaning a Witness failure itself will not affect VMs running on either site.

MetroWithWitness

Failure Scenarios Addressed by MA Witness.

There are a number of scenarios which can occur and Metro Availability responds differently depending on if MA is configured in “Witness mode”, “Automatic Resume mode” or in “Manual mode”.

The following table details the scenarios and the behaviour based on the configuration of Metro Availability.

MetroFailureScenarios3

In all cases except a failure at both Site 1 and Site 2, the MA Witness automatically handles the situation and ensures the fastest possible RTO.

The following videos show how each of the above scenarios function.

Deployment of the Metro Availability (MA) Witness:

The Witness capability is deployed as a stand-alone Virtual Machine, that can be imported on any hypervisor in a separate failure domain, typically a 3rd site.  This VM can run on non-Nutanix hardware.  This site is expected to have dedicated network connections to Site 1 and Site 2 to avoid a single point of failure.

As a result, MA Witness is quick and easy to deploy, resulting in lower complexity and risk compared to other solutions on the market.

Summary:

The Nutanix Metro Witness completes the Nutanix Metro Availability functionality by providing completely automatic failover in case of site or networking failures.

Related .NEXT 2016 Posts