What’s .NEXT 2016 – Metro Availability Witness

In 2014, Nutanix introduced Metro Availability which allows Virtual Machines to have mobility between sites as well as to provide failover in the event of a site failure.

The goal of the Metro Availability (MA) Witness is to automate failovers in case of inter-site network failures or site failures. By the virtue of running the Witness in a different location than the two Metro Sites, it provides the ‘outside’ view that can determine whether a site is actually down or whether the network connection between the two sites is down, avoiding a split-brain scenario that can occur without an external Witness.

The main functions of a Witness include:

  • Making failover decision in the event of site or inter-site network failure
  • Avoiding split brain where the same container is active on both sites
  • Design to handle situations where a single storage or network domain fails

For example, in the case of a Metro Availability (MA) relationship between clusters, a Witness residing in a separate failure domain (e.g.: 3rd site) decides which site should be activated when a split brain occurs due to a WAN failure, or in the situation where a site goes down. For that reason, it is a requirement that there are independent network connections for inter-site connectivity *and* for connections to the witness.

How Metro works without a Witness:

In the event of the primary site failure (the site where the Metro container is currently active) or the links between the sites going offline, the Nutanix administrator is required to manually Disable Metro Availability and Promote the Container to Active on the site where VMs are desired to be ran. This is a quick and simple process, but it is not automated which may impact the Recovery Time Objective (RTO).

In case of a communication failure with the secondary site (either due to the site going down or the network link between the sites going down), the Nutanix administrator can configure the system in two ways:

  • Automatic: the system will automatically disable Metro Availability on the container on the primary site after a short pause if the secondary site connection doesn’t recover within that time
  • Manual: wait for the administrator to manually take action

MetroNoWitness

How Metro Availability (MA) works with the witness:

With the new Witness capability, the process of disabling Metro Availability and Promoting the Container in case of a site outage or a network failure is fully automated which ensures the fastest possible RTO.  The Witness functionality is only used in case of a failure, meaning a Witness failure itself will not affect VMs running on either site.

MetroWithWitness

Failure Scenarios Addressed by MA Witness.

There are a number of scenarios which can occur and Metro Availability responds differently depending on if MA is configured in “Witness mode”, “Automatic Resume mode” or in “Manual mode”.

The following table details the scenarios and the behaviour based on the configuration of Metro Availability.

MetroFailureScenarios3

In all cases except a failure at both Site 1 and Site 2, the MA Witness automatically handles the situation and ensures the fastest possible RTO.

The following videos show how each of the above scenarios function.

Deployment of the Metro Availability (MA) Witness:

The Witness capability is deployed as a stand-alone Virtual Machine, that can be imported on any hypervisor in a separate failure domain, typically a 3rd site.  This VM can run on non-Nutanix hardware.  This site is expected to have dedicated network connections to Site 1 and Site 2 to avoid a single point of failure.

As a result, MA Witness is quick and easy to deploy, resulting in lower complexity and risk compared to other solutions on the market.

Summary:

The Nutanix Metro Witness completes the Nutanix Metro Availability functionality by providing completely automatic failover in case of site or networking failures.

Related .NEXT 2016 Posts

Metro Availability Witness Failure Scenario 5 – Complete Network outage Site 1

Related Posts

Failure Scenario 1 – Site 1 Failure

Failure Scenario 2 – Site 2 Failure

Failure Scenario 3 – Network Partition

Failure Scenario 4 – Witness Failure

Failure Scenario 5 – Network Outage Site 1

Failure Scenario 6 – Network Outage Site 2

Failure Scenario 7 – Complete Network outage

Failure Scenario 8 – Site 1 Network outage to Witness

Failure Scenario 9 – Network Partition + Site Failure

Metro Availability Witness Failure Scenario 3 – Network Partition

Related Posts