Acropolis: VM High Availability (HA)

This past week at Nutanix .NEXT, Acropolis was officially announced although it has actually been available and running in many customer environments (1200+ nodes globally) for a long time.

One of the new features is VM High Availability.

As with everything Nutanix, VM HA is a very simple yet effective feature. Let’s go through how to configure HA via the Acropolis/PRISM HTML 5 interface.

As shown below, using the “Options” menu represented by the cog, there is an option called “Manage VM High Availability”.

HAMenu

The Manage VM High Availability has 2 simple options shown below:

1. Enable VM High Availability (On/Off)
2: Best Effort / Reserve Space

Best Effort works as you might expect where in the event of a node failure, VMs are powered on throughout the cluster if resources are available.

In the event resources e.g.: Memory, are not available then some/all VMs may not be powered on.

HAonBestEffort

Reserve Space also works as you might expect by reserving enough compute capacity within the cluster to tolerate either one or two node failures. If RF2 is configured then one node is reserved and if RF3 is in use, two nodes are reserved.

Pretty simple right!

HAonReserveSpace

The best part about Reserve Space is its like “Host failures cluster tolerates” in vSphere, however without using the potentially inefficient slot size algorithm.

Once HA is enabled, it appears on the Home screen of PRISM and gives a summary of the VMs which are On,Off and Suspended as shown below.

HAHomeScreen

HA can also be enabled/disabled on a per VM basis via the VMs tab. Simply highlight the VM and click “Update” as shown below.

VMHAupdate

Then you will see the “Update VM” popup appear. Then simply Enable HA.

VMHA

In the above screenshot you can see that the popup also warns you if HA is disabled at the cluster level and allows you to jump straight to the Manage VM High Availability configuration menu.

So there you have it, Acropolis VM High Availability, simple as that.

Related Articles:

1. Acropolis: Scalability
2. What’s .NEXT? – Acropolis!
3. What’s .NEXT? – Erasure Coding!

 

 

Acropolis: Scalability

One of the major focuses for Nutanix both for our Distributed Storage Fabric (part of the Nutanix Xtreme Computing Platform or XCP) has been scalability with consistent performance.

Predictable scalability is critical to any distributed platform as it predictable scalability for the management layer.

This is one of the many strengths of the Acropolis management layer.

All components which are required to Configure, Manage, Monitor, Scale and Automate are fully distributed across all nodes within the cluster.

As a result, there is no single point of failure with the Nutanix/Acropolis management layer.

Lets take a look at a typical four node cluster:

Below we see four Controller VMs (CVMs) which service one node each. In the cluster we have an Acropolis Master along with multiple Acropolis Slave instances.

Acropolis4nodecluster1

In the event the Acropolis Master becomes unavailable for any reason, an election will take place and one of the Acropolis Slaves will be promoted to Master.

This can be achieved because Acropolis data is stored in a fully distributed Cassandra database which is protected by the Distributed Storage Fabric.

When an additional Nutanix node is added to the cluster, an Acropolis Slave is also added which allows the workload of managing the cluster to be distributed, therefore ensuring management never becomes a point of contention.Acropolis5NodeCluster

Things like performance monitoring, stats collection, Virtual Machine console proxy connections are just a few of the management tasks which are serviced by Master and Slave instances.

Another advantage of Acropolis is that the management layer never needs to be sized or scaled manually. There is no vApp/s , Database Server/s, Windows instances to deploy, install, configure, manage or license, therefore reducing cost and simplifying management of the environment.

Summary:

Acropolis Management is automatically scaled as nodes are added to the cluster, therefore increasing consistency , resiliency, performance and eliminating potential for architectural (sizing) errors which may impact manageability.

Note: For non-Acropolis deployments, PRISM is also scaled in the same manner as described above, however the scalability of Hypervisor management layers such as vCenter or SCVMM will need to be considered separately when not using Acropolis.

What’s .NEXT? – Acropolis! Part 1

By now many of you will probably have heard about Project “Acropolis” which was the code name for development where Nutanix decided to create an Uncompromisingly Simple management platform for an optimized KVM hypervisor.

Along the way with Nutanix extensive skills and experience with products such as vSphere and Hyper-V, we took on the challenge of delivering similar enterprise grade features while doing so in an equally scalable and performant platform as NDFS.

Acropolis therefore had to be built into PRISM. The below screen shot shows the Home screen for PRISM in a Nutanix Acropolis environment, looks pretty much the same as any other Nutanix solution right! Simple!

PRISMwAcropolis

So let’s talk about how you install Acropolis. Since its a management platform for your Nutanix infrastructure it is critical component, so do I need a management cluster? No!

Acropolis is built into the Nutanix Controller VM (CVM), so it is installed by default when loading the KVM hypervisor (which is actually shipped by default).

Because its built into the CVM, Acropolis (and therefore all the management components) automatically scale with the Nutanix cluster, so there is no need to size the management infrastructure. There is also no need to license or maintain operating systems for management tools, further reducing cost and operational expense.

The following diagram shows a 4 node Nutanix NDFS cluster running Nutanix KVM Hypervisor using Acropolis. One CVM per cluster is elected the Acropolis Master and the rest of the CVMs are Acropolis Slaves.

AcropolisCluster1

The Acropolis Master is responsible for the following tasks:

  1. Scheduler for HA
  2. Network Controller
  3. Task Executors
  4. Collector/Publisher of local stats from Hypervisor
  5. VNC Proxy for VM Console connections
  6. IP address management

Each Acropolis Slave is responsible for the following tasks:

  1. Collector/Publisher of local stats from Hypervisor
  2. VNC Proxy for VM Console connections

Acropolis is a truly distributed management platform which has no dependency on external database servers and is fully resilient with in-built self healing capabilities so in the event of node or CVM failures that management continues without interruption.

What does Acropolis do? Well, put simply, the things 95% of customers need including but not limited to:

  • High Availability (Think vSphere HA)
  • Load Balancing / Virtual Machine Migrations (Think DRS & vMotion)
  • Virtual machine templates
  • Cloning (Instant and space efficient like VAAI-NAS)
  • VM operations / snapshots / console access
  • Centralised configuration of nodes (think vSphere Host Profiles & vSphere Distributed Switch)
  • Centralized Managements of virtual networking (think vSphere Distributed Switch)
  • Performance Monitoring of Physical HW, Hypervisor & VMs (think vRealize Operations manager)

Summary: Acropolis combines the best of breed hyperconverged platform with an enterprise grade KVM management solution which dramatically simplifies the design, deployment and ongoing management of datacenter infrastructure.

In the next few parts of this series I will explore the above features and the advantages of the Acropolis solution.