Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 5 – Resiliency

When discussing resiliency, it is common to make the mistake of only looking at data resiliency and not considering resiliency of the storage controllers and the management components required to service the business applications.

Legacy technologies such as RAID and Hot Spare drives may in some cases provide high resiliency for data, however if they are backed by a dual controller type setups which cannot scale out and self heal, the data may be unavailable or performance/functionality severely degraded following even a single component failure. Infrastructure that is dependant on HW replacement to restore resiliency following a failure is fundamentally flawed as I have discussed in: Hardware support contracts & why 24×7 4 hour onsite should no longer be required.

In addition if the management application layer is not resilient, then data layer high-availability/resiliency may be irrelevant as the business applications may not be functioning properly (i.e.: At normal speeds) or at all.

The Acropolis platform provides high resiliency for both the data and management layers at a configurable N+1 or N+2 level (Resiliency Factor 2 or 3) which can tolerate up to two concurrent node failures without losing access to Management or data. In saying that, with “Block Awareness”, an entire block (up to four nodes) can fail and the cluster still maintains full functionality. This puts the resiliency of data and management components on XCP up to N+4.

In addition, the larger the XCP cluster, the lower the impact of a node/controller/component failure. For a four node environment, N-1 is 25% impact whereas for an 8 node cluster N-1 is just a 12.5% impact. The larger the cluster the lower the impact of a controller/node failure. In contrast a dual controller SAN has a single controller failure, and in many cases the impact is 50% degradation and a subsequent failure would result in an outage. Nutanix XCP environments self heal so that even for an environment only configured for N-1, it is possible following a self heal than subsequent failures can be tolerated without causing high impact or outages.

In the event the Acropolis Master instance fails, full functionality will return to the environment after an election which completes within <30 seconds. This equates to management availability greater than “six nines” (99.9999%). Importantly, AHV has this management resiliency built-in; it requires zero configuration!

For more information see: Acropolis: Scalability

As for data availability, regardless of hypervisor the Nutanix Distributed Storage Fabric (DSF) maintains two or three copies of data/parity and in the event of a SSD/HDD or node failure, the configured RF is restored by all nodes within the cluster.

Data Resiliency

While we have just covered why resiliency of data is not the only important factor, it is still key. After all, if a solution which provides shared storage looses data, its not fit for purpose in any datacenter.

As data resiliency is such a foundation to the Nutanix Distributed Storage Fabric, the Data resiliency status is displayed on the Prism Home Screen. In the below screenshot we can see is that the ability to provide resiliency in both steady state and in the event of a failure (Rebuild Capacity) are both tracked.

In this example, all data in the cluster is compliant with the configured Resiliency Factor (RF2 or 3) and the cluster has at least N+1 available capacity to rebuild after the loss of a node.

dataresiliency1

To dive deeper into the resiliency status, simply click on the above box and it will expand to show more granular detail of the failures which can be tolerated.

The below screen shot shows things like Metadata, OpLog (Persistent Write Cache) and back end functions such as Zookeeper are also monitored and alerted when required.

resiliency2

In the event either of these is not in a normal or “Green” state, PRISM will alert the administrator. In the event the alert is the cause of a node failure, Prism automatically notifies Nutanix support (via Pulse) and dispatches the required part/s, although typically an XCP cluster will self-heal long before delivery of hardware even in the case of an aggressive Hardware Maintenance SLA such as 4hr Onsite.

This is yet another example of Nutanix not being dependent on Hardware (replacement) for resiliency.

Data Integrity

Acknowledging a Write I/O to a guest operating system should only occur once the data is written to persistent media because until this point, it is possible for data loss to occur even when storage is protected by battery backed cache and uninterruptible power supplies (UPS).

The only advantage to acknowledging writes before this has occurred is performance, but what good is performance when your data lacks integrity or is lost?

Another commonly overlooked requirement of any enterprise grade storage solution is the ability to detect and recover from Silent Data Corruption. Acropolis performs checksums in software for every write AND on every read. Importantly Nutanix is in no way dependent on the underlying hardware or any 3rd party software to maintain data integrity, all check summing and remediation (where required) is handled natively.

Pro tip: If a storage solution does not perform checksums on Write AND Read, DO NOT use it for production data.

In the event of Silent Data Corruption (which can impact any storage device from any vendor), the checksum will fail and the I/O will be serviced from another replica which is stored on a different node (and therefore physical SSD/HDD). If a checksum fails in an environment with Erasure Coding, EC-X recalculates the data the same way as if a HDD/SSD failed and services the I/O.

In the background, the Nutanix Distributed Storage Fabric will discard the corrupted data and restore the configured Resiliency Factor from the good replica or stripe where EC-X is used.

This process is completely transparent to the virtual machine and end user, but is a critical component of the XCP’s resiliency. The underlying Distributed Storage Fabric (DFS) also automatically protects all Acropolis management components, this is an example of one of the many advantages of the Acropolis architecture where all components are built together, not bolted on afterwards.

An Acropolis environment with a container configured with RF3 (Replication Factor 3) provides N+2 management availability. As a result, it would take an extraordinarily unlikely failure of three concurrent node failures before a management outage could potentially occur. Luckily XCP has an answer for this albeit unlikely scenario as well, Block Awareness is a capability where with 3 or more blocks the cluster can tolerate the failure of an entire block (up to 4 nodes) without causing data or management to go offline.

Part of the Acropolis story around resiliency goes back to the lack of complexity. Acropolis enables rolling 1-click upgrades and includes all functionality. There is no single point of failure; in the worst-case scenario if the node with Acropolis master fails, within 30 seconds the Master role will restart on a surviving node and initiate VMs to power on. Again this is in-built functionality, not additional or 3rd party solutions which need to be designed/installed & maintained.

The above points are largely functions of the XCP rather than AHV itself, so I thought I would highlight a AHV’s Load Balancing and failover capabilities.

Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing policy to choose which removes another layer of complexity and potential point of failure.

However in the event of the local CVM being unavailable for any reason we need to service I/O for all the VMs on the node in the most efficient manner. AHV does this by redirecting I/O on a per vDisk level to a random remote stargate instance as shown below.

pervmpathfailover

AHV can do this because every vdisk is presented via iSCSI and is its own target/LUN which means it has its own TCP connection. What this means is a business critical application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by multiple controllers concurrently.

As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or maintenance scenario.

For more information see: Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

Summary:

  1. Out of the box self healing capabilities for:
    1. SSD/HDD/Node failure/s
    2. Acropolis and PRISM (Management layer)
  2. In-Built Data Integrity with software based checksums
  3. Ability to tolerate up to 4 concurrent node failures
  4. Management availability of >99.9999 (Six “Nines”)
  5. No dependency on Hardware for data or management resiliency

For more information see: Ensuring Data Integrity with Nutanix – Part 2 – Forced Unit Access (FUA) & Write Through

Back to the Index

Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 3 – Scalability

Scalability is not just about the number of nodes that can form a cluster or the maximum storage capacity. The more important aspects of scalability is how an environment expands from many perspectives including Management, Performance, Capacity, Resiliency and how scaling effects Operational aspects.

Let’s start with scalability of the components required to Manage/Administrator AHV:

Management Scalability

AHV automatically sizes all Management components during deployment of the initial cluster, or when adding node/s to the cluster. This means there is no need to do initial sizing or manual scaling of XCP management components regardless of the initial and final size of the cluster/s.

Where Resiliency Factor of 3 (N+2) is configured, the Acropolis management components will be automatically scaled to meet the N+2 requirement. Let’s face it, there is no point having N+2 at one layer and not at another because availability, like a Chain, is only as good as its weakest link.

Storage Capacity Scaling

The Nutanix Distributed Storage Fabric (DSF) has no maximum Storage Capacity, additionally, storage capacity can even be scaled separately to compute with “Storage-only” nodes such as the NX-6035C. Nutanix storage only nodes help eliminate the problems when scaling capacity compared to traditional storage.

Scaling Storage-only nodes run AHV (which are interoperable with other supported hypervisors) allowing customers to scale capacity regardless of Hypervisor. Storage-only nodes do not require hypervisor licensing or separate management. Storage only nodes also fully support all one-click upgrades for the Acropolis Base Software and AHV just like compute+storage nodes. As a result, storage only nodes are invisible, well apart from the increased capacity and performance which the nodes deliver.

Nutanix Storage only nodes help eliminate the problems when scaling capacity compared to traditional storage, for more information see: Scaling problems with traditional shared storage.

Some of the scaling problems with traditional storage is adding shelves of drives and not scaling data services/management. This leads to problems such as lower IOPS/GB and higher impact to workloads in the event of component failures such as storage controllers.

Scaling storage only nodes is remarkably simple. For example a customer added 8 x NX6035C nodes to his vSphere cluster via his laptop on the showroom floor of vForum Australia in October of this year.

https://twitter.com/josh_odgers/status/656999546673741824

As each storage-only node is added to the cluster, a light-weight Nutanix CVM joins the cluster to provide data services to ensure linear scale out management and performance capabilities, thus avoiding the scaling problems which plague traditional storage.

For more information on Storage only nodes, see: http://t.co/LCrheT1YB1

Compute Scalability

Enabling HA within a cluster requires reserving one or more nodes for HA. This can create unnecessary inefficiencies when the hypervisor limits the maximum cluster size. AHV not only has no limit to the number of nodes within a cluster. As a result, AHV can help avoid unnecessary silos that can lead to inefficient use of infrastructure due to requiring one or more nodes per cluster to be reserved for HA. AHV nodes are also automatically configured with all required settings when joining an existing cluster. All the administrator needs to provide is basic IP address information, Press Expand cluster and Acropolis takes care of the rest.

See the below demo showing how to expand a Nutanix cluster:

Analytics Scalability

AHV includes built-in Analytics and as with the other Acropolis Management components, Analysis components are sized automatically during initial deployment and scales automatically as nodes are added.

This means there is never tipping point where there is a requirement for an administrator to scale or deploy new Analysis instances or components. The analysis functionality and its performance remains linear regardless of scale.

This means AHV eliminates the requirement for seperate software instances and database/s to provide analytics.

Resiliency Scalability

As Acropolis uses the Nutanix Distributed Storage Fabric, in the event drive/s or node/s fail, all nodes within the cluster participate in restoring the configured resiliency factor (RF) for the impacted data. This occurs regardless of Hypervisor, however, AHV includes fully distributed Management components; the larger the cluster, the more resilient the management layer also becomes.

For example, the loss of a single node in a 4-node cluster would have potentially a 25% impact on the performance of the management components. In a 32-node cluster, a single node failure would have a much lower potential impact of only 3.125%. As an AHV environment scales, the impact of a failure decreases and the ability to self-heal increases in both speed to recover and number of subsequent failures which can be supported.

For information about increasing resiliency of large clusters, see: Improving Resiliency of Large Clusters with EC-X

Performance Scalability

Regardless of hypervisor, as XCP clusters grow, the performance improves. The moment new node(s) are added to the cluster, the additional CVM/s start participating in Management and Data Resiliency tasks even when no VMs are running on the nodes. Adding new nodes allows the Storage Fabric to distribute RF traffic among more Controllers which enhances Write I/O & resiliency while helping decrease latency.

The advantage that AHV has over the other supported hypervisors is that the performance of the Management components (many of which have been previously discussed) dynamically scale with the cluster. Similar to Analytics, AHV management components scale out. There is never a tipping point requiring manual scale out of management or new deploying instances of management components or their dependencies.

Importantly, for all components, the XCP distributes data and management functions across all nodes within the cluster. Acropolis does not use “mirrored” components/hardware or objects which ensures no two nodes or components/hardware become a bottleneck or point of failure.

Back to the Index

Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 2 – Simplicity

Let me start by saying I believe complexity is one of the biggest and potentially the most overlooked, issue in modern datacenters.

Virtualization has enabled increased flexibility and solved countless problems within the datacenter. But over time I have observed an increase in complexity especially around the management components which for many customers is a major pain point.

Complexity leads to things like increased cost (both CAPEX & OPEX) and risk, which commonly leads to reduced availability/performance.

In Part 10, I will cover Cost in more depth so let’s park it for the time being.

When architecting solutions for customers, my number one goal is to meet/exceed all my customers’ requirements with the simplest solution possible.

anyfool

Acropolis is where web-scale technology delivers enterprise grade functionality with consumer-grade simplicity, and with AHV the story gets even better.

Removing Dependencies

A great example of the simplicity of the Nutanix Xtreme Computing Platform (XCP) is its lack of external dependencies. There is no requirement for any external databases when running Acropolis Hypervisor (AHV) which removes the complexity of designing, implementing and maintaining enterprise grade database solutions such as Microsoft SQL or Oracle.

This is even more of an advantage when you take into account the complexity of deploying these platforms in highly available configurations such as AlwaysOn Availability Groups (SQL) or Real Application Clusters (Oracle RAC) where SMEs need to be engaged for design, implementation and maintenance. As a result of not being dependent on 3rd party database products, AHV reduces/removes complexity around product interoperability or the need to call multiple vendors if something goes wrong. This also means no more investigating Hardware Compatibility Lists (HCLs) and Interoperability Matrix’s when performing upgrades.

Management VMs

Only a single management virtual machine (Prism Central) needs to be deployed – even for multi-cluster globally distributed AHV environments. Prism Central is an easy to deploy appliance and since it’s state-less, it does not require backing up. In the event the appliance is lost, an administrator simply deploys a new Prism Central appliance and connects it to the clusters which can be done in a matter of seconds per cluster. No historical data is lost as the data is maintained on the clusters being managed.

Because Acropolis requires no additional components, it all but eliminates the design/implementation and operational complexity for management compared to other virtualization / HCI offerings.

Other supported hypervisors commonly require multiple management VMs and backend databases even for relatively small scale/simple deployments just to provide basic administration, patching and operations management capabilities.

Acropolis has zero dependencies during the installation phase, customers can implement a fully featured AHV environment without any existing hardware/software in the datacenter. Not only does this make initial deployment easy, but it also removes the complexity around interoperability when patching or upgrading in the future.

Ease of Management

Nutanix XCP clusters running any hypervisor can be managed individually using Prism Element or centrally via Prism Central.

Prism Element requires no installation; it is available and performs optimally out-of-the-box. Administrators can access Prism Element via the XCP Cluster IP address or via any Controller VM IP address.

Administrators of Legacy virtualization products often need to use hypervisor-specific tools to complete various tasks requiring design/deployment and management of these components and their dependencies. With AHV, all hypervisor level functionality is completed via Prism providing a true single pane of glass interface for everything from Storage, Compute, Backup, Data Replication, Hardware monitoring and more.

The image below shows the PRISM Central Home Screen that provides a high-level summary of all clusters in the environment. From this screen, you can drill down to individual clusters to get more granular information where required.

PRISMcentraloverview

Administrators perform all upgrades from PRISM without the requirement for external update management applications/appliances/VMs or supporting back end databases.

PRISM performs one-click fully automated rolling upgrades to all components including Hypervisor, Acropolis Base Platform (formally known as NOS), Firmware and Nutanix Cluster Check (NCC).

For a demo of Prism Central see the following YouTube video:

Further Reduced Storage Complexity

Storage has long been, and continues for many customers to be, a major hurdle to successful virtual environments. Nutanix has essentially made storage invisible over the past few years by removing the requirement for dedicated Storage Area Networks, Zoning, Masking, RAID and LUNs. When combined with AHV, XCP has taken this innovation yet another big step forward by removing the concepts of datastores/mounts and virtual SCSI controllers.

For each Virtual Machine disk, AHV presents the vDisk directly to the VM, and the VM simply sees the vDisk as if it were a physically attached drive. There is no in-guest configuration. It just works.

This means there is no complexity around how many virtual SCSI controllers to use, or where to place a VM or vDisk and as such, Acropolis has eliminated the requirement for advanced features to manage virtual machine placement and capacity management such as vSphere’s Storage DRS.

Don’t get me wrong, Storage DRS is a great feature which helps solve serious problems with traditional storage.  With XCP these problems just don’t exist.

For more details see:  Storage DRS and Nutanix – To use, or not to use, that is the question?

The following screen shot shows just how simple vDisks appear under the VM configuration menu in Prism Element. There is no need to assign vDisks to Virtual SCSI controllers which ensures vDisks can be added quickly and perform optimally.

VMdisks

Node Configuration

Configuring an AHV environment via Prism automatically applies all changes to each node within the cluster. Critically, Acropolis Host Profiles functionality does not need to be enabled or configured, nor do Administrators have to check for compliance or create/apply profiles to nodes.

In AHV all networking is fully distributed similar to the vSphere Distributed Switch (VDS) from VMware. AHV network configuration is automatically applied to all nodes within the cluster without requiring the administrator to attach nodes/hosts to the virtual networking. This helps ensure a consistent configuration throughout the cluster.

The reason the above points are so important is each dramatically simplifies the environment by removing (not just abstracting) many complicated design/configuration items such as:

  • Multipathing
  • Deciding How many datastores are required & what size each should be
  • Considering how many VMs should reside per datastore/LUN.
  • Configuration maximums for Datastores / Paths
  • Managing consistent configuration across nodes/hosts
  • Managing Network Configuration

Administrators can optionally join Acropolis built-in authentication to an Active Directory domain, removing the requirement for additional Single Sign-On components. All Acropolis components include High Availability out-of-the-box, removing the requirement to design (and license) HA solutions for individual management components.

Data Protection / Replication

The Nutanix CVM includes built-in data protection and replication components, removing the requirement to design/deploy/manage one or more Virtual Appliances. This also avoids the need to design, implement and scale these components as the environment grows.

All of the data protection and replication features are also available via Prism and, importantly, are configured on a per VM basis making configuration easier and reducing overheads.

Summary

In summary the simplicity of the AHV eliminates:

  1. Single points of failures for all management components out of the box
  2. The requirement for dedicated management clusters for Acropolis components
  3. Dependency on 3rd Party Operating Systems & Database platforms
  4. The requirement for design, implementation and ongoing maintenance for Virtualization management components
  5. The need to design, install, configure & maintain a Web or Desktop type
  6. Complexity such as
    1. The requirement to install software or appliances to allow patching / upgrading
    2. The requirement for an SME to design a solution to make management components highly available
    3. The requirement to follow complex Hardening Guides to achieve security compliance.
    4. The requirement for additional Appliances/interfaces and external dependencies (i.e.: Database Platforms)
  7. The requirement to license features to allow Centralised configuration management of nodes.

Back to the Index