Nutanix AHV I/O path efficiency

The I/O path in AHV is unlike other hypervisors and is remarkably simple. Each VM is made up of one or more vDisks, with each vDisk presented directly to the VM via iSCSI. vDisks appear to the guest OS as if they were a physical disk or the same as a VMDK does in vSphere environments and do not require any special in guest configuration.

The I/O path for each vDisk bypasses the underlying QEMU storage stack and has a direct TCP connection to the iSCSI target on the local Controller VM. This bypasses any/all queues at the hypervisor layer and allows Stargate to manage the one and only queue.

Importantly, every single vDisk has its own TCP connection to stargate which means vdisks do not share any queues until they hit the storage controller (stargate). This reduces points of contention to Stargate itself and as every AHV node runs a stargate instance (within the CVM), only VMs on the same node share the queue for stargate, further reducing the chances of contention.

For those of you who are not familiar with the underlying Nutanix architecture, check out the below video describing what stargate does.

Because the vDisk is presented as a LUN via iSCSI the commands being sent do not require SCSI protocol emulation and simply send native SCSI commands.

The below diagram shows a VM with 3 vDisks and how they connect to Stargate. You will note QEMU is completely bypassed which optimises the I/O path.

AHVIOpath

If a Virtual machine has more than 3 vDisks, each additional vDisk will have its own TCP connection.

In the event the local Stargate instance is offline for any reason (e.g.: Rolling One-Click upgrade or CVM failure) each TCP connection will be redirected in a round robin manner across all the CVMs within the Nutanix cluster as described in Acropolis Hypervisor (AHV) I/O Failover & Load Balancing.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Why AHV is the next generation hypervisor – 10 Part Series

Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 10 – Cost

You may be surprised cost is so far down the list but as you have probably realized by reading the previous 9 parts is that AHV is in many ways a superior virtualization platform to other products on the market. In my opinion, it would be a mistake to think AHV is a “low-cost option” or “a commodity hypervisor with limited capabilities” just because it happens to be included with Starter Edition (making it effectively free for all Nutanix customers).

Apart from the obvious removal of hypervisor and associated management component licensing/ELA costs, the real cost advantage of using AHV is the dramatic reduction in effort required in the design, implementation, operational verification phases as well as ongoing management.

This is due to many factors:

Simplified Design Phase

As all AHV Management components are in-built, highly available and auto scaling, there is no need to engage a Subject Matter Expert (SME) to design the management solution. As a person who has designed countless highly available virtualization solutions over the years, I can tell you AHV out of the box is what I have all but dreamed of creating with other products for customers in the past.

Simplified Implementation Phase

All management components (with the exception of Prism Central) are deployed automatically removing the requirement for an engineer to install/patch/harden these components.

Building Acropolis and all management components into the CVM means there are fewer moving parts that can go wrong and therefore that need to be verified.

In my experience, Operational Verification is one of the areas regularly overlooked and infrastructure is put into production without having proven it meets the design requirements and outcomes. With AHV management components deployed automatically, the risk of components not delivering is all but eliminated and where Operational Verification is performed, it can be completed much faster than traditional products due to having much fewer moving parts.

Simplified ongoing operations

Acropolis provides One-Click fully automated rolling upgrades for Acropolis Base Software (formally known as NOS), Acropolis Hypervisor, Firmware and Nutanix Cluster Check (NCC). In addition, upgrades can be automatically downloaded removing the risk of installing incompatible versions and the requirement to check things such as Hardware Compatability Lists (HCLs) and interoperability matrix’ before upgrades.

AHV dramatically simplifies Capacity management by only requiring capacity management to be done at the Storage Pool layer; there is no requirement for administrators to manage capacity between LUNs/NFS mounts or Containers. This capability also eliminates the requirement for well-known hypervisor features such as vSphere’s Storage DRS.

Reduced 3rd party licensing costs

AHV includes all management components, or in the case of Prism Central, come as a prepackaged appliance. There is no need to license any operating systems. The highly resilient management components on every Nutanix node eliminates the requirement for 3rd party database products such as Microsoft SQL or Oracle or best case scenario, the deployment of Virtual Appliances which may not be highly available and which needs to be backed up and maintained.

Reduced Management infrastructure costs

It is not uncommon for virtualization solutions to require a dozen or more management components (each potentially on a dedicated VM) even for small deployments to get all the functionality such as centralized management, patching and performance/capacity management. As deployments grow or have higher availability requirements, the number of management VMs and their compute requirements tend to increase.

As all management components run within the Nutanix Controller VM (CVM) which resides on each Nutanix node, there is no need to have a dedicated management cluster. The amount of compute/storage resources are also reduced.

The indirect cost savings for the reduced management infrastructure include:

  1. Less rack space (RU)
  2. Less power/cooling
  3. Fewer network ports
  4. Less compute nodes
  5. Lower storage capacity & performance requirements

Last but not least, what about the costs associated with maintenance windows or outages?

Because Acropolis provides fully non-disruptive one-click upgrades and removes numerous points of failure (e.g.: 3rd Party Databases) while providing an extremely resilient platform, AHV also reduces the cost to the customer of maintenance and outages.

Summary:

  1. No design required for Acropolis management components
  2. No ongoing maintenance required for management components
  3. Reduced complexity reduces the chance of downtime as a result of human error

Back to the Index

Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

Many customers and partners have expressed interest in Acropolis since it was officially launched at .NEXT in June earlier this year, and since then lots of questions have been asked around resiliency/availability etc.

In this post I will cover how I/O failover occurs and how AHV load balances in the event of I/O failover to ensure optimal performance.

Let’s start with an Acropolis node under normal circumstances. The iSCSI initiator for QEMU connects to the iSCSI redirector which directs all I/O to the local stargate instance which runs within the Nutanix Controller VM (CVM) as shown below.

AHVMPdefault

I/O will always be serviced by the local stargate unless a CVM upgrade, shutdown or failure occurs. In the event one of the above occurs QEMU will loose connection to the local stargate as shown below.AHVMPfailedlocal

When this loss of connectivity to stargare occurs, QEMU reconnects to the iSCSI redirector and establishes a connection to a remote stargate as shown below.AHVMPremote

The process of re-establishing an iSCSI connection is near instant and you will likely not even notice this has occurred.

Once the local stargate is back online (and stable for 300 seconds) I/O will be redirected back locally to ensure optimal performance.

AHVMPfailback

In the unlikely event that the remote stargate goes down before the local stargate is back online then the iSCSI redirector will redirect traffic to another remote stargate.

Next lets talk about Load Balancing.

Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing policy to choose which removes another layer of complexity and potential point of failure.

However in the event of the local CVM being unavailable for any reason we need to service I/O for all the VMs on the node in the most efficient manner. Acropolis does this by redirecting I/O on a per vDisk level to a random remote stargate instance as shown below.

pervmpathfailover

Acropolis can do this because every vdisk is presented via iSCSI and is its own target/LUN which means it has its own TCP connection. What this means is a business critical application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by multiple controllers concurrently.

As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or maintenance scenario.

As i’m sure you can now see, Acropolis provides excellent resiliency and performance even during maintenance or failure scenarios.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM