Nutanix Scalability – Part 4 – Storage Performance for Monster VMs with AHV!

In Part 3 we learned a number of ways to scale storage performance for a single VM including but not limited too:

  • Using multiple PVSCSI controllers
  • Using multiple virtual disks
  • Spreading large workloads (like databases) across multiple vDisks/Controllers
  • Increasing the CVMs vCPUs and/or vRAM
  • Adding storage only nodes
  • Using Acropolis Block Services (ABS)

Now here at Nutanix, especially in the Solutions/Performance engineering team we’re never satisfied and we’re always pushing for more efficiency which leads to greater performance.

A colleague of mine, Michael Webster (NPX#007 and VCDX#66) was a key part of the team who designed and developed what is now known as “Volume Group Load Balancer” or VG LB for short.

Volume Group Load Balancer is an Acropolis Hypervisor (AHV) only capability which combines the IO path efficiencies of AHV Turbo Mode with the benefits of the Acropolis Distributed Storage Fabric (ADSF) to create a more simplified and dynamic version of Acropolis Block Services (ABS).

One major advantage of VG LB over ABS is it’s simplicity.

There is no requirement for in-guest iSCSI which removes the potential for driver and configuration issues and VG LB is configured through PRISM UI by using the update VM option making it a breeze to setup.

UpdateVMwVG

The only complexity with VG LB currently is to enable the load balancing functionality, it needs to be applied at the Acropolis CLI (acli) using the following command:

acli vg.update Insert_vg_name_here load_balance_vm_attachments=true

In the event you do not wish all Controller VMs to provide IO for VG LB, one or more CVMs can be excluded from load balancing. However I recommend leaving the cluster to sort itself out as the Acropolis Dynamic Scheduler (ADS) will move virtual disk sessions if CVM contention is discovered.

iSCSI sessions are also dynamically balanced as workload on individual CVMs exceed 85% to ensure hot spots are quickly alleviated which is another reason why CVMs should not be excluded as you are likely constraining performance for the VG LB VM unnecessarily.

VG LB is how Nutanix has achieved >1 MILLION 8k random read IOPS at just 0.11ms latency from a single VM as shown below.

This was achieved using just a 10 node cluster, imagine what can be achieved when you scale out the cluster further.

A Frequently asked question relating to high performance VMs is what happens when you vMotion?

The link above shows this in detail including a YouTube demonstration, but in short the IO dropped below 1 million IOPS for approx 3 seconds during the vMotion with the lowest value recorded at 956k IOPS. I’d say an approx 10% drop for 3 seconds is pretty reasonable as the performance drop is caused by the migration stunning the VM and not by the underlying storage.

The next question is “What about mixed read/write workloads?

Again the link above shows this in detail including a YouTube demonstration, but at this stage you’re probably not surprised that this result shows a maximum starting baseline of 436K random read and 187k random write IOPS and immediately following the migration performance reduced to 359k read and 164k write IOPS before achieving greater performance than the original baseline @ 446k read and 192k IOPS within a few seconds.

So not only can Nutanix VG LB achieve fantastic performance, it can do so during normal day to day operations such as VM live migrations.

The VG LB capability is unique to Nutanix and is only achievable thanks to the true Distributed Storage Fabric.

With Nutanix highly scalable software defined storage and the unique capabilities like storage only nodes, AHV Turbo and VG LB, the question “Why?” seriously needs to be asked of anyone recommending a SAN.

I’d appreciate any constructive questions/comments on use cases which you believe Nutanix cannot handle and I’ll follow up with a blog post explaining how it can be done, or I’ll confirm if it’s not currently supported/recommended.

Summary:

Part 3 has taught us that Nutanix provides excellent scalability for Virtual Machines and provides ABS for niche workloads which may require more performance than a single node can offer while Part 4 explains how Nutanix’ next generation hypervisor (AHV) provides further enhanced and simplified performance for monster VMs with Volume Group Load Balancing leveraging Turbo Mode.

Back to the Scalability, Resiliency and Performance Index.

Nutanix Scalability – Part 1 – Storage Capacity

It never ceases to amaze me that analysts as well as prospective/existing customers frequently are not aware of the storage scalability capabilities of the Nutanix platform.

When I joined back in 2013, a common complaint was that Nutanix had to scale in fixed building blocks of NX-3050 nodes with compute and storage regardless of what the actual requirement was.

Not long after that, Nutanix introduced the NX-1000 and NX-6000 series which had lower and higher CPU/RAM and storage capacity options which gave more flexibility, but still there were some use cases where Nutanix still had significant gaps.

In October 2013 I wrote a post titled “Scaling problems with traditional shared storage” which covers why simply adding shelves of SSD/HDD to a dual controller storage array does not scale an environment linearly, can significantly impact performance and add complexity.

At .NEXT 2015, Nutanix announced the ability to Scale Storage separately to Compute which allowed customers to scale capacity by adding similar to a shelf of drives like they could with their legacy SAN/NAS, but with the added advantage of having a storage controller (the Nutanix CVM) to add additional data services, performance and resiliency.

Storage only nodes are supported with any Hypervisor but the good news in they run on Nutanix’ Acropolis Hypervisor (AHV) which means no additional hypervisor licensing if you run VMware ESXi, and storage only nodes still support all the 1-click rolling upgrades so they add no additional management overhead.

Advantages of Storage Only Nodes:

  1. Ability to scale capacity seperate to CPU/RAM like a traditional disk shelf on a storage array
  2. Ability to start small and scale capacity if/when required, i.e.: No oversizing day 1
  3. No hypervisor licensing or additional management when scaling capacity
  4. Increased data services/resiliency/performance thanks to the Nutanix Controller VM (CVM)
  5. Ability to increase capacity for hot and cold data (i.e.: All Flash and Hybrid/Storage heavy)
  6. True Storage only nodes & the way data is distributed to them is unique to Nutanix

Example use cases for Storage Only Nodes

Example 1: Increasing capacity requirement:

MS Exchange Administrator: I’ve been told by the CEO to increase our mailbox limits from 1GB to 2GB but we don’t have enough capacity.

Nutanix: Let’s start small and add storage only nodes as the Nutanix cluster (storage pool) reaches 80% utilisation.

Example 2: Increasing flash capacity:

MS SQL DBA: We’re growing our mission critical database and now we’re hitting SATA for some day to day operations, we need more flash!

Nutanix: Let’s add some all flash storage only nodes.

Example 3: Increasing resiliency

CEO/CIO: We need to be able to tolerate failures and the infrastructure self heal but we have a secure facility which is difficult and time consuming to get access too, what can we do?

Nutanix: Let’s add some storage only nodes to ensure you have enough capacity (All Flash and/or Hybrid) to ensure sufficient capacity to tolerate “n” number of failures and rebuild the environment back to a fully resilient and performant state.

Example 4: Implementing Backup / Long Term Retention

CEO/CIO: We need to be able to keep 7 years of data for regulatory requirements and we need to be able to access it within 1hr.

Nutanix: We can either add storage only nodes to one or more existing clusters OR create a dedicated Backup/Retention cluster. Let’s start with enough capacity for Year 1, and then as capacity is required, add more storage only nodes as the cost per GB drops over time. Nutanix allows mixing of hardware generations so you’ll never be in a situation where you need to rip & replace.

Example 5: Supporting one or more Monster VMs

Server Administrator: We have one or more VMs with storage capacity requirements of 100TB each, but the largest Nutanix node we have only supports 20TB. What do we do?

Nutanix: The Distributed Storage Fabric (ADSF) allows a VMs data set to be distributed throughout a Nutanix cluster ensuring any storage requirement can be met. Adding storage only nodes will ensure sufficient capacity while adding resiliency/performance to all other VMs in the cluster. Cold data will be distributed throughout the cluster while frequently accessed data will remain local where possible within the local storage capacity on the node where the VM runs.

For more information on this use case see: What if my VMs storage exceeds the capacity of a Nutanix node?

Example 6: Performance for infrequently accessed data (cold data).

Server Administrator: We have always stored our cold data on SATA drives attached to our SAN because we have a lot of data and flash is expensive. One or twice a year we need to do a bulk read of our data for auditing/accounting purposes but it’s always been so slow. How can we solve this problem and give good performance while keeping costs down?

Nutanix: Hybrid Storage only nodes are a cost effective way to store cold data and combined with ADSF, Nutanix is able to deliver optimum read performance from SATA by reading from the replica (copy of data) with the lowest latency.

This means if a HDD or even a node is experiencing heavy load, ADSF will dynamically redirect Read I/O throughout the cluster to Deliver Increased Read Performance from SATA. This capability was released in 2015 and storage only nodes adding more spindles to a cluster is very complimentary to this capability.

Frequently asked questions (FAQ):

  1. How many storage only nodes can a single cluster support?
    1. There is no hard limit, typically cluster sizes are less than 64 nodes as it’s important to consider limiting the size of a single failure domain.
  2. How many Compute+Storage nodes are required to use Storage Only nodes?
    1. Two. This also allows N+1 failover for the nodes running VMs in the event a compute+storage node failed so VMs can be restarted. Technically, you can create a cluster with only storage only nodes.
  3. How does adding storage only node increase capacity for my monster VM?
    1. By distributing replicas of data throughout the cluster, thus freeing up local capacity for the running VM/s on the local node. Where a VMs storage requirement exceeds the local nodes capacity, storage only nodes add capacity and performance to the storage pool. Note: One VM even with only one monster vDisk can use the entire capacity of a Nutanix cluster without any special configuration.

Summary:

For many years Nutanix has supported and recommended the use of Storage only nodes to add capacity, performance and resiliency to Nutanix clusters.

Back to the Scalability, Resiliency and Performance Index.

Free Training – Virtualizing Business Critical Applications (vBCA)

Just found another great free self paced training course offered by VMware. This one is focused on one of my favourite topics, Virtualizing Business Critical Applications.

The course covers thing like what business critical applications can be virtualized efficiently as well as covering common customer objections, some of which are FUD or fiction.

In addition use cases, best practices and value propositions for virtualizing each business-critical application.

One important area the course covers (which can be hard to find reliable information on) is the licensing requirements for applications such as Oracle databases, SAP and the Microsoft Suite e.g.: SQL / Exchange / Sharepoint.

Kudo’s to VMware for releasing this training free of charge. The link to access the course is below.

Virtualizing Business-Critical Applications [V5.X] – Customer