The value of the hyperscaler + hypervisor model

Public cloud offerings for “hyperscalers” such as AWS EC2, Microsoft Azure & Google GCP provide a lot of value when it comes to be able to stand up and run virtual workloads in a timely manner and provide various capabilities to create globally resilient solutions.

All of these offerings also boast a varying/wide range of native services which can compliment or replace services running in traditional virtual machines.

As I’ve previously stated in a post from August 2022, Direct to Cloud Value – Part 1, the hyperscalers have two major advantages customers can benefit from:

  1. A Well understood architecture
  2. Global availability

Designing, deploying and maintaining “on-premises” infrastructure on the other hand is often far less attractive from a time to value perspective and requires significant design efforts by highly qualified, experienced (and paid) individuals in order to get anywhere close to the scalability, reliability and functionality of the hyperscalers.

On-premises infrastructure may not be cost effective for smaller customers/environments who don’t have the quantity of workloads/data to make it cost effective, so “native” public cloud solutions at a high level are often a great choice for customers.

The problem for many customers is they’re established businesses with a wide range of applications from numerous vendors, many of which are not easy to simply migrate to a public cloud provider.

Workload refactoring is often a time consuming and complex task which is not always able to be achieved in a timely manner, and in many cases not at all.

Customers also rarely have the luxury of starting from and/or just building a greenfield environment due to the overall cost and/or the requirement to get a return on investment (ROI) from existing infrastructure.

Customers often have the requirement to burst during peak periods which isn’t something easily achievable on-premises. Customers often need to significantly oversize their on-premises infrastructure just to be able to support end of month, quarter or peak periods such as “Black Friday” for retailers.

This oversizing does help mitigate risks and deliver business outcomes, but it comes at a high cost (CAPEX).

Enter the “Hyperscaler + Hypervisor” model.

The hyperscaler + hypervisor model is where the hyperscaler (AWS/Azure/Goolgle) provides bare metal servers (a.k.a instances) where a hypervisor (in the above example, VMware ESXi) is running along with Virtual SAN (a.k.a “vSAN”) to provide the entire VMware technology stack to run Virtual Machines (VMs).

Nutanix has a similar offering called “Nutanix Cloud Clusters” or “NC2” using their own hypervisor “AHV”.

Both the VMware & Nutanix offerings gives the same look/feel to their customers as they have today on-premises.

The advantages of the hyperscaler + hypervisor model are enormous from both a business and technical perspective, the following are just a few examples.

  • Ease of Migration

A migration of VMware based workloads from an existing on-premises environment can be achieved using a variety of methods including VMware native tools such HCX as well as third party tools from backup vendors such as Commvault without having to refactor workloads.

This is achieved without the cost/complexity and delay of refactoring workloads.

  • Consistent look and feel

The Hyperscaler + hypervisor options provide customers access to the same management tools they’re used to on-premises meaning there is minimal adjustment required for I.T teams.

  • Built-in Cloud exit strategy / No Cloud Vendor “Lock in”

The hypervisor layer allows customers to quickly move from one hyperscaler to another again without refactoring, giving customers real bargaining power when it comes to negotiating commercial arrangements.

It also enables a move off public cloud back to on-premises.

  • Faster Time to value

The ability to stand up net new environments typically within a few hours gives customers the ability to respond to unexpected situations as well as new projects without the time/complexity of procurement and designing/implementing new environments from the ground up.

One very important value here is the ability to respond to critical situations such as ransomware by standing up an entirely isolated net new infrastructure to restore known good data. This is virtually impossible to do on-premises.

  • Lower Risk

In the event of a significant commercial/security/technical issue, a hyperscaler + hypervisor environment can be scaled up, migrated to a new environment/provider or isolated.

This model also mitigates against the delays caused by under-sizing or failure scenarios where new hardware needs to be added as this can occur typically within an hour or so as opposed to days/weeks/months.

As in the next example, workloads can simply be “lifted and shifted” minimising the number of changes/risks involved with a public cloud migration.

In the event of hardware failures, new hardware can be added back to the environment/s straight away without waiting for replacement hardware to be shipped/arrive and be installed. This greatly minimises the chance of double/subsequent failures causing an impact to the environment.

In the case of a disaster such a region failure, a new region can be scaled up to restore production whereas standing/scaling up a new on-prem environment is unlikely to occur in a timely manner.

  • Avoiding the need to “re-factor” workloads

Simply lifting and shifting workloads “as-is” on the same underlying hypervisor ensures the migration can occur with as few dependancies (and risks) as possible.

  • Provides excellent performance

The hardware provided by these offerings varies but often are all NVMe storage with latest or close to latest generation CPU/Memory, ensuring customers are not stuck with older generation hardware.

Having all workloads share a pool of NVMe storage also avoids the issue where some instances (VMs) are assigned to a lower tier of storage due to commercial cost constraints which can have significant downstream effects on other workloads/applications.

The all NVMe option in hyperscalers + hypervisor solutions becomes cost effective due to the economies of scale and elimination of “Cloud waste” which I will discuss next.

In many cases customers will be moving from a multiple year old hardware & storage solutions, simply having an all NVMe storage layer can reduce latency and subsequently make more efficient use of CPU/Memory often resulting in significant performance improvements let alone newer generation CPUs.

  • Economies of scale

In many cases, purchasing on a per instance (VM) basis may be attractive in the beginning, but when you reach a certain level of workloads, it makes more sense to buy in bulk (i.e.: A bare metal instance) and run the workloads on top of a hypervisor.

This gives the customer the benefit of the hypervisors ability to efficiently and effectively oversubscribe CPU and with a hyper-converged (HCI) storage layer (Virtual SAN a.k.a vSAN or Nutanix AOS) customers benefit from the native data reduction capabilities such as Compression, Deduplication and Erasure Coding.

  • Avoids native cloud instance constraints a.k.a “Cloud waste”

Virtual Machine “right-sizing” is to this day one of the most under-rated tasks but this can provide not only lower cost, but significant performance improvements for VMs. Cloud Waste occurs when workloads are forced into pre-defined instance sizes where small amounts of resources such as vCPUs or vRAM are assigned to the VM, but not required/use.

When we have the hypervisor layer, instance sizes can be customised to the exact requirements and eliminate cloud waste which I’ve personally observed in many customer environments to be in the range of 20-40%.

Credit: Steve Kaplan for coining the term “Cloud Waste”.

  • Increased Business Continuity / Disaster Recovery options

The cost/complexity involved with building business continuity and disaster recovery (BC/DR) solutions often lead to customers having to accept and try to mitigate significant risks to their businesses.

The hyperscaler + hypervisor model provides a number of options to have very cost effective BC/DR solutions including across multiple providers to mitigate against large global provider outages.

  • An OPEX commercial model

The ability to commit to a monthly minimum spend to get the most attractive rates while having the flexibility to burst when required (albeit at a less attractive price) means customers don’t have to try and fund large CAPEX projects and have the ability to scale in a “just in time” fashion.

Cost

This sounds to good to be true, what about cost?

On face value, these offerings can appear expensive compared to on-premises equivalents, but from the numerous assessments I’ve conducted I am confident the true cost is closer to or even cheaper than on-premises especially when a proper Total Cost of Ownership (TCO) is performed.

Compared with “native cloud” i.e.: Running workloads without the hypervisor layer, the hyperscaler + hypervisor solution will typically save customers 20-40% while providing equal or better performance and resiliency.

One other area which can make costs higher than necessary is a lack of optimisation with the workloads. I highly recommend for both on-premises and hyperscaler models that customers engage an experienced architect to review their environment thoroughly.

The performance benefits of a right sizing exercise are typically be significant AND it saves valuable IT resources (CPU/RAM). It also means less hardware is required to achieve the same or even a better outcome and therefore lowering costs.

Summary

The hyperscaler + hypervisor model has many advantages both commercially and technically and with the ease of setup, migration to and scaling in public cloud, I expect this model to become extremely popular.

I would strongly recommend anyone looking at replacing their on premises infrastructure in the near future do a thorough assessment of these offerings against their business goals.

End-2-End Enterprise Architecture (@E2EEA) has multiple highly experienced and certified staff at the highest level with both VMware (VCDX) and Nutanix (NPX) technologies and can provide expert level services to help you assess the hyperscaler + hypervisor options as well as design and deliver the solution.

E2EEA can be reached at sales@e2eea.com

Nutanix Resiliency – Part 7 – Read & Write I/O during Hypervisor upgrades

If you haven’t already review Parts 1 through 4, please do so as they cover critical resiliency factors around speed of recovery from failures, increasing resiliency on the fly by converting from RF2 to RF3 as well as using Erasure Coding (EC-X) to save capacity while providing the same resiliency level.

Parts 5 and 6 covered how Read and Write I/O function during CVM maintenance or failure and in this Part 7 of the series, we look at what impact hypervisor (ESXi, Hyper-V, XenServer and AHV) upgrades have on Read and Write I/O.

This post will refer to Parts 5 and 6 heavily so they are required reading to fully understand this post.

As covered in Part 5 and 6, no matter what the situation with the CVM, read and write I/O continues to be served and data remains in compliance with the configured Resiliency Factor.

In the event of a hypervisor upgrade, Virtual Machine are first migrated off the node and continue normal operations. In the event of a hypervisor failure the Virtual Machine would be restarted by HA and then resume normal operations.

Whether it be a hypervisor (or node) failure or hypervisor upgrade, ultimately both scenarios result in the VM/s running on a new node and the original node (Node 1 in the diagram below) being offline with the data on it’s local drives unavailable for a period of time.

HostFailureHypervisorUpgradeWriteIO

Now how does Read I/O work in this scenario? The same way as was described in Part 5 with reads being serviced remotely OR if the 2nd replica happens to be on the node the Virtual Machine migrated (or was restarted by HA) onto, then the read is serviced locally. If a remote read occurs the 1MB extent is localised to ensure subsequent reads are local.

How about Write I/O? Again as per Part 6, all writes are always in compliance with the configured Resiliency Factor no matter if it’s a hypervisor upgrade OR CVM, Hypervisor, Node, Network, Disk or SSD failure with one replica being written to the local node and the subsequent one or two replica/s distributed throughout the cluster based on the clusters current performance and capacity per node.

It really is that simple, and this level of resiliency is achieved only thanks to the Acropolis Distributed Storage Fabric.

Summary:

  1. A hypervisor failure never impacts the write path of ADSF
  2. Data integrity is ALWAYS maintained even in the event of a hypervisor (node) failure
  3. A hypervisor upgrade is completed without disruption to the read/write path
  4. Reads continue to be served either locally or remotely regardless of upgrades, maintenance or failure
  5. During hypervisor failures, Data Locality is maintained with writes always keeping one copy locally where the VM resides for optimal read/write performance during upgrades/failure scenarios.

Index:
Part 1 – Node failure rebuild performance
Part 2 – Converting from RF2 to RF3
Part 3 – Node failure rebuild performance with RF3
Part 4 – Converting RF3 to Erasure Coding (EC-X)
Part 5 – Read I/O during CVM maintenance or failures
Part 6 – Write I/O during CVM maintenance or failures
Part 7 – Read & Write I/O during Hypervisor upgrades
Part 8 – Node failure rebuild performance with RF3 & Erasure Coding (EC-X)
Part 9 – Self healing
Part 10: Nutanix Resiliency – Part 10 – Disk Scrubbing / Checksums

What’s .NEXT 2017 – AHV Turbo Mode

Back in 2015 I wrote a series titled “Why Nutanix Acropolis Hypervisor (AHV) is the next generation hypervisor” which covered off many reasons why AHV was and would become a force to be reckoned with.

In short, AHV is the only purpose built hypervisor for hyper-converged infrastructure (HCI) and it has continued to evolve in terms of functionality and maturity while becoming a popular choice for customers.

How popular you ask? Nutanix officially reported 23% adoption as a percentage of nodes sold in our recent third quarter fiscal year 2017 financial highlights.

Over the last couple of years I have personally worked with numerous customers who have adopted AHV especially when it comes to business critical applications such as MS SQL, MS Exchange.

One such example is Shinsegae who is a major retailer running 50,000 MS Exchange mailboxes on Nutanix using AHV as the hypervisor. Shinsegae also runs MS SQL workloads on the same platform which has now become the standard platform for all workloads.

This is just one example of AHV proven in the field and at scale to have the functionality, resiliency and performance to support business critical workloads.

But at Nutanix we’re always striving to deliver more value to our customers, and one area where there is a lot of confusion and misinformation is around the efficiency of the storage I/O path for Nutanix.

The Nutanix Controller VM (CVM) runs on top of multiple hypervisors and delivers excellent performance, but there is always room for improvement. With our extensive experience with in-kernel and virtual machine based storage solutions, we quickly learned that the biggest bottleneck is the hypervisor itself.

Next2017TraditionalHypervisorBottleneck

With technology such as NVMe becoming mainstream and 3D XPoint not far behind, we looked for a way to give customers the best value from these premium storage technologies.

That’s where AHV Turbo mode comes into play.

Next2017AHVTurboMode

AHV Turbo mode is a highly optimised I/O path (shortened and widened) between the User VM (UVM) and Nutanix stargate (I/O engine).

These optimisation have been achieved by moving the I/O path in-kernel.

V

V

V

V

V

V

V

V

V

V

V

Just kidding! In-kernel being better for performance is just a myth, Nutanix has achieved major performance improvements by doing the heavy lifting of the I/O data path in User Space, which is the opposite of the much hyped “In-kernel”.

The below diagram show the UVM’s I/O path now goes via Frodo (a.k.a Turbo Mode) which runs in User Space (not In-kernel) and onto stargate within the Controller VM).

Next2017FrodoDatapath2

Another benefit of AHV and Turbo mode is that it eliminates the requirement for administrators to configure multiple PVSCSI adapters and spread virtual disks across those controllers. When adding virtual disks to an AHV virtual machine, disks automatically benefit from Nutanix SCSI and block multi-queue ensuring enhanced I/O performance for both reads and writes.

The multi-queue I/O flow is handled by multiple frodo threads (Turbo mode) threads and passed onto stargate.

Next2017FrodoUpdated

As the above diagram shows, Nutanix with Turbo mode eliminates the bottlenecks associated with legacy hypervisors, one such example is VMFS datastores which required VAAI Atomic Test and Set (ATS) to minimise the impact of locking when the numbers of VMs per datastore increased (e.g. >25). With AHV and Turbo mode, every vdisk has always had it’s own queue (not one per datastore or container) but frodo enhances this by adding a per-vcpu queue at the virtual controller level.

How much performance improvement you ask? Well I ran a quick test which showed amazing performance improvements even on a more than four year old IVB NX3450 which only has 2 x SATA SSDs per node and with the memory read cache disabled (i.e.: No reads from RAM).

A quick summary of the findings were:

  1. 25% lower CPU usage for the similar sequential write performance (2929MBps vs 2964MBps)
  2. 27.5% higher sequential read performance (9512MBps vs 7207MBps)
  3. A 62.52% increase in random read IOPS (510121 vs 261265)
  4. A 33.75% increase in random write IOPS (336326 vs 239193)

So with Turbo Mode, Nutanix is using less CPU and RAM to drive higher IOPS & throughput and doing so in user space.

Intel published “Code Sample: Hello World with Storage Performance Development Kit and NVMe Driver” which states “When comparing the SPDK userspace NVMe driver to an approach using the Linux Kernel, the overhead latency is up to 10x lower”.

This is just one of many examples which shows userspace is clearly not the bottleneck that some people/vendors have tried to claim with the “in-kernel” is faster nonsense I have previously written about.

With Turbo mode, AHV is the highest performance (throughput / IOPS) and lowest latency hypervisor supported by Nutanix!

But wait there’s more! Not only is AHV now the highest performing hypervisor, it’s also used by our largest customer who has more than 1750 nodes running 100% AHV!

Next20171650Nodes