What’s .NEXT 2017 – AHV Turbo Mode

Posted on June 29, 2017 by Josh Odgers

Back in 2015 I wrote a series titled “Why Nutanix Acropolis Hypervisor (AHV) is the next generation hypervisor” which covered off many reasons why AHV was and would become a force to be reckoned with.

In short, AHV is the only purpose built hypervisor for hyper-converged infrastructure (HCI) and it has continued to evolve in terms of functionality and maturity while becoming a popular choice for customers.

How popular you ask? Nutanix officially reported 23% adoption as a percentage of nodes sold in our recent third quarter fiscal year 2017 financial highlights.

Over the last couple of years I have personally worked with numerous customers who have adopted AHV especially when it comes to business critical applications such as MS SQL, MS Exchange.

One such example is Shinsegae who is a major retailer running 50,000 MS Exchange mailboxes on Nutanix using AHV as the hypervisor. Shinsegae also runs MS SQL workloads on the same platform which has now become the standard platform for all workloads.

This is just one example of AHV proven in the field and at scale to have the functionality, resiliency and performance to support business critical workloads.

But at Nutanix we’re always striving to deliver more value to our customers, and one area where there is a lot of confusion and misinformation is around the efficiency of the storage I/O path for Nutanix.

The Nutanix Controller VM (CVM) runs on top of multiple hypervisors and delivers excellent performance, but there is always room for improvement. With our extensive experience with in-kernel and virtual machine based storage solutions, we quickly learned that the biggest bottleneck is the hypervisor itself.

With technology such as NVMe becoming mainstream and 3D XPoint not far behind, we looked for a way to give customers the best value from these premium storage technologies.

That’s where AHV Turbo mode comes into play.

AHV Turbo mode is a highly optimised I/O path (shortened and widened) between the User VM (UVM) and Nutanix stargate (I/O engine).

These optimisation have been achieved by moving the I/O path in-kernel.

Just kidding! In-kernel being better for performance is just a myth, Nutanix has achieved major performance improvements by doing the heavy lifting of the I/O data path in User Space, which is the opposite of the much hyped “In-kernel”.

The below diagram show the UVM’s I/O path now goes via Frodo (a.k.a Turbo Mode) which runs in User Space (not In-kernel) and onto stargate within the Controller VM).

Another benefit of AHV and Turbo mode is that it eliminates the requirement for administrators to configure multiple PVSCSI adapters and spread virtual disks across those controllers. When adding virtual disks to an AHV virtual machine, disks automatically benefit from Nutanix SCSI and block multi-queue ensuring enhanced I/O performance for both reads and writes.

The multi-queue I/O flow is handled by multiple frodo threads (Turbo mode) threads and passed onto stargate.

As the above diagram shows, Nutanix with Turbo mode eliminates the bottlenecks associated with legacy hypervisors, one such example is VMFS datastores which required VAAI Atomic Test and Set (ATS) to minimise the impact of locking when the numbers of VMs per datastore increased (e.g. >25). With AHV and Turbo mode, every vdisk has always had it’s own queue (not one per datastore or container) but frodo enhances this by adding a per-vcpu queue at the virtual controller level.

How much performance improvement you ask? Well I ran a quick test which showed amazing performance improvements even on a more than four year old IVB NX3450 which only has 2 x SATA SSDs per node and with the memory read cache disabled (i.e.: No reads from RAM).

A quick summary of the findings were:

25% lower CPU usage for the similar sequential write performance (2929MBps vs 2964MBps)
27.5% higher sequential read performance (9512MBps vs 7207MBps)
A 62.52% increase in random read IOPS (510121 vs 261265)
A 33.75% increase in random write IOPS (336326 vs 239193)

So with Turbo Mode, Nutanix is using less CPU and RAM to drive higher IOPS & throughput and doing so in user space.

Intel published “Code Sample: Hello World with Storage Performance Development Kit and NVMe Driver” which states “When comparing the SPDK userspace NVMe driver to an approach using the Linux Kernel, the overhead latency is up to 10x lower”.

This is just one of many examples which shows userspace is clearly not the bottleneck that some people/vendors have tried to claim with the “in-kernel” is faster nonsense I have previously written about.

With Turbo mode, AHV is the highest performance (throughput / IOPS) and lowest latency hypervisor supported by Nutanix!

But wait there’s more! Not only is AHV now the highest performing hypervisor, it’s also used by our largest customer who has more than 1750 nodes running 100% AHV!

Competition Example Architectural Decision Entry 3 – Scalable network architecture for VXLAN

Posted on October 8, 2013 by Josh Odgers

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

You are moving towards scalable network architecture for your large scale Virtualized Datacenter and want to configure VXLAN in your environment. You want to make sure that Teaming Policy for VXLAN transport is configured optimally for better performance and reduce operational complexity around it.

Assumptions

1. vSphere 5.1 or greater
2. vCloud Networking & Security 5.1 or greater
3. Core & Edge Network topology is in place

Constraints

1. Should have switches that support Static Etherchannel or LACP (Dynamic Etherchannel)
2. Have to use only IP Hash Load balancing method if using vSphere 5.1
3. Cannot use Beacon Probing as Failure Detection mechanism

Motivation

1. Optimize performance for VXLAN

2. Reduce complexity where possible

3. Choosing best teaming policy for VXLAN Traffic for future scalability

Architectural Decision

LACP – Passive Mode will be chosen as the teaming policy for the VXLAN Transport.

At least two or more physical links will be aggregated using LACP in the upstream Edge switches.

Two Edge switches will be connected to each other.

ESXi host will be cross connected to these two Physical upstream switches for forming a LACP group.

LACP will be configured in Passive mode in Edge switches so that the participating ports responds to the LACP packets that it receives but does not initiate LACP negotiation.

Alternatives

1. Use LACP – Active Mode and make sure you are using IP Hash algorithm for the load balancing in your vDS if using vSphere 5.1.

2. Use LACP – Active Mode and use any of the 22 available load balancing algorithm in your vDS if using vSphere 5.5.

3. Use LACP – Active Mode and use Cisco Nexus 1000v virtual switch and use any of the 19 available load balancing algorithm.

4. Use Static Etherchannel and make sure you are using IP Hash *Only* algorithm in your vDS.

5. If using Failover then have at least one 10G NIC to handle the VXLAN traffic.

Justification

1. Fail Over teaming policy for VXLAN vmkernel NIC uses only one uplink for all VXLAN traffic. Although redundancy is available via the standby link, all available bandwidth is not used.
2. Static Etherchannel requires IP Hash Load Balancing be configured on the switching infrastructure, which uses a hashing algorithm based on source and destination IP address to determine which host uplink egress traffic should be routed through.

3. Static Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations, such as, you can’t use beacon probing, you can’t configure standby or unused link etc.

4. Static Etherchannel does not do pre check both the terminating ends before forming the Channel Group. So, if there are issues within two ends then traffic will never pass and vSphere will not see any acknowledgement back in it’s Distributed Switches

5. Active LACP mode places a port into an active negotiating state, in which the port initiates negotiations with other ports by sending LACP packets. If using vSphere prior to 5.5 where only IP Hash algorithm is supported then LACP will not pass any traffic if vSphere uses any other algorithm other than IP Hash (such as Virtual Port ID)

6. The operational complexity is reduced

7. If using vSphere 5.5 then can use 22 different algorithm for load balancing and also Beacon Probing can be used for Failure Detection.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)

2. Only IP Hash algorithm is supported if using vSphere 5.1

3. Only one LAG can be supported for the entire vSphere Distributed Switches if using vSphere 5.1

4. IP Hash calculation if not done manually by taking VM’s vNIC and Physical NIC then there is no guarantee that it will balance the traffic across physical links

Back to Competition Main Page or Competition Submissions

vSphere 5.1 Single Sign On (SSO) Configuration – Architectural Decision flowchart

Posted on June 22, 2013 by Josh Odgers

The below is the second architectural decision flowchart in my new series and covers a new feature in vSphere 5.1, Single Sign On.

There has been a lot of discussion around “Best Practices” for SSO, I have taken the approach of creating this flowchart with as many scenarios as possible.

I would recommend that you validate any configuration the flowchart results in is suitable for your environment prior to implementing into production.

The flowchart is designed to be used as a guide only, not a definitive decision making resource.

This also compliments some of my previous example architectural decisions which are shown in the related topics section below.

A special thanks to Michael Webster (VCDX#66) @vcdxnz001 & James Wirth (VCDX#83)@JimmyWally81 for their review and contributions to this flowchart.