VSAN overheads

Posted on May 5, 2016 by Josh Odgers

For a long time now, VMware & EMC are leading the charge along with other vendors spreading FUD regarding the Nutanix Controller VM (CVM), making claims it uses a lot of resources to drive storage I/O and it being a Virtual Machine (a.k.a Virtual Storage Appliance / VSA) is inefficient / slower than running In-Kernel.

An recent example of the FUD comes from an article written by the President of VCE himself, Mr Chad Sakac who wrote:

… it (VxRAIL) is the also the ONLY HCIA that has a fully integrated SDS stack that is embedded into the kernel – specifically VSAN because VxRail uses vSphere. No crazy 8vCPU, 16+ GB of RAM for the storage stack (per “storage controller” or even per node in some cases with other HCIA choices!) needed.

So I thought I would put together a post covering what the Nutanix CVM provides and giving a comparison to what Chad referred to as a fully integrated SDS stack.

Let’s compare what resources are required between the Nutanix suite which is made up of the Acropolis Distributed Storage Fabric (ADSF) & Acropolis Hypervisor (AHV) and VMware’s suite made up of vCenter , ESXi , VSAN and associated components.

This should assist those not familiar with the Nutanix platform understand the capabilities and value the CVM provides and correct the FUD being spread by some competitors.

Before we begin, let’s address the default size for the Nutanix CVM.

As it stands today, The CVM by default is assigned 8 vCPUs and 16GB RAM.

What CPU resources the CVM actually uses obviously depends on the customers use case/s so if the I/O requirements are low, the CVM wont use 8 vCPU, or even 4vCPUs, but it is assigned 8vCPUs.

With the improvement in ESXi CPU scheduling over the years, the impact of having more than the required vCPUs assigned to a limited number of VMs (such as the CVM) in an environment is typically negligible, but the CVM can be right sized which is also common.

The RAM allocation is recommended to be 24Gb when using deduplication, and for workloads which are very read intensive, the RAM can be increased to provide more read cache.

However, increasing the CVM RAM for read cache (Extent Cache) is more of a legacy recommendation as the Acropolis Operating System (AOS) 4.6 release achieves outstanding performance even with the read cache disabled.

In fact, the >150K 4k random read IOPS per node which AOS 4.6 achieves on the NX-9040-G4 nodes was done without the use of in-memory read cache as part of engineering testing to see how hard the SSD drives can be pushed. As a result, even for extreme levels of performance, increasing the CVM RAM for Read Cache is no longer a requirement. As such, 24Gb RAM will be more than sufficient for the vast majority of workloads and reducing RAM levels is also on the cards.

Thought: Even if it was true in-kernel solutions provided faster outright storage performance, (which is not the case as I showed here), this is only one small part of the equation. What about management? VSAN management is done via vSphere Web Client which runs in a VM in user space (i.e.: Not “In-Kernel”) which connects to vCenter which also runs as a VM in user space which commonly leverage an SQL/Oracle database which also runs in user space.

Now think about Replication, VSAN uses vSphere Replication, which, you guessed it, runs in a VM in user space. For Capacity/Performance management, VSAN leverages vRealise Operations Manager (vROM) which also runs in user space. What about backup? The vSphere Data Protection appliance is yet another service which runs in a VM in user space.

All of these products require the data to move from kernel space into user space, So for almost every function apart from basic VM I/O, VSAN is dependant on components which are running in user space (i.e.: Not In-Kernel).

Lets take a look at the requirements for VSAN itself.

According to the VSAN design and sizing guide (Page 56) VSAN uses up to 10% of hosts CPU and requires 32GB RAM for full VSAN functionality. Now the RAM required doesn’t mean VSAN is using all 32GB and the same is true for the Nutanix CVM if it doesn’t need/use all the assigned RAM, it can be downsized although 12GB is the recommended minimum, 16GB is typical and for a node with even 192Gb RAM which is small by todays standards, 16GB is <10% which is minimal overhead for either VSAN or the Nutanix CVM.

In my testing VSAN is not limited to 10% CPU usage and this can be confirmed in VMware’s own official testing of SQL in : VMware Virtual SAN™ Performance with Microsoft SQL Server

In short, the performance testing is conducted with 3 VMs each with 4 vCPUs each on hosts contained a dual-socket Intel Xeon Processor E5-2650 v2 (16 cores, 32 threads, @2.6GHz).

So assuming the VMs were at 100% utilisation, they would only be using 75% of the total cores (12 of 16). As we can see from the graph below, the hosts were almost 100% utilized, so something other than the VMs is using the CPU. Best case, VSAN is using ~20% CPU, with the hypervisor using 5%, in reality the VMs wont be pegged at 100% so the overhead of VSAN will be higher than 20%.

Now I understand I/O requires CPU, and I don’t have a problem with VSAN using 20% or even more CPU, what I have a problem with is VMware lying to customers that it only uses 10% AND spreading FUD about other vendors virtual appliances such as the Nutanix CVM are resource hogs.

Don’t take my word for it, do your own testing and read their documents like the above which simple maths shows the claim of 10% max is a myth.

So that’s roughly 4 vCPUs (on a typical dual socket 8 core system) and up to 32GB RAM required for VSAN, but lets assume just 16GB RAM on average as not all systems are scaled to 5 disk groups.

The above testing was not on the latest VSAN 6.2, so things may have changed. One such change is the introduction of software checksums into VSAN. This actually reduces performance (as you would expect) because it provides a layer of data integrity with every I/O, as such the above performance is still a fair comparison because Nutanix has always had software checksums as this is essential for any production ready storage solution.

Now keep in mind, VSAN is really only providing the storage stack, so its using ~20% CPU under heavy load for just the storage stack, unlike the Nutanix CVM which is also providing a highly available management layer which has comparable (and in many cases better functionality/availability/scalability) to vCenter, VUM, vROM, vSphere Replication, vSphere Data Protection, vSphere Web Client, Platform Services Controller (PSC) and the supporting database platform (e.g.: SQL/Oracle/Postgress).

So I comparing VSAN CPU utilization to a Nutanix CVM is about as far from Apples/Apples as you could get, so let’s look at what all the vSphere Managements components resource requirements are and make a fairer comparison.

vCenter Server

Resource Requirements:

Small | Medium | Large

4vCPUs | 8vCPUs | 16vCPUs
16GB | 24GB | 32GG RAM

Reference:http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.install.doc/GUID-D2121DC5-1FC8-48DC-A4BA-C3FD72D0BE77.html

Platform Services Controller

Resource Requirements:

2vCPUs
2GB RAM

Reference:http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.install.doc/GUID-D2121DC5-1FC8-48DC-A4BA-C3FD72D0BE77.html

vCenter Heartbeat (Deprecated)

If we we’re to compare apples to apples, vCenter would need to be fully distributed and highly available which its not. The now deprecated vCenter Heartbeat used to be able to somewhat provide, so that’s 2x the resources of vCenter, VUM etc, but since its deprecated we’ll give VMware the benefit of the doubt and not count resources to make their management components highly available.

What about vCenter Linked Mode?

I couldn’t find its resource requirements in the documentation so let’s give VMware the benefit of the doubt and say it doesn’t add any overheads. But regardless of overheads, its another product to install/validate and maintain.

vSphere Web Client

The Web Client is required for full VSAN management/functionality and has its own resource requirements:

4vCPUs
2GB RAM (at least)

Reference:https://pubs.vmware.com/vsphere-50/index.jsp#com.vmware.vsphere.install.doc_50/GUID-67C4D2A0-10F7-4158-A249-D1B7D7B3BC99.html

vSphere Update Manager (VUM)

VUM can be installed on the vCenter server (if you are using the Windows Installation) to save having management VM and OS to manage, if you are using the Virtual Appliance then a seperate windows instance is required.

Resource Requirements:

2vCPUs
2GB

The Nutanix CVM provides the ability to do Major and Minor patch updates for ESXi and of course for AHV.

vRealize Operations Manager (vROM)

Nutanix provides built in Analytics similar to what vROM provides in PRISM Element and centrally managed capacity planning/management and “what if” scenarios for adding nodes to the cluster, as such including vROM in the comparison is essential if we want to get close to apples/apples.

Resource Requirements:

Small | Medium | Large

4vCPUs | 8vCPUs | 16vCPUs
16GB | 32GB | 48GB
14GB Storage

Remote Collectors Standard | Large

2vCPUs | 4vCPUs
4GB | 16GB

Reference:https://pubs.vmware.com/vrealizeoperationsmanager-62/index.jsp#com.vmware.vcom.core.doc/GUID-071E3259-625A-437B-AB34-E6A58B87C65B.html

vSphere Data protection

Nutanix also has built in backup/recovery/snapshotting capabilities which include application consistency via VSS. As with vROM we need to include vSphere Data Protection in any comparison to the Nutanix CVM.

vSphere Data Protection can be deployed in 0.5 to 8TB as shown below:

Reference: http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vmware-data-protection-administration-guide-61.pdf

The minimum size is 4vCPUs and 4GB RAM but that only supports 0.5TB, for even an average size node which supports say 4TB, 4 vCPUs and 8GB is required.

So best case scenario we need to deploy one VDP appliance per 8TB, which is smaller than some Nutanix (or VSAN Ready) nodes (e.g.: NX6035 / NX8035 / NX8150) so that would potentially mean one VDP appliance per node when running VSAN since the backup capabilities are not built in like they are with Nutanix.

Now what about if I want to replicate my VMs or use Site Recovery Manager (SRM)?

vSphere Replication

As with vROM and vSphere Data protection, vSphere Replication provides VSAN functionality which Nutanix also has built into the CVM. So we also need to include vSphere Replication resources in any comparison to the CVM.

While vSphere Replication has fairly light on resource requirements, if all my replication needs to go via the appliance, it means one VSAN node will be a hotspot for storage and network traffic, potentially saturating the network/node and being a noisy neighbour to any Virtual machines on the node.

Resource Requirements:

2vCPUs
4GB RAM
14GB Storage

Limitations:

1 vSphere replication appliance per vCenter
Limited to 2000 VMs

Reference: http://pubs.vmware.com/vsphere-replication-61/index.jsp?topic=%2Fcom.vmware.vsphere.replication-admin.doc%2FGUID-E114BAB8-F423-45D4-B029-91A5D551AC47.html

So scaling beyond 2000 VMs requires another vCenter, which means another VUM, another Heartbeat VM (if it was still available), potentially more databases on SQL or Oracle.

Nutanix doesn’t have this limitation, but again we’ll give VMware the benefit of the doubt for this comparison.

Supporting Databases

The size of even a small SQL server is typically at least 2vCPUs and 8GB+ RAM and if you want to compare apples/apples with Nutanix AHV/CVM you need to make the supporting database server/s highly available.

So even in a small environment we would be talking 2 VMs @ 2 vCPUs and 8GB+ RAM ea just to support the back end database requirements for vCenter, VUM, SRM etc.

As the environment grows so does the vCPU/vRAM and Storage (Capacity/IOPS) requirements, so keep this in mind.

So what are the approx. VSAN overheads for a small 4 node cluster?

The table below shows the minimum vCPU/vRAM requirements for the various components I have discussed previously to get VSAN comparable (not equivalent) functionality to what the Nutanix CVM provides.

As the above only covers the minimum requirements for a small say 4 node environment, things like vSphere Data Protection will require multiple instances, SQL should be made highly available using an Always on Availability group (AAG) which requires a 2nd SQL server and as the environment grows, so do the vCPU/vRAM requirements for vCenter, vRealize Operations Manager and SQL.

A Nutanix AHV environment on the other hand looks like this:

So just 32 vCPUs and 64GB RAM for a 4 node cluster which is 8vCPU and 54GB RAM LESS than the comparable vSphere/VSAN 4 node solution.

If we add Nutanix Scale out File Server functionality into the mix (which is optionally enabled) this increases to 48vCPUs and 100GB RAM. Just 8vCPUs more and still 18GB RAM LESS than vSphere/VSAN while Nutanix provides MORE functionality (e.g.: Scale out File Services) and comes out of the box with a fully distributed, highly available, self healing FULLY INTEGRATED management stack.

The Nutanix vCPU count assumes all vCPUs are in use which is VERY rarely the case. So this comparison is well and truely in favour of VSAN while still showing vSphere/VSAN having higher overheads for a typical/comparable solution with Nutanix providing additional built in features such as Scale out File Server (another distributed and highly available solution) for only a small amount more resources than vSphere/VSAN which does not provide comparable native file serving functionality.

What about if you don’t use all those vSphere/VSAN features and therefore don’t deploy all those management VMs. VSAN overheads are lower, right?

It is a fair argument to say not all vSphere/VSAN features need to be deployed, so this will reduce the vSphere/VSAN requirements (or overheads).

The same however is true for the Nutanix Controller VM.

Its not uncommon where customers don’t run all features and/or have lower I/O requirements for the CVM to be downsized to 6vCPUs. I personally did this earlier this week for a customer running SQL/Exchange this week and the CVM is still only running at ~75% or approx ~4 vCPUs and that’s running vBCA with in-line compression.

So the overheads depend on the workloads, and the default sizes can be changed for both vSphere/VSAN components and the Nutanix CVM.

Now back to the whole In-Kernel nonsense.

VMware also like to spread FUD that their own hypervisor has such high overheads, its crazy to run any storage through it. I’ve always found this funny since VMware have been telling the market for years the hypervisor has a low overhead (which it does), but they change their tune like the weather to suit their latest slideware.

One such example of this FUD comes from VMware’s Chief Technologist, Duncan Epping who tweeted:

The tweet is trying to imply that going through the hypervisor to another Virtual Machine (in this case a Nutanix CVM) is inefficient, which is interesting for a few reasons:

If going from one VM to another via the kernel has such high overheads, why do VMware themselves recommend virtualizing business critical high I/O applications which have applications access data between VMs (and ESXi hosts) all the time? e.g.: When a Web Server VM accesses an Application Server VM which accesses data from a Database. All this is in one VM, through the kernel and into another VM.
Because for VSAN has to do exactly this to leverage many of the features it advertises such as:

Replication (via vSphere Replication)
vRealize Operations Manager (vROM)
vSphere Data Protection (vDP)
vCenter and supporting components

Another example of FUD from VMware, in this case Principal Engineer, Jad El-Zein is implying VSAN has low(er) overheads compared to Nutanix (Blocks = Nutanix “Blocks”):

I guess he forgot about the large number of VMs (and resources) required to provide VSAN functionality and basic vSphere management. Any advantage of being In-Kernel (assuming you still believe it is in fact any advantage) are well and truely eliminated by the constant traffic across the hypervisor to and from the management VMs all of which are not In-Kernel as shown below.

I’d say its #AHVisTheOnlyWay and #GoNutanix since the overheads of AHV are lower than vSphere/VSAN!

Summary:

The Nutanix CVM provides a fully integrated, preconfigured and highly available, self healing management stack. vSphere/VSAN requires numerous appliances and/or software to be installed.
The Nutanix AHV Management stack (provided by the CVM) using just 8vCPUs and typically 16GB RAM provides functionality which in many cases exceeds the capabilities of vSphere/VSAN which requires vastly more resources and VMs/Appliances to provide comparable (but in many cases not equivalent) functionality.
The Nutanix CVM provides these capabilities built in (with the exception of PRISM Central which is a seperate Virtual Appliance) rather than being dependant on multiple virtual appliances, VMs and/or 3rd party database products for various functionality.
The Nutanix management stack is also more resilient/highly available that competing products such as all VMware management components and comes this way out of the box. As the cluster scales, the Acropolis management stack continues to automatically scale management capabilities to ensure linear scalability and consistent performance.
Next time VMware/EMC try to spread FUD about the Nutanix Controller VM (CVM) being a resource hog or similar, ask them what resources are required for all functionality they are referring to. They probably haven’t even considered all the points we have discussed in this post so get them to review the above as a learning experience.
Nutanix/AHV management is fully distributed and highly available. Ask VMware how to make all the vSphere/VSAN management components highly available and what the professional services costs will be to design/install/validate/maintain that solution.
The next conversation to have would be “How much does VSAN cost compared to Nutanix”? Now that we understand all the resources overheads and complexity in design/implementation/validation of the VSAN/vSphere environment, not to mention most management components will not be highly available beyond vSphere HA. But cost is a topic for another post as the ELA / Licensing costs are the least of your worries.

To our friends at VMware/EMC, the Nutanix CVM says,

“Go ahead, underestimate me”.

Benchmark(et)ing Nonsense IOPS Comparisons, if you insist – Nutanix AOS 4.6 outperforms VSAN 6.2

Posted on February 16, 2016 by Josh Odgers

As many of you know, I’ve taken a stand with many other storage professionals to try to educate the industry that peak performance is vastly different to real world performance. I covered this in a post titled: Peak Performance vs Real World Performance.

I have also given a specific example of Peak Performance vs Real World Performance with a Business Critical Application (MS Exchange) where I demonstrate that the first and most significant constraining factor for Exchange performance is compute (CPU/RAM) so achieving more IOPS is unnecessary to achieve the business outcome (which is supporting a given number of Exchange mailboxes/message per day).

However vendors (all of them) who offer products which provide storage, whether it is as a component such as in HCI or a fully focused offering, continue to promote peak performance numbers. They do this because the industry as a whole has and continues to promote these numbers as if they are relevant and trying to one-up each other with nonsense comparisons.

VMware and the EMC federation have made a lot of noise around In-Kernel being better performance than Software Defined Storage running within a VM which is referred to by some as a VSA (Virtual Storage Appliance). At the same time the same companies/people are recommending business critical applications (vBCA) be virtualized. This is a clear contradiction, as I explain in an article I wrote titled In-Kernel verses Virtual Storage Appliance which in short concludes by saying:

…a high performance (1M+ IOPS) solution can be delivered both In-Kernel or via a VSA, it’s simple as that. We are long past the days where a VM was a significant bottleneck (circa 2004 w/ ESX 2.x).

I stand by this statement and the in-kernel vs VSA debate is another example of nonsense comparisons which have little/no relevance in the real world. I will now (reluctantly) cover off (quickly) some marketing numbers before getting to the point of this post.

VMware VSAN 6.2

Firstly, Congratulations to VMware on this release. I believe you now have a minimally viable product thanks to the introduction of software based checksums which are essential for any storage platform.

VMW Claim One: For the VSAN 6.2 release, “delivering over 6M IOPS with an all-flash architecture”

The basic math for a 64 node cluster = ~93700 IOPS / node but as I have seen this benchmark from Intel showing 6.7Million IOPS for a 64 node cluster, let’s give VMware the benefit of the doubt and assume its an even 7M IOPS which equates to 109375 IOPS / node.

Reference: VMware Virtual SAN Datasheet

VMW Claim Two: Highest Performance >100K IOPS per node

The graphic below (pulled directly from VMware’s website) shows their performance claims of >100K IOPS per node and >6 Million IOPS per cluster.

Reference: Introducing you to the 4th Generation Virtual SAN

Now what about Nutanix Distributed Storage Fabric (NDSF) & Acropolis Operating System (AOS) 4.6?

We’re now at the point where the hardware is becoming the bottleneck as we are saturating the performance of physical Intel S3700 enterprise-grade solid state drives (SSDs) on many of our hybrid nodes. As such we have moved onto performance testing of our NX-9460-G4 model which has 4 nodes running Haswell CPUs and 6 x Intel S3700 SSDs per node all in 2RU.

With AOS 4.6 running ESXi 6.0 on a NX9460-G4 (4 x NX-9040-G4 nodes), Nutanix are seeing in excess of 150K IOPS per node, which is 600K IOPS per 2RU (Nutanix Block).

The below graph shows performance per node and how the solution scales in terms of performance up to a 4 node / 1 block solution which fits within 2RU.

So Nutanix AOS 4.6 provides approx. 36% higher performance than VSAN 6.2.

(>150K IOPS per NX9040-G4 node compared to <=110K IOPS for All Flash VSAN 6.2 node)

It should be noted the above Nutanix performance numbers have already been improved upon in upcoming releases going through performance engineering and QA, so this is far from the best you will see.

Enough with the nonsense marketing numbers! Let’s get to the point of the post:

These 4k 100% random read IOPS (and similar) tests are totally unrealistic.

Assuming the 4k IOPS tests were realistic, to quote my previous article:

Peak performance is rarely a significant factor for a storage solution.

More importantly, SO WHAT if Vendor A (in this case Nutanix) has higher peak performance than Vendor B (in this case VSAN)!

What matters is customer business outcomes, not benchmark(eting)!

Wait a minute, the vendor with the higher performance is telling you peak performance doesn’t matter instead of bragging about it and trying to make it sound importaint?

Yes you are reading that correctly, no one should care who has the highest unrealistic benchmark!

I wrote things to consider when choosing infrastructure. a while back to highlight that choosing the “Best of Breed” for every workload may not be a good overall strategy, as it will require management of multiple silos which leads to inefficiency and increased costs.

The key point is if you can meet all the customer requirements (e.g.: performance) with a standard platform while working within constraints such as budget, power, cooling, rack space and time to value, you’re doing yourself (or your customer) a dis-service by not considering using a standard platform for your workloads. So if Vendor X has 10% faster performance (even for your specific workload) than Vendor Y but Vendor Y still meets your requirements, performance shouldn’t be a significant consideration when choosing a product.

Both VSAN and Nutanix are software defined storage and I expect both will continue to rapidly improve performance through tuning done completely in software. If we were talking about a product which is dependant on offloading to Hardware, then sure performance comparisons will be relevant for longer, but VSAN and Nutanix are both 100% software and can/do improve performance in software with every release.

In 3 months, VSAN might be slightly faster. Then 3 months later Nutanix will overtake them again. In reality, peak performance rarely if ever impacts real world customer deployments and with scale out solutions, it’s even less relevant as you can scale.

If a solution can’t scale, or does so in 2 node mirror type configurations then considering peak performance is much more critical. I’d suggest if you’re looking at this (legacy) style of product you have bigger issues.

Not only does performance in the software defined storage world change rapidly, so does the performance of the underlying commodity hardware, such as CPUs and SSDs. This is why its importaint to consider products (like VSAN and Nutanix) that are not dependant on proprietary hardware as hardware eventually becomes a constraint. This is why the world is moving towards software defined for storage, networking etc.

If more performance is required, the ability to add new nodes and the ability to form a heterogeneous cluster and distribute data evenly across the cluster (like NDSF does) is vastly more importaint than the peak IOPS difference between two products.

While you might think that this blog post is a direct attack on HCI vendors, the principle analogy holds true for any hardware or storage vendor out there. It is only a matter of time before customers stop getting trapped in benchmark(et)ing wars. They will instead identify their real requirements and readily embrace the overall value of dramatically simple on-premises infrastructure.

In my opinion, Nutanix is miles ahead of the competition in terms of value, flexibility, operational benefits, product maturity and market-leading customer service all of which matter way more than peak performance (which Nutanix is the fastest anyway).

Summary:

Focus on what matters and determine whether or not a solution delivers the required business outcomes. Hint: This is rarely just a matter of MOAR IOPS!
Don’t waste your time in benchmark(et)ing wars or proof of concept bake offs.
Nutanix AOS 4.6 outperforms VSAN 6.2
A VSA can outperform an in-kernel SDS product, so lets put that in-kernel vs VSA nonsense to rest.
Peak performance benchmarks still don’t matter even when the vendor I work for has the highest performance. (a.k.a My opinion doesn’t change based on my employers current product capabilities)
Storage vendors ALL should stop with the peak IOPS nonsense marketing.
Software-defined storage products like Nutanix and VSAN continue to rapidly improve performance, so comparisons are outdated soon after publication.
Products dependant upon propitiatory hardware are not the future
Put a high focus on the quality of vendors support.

Related Articles:

In-Kernel verses Virtual Storage Appliance

Posted on March 30, 2015 by Josh Odgers

Let me start by asking, What’s all this “In-Kernel verses Virtual Storage Appliance” debate all about?

It seems to me to be total nonsense yet it is the focus of so called competitive intelligence and twitter debates. From an architectural perspective I just don’t get why it’s such a huge focus when there are so many other critical areas to focus on, like the benefit of Hyper-Converged vs SAN/NAS!!!

Saying In-Kernel or VSA is faster than the other (just because of where the software runs) is like saying my car with 18″ wheels is faster than your car with 17″ wheels. In reality there are so many other factors to consider, the wheel size is almost irrelevant, as is whether or not storage is provided “In-Kernel” or via a “Virtual Appliance”.

If something is In-Kernel, it doesn’t mean it’s efficient, it could be In-Kernel and really inefficient code, therefore being much worse than a VSA solution, or a VSA could be really inefficient and an In-Kernel solution could be more efficient.

In addition to this, Hyper-converged solutions are by design scale-out solutions, as a result the performance capabilities are the sum of all the nodes, not one individual node.

As long as a solution can provide enough performance (IOPS) per node for individual (or scaled up) VMs and enough scale-out to support all the customers VMs, it doesn’t matter if Solution A is In-Kernel or VSA, or that the solution can do 20% or even 100% more IOPS per node compared to solution B. The only thing that matters is the customers requirements are met/exceeded.

Let’s shift focus for a moment and talk about the performance capabilities of the ESX/ESXi hypervisor as this seems to be argued as an significant overhead which prevents a VSA from being high performance. In my experience , ESXi has never been a significant I/O bottleneck, even for large customers with business critical applications as the focus on Biz Critical Apps really took off around the VI3 days or later where the hypervisor could deliver ~100K IOPS per host.

The below is a chart showing VMware’s tested capabilities from ESX 1, through to vSphere 5 which was released in July 2011.

What we can clearly see is vSphere 5.0 can achieve 1 Million IOPS (per host), and even back in the VI3 days, 100,000 IOPS.

In 2011, VMware wrote a great article “Achieving a Million I/O Operations per Second from a Single VMware vSphere® 5.0 Host” which shows how the 1 million IOPS claim has been validated.

In 2012 VMware published “1 million IOPS On 1VM” which showed not only could vSphere achieve a million IOPS, but it could do it from 1 VM.

I don’t know about you, but it’s pretty impressive VMware has optimized the hypervisor to the point where a single VM can get 1 million IOPS, and that was back in 2012!

Now in both the articles, the 1 million IOPS was achieved using a traditional centralised SAN, the first article was with an EMC VMAX with 8 engines and I have summarized the setup below.

4 quad-core processors and 128GB of memory per engine
64 front-end 8Gbps Fibre Channel (FC) ports
64 back-end 4Gbps FC ports
960 * 15K RPM, 450GB FC drives

The IO profile for this test was 8K , 100% read, 100% random.

For the second 1 million IOPS per VM test, the setup used 2 x Violin Memory 6616 Flash Memory Arrays with the below setup.

Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Memory Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers

For both configurations, all I/O needs to traverse from the VM, through the hypervisor, out HBAs/NICs, across a storage area network, through central controllers and then make the return journey back to the VM.

There is so many places where additional latency or contention can be introduced in the storage stack it’s amazing VMs can produce the level of storage performance they do, especially back 3 years ago.

Chad Sakac wrote a great article back in 2009 called “VMware I/O Queues, Microbursting and Multipathing“, which has the below representation of the path I/O takes between a VM and a centralized SAN.

As we can see, Chad shows 12 steps for I/O to get to the disk queues, and once the I/O is completed, the I/O needs to traverse all the way back to the VM, so all in all you could argue it’s a 24 step round trip for EVERY I/O!

The reason I am pointing this out is because the argument around “In-kernel” verses “Virtual Storage Appliance” is only about 1 step in the I/O path, when Hyper-Converged solutions like Nutanix (which uses a VSA) eliminate 3/4’s of the steps in an overcomplicated I/O path which has been proven to achieve 1 million IOPS per VM.

Andre Leibovici recently wrote the article “Nutanix Traffic Routing: Setting the story straight” where he shows the I/O path for VMs using Nutanix.

The below diagram which Andre created shows the I/O path (for Read I/O) goes from the VM, across the ESXi hypervisor to the Controller VM (CVM) which then using DirectPath I/O to directly access the locally attached SSD and SATA drives.

Consider if the VM in the above diagram was a Web Server and the CVM was a database server and they were running in an environment with a SAN/NAS. The Web Server would be communicating to the DB server over the network (via the hypervisor) but the DB Server would have to access it’s data (that the Web Server requested) from the centralized SAN, so in the vast majority of environments today (which are using SAN/NAS) the data is travelling a much longer path than it would compared to a VSA solution and in many cases traversing from one VM to another across the hypervisor before going to the SAN/NAS and back through a VM to be served to the VM requesting the data.

Now back to the diagram, For Nutanix the Read I/O under normal circumstances will be served locally around 95% of the time, this is thanks to data locality and how Write I/O happens.

For Write I/Os, one copy of each piece of data is written locally where the VM is running which means all subsequent Read I/O can be served locally (and freshly written data is also typically “Active data”), and the 2nd copy is replicated throughout the Nutanix cluster. This means even though half the Write I/O (of the two copies) needs to traverse the LAN, it doesn’t hit a choke point like a traditional SAN, because Nutanix scales out controllers on a 1:1 ratio with ESXi hosts and writes are distributed throughout the cluster in 1MB extents.

So if we look back to Chad’s (awesome!) diagram, Hyper-converged solutions like Nutanix and VSAN are only concerned with Steps 1,2,3,12 (4 total) for Read I/O and 1,2,3,12 as well as 1 step for the NIC at the source & 1 step for the NIC at the destination host.

So overall it’s 4 steps for Read, 6 steps for Write, compared to 12 for Read and 12 for Write for a traditional SAN.

So Hyper-converged solutions regardless of In-Kernel or VSA based remove many of the potential points of failure and contention compare to a traditional SANNAS and as a result, have MUCH more efficient data paths.

On twitter recently, I responded to a tweet where the person claims “Hyperconverged is about software, not hardware”.

I disagree, Hyper-converged to me (and the folk at Nutanix) is all about the customer experience. It should be simple to deploy, manage, scale etc, all of which constitute the customers experience. Everything in the datacenter runs on HW, so I don’t get the fuss on the Software only vs Appliance / OEM software only solution debate either, but this is a topic for another post.

I agree doing things in software is a great idea, and that is what Nutanix and VSAN do, provide a solution in software which combines with commodity hardware to create a Hyper-converged solution.

Summary:

A great customer experience (which is what I believe matters) along with high performance (1M+ IOPS) solution can be delivered both In-Kernel or via a VSA, it’s simple as that. We are long past the days where a VM was a significant bottleneck (circa 2004 w/ ESX 2.x).

I’m glad VMware has led the market in pushing customers to virtualize Business Critical Apps, because it works really really well and delivers lots of value to customers.

As a result of countless best practice guides, white papers, case studies from VMware and VMware Storage Partners such as Nutanix, we know highly compute / network & storage intensive applications can easily be virtualized, so anyone saying a Virtual Storage Appliance can’t (or shouldn’t) be, simply doesn’t understand how efficient the ESXi hypervisor is and/or he/she hasn’t had the industry experience deploying storage intensive Business Critical Applications.

To all Hyper-converged vendors: Can we stop this ridiculous debate and get on with the business of delivering a great customer experience and focus on the business at hand of taking down traditional SAN/NAS? I don’t know about you, but that’s what I’ll be doing.

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: vsa

VMware you’re full of it (FUD) : Nutanix CVM/AHV & vSphere/VSAN overheads

Benchmark(et)ing Nonsense IOPS Comparisons, if you insist – Nutanix AOS 4.6 outperforms VSAN 6.2

In-Kernel verses Virtual Storage Appliance

Share this:

Share this:

Share this: