In-Kernel verses Virtual Storage Appliance

Let me start by asking, What’s all this “In-Kernel verses Virtual Storage Appliance” debate all about?

It seems to me to be total nonsense yet it is the focus of so called competitive intelligence and twitter debates. From an architectural perspective I just don’t get why it’s such a huge focus when there are so many other critical areas to focus on, like the benefit of Hyper-Converged vs SAN/NAS!!!

Saying In-Kernel or VSA is faster than the other (just because of where the software runs) is like saying my car with 18″ wheels is faster than your car with 17″ wheels. In reality there are so many other factors to consider, the wheel size is almost irrelevant, as is whether or not storage is provided “In-Kernel” or via a “Virtual Appliance”.

If something is In-Kernel, it doesn’t mean it’s efficient, it could be In-Kernel and really inefficient code, therefore being much worse than a VSA solution, or a VSA could be really inefficient and an In-Kernel solution could be more efficient.

In addition to this, Hyper-converged solutions are by design scale-out solutions, as a result the performance capabilities are the sum of all the nodes, not one individual node.

As long as a solution can provide enough performance (IOPS) per node for individual (or scaled up) VMs and enough scale-out to support all the customers VMs, it doesn’t matter if Solution A is In-Kernel or VSA, or that the solution can do 20% or even 100% more IOPS per node compared to solution B. The only thing that matters is the customers requirements are met/exceeded.

Let’s shift focus for a moment and talk about the performance capabilities of the ESX/ESXi hypervisor as this seems to be argued as an significant overhead which prevents a VSA from being high performance. In my experience , ESXi has never been a significant I/O bottleneck, even for large customers with business critical applications as the focus on Biz Critical Apps really took off around the VI3 days or later where the hypervisor could deliver ~100K IOPS per host.

The below is a chart showing VMware’s tested capabilities from ESX 1, through to vSphere 5 which was released in July 2011.

IOPSvmware

What we can clearly see is vSphere 5.0 can achieve 1 Million IOPS (per host), and even back in the VI3 days, 100,000 IOPS.

In 2011, VMware wrote a great article “Achieving a Million I/O Operations per Second from a Single VMware vSphere® 5.0 Host” which shows how the 1 million IOPS claim has been validated.

In 2012 VMware published “1 million IOPS On 1VM” which showed not only could vSphere achieve a million IOPS, but it could do it from 1 VM.

I don’t know about you, but it’s pretty impressive VMware has optimized the hypervisor to the point where a single VM can get 1 million IOPS, and that was back in 2012!

Now in both the articles, the 1 million IOPS was achieved using a traditional centralised SAN, the first article was with an EMC VMAX with 8 engines and I have summarized the setup below.

  • 4 quad-core processors and 128GB of memory per engine
  • 64 front-end 8Gbps Fibre Channel (FC) ports
  • 64 back-end 4Gbps FC ports
  • 960 * 15K RPM, 450GB FC drives

The IO profile for this test was 8K , 100% read, 100% random.

For the second 1 million IOPS per VM test, the setup used 2 x Violin Memory 6616 Flash Memory Arrays with the below setup.

  • Hypervisor: vSphere 5.1
  • Server: HP DL380 Gen8
    CPU: 2 x Intel Xeon E5-2690, HyperThreading disabled
    Memory: 256GB
  • HBAs: 5 x QLE2562
  • Storage: 2 x Violin Memory 6616 Flash Memory Arrays
  • VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
    Iometer Config: 4K IO size w/ 16 workers

For both configurations, all I/O needs to traverse from the VM, through the hypervisor, out HBAs/NICs, across a storage area network, through central controllers and then make the return journey back to the VM.

There is so many places where additional latency or contention can be introduced in the storage stack it’s amazing VMs can produce the level of storage performance they do, especially back 3 years ago.

Chad Sakac wrote a great article back in 2009 called “VMware I/O Queues, Microbursting and Multipathing“, which has the below representation of the path I/O takes between a VM and a centralized SAN.

6a00e552e53bd28833011570408872970c

As we can see, Chad shows 12 steps for I/O to get to the disk queues, and once the I/O is completed, the I/O needs to traverse all the way back to the VM, so all in all you could argue it’s a 24 step round trip for EVERY I/O!

The reason I am pointing this out is because the argument around “In-kernel” verses “Virtual Storage Appliance” is only about 1 step in the I/O path, when Hyper-Converged solutions like Nutanix (which uses a VSA) eliminate 3/4’s of the steps in an overcomplicated I/O path which has been proven to achieve 1 million IOPS per VM.

Andre Leibovici recently wrote the article “Nutanix Traffic Routing: Setting the story straight” where he shows the I/O path for VMs using Nutanix.

The below diagram which Andre created shows the I/O path (for Read I/O) goes from the VM, across the ESXi hypervisor to the Controller VM (CVM) which then using DirectPath I/O to directly access the locally attached SSD and SATA drives.

nutanix_datapath3

Consider if the VM in the above diagram was a Web Server and the CVM was a database server and they were running in an environment with a SAN/NAS. The Web Server would be communicating to the DB server over the network (via the hypervisor) but the DB Server would have to access it’s data (that the Web Server requested) from the centralized SAN, so in the vast majority of environments today (which are using SAN/NAS) the data is travelling a much longer path than it would compared to a VSA solution and in many cases traversing from one VM to another across the hypervisor before going to the SAN/NAS and back through a VM to be served to the VM requesting the data.

Now back to the diagram, For Nutanix the Read I/O under normal circumstances will be served locally around 95% of the time, this is thanks to data locality and how Write I/O happens.

For Write I/Os, one copy of each piece of data is written locally where the VM is running which means all subsequent Read I/O can be served locally (and freshly written data is also typically “Active data”), and the 2nd copy is replicated throughout the Nutanix cluster. This means even though half the Write I/O (of the two copies) needs to traverse the LAN, it doesn’t hit a choke point like a traditional SAN, because Nutanix scales out controllers on a 1:1 ratio with ESXi hosts and writes are distributed throughout the cluster in 1MB extents.

So if we look back to Chad’s (awesome!) diagram, Hyper-converged solutions like Nutanix and VSAN are only concerned with Steps 1,2,3,12 (4 total) for Read I/O and 1,2,3,12 as well as 1 step for the NIC at the source & 1 step for the NIC at the destination host.

So overall it’s 4 steps for Read, 6 steps for Write, compared to 12 for Read and 12 for Write for a traditional SAN.

So Hyper-converged solutions regardless of In-Kernel or VSA based remove many of the potential points of failure and contention compare to a traditional SANNAS and as a result, have MUCH more efficient data paths.

On twitter recently, I responded to a tweet where the person claims “Hyperconverged is about software, not hardware”.

I disagree, Hyper-converged to me (and the folk at Nutanix) is all about the customer experience. It should be simple to deploy, manage, scale etc, all of which constitute the customers experience. Everything in the datacenter runs on HW, so I don’t get the fuss on the Software only vs Appliance / OEM software only solution debate either, but this is a topic for another post.

TweetAboutCustomerExperience

I agree doing things in software is a great idea, and that is what Nutanix and VSAN do, provide a solution in software which combines with commodity hardware to create a Hyper-converged solution.

Summary:

A great customer experience (which is what I believe matters) along with high performance (1M+ IOPS) solution can be delivered both In-Kernel or via a VSA, it’s simple as that. We are long past the days where a VM was a significant bottleneck (circa 2004 w/ ESX 2.x).

I’m glad VMware has led the market in pushing customers to virtualize Business Critical Apps, because it works really really well and delivers lots of value to customers.

As a result of countless best practice guides, white papers, case studies from VMware and VMware Storage Partners such as Nutanix, we know highly compute / network & storage intensive applications can easily be virtualized, so anyone saying a Virtual Storage Appliance can’t (or shouldn’t) be, simply doesn’t understand how efficient the ESXi hypervisor is and/or he/she hasn’t had the industry experience deploying storage intensive Business Critical Applications.

To all Hyper-converged vendors: Can we stop this ridiculous debate and get on with the business of delivering a great customer experience and focus on the business at hand of taking down traditional SAN/NAS? I don’t know about you, but that’s what I’ll be doing.

Nutanix support on vSphere – No Bull!

Recently it seems the spreading of Fear, Uncertainly and Doubt (FUD) about Nutanix has ramped up, probably due to Nutanix ongoing success and enormous growth.

Still It’s unfortunate when large companies, and especially people in senior positions at these companies try to bully smaller companies. Luckily at Nutanix, we’re like HoneyBadgers, and we don’t care. We Hyper-converge anyway!

honey_badger_don__t_care_by_gatorvenom-d40h28z

While this sort of attention is expected when you work for a disruptive start-up who according to independent sources such as IDC, put Nutanix as the hyperconverged market leader with 52% market share.

However when the FUD creates confusion for customers, that’s where it crosses the line and I need to correct the misinformation for the benefit of the customers, which is what Nutanix focus is, the customer experience.

The specific FUD I am talking about which was chucked around the blog/twittersphere in the last 24 hours is as follows:

  • “… Nutanix is not entitled to support VMware customers”
  • “Any support you (Nutanix) provide is not “Official”
  • “Your (support) model puts customers in a grey area”
  • “Nutanix (is) providing unofficial customer support for both VMware and Nutanix”
  • “Nutanix should be transparent to customers regarding what services they are entitled to provide, and which ones they aren’t”.

How would I describe the above comments, simple:  Total Bull (shit)!

Now, that made me think of an Australian company (Pedders – logo below) who made the term “No Bull” famous with a series of TV commercials about car servicing and suspension. The ads made light of car mechanics who overcharge people for unnecessary work on vehicles, and making the point Pedders giving the right advice and “No Bull”.

NoBull

So this post is inspired by Pedders and is about giving you the facts, and No Bull (Shit)!

So is Nutanix a supported platform for vSphere? YES!

How do I know this, I personally completed the hands on certification work and submitted the successful certification logs to VMware prior to them being approved. In fact, I am still involved in keeping our certifications up to date.

But if you don’t believe me, its easy enough to verify. All you need to do is check the Official VMware Hardware Compatibility List (HCL).

To do this simply visit http://www.vmware.com/go/hcl and select Storage/SAN as shown below.
HCL1

Next select the following:

  • Product Release Version : All
  • Partner Name : Nutanix
  • Storage Virtual Appliance Only : Yes
  • Features Category : All

HCL2
Hit Update and View Results and you will get a view like the below.

NutanixHCL

The above shows all Nutanix node types from the NX1000 series all the way to our All-Flash NX-9000 series with support for vSphere 5.1 through vSphere 6.0.

If you check on one of the node types you will see Nutanix is also supported for VAAI-NAS, and as I highlighted in my post “Not all VAAI-NAS solutions are created equal“, Nutanix supports all 4 VAAI-NAS primitives, not a subset which is common especially in the hyperconverged market.

Note: Some HCI solutions have no VAAI support at all!nutanixhclvaai

So with a quick check of the VMware HCL, we all can see Nutanix is a fully certified and supported solution by VMware Global Support Services (GSS).

What does this mean?

Put simply, Any Nutanix customer with up to date Support and Subscription (SnS) can call VMware GSS directly and get support.

Nutanix customers are also welcome and encouraged to contact Nutanix directly. As a result, customers get the best of both worlds, or choose which vendor they call based on the quality of support.

What does Nutanix support provide?

End to End support including (but not limited too):

  • The Hypervisor (ESXi, Hyper-V or KVM)
  • The Nutanix layer
  • Performance troubleshooting
  • Networking support
  • Application support such as MS SQL / MS Exchange / Oracle etc

So Nutanix support is really a “One throat to choke” service.

Not only is our service “One throat to choke”, Nutanix also has one of, if not the highest Net Promoter Score in the I.T industry, with a score of +88 out of a scale of -100 to +100.

Omega-award (1)

 

I challenge anyone to show me a company with a better NPS in the I.T industry!

Nutanix System Reliability Engineers (SREs) are more often than not Level 3 engineers. Nutanix does not hire the typical Level 1 engineer who essentially just takes a phone call and needs to escalate most calls to another engineer.

Nutanix has numerous VCAP level certified support engineers, as well as the ability to call on one or more of our 12 VMware Certified Design Experts (VCDXs) if/when required. I personally have been involved in numerous escalations, in some cases I have travelled to customer sites to investigate and drive to successful resolution of non Nutanix issues such as hypervisor bugs. The escalation was done free of charge, to ensure our customers have the best experience possible.

In the event Nutanix support finds a problem with something which is not Nutanix, such as a Hypervisor bug, we don’t hand you off to another vendor, we log the bug on your behalf, manage the case with the Hypervisor vendor and follow it through to conclusion.

To be clear, Nutanix is not an OEM partner of VMware, nor do we ship ESXi with our platform. Does this matter? Not at all. What it does mean is the support costs customers pay to VMware are not shared with Nutanix, that’s all.

Nutanix can and does provide support for vSphere, just as Systems Integrators (SIs), managed service providers and other non OEM VMware partners do.

To cover unusual situations, Nutanix is also member of TSAnet which is a multi-vendor support network along with Microsoft, VMware which ensures even in the event your problem is not strictly covered by a support contract, or its not a supported configuration, Nutanix will make every effort to ensure the problem is resolved directly with the other vendor/s via TSAnet.

This post was brought to you by my two favourite Chuck’s, and “No Bull”!

Chuck1 -Chuck_Norris-_01

The new standard in Enterprise Architecture certifications

I am very proud to have been selected to be part of a team of absolute superstars who in the last few months have developed what I believe will be the new standard in Enterprise Architecture certifications, the Nutanix Platform Expert (NPX).

The NPX was developed under the guidance of Lisa O’Leary, a PhD psychometrician and recognized authority in the development of expert-level panel-based assessments for the IT industry. This was a real eye opener for me into how to create a scoring rubric and how to ensure different examiners score as evenly as possible to ensure consistent results.

The NPX certification (along with Nutanix nu.School Education) is designed to produce and certify the best of the best enterprise architects with the main goal of ensuring customers get the best architects to design and deliver solutions which solve real world business problems while maximizing value and reducing ongoing costs.

During the development of NPX, myself and other members of the group basically decided that none of us should be able achieve NPX without each of us putting in significant time and effort to improve our skills, especially as it is required to demonstrate expertise both architecturally and hands on in multiple hypervisors and vendor software stacks. Considering the talent in the group, this was a big call!

I personally am enjoying the challenge of preparing my submission for the NPX based on a large scale project I am working on at the moment, and look forward to submitting my application and hopefully being invited to the Nutanix Design Review (NDR) to defend. I can already tell you this is more comprehensive than any single design I have done to date, and it will be a blast to defend.

So what will being an NPX mean?

Certified graduates of the NPX Program will have a very unique set of skills, including the demonstrated ability to deliver enterprise-class Web-scale solutions using multiple hypervisors and vendor software stacks on the Nutanix platform (VMware® vSphere®, Microsoft® Hyper-V®, and KVM).

This hypervisor agnostic certification for Enterprise Architects is a first in the industry; our groundbreaking approach allows an NPX the freedom to design cutting-edge Web-scale solutions for customers based solely on their business needs.

The depth and breadth of the solution design and delivery skills validated through our peer-vetted program make NPX the new standard for excellence. In accordance with program goals every NPX will be a superb technologist, a visionary evangelist for Web-scale, and a true Enterprise Architect – capable of designing and delivering a wide range of cutting-edge solutions; custom built to support the business goals of the Global 2000 and government agencies in every region of the world.

So what’s required to achieve NPX?

The first prerequisite is the Nutanix Platform Professional (NPP) certification. The NPP is really the entry level certification showing core Nutanix knowledge.

As per the NPX Application, the NPX certification is a two-stage process;

Stage 1 being a review of a candidate’s NPX Program Application.

If a candidate’s application is accepted they will be invited to participate in the NPX Design Review (NDR).

Now at this stage you’re probably saying, this doesn’t seem that hard, right?

Well, here is an idea of the required documentation:

  • A current state and operational readiness assessment
  • A Web-scale migration and transition plan
  • Documentation of specific business requirements driving the solution design
  • Documentation of assumptions that impacted the solution design
  • Documentation of design constraints that impacted the design and delivery of the solution
  • Documentation describing risks identified in the design and delivery of the solution and how those risks
  • A solution architecture including a conceptual/logical and physical design with appropriate diagrams and descriptions all functional components of the solution
  • Documentation of operational procedures and verification

The documentation set goes well beyond any certification I am aware of, but more importantly demonstrates a candidates ability to produce documentation which ensures the solution can be implemented , validated and operated in the event the lead architect is unavailable. This is a very high standard of documentation which I’ve rarely seen in my career.

In addition, 3 Professional references will also be required to validate the candidates experience.

Stage 2 being the NDR is modeled after an academic viva voce defense (live, oral exam) and requires candidates to present their solution to, and answer questions posed to them by NPX-Certified Examiners (NCE). The NDR also includes a series of hands-on exercises, which must be completed by the candidate. Successful completion of both stages is required to earn the NPX credential.

The NPX has a strict policy regarding fictitious solution designs.

NPX candidates may not submit wholly fictitious designs.

I pushed for this during the development of the certification as in my opinion, an enterprise architect should have a portfolio of work to choose from which negates the requirements to create a fictitious design.

In saying that, Partially fictitious designs are permitted when an existing design requires additions or enhancements in order to demonstrate competence in required knowledge areas (e.g., a backup or DR solution may be added if this component was outside the scope of the original design).

Adapting an existing 3-tier solution design to the Nutanix platform is also permitted. In either case the submitted design should contain a majority of solution components architected to support applications with service level agreements specified by actual business stakeholders.

The NDR itself requires the completion of an exercise involving a live Nutanix environment and completion of a design scenario. Both exercises will require demonstration of NPX-level solution design and delivery skills with a second solution stack/hypervisor.

An NPX candidate is permitted to choose the hypervisor you will be tested on during your NDR (it must be different from the hypervisor utilized in the submitted solution design). The hypervisor selected will be used for the Hands-on and Design scenarios during the NDR.

The Hypervisor choices are:

  • VMware® vSphere®
  • Microsoft® Hyper-V®
  • KVM

What next?

I would encourage all enterprise architects to stay tuned for the release of more NPX details via the Nutanix nu.School website and take on the challenge of NPX and become a better architect in the process.

The Nutanix Platform Expert Official Certification Guide is currently being written and should be released at Nutanix .NEXT this coming June.

Summary:

I really enjoyed working with such a talented group of people in developing NPX, and I look forward to being a part of the program firstly as a candidate and as a certified examiner in the future to ensure the quality of Enterprise Architects in the industry only gets better!

Here is a group shot of on the final day of NPX development in San Jose.

Names (Left to right): Derek Seaman , Steven Poitras, Jon Kohler, Ray Hassan, Bas Raayman, Raymon Epping, Josh Odgers, Michael Webster, Artur Krzywdzinski, Samir Roshan, Lane Laverett, Mark Brunstad and Richard Arsenian.

Absent for Photo: Magnus Andersson , Lisa O’Leary, PhD Psychometrician.

NPXDevTeam