Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again! – Part 1

I recently wrote a post called Fight the FUD: Nutanix scale limitations which corrected some mis-information VCE COO Todd Pavone has stated in this article COO: VCE converged infrastructure not affected by Dell-EMC about Nutanix scalability.

In the same interview, Todd makes several comments ( see quote below) which I can only trust to be accurate for VSPEX Blue but as he refers more generally about Hyper-converged systems, I have to disagree with many of the comments from a Nutanix perspective, and thought it would be good to discuss where I see Nutanix.

Where does VSPEX Blue fit into the portfolio?

Hyper-converged by definition is where you use software to find technology to manage what people like to call a commoditized infrastructure, where there is no external storage. So, the intelligence is in the software, and you don’t require the intelligence in the infrastructure. In the market, everyone has had an appliance, which is just a server with embedded storage or some marketed software, and ideal for edge locations or for single use cases. But you’re not going to put SAP and run your mission-critical business on an appliance. They have scaling challenges, right? You get to a certain number of nodes, and then the performance degrades; you have to then create another cluster, another cluster. It’s just not an ideal way to go run your mission-critical x86 workloads. [It’s] good for an edge, good for a simple form factors, good for single use cases or what I’ll call more simplified workloads.

In this post I will be specifically discussing Nutanix HCI solution, and while I have experience with and opinions about other products in the market, I will let other vendors speak for themselves.

The following quotes are not in the order Todd mentioned them in the above interview, they have been grouped together/ordered to avoid overlap/repeating comments and to make this blog flow better (hopefully). As such, if any comments appear to be taken out of context, it is not my intention.

So let’s break down what Todd has said:

  • Todd: In the market, everyone has had an appliance, which is just a server with embedded storage or some marketed software, and ideal for edge locations or for single use cases.

I agree that Hyper-converged systems such as Nutanix run on commodity servers with embedded storage. I also agree Nutanix is ideal for edge locations and can be successfully used for single use cases, but as my next response will show, I strongly disagree with any implication that Nutanix (as the markets most innovative leader in HCI, source: Gartner with 52% market share according to IDC) is limited to edge or single use cases.

  • Todd: “It’s just not an ideal way to go run your mission-critical x86 workloads” & “But you’re not going to put SAP and run your mission-critical business on an appliance.”

Interestingly, Nutanix is the only certified HCI platform for SAP.

As an architect, when designing for mission critical workloads, I want a platform which can/is:

a) Start small and scale as required (for example as vBCA’s demands increase)
b) Highly resilient & have automated self healing
c) Fully automated non-disruptive (and low impact) maintenance
d) Easy to manage / scale
e) Deliver the required levels of performance

In addition to the above, the fewer dependancies the better, as there is less to go wrong, troubleshoot, create bottlenecks and so on.

Nutanix HCI delivers all of the above, so why wouldn’t you run vBCA on Nutanix? In fact, the question I would ask is, “Why would you run vBCA on legacy 3 tier platforms”!

With legacy 3 tier in my experience it’s more difficult to start small and scale, typically 3-tier solutions have only two controllers which cannot self heal in the event of a failure, have complex and time consuming patching/upgrading procedures, typically have multiple points of Management (not single pane of glass like Nutanix w/ Acropolis Hypervisor), are typically much more difficult to scale (and require rip/replace).

The only thing most monolithic 3-tier products provide (if architected correctly) is reasonable performance.

Here is a typical example of a Nutanix customer upgrade experience compared to a legacy 3-tier product.

HdexTweetUpgrades

Think the above isn’t a fair comparison? I agree! Nutanix vs Legacy is no contest.

When I joined Nutanix in 2013, I was immediately involved with testing of mission critical workloads & I have no problems saying performance was not good enough for some workloads. Since then Nutanix has focused on building out a large team (3 of which are VCDX with years of vBCA experience) focusing on business critical applications, now applications like SQL, Oracle (including RAC deployments), MS Exchange and SAP are becoming common workloads for our customers who originally started with Test/Dev or VDI.

Think of Nutanix like VMware in 2005, everyone was concerned about performance, resiliency and didn’t run business critical applications on VI3 (later renamed vSphere), but over time everyone (including myself) learned virtualization was infact not only suitable for vBCA it’s an ideal platform. I’m here to tell everyone, don’t make the same mistake (we all did with virtualization) and assume Nutanix isn’t suitable for vBCA and wait 5 years to realise the value. Nutanix is more than ready (and has been for a while) for Mission critical applications.

Regarding Todd’s second statement “But you’re not going to put SAP and run your mission-critical business on an appliance.”

If not on an appliance, then what are we supposed to put mission-critical application on? Regardless of what you think of traditional Converged products, the fact is they are actually just a single SKU for multiple different pre-existing products (generally from multiple different vendors) which have been pre-architected and configured. They are not radically different and nor do they eliminate ongoing operational complexity which is a strength of HCI solutions such as Nutanix.

If anything putting mission critical applications on a simple and highly performant/scalable HCI appliance based solution (especially Nutanix) makes more sense than Converged / 3 Tier products. Nutanix is no longer the new kid on the block, Nutanix is well proven across all industries and on different workloads, including mission critical. Hell, most US Federal agencies including the Pentagon uses Nutanix, how much more critical do you want?  (Also anyone saying VDI isn’t mission critical has rock’s in their head! Think if all your users are offline, how productive is your company and how much use are all your servers?)

Imagine if the sizing of a traditional converged solution is wrong, or a mission critical application outgrows it before its scheduled end of life. Well with Nutanix, add one or more nodes (no rip and replace) and vMotion the workload/s, and you’ve scaled completely non disruptively. In fact, with Nutanix you should intentionally start small and scale as close to a just in time fashion as possible so your mission-critical application can take advantage of newer HW over the 3-5 years! Lower CAPEX and better long term performance, sounds like a WIN/WIN to me!

Even if it were true that Converged (or any other product) had higher peak performance (which in the real world has minimal value) than a Nutanix HCI solution, so what? Do you really want to have point solutions (a.k.a Silos) for every different workload? No. I wrote the following post which covers things to consider when choosing infrastructure which covers why you want to avoid silos which I encourage you to read when considering any new infrastructure.

  • Todd: They have scaling challenges, right? You get to a certain number of nodes, and then the performance degrades; you have to then create another cluster, another cluster.”

My previous post Fight the FUD: Nutanix scale limitations covers this FUD off in detail. In short, Nutanix has proven numerous times we can scale linearly, see Scaling to 1 Million IOPS and beyond linearly! for an example (And this video is from October 2013). Note: Ignore the actual IO number, the importaint factor is the linear scalability, not the peak benchmark number which have little value in the real world as I discuss here: “Peak Performance vs Real World Performance”.

  • Todd:  [It’s] good for an edge, good for a simple form factors, good for single use cases or what I’ll call more simplified workloads.

To be honest i’m not sure what he means by “good for a simple form factors”, but I can only assume he is talking about how HCI solutions like Nutanix has compact 4 node per 2RU form factors and use less rack space, power, cooling etc?

As for single use cases, I recommend customers run mixed workloads for several reasons. Firstly, Nutanix is a truly distributed solution which means the more nodes in a cluster, the more performant & resilient the cluster becomes. Scaling out a cluster also helps eliminate silos which reduces waste.

I recently wrote this post: Heterogeneous Nutanix Clusters Advantages & Considerations which covers how mixing node types works in a Nutanix environment. The Nutanix Distributed Storage fabric has lots of back end optimisations (ran by curator) which have been developed over the years to ensure heterogeneous clusters perform well. This is an example of technology which marketing slides can’t represent the value of, but the real world value is huge.

I have been involved with numerous mission critical application deployments, and there are heaps of case studies available on the Nutanix website for these deployments available at http://www.nutanix.com/resources/case-studies/.

A final thought for Part 1, with Nutanix, you can build what you need today and have mission critical workloads benefit from latest generation HW on a frequent basis (e.g.: Annually) by adding new nodes over time and simply vMotioning mission critical VMs to the newer nodes. So over say a 5 year life span of infrastructure, your mission critical applications could benefit from the performance improvements of 5 generations of intel chipsets not to mention the ever increasing efficiency of the Nutanix Acropolis base software (formally known as NOS).

Try getting that level of flexibility/performance improvements with legacy 3 tier!

Next up, Part 2

 

VMworld then and now! (2013 vs 2014)

Last year I did an interview with Eric Sloof @esloof of VMworld TV (below) where we discussed the basics (or the 101) of Nutanix and this was the theme of questions from attendees throughout the Solutions Exchange.

Meet the team behind Nutanix VMworld 2013 – https://www.youtube.com/watch?v=T56KBaB3OUk

Jump forward to this years VMworld (2014) and I was lucky enough to get an opportunity to interview with  VMworld TV again. Eric and I agreed that we didn’t want a simple repeat of last years interview, but talk about more benefits of the platform (or the 201 level).

Nutanix speaks to VMworld TV about their exciting new products – https://t.co/brA15Zgcql

The interesting part of VMworld 2014 and my time on the booth, the theme of questions from attendees was significantly different from last year, and in large part was focused on Business Critical Applications and Server workloads from both prospective and existing customers.

One of my focusses over the last year has been Business Critical Applications and improving the Nutanix platform for these workloads. I am proud to say we (Nutanix) has made significant improvements in this area and we have a strong offering especially with the new NX-8150 platform which my team were responsible for designing.

I am looking forward to interviewing at next years VMworld and covering the Advanced/Expert level topics (301 level) with Eric and the fantastic VMworld TV crew.

Virtualizing Business Critical Applications – The Web-Scale Way!

Since joining Nutanix back in July 2013, I have been working on testing the performance and resiliency of a range of virtual workloads including Business Critical Applications on the Nutanix platform. At the time, Nutanix only offered a single form factor (4 nodes in 2RU) which was not always a perfect fit depending on customer requirements.

Fast forward to August 2014 and now Nutanix has a wide range of node types to meet most workload requirements which can be found here.

The only real gap in the node types was a node which would support applications with large capacity requirements and also have a very large active working set which requires consistent low latency and high performance regardless of tier.

So what do I mean when I say “Active working Set”. I would define this as a data being regularly accessed by the VM/s, for example a file server may have 10TB of data, but users only access 10% on a regular basis. This 10% I would classify as the Active Working Set.

Now back to the topic at hand, The reason I am writing this post is because this has been a project I have been working towards for some time, and I am very excited about this product being released to the market. I have no doubt it will further increase the already fast up take of the Web-scale solutions and provide significant value and opportunities to new and existing customers wanting to simplify their datacenter/s and standardize on Nutanix Web-scale architecture.

Along with many others at Nutanix, we proposed a new node type (being the NX-8150), which has been undergoing thorough testing in my team (Solutions & Performance Engineering) for some time and I am pleased to say is being officially released (very) soon!

nx8050

What is the NX-8150?

A 1 Node per 2RU platform with the following specifications:

* 2 CPU Sockets with two CPU options (E5-2690v2 [20 cores / 3.0 GHz] OR E5-2697v2 [24 cores / 2.7 GHz]
* 4 x Intel 3700 Series SSDs (ranging from 400GB to 1.6TB ea)
* 20 x 1TB SATA HDDs
* Up to 768GB RAM
* Up to 4 x 10GB NICs
* 4 x 1GB NICs
* 1 x IPMI (Out of band Management)

What is the use case for the NX-8150?

Simply put, Applications which have high CPU/RAM requirements with large active working sets and/or the requirement for consistent high performance over a large data set.

Some examples of these applications include:

* Microsoft Exchange including DAG deployments
* Microsoft SQL including Always on Availability Groups
* Oracle including RAC
* SAP
* Microsoft Sharepoint
* Mixed Production Server Workloads with varying Capacity & I/O requirements

The NX-8150 is a great platform for the above workloads as it not only has fast CPUs and up to a massive 768GB of RAM to provide substantial compute resources to VMs, but also up to a massive 6.4TB of RAW SSD capacity for Virtual machines with high IO requirements. For workloads where peak performance is not critical the NX-8150 also provides solid consistent performance across the “Cold Tier” provided by the 20 x 1TB HDDs.

As with all Nutanix nodes, Intelligent Life-cycle management (ILM) maximizes performance by dynamically migrating hot data to SSD and cold data to SATA to provide the best of both worlds being high IOPS and high capacity.

One of the many major advantages of Nutanix Web-Scale architecture is Simplicity and its ability to remove the requirement for application specific silos! Now with the addition of the NX-8150 the vast majority of workloads including Business Critical Applications can be ran successfully on Nutanix, meaning less silos are required, resulting in a simpler, more cost effective, scalable and resilient datacenter solution.

Now with a number of customers already placing advanced orders for NX-8150’s to deploy Business Critical Applications, it wont be long until the now common “Virtual 1st” policies within many organisations turns into a “Nutanix Web-Scale 1st” policy!

Stay tuned for upcoming case studies for NX-8150 based Web-Scale solutions!