Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again! – Part 1

I recently wrote a post called Fight the FUD: Nutanix scale limitations which corrected some mis-information VCE COO Todd Pavone has stated in this article COO: VCE converged infrastructure not affected by Dell-EMC about Nutanix scalability.

In the same interview, Todd makes several comments ( see quote below) which I can only trust to be accurate for VSPEX Blue but as he refers more generally about Hyper-converged systems, I have to disagree with many of the comments from a Nutanix perspective, and thought it would be good to discuss where I see Nutanix.

Where does VSPEX Blue fit into the portfolio?

Hyper-converged by definition is where you use software to find technology to manage what people like to call a commoditized infrastructure, where there is no external storage. So, the intelligence is in the software, and you don’t require the intelligence in the infrastructure. In the market, everyone has had an appliance, which is just a server with embedded storage or some marketed software, and ideal for edge locations or for single use cases. But you’re not going to put SAP and run your mission-critical business on an appliance. They have scaling challenges, right? You get to a certain number of nodes, and then the performance degrades; you have to then create another cluster, another cluster. It’s just not an ideal way to go run your mission-critical x86 workloads. [It’s] good for an edge, good for a simple form factors, good for single use cases or what I’ll call more simplified workloads.

In this post I will be specifically discussing Nutanix HCI solution, and while I have experience with and opinions about other products in the market, I will let other vendors speak for themselves.

The following quotes are not in the order Todd mentioned them in the above interview, they have been grouped together/ordered to avoid overlap/repeating comments and to make this blog flow better (hopefully). As such, if any comments appear to be taken out of context, it is not my intention.

So let’s break down what Todd has said:

  • Todd: In the market, everyone has had an appliance, which is just a server with embedded storage or some marketed software, and ideal for edge locations or for single use cases.

I agree that Hyper-converged systems such as Nutanix run on commodity servers with embedded storage. I also agree Nutanix is ideal for edge locations and can be successfully used for single use cases, but as my next response will show, I strongly disagree with any implication that Nutanix (as the markets most innovative leader in HCI, source: Gartner with 52% market share according to IDC) is limited to edge or single use cases.

  • Todd: “It’s just not an ideal way to go run your mission-critical x86 workloads” & “But you’re not going to put SAP and run your mission-critical business on an appliance.”

Interestingly, Nutanix is the only certified HCI platform for SAP.

As an architect, when designing for mission critical workloads, I want a platform which can/is:

a) Start small and scale as required (for example as vBCA’s demands increase)
b) Highly resilient & have automated self healing
c) Fully automated non-disruptive (and low impact) maintenance
d) Easy to manage / scale
e) Deliver the required levels of performance

In addition to the above, the fewer dependancies the better, as there is less to go wrong, troubleshoot, create bottlenecks and so on.

Nutanix HCI delivers all of the above, so why wouldn’t you run vBCA on Nutanix? In fact, the question I would ask is, “Why would you run vBCA on legacy 3 tier platforms”!

With legacy 3 tier in my experience it’s more difficult to start small and scale, typically 3-tier solutions have only two controllers which cannot self heal in the event of a failure, have complex and time consuming patching/upgrading procedures, typically have multiple points of Management (not single pane of glass like Nutanix w/ Acropolis Hypervisor), are typically much more difficult to scale (and require rip/replace).

The only thing most monolithic 3-tier products provide (if architected correctly) is reasonable performance.

Here is a typical example of a Nutanix customer upgrade experience compared to a legacy 3-tier product.

HdexTweetUpgrades

Think the above isn’t a fair comparison? I agree! Nutanix vs Legacy is no contest.

When I joined Nutanix in 2013, I was immediately involved with testing of mission critical workloads & I have no problems saying performance was not good enough for some workloads. Since then Nutanix has focused on building out a large team (3 of which are VCDX with years of vBCA experience) focusing on business critical applications, now applications like SQL, Oracle (including RAC deployments), MS Exchange and SAP are becoming common workloads for our customers who originally started with Test/Dev or VDI.

Think of Nutanix like VMware in 2005, everyone was concerned about performance, resiliency and didn’t run business critical applications on VI3 (later renamed vSphere), but over time everyone (including myself) learned virtualization was infact not only suitable for vBCA it’s an ideal platform. I’m here to tell everyone, don’t make the same mistake (we all did with virtualization) and assume Nutanix isn’t suitable for vBCA and wait 5 years to realise the value. Nutanix is more than ready (and has been for a while) for Mission critical applications.

Regarding Todd’s second statement “But you’re not going to put SAP and run your mission-critical business on an appliance.”

If not on an appliance, then what are we supposed to put mission-critical application on? Regardless of what you think of traditional Converged products, the fact is they are actually just a single SKU for multiple different pre-existing products (generally from multiple different vendors) which have been pre-architected and configured. They are not radically different and nor do they eliminate ongoing operational complexity which is a strength of HCI solutions such as Nutanix.

If anything putting mission critical applications on a simple and highly performant/scalable HCI appliance based solution (especially Nutanix) makes more sense than Converged / 3 Tier products. Nutanix is no longer the new kid on the block, Nutanix is well proven across all industries and on different workloads, including mission critical. Hell, most US Federal agencies including the Pentagon uses Nutanix, how much more critical do you want?  (Also anyone saying VDI isn’t mission critical has rock’s in their head! Think if all your users are offline, how productive is your company and how much use are all your servers?)

Imagine if the sizing of a traditional converged solution is wrong, or a mission critical application outgrows it before its scheduled end of life. Well with Nutanix, add one or more nodes (no rip and replace) and vMotion the workload/s, and you’ve scaled completely non disruptively. In fact, with Nutanix you should intentionally start small and scale as close to a just in time fashion as possible so your mission-critical application can take advantage of newer HW over the 3-5 years! Lower CAPEX and better long term performance, sounds like a WIN/WIN to me!

Even if it were true that Converged (or any other product) had higher peak performance (which in the real world has minimal value) than a Nutanix HCI solution, so what? Do you really want to have point solutions (a.k.a Silos) for every different workload? No. I wrote the following post which covers things to consider when choosing infrastructure which covers why you want to avoid silos which I encourage you to read when considering any new infrastructure.

  • Todd: They have scaling challenges, right? You get to a certain number of nodes, and then the performance degrades; you have to then create another cluster, another cluster.”

My previous post Fight the FUD: Nutanix scale limitations covers this FUD off in detail. In short, Nutanix has proven numerous times we can scale linearly, see Scaling to 1 Million IOPS and beyond linearly! for an example (And this video is from October 2013). Note: Ignore the actual IO number, the importaint factor is the linear scalability, not the peak benchmark number which have little value in the real world as I discuss here: “Peak Performance vs Real World Performance”.

  • Todd:  [It’s] good for an edge, good for a simple form factors, good for single use cases or what I’ll call more simplified workloads.

To be honest i’m not sure what he means by “good for a simple form factors”, but I can only assume he is talking about how HCI solutions like Nutanix has compact 4 node per 2RU form factors and use less rack space, power, cooling etc?

As for single use cases, I recommend customers run mixed workloads for several reasons. Firstly, Nutanix is a truly distributed solution which means the more nodes in a cluster, the more performant & resilient the cluster becomes. Scaling out a cluster also helps eliminate silos which reduces waste.

I recently wrote this post: Heterogeneous Nutanix Clusters Advantages & Considerations which covers how mixing node types works in a Nutanix environment. The Nutanix Distributed Storage fabric has lots of back end optimisations (ran by curator) which have been developed over the years to ensure heterogeneous clusters perform well. This is an example of technology which marketing slides can’t represent the value of, but the real world value is huge.

I have been involved with numerous mission critical application deployments, and there are heaps of case studies available on the Nutanix website for these deployments available at http://www.nutanix.com/resources/case-studies/.

A final thought for Part 1, with Nutanix, you can build what you need today and have mission critical workloads benefit from latest generation HW on a frequent basis (e.g.: Annually) by adding new nodes over time and simply vMotioning mission critical VMs to the newer nodes. So over say a 5 year life span of infrastructure, your mission critical applications could benefit from the performance improvements of 5 generations of intel chipsets not to mention the ever increasing efficiency of the Nutanix Acropolis base software (formally known as NOS).

Try getting that level of flexibility/performance improvements with legacy 3 tier!

Next up, Part 2

 

Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 4 – Security

Security is a major pillar of the XCP design. The use of innovative automation results in perhaps the most hardened, simple and comprehensive virtualization infrastructure in the industry.

AHV is not designed to work with a comprehensive HCL of hardware vendors, nor does it have countless bolt-on style products which need to be catered for. Instead Acropolis hypervisor has been optimized to work with the Nutanix Distributed Storage Fabric and approved appliances from Nutanix and OEM partners to provide all services/functionality in a truly Web scale manner.

This allows for much tighter and targeted quality assurance and dramatically reduces the attack surface compared to hypervisors.

The Security Development Lifecycle (SecDL) is leveraged across the entire Acropolis platform ensuring every line of code is production ready. This design follows a defense-in-depth model that removes all unnecessary services for libvirt/QEMU (SPICE, unused drivers), leverages libvirt non-root group sockets for principle of least privilege, SELinux confined guests for vmescape protection, and an embedded intrusion detection system.

seclifecycle

Acropolis hypervisor has a documented and supported security baseline (XCCDF STIG), and introduces the self-remediating hypervisor. On a customer defined interval, the hypervisor is scanned for any changes to the supported security baseline, and resets the baseline back to the secure state if any anomaly is detected in the background with no user intervention.

The Acropolis platform also boats a comprehensive list of security certifications/validations:

SecCerts2

Summary

Acropolis provides numerous security advantages including:

  1. In-Built and self auditing Security Technical Implementation Guides (STIGs)
  2. Hardened hypervisor out of the box without the requirement for administrators to apply hardening recommendations
  3. Reduced attack surface compared to other supported hypervisors

For more information on Nutanix security see:

Back to the Index

Why Nutanix Acropolis hypervisor (AHV) is the next generation hypervisor – Part 2 – Simplicity

Let me start by saying I believe complexity is one of the biggest and potentially the most overlooked, issue in modern datacenters.

Virtualization has enabled increased flexibility and solved countless problems within the datacenter. But over time I have observed an increase in complexity especially around the management components which for many customers is a major pain point.

Complexity leads to things like increased cost (both CAPEX & OPEX) and risk, which commonly leads to reduced availability/performance.

In Part 10, I will cover Cost in more depth so let’s park it for the time being.

When architecting solutions for customers, my number one goal is to meet/exceed all my customers’ requirements with the simplest solution possible.

anyfool

Acropolis is where web-scale technology delivers enterprise grade functionality with consumer-grade simplicity, and with AHV the story gets even better.

Removing Dependencies

A great example of the simplicity of the Nutanix Xtreme Computing Platform (XCP) is its lack of external dependencies. There is no requirement for any external databases when running Acropolis Hypervisor (AHV) which removes the complexity of designing, implementing and maintaining enterprise grade database solutions such as Microsoft SQL or Oracle.

This is even more of an advantage when you take into account the complexity of deploying these platforms in highly available configurations such as AlwaysOn Availability Groups (SQL) or Real Application Clusters (Oracle RAC) where SMEs need to be engaged for design, implementation and maintenance. As a result of not being dependent on 3rd party database products, AHV reduces/removes complexity around product interoperability or the need to call multiple vendors if something goes wrong. This also means no more investigating Hardware Compatibility Lists (HCLs) and Interoperability Matrix’s when performing upgrades.

Management VMs

Only a single management virtual machine (Prism Central) needs to be deployed – even for multi-cluster globally distributed AHV environments. Prism Central is an easy to deploy appliance and since it’s state-less, it does not require backing up. In the event the appliance is lost, an administrator simply deploys a new Prism Central appliance and connects it to the clusters which can be done in a matter of seconds per cluster. No historical data is lost as the data is maintained on the clusters being managed.

Because Acropolis requires no additional components, it all but eliminates the design/implementation and operational complexity for management compared to other virtualization / HCI offerings.

Other supported hypervisors commonly require multiple management VMs and backend databases even for relatively small scale/simple deployments just to provide basic administration, patching and operations management capabilities.

Acropolis has zero dependencies during the installation phase, customers can implement a fully featured AHV environment without any existing hardware/software in the datacenter. Not only does this make initial deployment easy, but it also removes the complexity around interoperability when patching or upgrading in the future.

Ease of Management

Nutanix XCP clusters running any hypervisor can be managed individually using Prism Element or centrally via Prism Central.

Prism Element requires no installation; it is available and performs optimally out-of-the-box. Administrators can access Prism Element via the XCP Cluster IP address or via any Controller VM IP address.

Administrators of Legacy virtualization products often need to use hypervisor-specific tools to complete various tasks requiring design/deployment and management of these components and their dependencies. With AHV, all hypervisor level functionality is completed via Prism providing a true single pane of glass interface for everything from Storage, Compute, Backup, Data Replication, Hardware monitoring and more.

The image below shows the PRISM Central Home Screen that provides a high-level summary of all clusters in the environment. From this screen, you can drill down to individual clusters to get more granular information where required.

PRISMcentraloverview

Administrators perform all upgrades from PRISM without the requirement for external update management applications/appliances/VMs or supporting back end databases.

PRISM performs one-click fully automated rolling upgrades to all components including Hypervisor, Acropolis Base Platform (formally known as NOS), Firmware and Nutanix Cluster Check (NCC).

For a demo of Prism Central see the following YouTube video:

Further Reduced Storage Complexity

Storage has long been, and continues for many customers to be, a major hurdle to successful virtual environments. Nutanix has essentially made storage invisible over the past few years by removing the requirement for dedicated Storage Area Networks, Zoning, Masking, RAID and LUNs. When combined with AHV, XCP has taken this innovation yet another big step forward by removing the concepts of datastores/mounts and virtual SCSI controllers.

For each Virtual Machine disk, AHV presents the vDisk directly to the VM, and the VM simply sees the vDisk as if it were a physically attached drive. There is no in-guest configuration. It just works.

This means there is no complexity around how many virtual SCSI controllers to use, or where to place a VM or vDisk and as such, Acropolis has eliminated the requirement for advanced features to manage virtual machine placement and capacity management such as vSphere’s Storage DRS.

Don’t get me wrong, Storage DRS is a great feature which helps solve serious problems with traditional storage.  With XCP these problems just don’t exist.

For more details see:  Storage DRS and Nutanix – To use, or not to use, that is the question?

The following screen shot shows just how simple vDisks appear under the VM configuration menu in Prism Element. There is no need to assign vDisks to Virtual SCSI controllers which ensures vDisks can be added quickly and perform optimally.

VMdisks

Node Configuration

Configuring an AHV environment via Prism automatically applies all changes to each node within the cluster. Critically, Acropolis Host Profiles functionality does not need to be enabled or configured, nor do Administrators have to check for compliance or create/apply profiles to nodes.

In AHV all networking is fully distributed similar to the vSphere Distributed Switch (VDS) from VMware. AHV network configuration is automatically applied to all nodes within the cluster without requiring the administrator to attach nodes/hosts to the virtual networking. This helps ensure a consistent configuration throughout the cluster.

The reason the above points are so important is each dramatically simplifies the environment by removing (not just abstracting) many complicated design/configuration items such as:

  • Multipathing
  • Deciding How many datastores are required & what size each should be
  • Considering how many VMs should reside per datastore/LUN.
  • Configuration maximums for Datastores / Paths
  • Managing consistent configuration across nodes/hosts
  • Managing Network Configuration

Administrators can optionally join Acropolis built-in authentication to an Active Directory domain, removing the requirement for additional Single Sign-On components. All Acropolis components include High Availability out-of-the-box, removing the requirement to design (and license) HA solutions for individual management components.

Data Protection / Replication

The Nutanix CVM includes built-in data protection and replication components, removing the requirement to design/deploy/manage one or more Virtual Appliances. This also avoids the need to design, implement and scale these components as the environment grows.

All of the data protection and replication features are also available via Prism and, importantly, are configured on a per VM basis making configuration easier and reducing overheads.

Summary

In summary the simplicity of the AHV eliminates:

  1. Single points of failures for all management components out of the box
  2. The requirement for dedicated management clusters for Acropolis components
  3. Dependency on 3rd Party Operating Systems & Database platforms
  4. The requirement for design, implementation and ongoing maintenance for Virtualization management components
  5. The need to design, install, configure & maintain a Web or Desktop type
  6. Complexity such as
    1. The requirement to install software or appliances to allow patching / upgrading
    2. The requirement for an SME to design a solution to make management components highly available
    3. The requirement to follow complex Hardening Guides to achieve security compliance.
    4. The requirement for additional Appliances/interfaces and external dependencies (i.e.: Database Platforms)
  7. The requirement to license features to allow Centralised configuration management of nodes.

Back to the Index