Enterprise Architecture & Avoiding tunnel vision.

Recently I have read a number of articles and had several conversations with architects and engineers across various specialities in the industry and I’m finding there is a growing trend of SMEs (Subject Matter Experts) having tunnel vision when it comes to architecting solutions for their customers.

What I mean by “Tunnel Vision” is that the architect only looks at what is right in front of him/her (e.g.: The current task/project) , and does not consider the implications of how the decisions being made for this task may impact the wider I.T infrastructure and customer from a commercial / operational perspective.

In my previous role I saw this all to often, and it was frustrating to know the solutions being designed and delivered to the customers were in some cases quite well designed when considered in isolation, but when taking into account the “Big Picture” (or what I would describe as the customers overall requirements) the solutions were adding unnecessary complexity, adding risk and increasing costs, when new solutions should be doing the exact opposite.

Lets start with an example;

Customer “ACME” need an enterprise messaging solution and have chosen Microsoft Exchange 2013 and have a requirement that there be no single points of failure in the environment.

Customer engages an Exchange SME who looks at the requirements for Exchange, he then points to a vendor best practice or reference architecture document and says “We’ll deploy Exchange on physical hardware, with JBOD & no shared storage and use Exchange Database Availability Groups for HA.”

The SME then attempts to justify his recommendation with “because its Microsoft’s Best practice” which most people still seem to blindly accept, but this is a story for another post.

In fairness to the SME, in isolation the decision/recommendation meets the customers messaging requirements, so what’s the problem?

If the customers had no existing I.T and the messaging system was going to be the only I.T infrastructure and they had no plans to run any other workloads, I would say the solution proposed could be a excellent solution, but how many customers only run messaging? In my experience, none.

So lets consider the customer has an existing Virtual environment, running Test/Dev, Production and Business Critical applications and adheres to a “Virtual First” policy.

The customer has already invested in virtualization & some form of shared storage (SAN/NAS/Web Scale) and has operational procedures and expertises in supporting and maintaining this environment.

If we were to add a new “silo” of physical servers, there are many disadvantages to the customer including but not limited too;

1. Additional operational documentation for new Physical environment.

2. New Backup & Disaster Recovery strategy / documentation.

3. Additional complexity managing / supporting a new Silo of infrastructure.

4. Reduced flexibility / scalability with physical servers vs virtual machines.

5. Increased downtime and/or impact in the event hardware failures.

6. Increased CAPEX due to having to size for future requirements due to scaling challenges with physical servers.

So what am I getting at?

The cost of deploying the MS Exchange solution on physical hardware could potentially be cheaper (CAPEX) Day 1 than virtualizing the new workload on the existing infrastructure (which likely needs to be scaled e.g.: Disk Shelves / Nodes) BUT would likely result overall higher TCO (Total Cost of Ownership) due to increased complexity & operational costs due to the creation of a new silo of resources.

Both a physical or virtual solution would likely meet/exceed the customers basic requirement to serve MS Exchange, but may have vastly different results in terms of the big picture.

Another example would be a customer has a legacy SAN which needs to be replaced and is causing issues for a large portion of the customers workloads, but the project being proposed is only to address the new Enterprise messaging requirements. In my opinion a good architect should consider the big picture and try to identify where projects can be combined (or a projects scope increased) to ensure a more cost effective yet better overall result for the customer.

If the architect only looked at Exchange and went Physical Servers w/ JBOD, there is zero chance of improvement for the rest of the infrastructure and the physical equipment for Exchange would likely be oversized and underutilized.

It will in many cases be much more economical to combine two or more projects, to enable the purchase of a new technology or infrastructure components and consolidate the workloads onto shared infrastructure rather than building two or more silo’s which add complexity to the environment, and will likely result in underutilized infrastructure and a solution which is inferior to what could have been achieved by combining the projects.

In conclusion, I hope that after reading this article, the next time you or your customers embark on a new project, that you as the Architect, Project Manager, or Engineer consider the big picture and not just the new requirement and ensure your customer/s get the best technical and business outcomes and avoid where possible the use of silos.

Scale Out Shared Nothing Architecture Resiliency by Nutanix

At VMware vForum Sydney this week I presented “Taking vSphere to the next level with converged infrastructure”.

Firstly, I wanted to thank everyone who attended the session, it was a great turnout and during the Q&A there were a ton of great questions.

I got a lot of feedback at the session and when meeting people at vForum about how the Nutanix scale out shared nothing architecture tolerates failures.

I thought I would summarize this capability as I believe its quite impressive and should put everyone’s mind at ease when moving to this kind of architecture.

So lets take a look at a 5 node Nutanix cluster, and for this example, we have one running VM. The VM has all its data locally, represented by the “A” , “B” and “C” and this data is also distributed across the Nutanix cluster to provide data protection / resiliency etc.

Nutanix5NodeCluster

So, what happens when an ESXi host failure, which results in the Nutanix Controller VM (CVM) going offline and the storage which is locally connected to the Nutanix CVM being unavailable?

Firstly, VMware HA restarts the VM onto another ESXi host in the vSphere Cluster and it runs as normal, accessing data both locally where it is available (in this case, the “A” data is local) and remotely (if required) to get data “B” and “C”.

Nutanix5nodecluster1failed

Secondly, when data which is not local (in this example “B” and “C”) is accessed via other Nutanix CVMs in the cluster, it will be “localized” onto the host where the VM resides for faster future access.

It is importaint to note, if data which is not local is not accessed by the VM, it will remain remote, as there is no benefit in relocating it and this reduces the workload on the network and cluster.

The end result is the VM restarts the same as it would using traditional storage, then the Nutanix cluster “curator” detects if any data only has one copy, and replicates the required data throughout the cluster to ensure full resiliency.

The cluster will then look like a fully functioning 4 node cluster as show below.

5NodeCluster1FailedRebuild

The process of repairing the cluster from a failure is commonly incorrectly compared to a RAID pack rebuild. With a raid rebuild, a small number of disks, say 8, are under heavy load re striping data across a hot spare or a replacement drive. During this time the performance of everything on the RAID pack is significantly impacted.

With Nutanix, the data is distributed across the entire cluster, which even with a 5 node cluster will be at least 20 SATA drives, but with all data being written to SSD then sequentially offloaded to SATA.

The impact of this process is much less than a RAID rebuild as all Nutanix controllers in the cluster participate and take a portion of the workload as a result the impact per disk, per controller ,per node and importantly for production VMs running in the cluster, is greatly reduced.

Essentially, the larger the cluster, the faster the cluster can repair itself, and the lower the impact on production workloads.

Now lets talk about a subsequent ESXi host failure, now we have two failed nodes, and three surviving nodes, and only one copy of data “A” , “B” and “C” as shown below.

Nutanix5NodeCluster2failures1copydata

Now the Nutanix “Curator” detects only one copy of data “A”, “B” and “C” exists and starts to replicate copies of “A”, “B” and “C” across the cluster. This results in the below which is a fully functional and redundant cluster, capable of surviving yet another failure as shown below.

Nutanix5NodeCluster2Failures

Even in this scenario, where two ESXi hosts are lost, the environment still has 60% of its storage controllers (and performance), as compared to a typical traditional storage product where the loss of just two (2) controllers can have your environment completely offline, and even if you only lost a single controller, you would only have 50% of the storage controllers (and performance) available.

I think this really highlights what VMware and players like Google, Facebook & Twitter have been saying for a long time, scaling out not up, and shared nothing architecture is the way of the future. The only question is who will be dominant in bringing this technology to the mass market, and I think you know who I have my money on.

Scaling problems with traditional shared storage

At VMware vForum Sydney this week I presented “Taking vSphere to the next level with converged infrastructure”.

Firstly, I wanted to thank everyone who attended the session, it was a great turnout and during the Q&A there were a ton of great questions.

One part of the presentation I got a lot of feedback on was when I spoke about Performance and Scaling and how this is a major issue with traditional shared storage.

So for those who couldn’t attend the session, I decided to create this post.

So lets start with a traditional environment with two VMware ESXi hosts, connected via FC or IP to a Storage array. In this example the storage controllers have a combined capability of 100K IOPS.

50kIOPS

As we have two (2) ESXi hosts, if we divide the performance capabilities of the storage controllers between the two hosts we get 50K IOPS per node.

This is an example of what I have typically seen in customer sites, and day 1, and performance normally meets the customers requirements.

As environments tend to grow over time, the most common thing to expand is the compute layer, so the below shows what happens when a third ESXi host is added to the cluster, and connected to the SAN.

33KIOPS

The 100K IOPS is now divided by 3, and each ESXi host now has 33K IOPS.

This isn’t really what customers expect when they add additional servers to an environment, but in reality, the storage performance is further divided between ESXi hosts and results in less IOPS per host in the best case scenario. Worst case scenario is the additional workloads on the third host create contention, and each host may have even less IOPS available to it.

But wait, there’s more!

What happens when we add a forth host? We further reduce the storage performance per ESXi host to 25K IOPS as shown below, which is HALF the original performance.

25KIOPS

At this stage, the customers performance is generally significantly impacted, and there is no easy or cost effective resolution to the problem.

….. and when we add a fifth host? We continue to reduce the storage performance per ESXi host to 20K IOPS which is less than half its original performance.

20KIOPS

So at this stage, some of you may be thinking, “yeah yeah, but I would also scale my storage by adding disk shelves.”

So lets add a disk shelf and see what happens.

20KIOPSAddDiskShelf

We still only have 100K IOPS capable storage controllers, so we don’t get any additional IOPS to our ESXi hosts, the result of adding the additional disk shelf is REDUCED performance per GB!

Make sure when your looking at implementing, upgrading or replacing your storage solution that it can actually scale both performance (IOPS/throughput) AND capacity in a linear fashion,otherwise your environment will to some extent be impacted by what I have explained above. The only ways to avoid the above is to oversize your storage day 1, but even if you do this, over time your environment will appear to become slower (and your CAPEX will be very high).

Also, consider the scaling increments, as a solutions ability to scale should not require you to replace controllers or disks, or have a maximum number of controllers in the cluster. it also should scale in both small, medium and large increments depending on the requirements of the customer.

This is why I believe scale out shared nothing architecture will be the architecture of the future and it has already been proven by the likes of Google, Facebook and Twitter, and now brought to market by Nutanix.

Traditional storage, no matter how intelligent does not scale linearly or granularly enough. This results in complexity in architecture of storage solutions for environments which grow over time and lead to customers spending more money up front when the investment may not be realised for 2-5 years.

I’d prefer to be able to Start small with as little as 3 nodes, and scale one node at a time (regardless of node model ie: NX1000 , NX3000 , NX6000) to meet my customers requirements and never have to replace hardware just to get more performance or capacity.

Here is a summary of the Nutanix scaling capabilities, where you can scale Compute heavy, storage heavy or a mix of both as required.

ScaingSolution