Enterprise Architecture & Avoiding tunnel vision.

Recently I have read a number of articles and had several conversations with architects and engineers across various specialities in the industry and I’m finding there is a growing trend of SMEs (Subject Matter Experts) having tunnel vision when it comes to architecting solutions for their customers.

What I mean by “Tunnel Vision” is that the architect only looks at what is right in front of him/her (e.g.: The current task/project) , and does not consider the implications of how the decisions being made for this task may impact the wider I.T infrastructure and customer from a commercial / operational perspective.

In my previous role I saw this all to often, and it was frustrating to know the solutions being designed and delivered to the customers were in some cases quite well designed when considered in isolation, but when taking into account the “Big Picture” (or what I would describe as the customers overall requirements) the solutions were adding unnecessary complexity, adding risk and increasing costs, when new solutions should be doing the exact opposite.

Lets start with an example;

Customer “ACME” need an enterprise messaging solution and have chosen Microsoft Exchange 2013 and have a requirement that there be no single points of failure in the environment.

Customer engages an Exchange SME who looks at the requirements for Exchange, he then points to a vendor best practice or reference architecture document and says “We’ll deploy Exchange on physical hardware, with JBOD & no shared storage and use Exchange Database Availability Groups for HA.”

The SME then attempts to justify his recommendation with “because its Microsoft’s Best practice” which most people still seem to blindly accept, but this is a story for another post.

In fairness to the SME, in isolation the decision/recommendation meets the customers messaging requirements, so what’s the problem?

If the customers had no existing I.T and the messaging system was going to be the only I.T infrastructure and they had no plans to run any other workloads, I would say the solution proposed could be a excellent solution, but how many customers only run messaging? In my experience, none.

So lets consider the customer has an existing Virtual environment, running Test/Dev, Production and Business Critical applications and adheres to a “Virtual First” policy.

The customer has already invested in virtualization & some form of shared storage (SAN/NAS/Web Scale) and has operational procedures and expertises in supporting and maintaining this environment.

If we were to add a new “silo” of physical servers, there are many disadvantages to the customer including but not limited too;

1. Additional operational documentation for new Physical environment.

2. New Backup & Disaster Recovery strategy / documentation.

3. Additional complexity managing / supporting a new Silo of infrastructure.

4. Reduced flexibility / scalability with physical servers vs virtual machines.

5. Increased downtime and/or impact in the event hardware failures.

6. Increased CAPEX due to having to size for future requirements due to scaling challenges with physical servers.

So what am I getting at?

The cost of deploying the MS Exchange solution on physical hardware could potentially be cheaper (CAPEX) Day 1 than virtualizing the new workload on the existing infrastructure (which likely needs to be scaled e.g.: Disk Shelves / Nodes) BUT would likely result overall higher TCO (Total Cost of Ownership) due to increased complexity & operational costs due to the creation of a new silo of resources.

Both a physical or virtual solution would likely meet/exceed the customers basic requirement to serve MS Exchange, but may have vastly different results in terms of the big picture.

Another example would be a customer has a legacy SAN which needs to be replaced and is causing issues for a large portion of the customers workloads, but the project being proposed is only to address the new Enterprise messaging requirements. In my opinion a good architect should consider the big picture and try to identify where projects can be combined (or a projects scope increased) to ensure a more cost effective yet better overall result for the customer.

If the architect only looked at Exchange and went Physical Servers w/ JBOD, there is zero chance of improvement for the rest of the infrastructure and the physical equipment for Exchange would likely be oversized and underutilized.

It will in many cases be much more economical to combine two or more projects, to enable the purchase of a new technology or infrastructure components and consolidate the workloads onto shared infrastructure rather than building two or more silo’s which add complexity to the environment, and will likely result in underutilized infrastructure and a solution which is inferior to what could have been achieved by combining the projects.

In conclusion, I hope that after reading this article, the next time you or your customers embark on a new project, that you as the Architect, Project Manager, or Engineer consider the big picture and not just the new requirement and ensure your customer/s get the best technical and business outcomes and avoid where possible the use of silos.

Competition Example Architectural Decision Entry 6 – Improve Performance for BCAs on Cisco UCS

Name: Anuj Modi
Title: Unified Computing & Virtualization Consultant @ Cisco
Twitter: @vConsultant
Blog: http://anujmodi.wordpress.com

Problem Statement

Most of the companies are migrating application workload to virtual infrastructure to take the advantages of virtual computing. With benefits of virtualizing the environment, the application still are facing I/O performance issue and end-users are not happy with response time for moving applications to physical servers. What are the ways to improve the performance for business critical applications in such environments?

Assumptions

1.      Cisco Unified Computing System
2.      VMware vSphere 5.x
3.      Cisco Virtual Interface Card M81/1240/1280
4.      Critical applications/databases

Constraints

1.      No impact on the applications production data
2.      Benefits of Virtual infrastructure features
3.      High Availability of Applications
Motivation

1.      Better performance and response time for business critical applications
2.      Reduce CPU cycles on ESXi Servers and offload the I/O load to hardware level.
3.      Improved I/O throughput for applications

Architectural Decision

Use the Cisco VN-Link in hardware with VMDirectPath to get better I/O performance for network traffic. All the traffic will be redirected through physical interface card and bypassing the vmkernel. This will provide better I/O performance as this will reduce the OS kernel layer to pass the network traffic to physical interface card.

VN-Link in Hardware with VMDirectPath

Alternatives

Cisco provides three different options for Virtual machine traffic on hypervisor. These options are listed below

1.      VN-Link is Software
2.      VN-Link in Hardware
3.      VN-Link in Hardware with VMDirectPath

The other two options can be used to improve the performance for virtual machine traffic.
In option1, Nexus 1000V switch can be used for network traffic forwarding. Virtual machine nic will directly connects to Nexus 1000V switch and Nexus 1000V switch uplinks will connect to Cisco virtual interface card. With this option, you can get benefits of Nexus 1000V advanced network features like ERSPA and Netflow and standardization of network switch management.

In option 2, UCSM will be used as Distributed switch and will integrated with vCenter server to control the virtual machine traffic. Each virtual machine nic will maps to a different virtual interface (VIF) on the UCS Fabric Interconnect and directly pass the traffic through it. This will give better I/O performance to network traffic and directs the I/O load to physical interface card.

Justification

Option 3 is selected with this solution to provide higher I/O performance for network traffic. Hypervisor bypass is the ability for a virtual machine to access PCIe adaptor hardware directly in order to reduce the overhead on host CPU.  Cisco UCS provide this feature with VN-Link in Hardware with VMDirectPath option and help to reduce the overhead for host CPU/memory for I/O virtualization. The virtual machine directly talks to Cisco virtual interface card and bypass the vmkernel to provide higher performance to network traffic. The current virtual interface card can scale up to 256 virtual interface cards, which means the most of the virtual machines can get PCIe adaptor on a single host.

Implications

1.The disadvantage is currently limited vMotion support on VMware hypervisor.

Back to Competition Main Page or Competition Submissions

Data Centre Migration Strategies – Part 1 – Overview

After a recent twitter discussion, I felt a Data Centre migration strategies would be a good blog series to help people understand what the options are, along with the Pros and Cons of each strategy.

This guide is not intended to be a step by step on how to set-up each of these solutions, but a guide to assist you making the best decision for your environment when considering a data centre migration.

So what’s are some of the options when migrating virtual machines from one data centre to another?

1. Lift and Shift

Summary: Shut-down your environment and Physically relocate all the required equipment to the new location.

2. VMware Site Recovery Manager (SRM)

Summary: Using SRM with either Storage Replication Adapters (SRAs) or vSphere Replication (VR) to perform both test and planned migration/s between the data centres.

3. vSphere Metro Storage Cluster (vMSC)

Summary: Using an existing vMSC or by setting up a new vMSC for the migration, vMotion virtual machines between the sites.

4. Stretched vSphere Cluster / Storage vMotion

Summary: Present your storage at one or both sites to ESXi hosts at one or both sites and use vMotion and Storage vMotion to move workloads between sites.

5. Backup & Restore

Summary: Take a full backup of your virtual machines, transport the backup data to a new data centre (physically or by data replication) and restore the backup onto the new environment.

6. Vendor Specific Solutions

Summary: There are countless vendor specific solutions which range from Storage layer, to Application layer and everything in between.

7. Data Replication and re-register VMs into vCenter (or ESXi) inventory

Summary: The poor man’s SRM solution. Setup data replication at the storage layer and manually or via scripts re-register VMs into the inventory of vCenter or ESXi for sites with no vCenter.

Each of the above topics will be discussed in detail over the coming weeks so stay tuned, and if you work for a vendor with a specific solution you would like featured please leave a comment and I will get back to you.