Free Training – Virtualizing Business Critical Applications (vBCA)

Just found another great free self paced training course offered by VMware. This one is focused on one of my favourite topics, Virtualizing Business Critical Applications.

The course covers thing like what business critical applications can be virtualized efficiently as well as covering common customer objections, some of which are FUD or fiction.

In addition use cases, best practices and value propositions for virtualizing each business-critical application.

One important area the course covers (which can be hard to find reliable information on) is the licensing requirements for applications such as Oracle databases, SAP and the Microsoft Suite e.g.: SQL / Exchange / Sharepoint.

Kudo’s to VMware for releasing this training free of charge. The link to access the course is below.

Virtualizing Business-Critical Applications [V5.X] – Customer

Virtualizing Business Critical Applications – The Web-Scale Way!

Since joining Nutanix back in July 2013, I have been working on testing the performance and resiliency of a range of virtual workloads including Business Critical Applications on the Nutanix platform. At the time, Nutanix only offered a single form factor (4 nodes in 2RU) which was not always a perfect fit depending on customer requirements.

Fast forward to August 2014 and now Nutanix has a wide range of node types to meet most workload requirements which can be found here.

The only real gap in the node types was a node which would support applications with large capacity requirements and also have a very large active working set which requires consistent low latency and high performance regardless of tier.

So what do I mean when I say “Active working Set”. I would define this as a data being regularly accessed by the VM/s, for example a file server may have 10TB of data, but users only access 10% on a regular basis. This 10% I would classify as the Active Working Set.

Now back to the topic at hand, The reason I am writing this post is because this has been a project I have been working towards for some time, and I am very excited about this product being released to the market. I have no doubt it will further increase the already fast up take of the Web-scale solutions and provide significant value and opportunities to new and existing customers wanting to simplify their datacenter/s and standardize on Nutanix Web-scale architecture.

Along with many others at Nutanix, we proposed a new node type (being the NX-8150), which has been undergoing thorough testing in my team (Solutions & Performance Engineering) for some time and I am pleased to say is being officially released (very) soon!

nx8050

What is the NX-8150?

A 1 Node per 2RU platform with the following specifications:

* 2 CPU Sockets with two CPU options (E5-2690v2 [20 cores / 3.0 GHz] OR E5-2697v2 [24 cores / 2.7 GHz]
* 4 x Intel 3700 Series SSDs (ranging from 400GB to 1.6TB ea)
* 20 x 1TB SATA HDDs
* Up to 768GB RAM
* Up to 4 x 10GB NICs
* 4 x 1GB NICs
* 1 x IPMI (Out of band Management)

What is the use case for the NX-8150?

Simply put, Applications which have high CPU/RAM requirements with large active working sets and/or the requirement for consistent high performance over a large data set.

Some examples of these applications include:

* Microsoft Exchange including DAG deployments
* Microsoft SQL including Always on Availability Groups
* Oracle including RAC
* SAP
* Microsoft Sharepoint
* Mixed Production Server Workloads with varying Capacity & I/O requirements

The NX-8150 is a great platform for the above workloads as it not only has fast CPUs and up to a massive 768GB of RAM to provide substantial compute resources to VMs, but also up to a massive 6.4TB of RAW SSD capacity for Virtual machines with high IO requirements. For workloads where peak performance is not critical the NX-8150 also provides solid consistent performance across the “Cold Tier” provided by the 20 x 1TB HDDs.

As with all Nutanix nodes, Intelligent Life-cycle management (ILM) maximizes performance by dynamically migrating hot data to SSD and cold data to SATA to provide the best of both worlds being high IOPS and high capacity.

One of the many major advantages of Nutanix Web-Scale architecture is Simplicity and its ability to remove the requirement for application specific silos! Now with the addition of the NX-8150 the vast majority of workloads including Business Critical Applications can be ran successfully on Nutanix, meaning less silos are required, resulting in a simpler, more cost effective, scalable and resilient datacenter solution.

Now with a number of customers already placing advanced orders for NX-8150’s to deploy Business Critical Applications, it wont be long until the now common “Virtual 1st” policies within many organisations turns into a “Nutanix Web-Scale 1st” policy!

Stay tuned for upcoming case studies for NX-8150 based Web-Scale solutions!

Example Architectural Decision – HA Admission Control Policy with Software licensing constaints

High Availability Admission Control Setting & Policy with a Software Licensing Constraint

Problem Statement

The customer has a requirement to virtualize “Application X” which is currently running on physical servers. The customer is licensed for a maximum of 32 cores and the software vendor has strict licensing restrictions which do not recognize the use of DRS rules to restrict virtual machines to a sub-set of hosts within a cluster.

The application is Tier 1, and requires maximum availability. A capacity planner assessment has been conducted and found 32 cores and 256Gb RAM is sufficient to run all servers.

The servers requirements vary greatly from 1vCPU/2GB RAM to 8vCPU/64GB Ram with the bulk of the VMs 2vCPU or less with varying RAM sizes.

What is the most suitable hardware configuration and HA admission control policy / setting  that complies with the licensing restrictions while ensuring N+1 redundancy and minimizing the change of poor application performance?

Assumptions

1. None

Constraints

1. Software vendor has strict licensing requirements
2. Only 32 cores are licensed and the customer has no budget for further licenses
3. DRS rules cannot be used to isolate VMs onto one or more hosts due to software licensing agreement

Motivation

1. Ensure maximum availability for the Tier 1 application/s
2. Ensure optimal performance for Tier 1 application/s

Architectural Decision

Purchase a total of three (3) x Two (2) Way Servers, with 8 core CPUs and 128GB Ram each and form a cluster of three nodes.

For the HA Admission control setting use “Enable – Do not power on virtual machines that violate availability constraints”

For the HA admission control policy use “Specify a Failover Host” and select the third host in the cluster. (Leaving two active hosts in the cluster).

Justification

1. Enabling strict admission control is critical to ensure the required level of availability for the Tier 1 application
2. Ensure maximum CPU scheduling efficiency by having two hosts active within the cluster running virtual machines as opposed to a single large host
3. Having 2 active hosts in the cluster allows DRS some flexibility to load balance to resolve contention compared to using a single large 32 core host
4. N+1 redundancy is achieved as one host can fail and the “fail-over” host will become active and be able to take the failed hosts workloads without performance degrading
5. As only 32 cores ( 2 servers with 16 cores each) are active at any one time, the solution complies with the licensing constraint
6. Using CPUs with smaller numbers of cores (such as 5 x 2 way servers with 4 cores per socket) would result in larger VMs not fitting within NUMA nodes and potentially impacting memory performance. Although, with vNUMA in vSphere 5.0 this would be less of an issue.
7. All VMs will fit within a NUMA node thus giving the VMs maximum performance without the requirement for vNUMA which is only available in vSphere 5.0 or later
8. The compute resource supplied by the proposed cluster is sufficient to run the workloads as per the capacity planner assessment.

Implications

1. Additional networking and storage ports for three hosts as opposed to a two host cluster
2. If additional compute is required in the cluster, additional software licenses would need to be purchased. Alternativley if the application servers were redesigned to use a scale out methodology (especially for VMs with 4-8vCPUs) it would likley result in higher overcommitment ratios without significant contention and better utilization of the existing licensed cores
3. One host is sitting as a hot standby not servicing customer workloads and may be considered to be “waste”

Alternatives

1. Use 2 x 4 way 8 core ESXi hosts (32 cores per host) and set HA admission control to specify a fail over host
2. Use 5 x 2 Way 4 core ESXi hosts (8 cores per host) and set HA admission control to specify a fail over host

The Below is a basic diagram of the proposed solution.

FailoverHost

*Post updated February 11th to correct an error.