Heterogeneous Nutanix Clusters Advantages & Considerations

Lets start with a simple example, the below shows a 4 node cluster mixing 2 x NX-3060 nodes with 2 x NX-8035 nodes. Both node types share the same Haswell CPU types but the NX-3060 has ~2TB usable and the NX-8035 has ~8TB usable.

3060and8035Mixed

Assuming the cluster capacity was 50% utilized the NDSF layer would look similar to this:

3060and8035cluster50percentused

The above shows the NDSF having a total Storage Pool capacity of 20TB with 50% used (10TB). As we have a heterogenous cluster, we have 2 different node types with vastly different usable capacity.

Nutanix Disk Balancing automatically balances the storage to ensure the utilization percentage of all SSDs/HDDs within the cluster are within +-15%. This means administrators do not have to worry about capacity management on a per node basis, capacity management only needs to be performed at the storage pool (cluster) layer.

Advantage 1: No silos of storage capacity is heterogeneous environments

Advantage 2: NDSF disk balancing ensures the data is evenly distributed throughout the cluster

Advantage 3: There is no requirement for hypervisor level storage capacity management such as Storage DRS (SDRS).

For more information on why Storage DRS is not required see: Storage DRS and Nutanix – To use, or not to use, that is the question?

In a heterogeneous environment, it is likely you will have multiple workloads with different capacity and performance requirements. The below diagram shows the same 4 node cluster, with a single storage pool and 4 containers with different data protection and reduction settings to suit a wide range of application requirements.

Note: The RF3 container shown below would only be possible in clusters of 5 nodes or more, but is shown to illustrate the flexibility/capabilities of NDSF.

HetroClusterCapacity

The storage pool itself has up to 20TB usable (assuming RF2 and excluding data reduction savings). In the Pool we can see four Containers which can be thought of as policies which can be applied to Virtual Machines or Virtual Disks.

Container01 is configured with RF2 and In-Line compression and reports 10TB free space as the underlying storage pool (where capacity is managed) is 50% utilised. Therefore the Container reports free space as all the available capacity within the Storage Pool based on its configured RF.

Container02 has RF2, In-Line compression and EC-X enabled but you will note it also reports 10Tb free space, as capacity is not assigned to a container, its shared between all containers within a Storage Pool.

Container03 is configured with a RF3 which is different to Containers 01 and 02, as such the container reports free space based on its configured RF of 3, so it shows 13.3TB usable and 6.66Tb free space as that is the maximum data that can be supported in that container based on its storage policies.

Container04 reports the same free space as Container 01 and 02, as its configured with the same RF. While Container04 has all data reduction technologies enabled, the Container reports actual free space, as data reduction takes effect the usable capacity will change.

It is possible to set capacity reservations on Containers where an application or tenant requires a guarantee as to the usable capacity available, it is also possible to set limits on containers to prevent workloads using more than a specified amount of capacity. However, for most use cases, I recommend not using Reservations or Limits and simply manage capacity at the Storage Pool layer.

Nutanix also supports VMs with more assigned/used capacity than the node they are running on, for more information see: What if my VMs storage exceeds the capacity of a Nutanix node?

Regardless of what node type/s reside within a Nutanix cluster, there is no advanced settings required to be configured such as Queue Depths, VAAI and multi-pathing, which can be required when mixing legacy storage platforms in the same cluster. There is also no requirement for Storage DRS to manage either performance or capacity as discussed earlier.

Advantage 3: No silos of storage capacity, all capacity is shared in the storage pool

Advantage 4: Storage policies such as RF and Data Reduction can be changed on the fly as required and multiple policies are supported within the same cluster.

For more information about Nutanix data reduction technologies, see: Nutanix Implementation of Data Avoidance & Reduction Technologies

Regardless of the mixture of node types and their respective capacity/performance characteristics, there is no advanced configuration required to achieve optimal performance.

Nutanix automatically manages I/O pathing and as data locality ensures most data is read locally and writes are always written local to the VM and then replicas distributed throughout the cluster, it minimizes the chances of hot spots by default.

In the unlikely event one nodes local SSD tier becomes saturated, NDSF will automatically write data across the shared SSD tier until the local nodes SSD tier has sufficient capacity to resume local writes. This avoids the requirement for a storage admin to take any corrective actions.

Advantage 5: In the unlikely event of saturation of a nodes SSD tier, NDSF automatically redirects new I/O until ILM (tiering) can free up capacity within the local tier.

NDSF natively distributes writes throughout all nodes within the cluster. This means all nodes within heterogeneous clusters increase the capacity, performance and resiliency of the entire cluster.

To increase the performance of a single VM, you have numerous options. All you need to do is migrate (vMotion for ESXi, Live Migration for Hyper-V or Migrate for AHV) to a node with higher spec physical processors, more SSD drives and/or more SATA spindles.

There is no requirement to Storage vMotion, or relocate the VM to a new Datastore/Container. NDSF manages the storage layer automatically and will localize hot data if/when required.

Advantage 6: No silos of storage capacity, all capacity is shared in the storage pool

Advantage 7: All nodes contribute to the capacity, performance and resiliency of the cluster

Heterogeneous clusters are managed by a single HTML 5 GUI called PRISM. There is no need to access multiple management interfaces for different storage types.

Advantage 8: Heterogeneous clusters are managed via a single HTML 5 GUI.

Nutanix also supports Pin to SSD which allows workloads requiring all flash to reside within a hybrid (SSD+SATA) cluster and be guaranteed all flash performance.

VMs or Virtual Disks can also be marked to be stored solely in Flash on the fly if/when required and vice versa.

Advantage 9: No silos required for workloads requiring All Flash performance

Nutanix eliminates the complexity around managing performance at a datastore layer. Nutanix supports up to the chosen hypervisors limits, e.g.: vSphere HA limit is 2048 VMs per datastore. As all controllers within a cluster actively service all datastores (Containers), performance isn’t constrained at a datastore layer like with traditional storage products.

For more information see: Unlimited VMs per datastore? Its not a myth with Nutanix!

Advantage 10: No performance concerns/constraints at the datastore level

What about Considerations for Heterogeneous Clusters?

From a performance perspective, always ensure you size to have your N+x (e.g.: N+1 , N+2 etc) node/s sized >= the largest node in the cluster to ensure in the event of a node failure, workloads benefiting from higher performance nodes can failover to equivalent nodes.

From a capacity perspective, for NDSF to be able to restore the configured RF (RF2 or RF3) in the event of a node failure, sufficient capacity must exist within the storage pool. As such, when using high capacity nodes such as NX-8035s , NX-8150s or NX-6035C storage only nodes, ensure you have >= capacity of the largest node free within the storage pool.

Advantage 11: Performance and availability sizing for heterogeneous clusters is simple.

Another consideration is for mission-critical or high I/O applications, spread these evenly across the nodes and ideally ensure the active working set fits within the local SSD tier. Doing so will maximise performance, but in the event a very large workload cannot fit with the local SSD, its data will resided within the shared SSD tier and be actively serviced by multiple Controller VMs.

For more information about sizing see:  Rule of Thumb: Sizing for Storage Performance in the new world.

Advantage 12: The NDSF shared SSD tier ensures in the event a workload exceeds the local SSD capacity that the application still enjoys all flash performance by distributing data intelligently across the cluster.

Over time, when adding new nodes, VMs can be quickly/easily migrated to newer, higher performance/capacity nodes without any preparation. The VMs will immediately benefit from the newer nodes CPU,RAM and storage performance even if most of its data is still stored on older node types.

Older nodes can be non disruptively removed once they are end of life, again without any preparation or administrator intevenston.

Advantage 13: Workloads on NDSF benefit from newer generation nodes immediately without complex design/migration activities.

Summary:

  • Nutanix supports and recommends heterogeneous clusters
  • No complexity with multi-pathing, it’s optimal out of the box
  • No custom per datastore configuration
  • VAAI just works, no advanced configuration required due to mixed node types
  • No compromise required to mix node types
  • No silos of storage capacity, all capacity is shared in the storage pool
  • All nodes contribute to performance of the cluster
  • No balancing VMs across datastores/storage devices is required to improve performance/resiliency
  • NDSF disk balancing ensures the data is evenly distributed throughout the cluster helping avoid hotspots
  • The distribution of RF traffic throughout the cluster also helps avoid hotspots
  • No silos required for workloads requiring all flash performance
  • NDSF ensures VMs can immediately benefit from the addition of newer generation node types
  • Nodes can be added/removed without system administrator performing data migrations

SQL AlwaysOn Availability Group support in VMDKs on NFS Datastores

Recently I had a customer contact me about doing SQL Always-On Availability Groups on Nutanix and they were wondering if it was supported due to the fact Nutanix recommend and run by default NFS datastores.

The customer did the right thing and investigated and came across the following VMware KB:

Microsoft Clustering on VMware vSphere: Guidelines for supported configurations (1037959)

The KB has the following table and the relevant section to MS SQL AAGs is highlighted.

SQLsupportnfs

As you can see the table indicates (incorrectly I might add) that SQL Always On Availability Groups are not supported on NFS when in fact the storage protocol is not relevant to non shared disk deployments.

The article goes onto provide further details about the supported clustering and vSphere versions as shown below with no further (obvious) mention of storage protocols.

pix1

However down the bottom of the article it states (as per the below screenshot):

3. In-Guest clustering solutions that do not use a shared-disk configuration, such as SQL Mirroring, SQL Server Always On Availability Group (Non-shared disk), and Exchange Database Availability Group (DAG), do not require explicit support statements from VMware.

pix2

As a result, SQL Always-On Availability Group non shared disk deployments are supported by VMware when deployed in VMDKs on NFS datastores (as are Exchange DAG deployments).

To ensure there is no further confusion, Michael Webster and I are currently working with VMware to have the KB updated so it is no longer confusing to customers with NFS storage.

For those of you wanting to learn more about Virtualizing SQL Server with vSphere, checkout my friend and colleague Michael Webster (VCDX#66) VMware Press book below.

virtualizing-sql-server-cover-small (1)

How to successfully Virtualize MS Exchange – Part 8 – Local Storage

As discussed in Part 7, Local Storage is probably the most basic form of storage we can present to ESXi and use for Exchange MBX/MSR VMs.

The below screen shot shows what local storage can look like to an ESXi host.

LocalStorage

As we can see above, the highlighted datastore is simply an SSD formatted with VMFS5. So in this case a single drive not running RAID, and therefore in the event of the drive failing, any data on the drive would be permanently lost.

Note: The above image is simply an example. In reality multiple drives most likely SAS or SATA would be used as SSD is unnecessary for Exchange.

In some ways this is very similar to a physical Exchange deployment on JBOD storage and I would like to echo the recommendations Microsoft give for JBOD deployments from the Exchange 2013 storage configuration options guide and say for JBOD deployments, I strongly recommend at least 3 database copies.

As per the recommendation in Part 4 (DRS), MS Exchange MBX/MSR VMs should always run on separate ESXi hosts to ensure a single host failure does not potentially cause an issue for the DAG. This is especially important because if two Exchange servers shared the same ESXi host and local storage, a single ESXi host outage could cause data loss and downtime for part or all of the Exchange environment.

The below is a screen shot from the Exchange 2013 storage configuration options guide showing the recommendations based on RAID or JBOD deployments. In my option these recommendations also apply to virtualized Exchange deployments on Local storage.

JBODexchange

Another option is to use Local Storage in a RAID configuration to eliminate the Single Point of Failure (SPOF) of a single drive failure.

Again, I agree with Microsoft’s recommendations and suggest at least two database copies when using a RAID configuration and again, each Exchange VM must run on its own ESXi host on dedicated physical disks.

Note: The RAID controller itself is still a SPOF which is why multiple copies is recommended from both an availability and data protection perspective.

Let’s now discuss the pros and cons for using Local Storage with JBOD for your Virtualized Exchange Deployment.

PROS

1. Generally lower cost per GB than centralized storage (e.g.: SAN)
2. Higher usable capacity per drive compared to RAID or centralized storage configurations using RAID or other propitiatory data protection techniques.
3. Local JBOD Storage formatted with VMFS is a fully supported configuration

CONS

1. No protection from data loss in the event of a JBOD drive failure. Note: For non DAG deployments, RAID and 3rd party backups should always be used!
2. Performance/Capacity in JBOD deployments is limited to the capabilities of a single drive.
3. Loss of Virtualization functionality such as HA / DRS and vMotion (without performing a Storage vMotion every time)
4. Can be difficult/costly to scale when nearing capacity.
5. Increased Management (Operational) overheads managing decentralized storage
6. At least 3 database copies are recommended, requiring more Exchange MBX/MSR servers.
7. Little/no protection against data corruption which may lead to all DAG copies suffering corruption. Note: If the corruption is not discovered in time, LAGGED copies can also be compromised.
8. Capacity cannot be shared between between ESXi hosts which may lead to inefficient use of the available capacity.

Next here are some pros and cons for using Local Storage with RAID for your Virtualized Exchange Deployment.

PROS

1. Generally lower cost per GB than centralized storage (e.g.: SAN)
2. A single drive failure will not cause data loss or a DAG failover
3. Performance is not limited to a single drives capabilities
4.Local Storage with RAID formatted with VMFS is a fully supported configuration
5. As there is no data loss with a single drive failure, less database copies are required (2 instead of >=3 for JBOD)

CONS

1. Increased Management (Operational) overheads managing decentralized storage
2. Performance/Capacity is limited to the capabilities of a single drive
3. Loss of Virtualization functionality such as HA / DRS and vMotion (without performing a Storage vMotion every time)
4. Little/no protection against data corruption which may lead to all DAG copies suffering corruption. Note: If the corruption is not discovered in time, LAGGED copies can also be compromised.
5. Capacity cannot be shared between ESXi hosts which may lead to inefficient use of the available capacity
6. Performance is constrained by a single RAID controller / set of drives and can be difficult/costly to scale when nearing capacity.

For more information about data corruption for JBOD or RAID deployments, see “Data Corruption“.

Recommendations:

1. When using local storage, (JBOD or RAID), as per Part 4, run only one Exchange MBX/MSR VM per ESXi host
2. Use dedicated physical disks for Exchange MBX/MSR VM (i.e.: Do not share the same disks with other workloads)
3. Store the Windows OS / Exchange application VMDK on local storage which is configured with RAID to ensure a single drive does not cause the VM an outage.
4. Ensure ESXi itself is install on local storage configured with RAID (and not a USB key) as the Exchange VM is dependant on that host and is not protected by vSphere HA. Nor is it easily/quickly portable due to the storage not being shared.

Summary:

Using Local Storage in either a JBOD or RAID configuration is fully supported by Microsoft and is a valid option for MS Exchange deployments.

In my opinion Local Storage deployments have more downsides than upsides and I would recommend considering other storage options for Virtualized Exchange deployments.

Other options along with my recommended options will be discussed in the next 3 parts of this series.

Back to the Index of How to successfully Virtualize MS Exchange.

~ Post Updated January 2nd 2015 Thanks to feedback from @zerszenyi ~