SQL AlwaysOn Availability Group support in VMDKs on NFS Datastores

Recently I had a customer contact me about doing SQL Always-On Availability Groups on Nutanix and they were wondering if it was supported due to the fact Nutanix recommend and run by default NFS datastores.

The customer did the right thing and investigated and came across the following VMware KB:

Microsoft Clustering on VMware vSphere: Guidelines for supported configurations (1037959)

The KB has the following table and the relevant section to MS SQL AAGs is highlighted.

SQLsupportnfs

As you can see the table indicates (incorrectly I might add) that SQL Always On Availability Groups are not supported on NFS when in fact the storage protocol is not relevant to non shared disk deployments.

The article goes onto provide further details about the supported clustering and vSphere versions as shown below with no further (obvious) mention of storage protocols.

pix1

However down the bottom of the article it states (as per the below screenshot):

3. In-Guest clustering solutions that do not use a shared-disk configuration, such as SQL Mirroring, SQL Server Always On Availability Group (Non-shared disk), and Exchange Database Availability Group (DAG), do not require explicit support statements from VMware.

pix2

As a result, SQL Always-On Availability Group non shared disk deployments are supported by VMware when deployed in VMDKs on NFS datastores (as are Exchange DAG deployments).

To ensure there is no further confusion, Michael Webster and I are currently working with VMware to have the KB updated so it is no longer confusing to customers with NFS storage.

For those of you wanting to learn more about Virtualizing SQL Server with vSphere, checkout my friend and colleague Michael Webster (VCDX#66) VMware Press book below.

virtualizing-sql-server-cover-small (1)

How to successfully Virtualize MS Exchange – Part 11 – Types of Datastores

Datastores are a logical construct which allows DAS,SAN or NAS storage is presented to ESXi. In the case of SAN and NAS storage it is generally “shared storage” which enabled virtualization features such as HA, DRS and vMotion.

When storage is presented to ESXi from DAS or SAN (block based) storage it is formatted with VMFS (Virtual Machine File System) and when storage is presented via file storage (NFS), it is presented to ESXi as an NFS mount.

Regardless of datastores being presented via block (iSCSI,FC,FCoE) or file based (NFS) protocols, they both host VMDKs (Virtual Machine Disks) which are block based storage. In the case of NFS, the SCSI commands are emulated by the hypervisor. This process is explained in Emulation of the SCSI Protocol and can be compared to Hyper-V SMB 3.0 (File) storage with VHDX which also emulates SCSI commands over File (SMB 3.0) storage.

The following diagram is courtesy of http://pubs.vmware.com and shows “host1” and “host2” runnings VMs across VMFS (block) and NFS (file) datastores. Note the VMs residing on datastore1 and datastore2 all have .vmx and .vmdk files and operate in the exact same way from the perspective of the VM, Guest OS and applications.

GUID-AD71704F-67E4-4AC2-9C22-10B531755566-high

The next paragraph is controversial and may be hotly debated, but to the best of my knowledge and the countless industry experts (from several different vendors) I have investigated this with over the last year including VMware’s formal position, it is completely true and I welcome any credible and detailed evidence to the contrary! (I even asked this question of Microsoft here).

Using either VMFS or NFS datastores meets the technical requirements for Exchange, being Write Ordering, Forced Unit Access (FUA) and SCSI abort/reset commands and because drives within Windows are formatted with NTFS which is a journalling file system as such the requirement to protect against Torn I/O is also achieved.

With that being said, Microsoft currently do not support Exchange running in VMDKs on NFS datastores.

The below is a quote from Exchange 2013 storage configuration options outlining the storage support statement for MS Exchange with the underlined section applying to NFS datastores.

All storage used by Exchange for storage of Exchange data must be block-level storage because Exchange 2013 doesn’t support the use of NAS volumes, other than in the SMB 3.0 scenario outlined in the topic Exchange 2013 virtualization. Also, in a virtualized environment, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.

If your interested in finding out more information about MS Exchange running in VMDKs on NFS datastores, see the links at the end of this post.

Now let’s discuss the limitations of Datastore’s and what impact they have on vSphere environments with MS Exchange deployments and why.

Number of LUNs / NFS Mounts : 256

This can be a significant constraint when using one or multiple datastore/s per Exchange MBX/MSR VM however in my opinion this should not be necessary nor is it recommended.

Generally Exchange VMDKs can be mixed with other VMs in the same datastore providing there is not a performance constraint. As such keep high I/O VMs (including other Exchange VMs) in different datastores.

As discussed in Part 10, if legacy per LUN snapshot based backup solutions are being used then in-Guest iSCSI may have to be used but for new deployments especially where new storage will be purchased, per LUN solutions should not be considered!

Number of Paths per ESXi host : 1024

If using VMFS datastores, the simple fact is 4 paths per LUN is the maximum you can use if you plan to reach or near the limit of 256 datastores. This is not a performance limiting factor with any enterprise grade storage solution.

Number of Paths per LUN : 32

If you configure 32 paths per LUN, you straight away restrict yourself to 32 LUNs per ESXi host (and vSphere cluster) so don’t do it! As mentioned earlier 4 paths per LUN is the maximum if you plan to reach 256 datastores. Do the math and this limit is not a problem.

Number of Paths per NFS mount : N/A

NFS mounts connect over IP to the Storage Controllers via vNetworking so there is no maximum as such although with NFS v3 only one physical NIC will be used at a time per IP subnet. This topic will be covered in a future post on vNetworking for MS Exchange.

VMFS Datastore Maximum Size : 64TB

256 LUNs x 64TB = 16,384TB per vSphere HA cluster. This is not a problem.

NFS Datastore Maximum Size : Varies depending on vendor

The limit depends on vendor but its typically higher than the VMFS limit of 64TB per mount, with some vendors not having a limit.

So you’re safe to assume >=16.384TB but always check with your current or potential storage vendor.

ESXi hosts per Volume (Datastore) : 64 (Note: HA cluster limit of 32)

As a vSphere cluster is limited to 32 currently, this limitation isn’t really an issue. With vSphere 6.0 it is expected cluster size will increase to 64 but ESXi hosts per Volume is not a maximum I have ever heard of being reached.

Recommendations:

1. If you require a fully supported configuration use VMFS datastores.
2. Maximum 4 paths per LUN to ensure maximum scalability (if required).
3. Consider the underlying storage configuration and type of a datastore before deploying MS Exchange.
4. Do not deployment MS Exchange VMDKs onto datastores with other high I/O workloads
5. When mixing workloads on a datastore, enable SIOC to ensure fairness between workloads in the event of storage contention.
6. Spread Exchange VMDKs across multiple datastores for maximum performance and resiliency. e.g.: 12 VMDKs per Exchange MBX/MSR VM across 4 mixed workload datastores.
7. Do not use dedicated datastore/s per MS Exchange database or VM. (This is unnecessary from a performance perspective)
8. If choosing to use NFS Datastores, purchase Premier Support from Microsoft and negotiate support for NFS. Microsoft do provide support for many customers with Premier Support with Exchange running on NFS datastores although it is not their preference.

On a final note, in future posts I will be discussing in detail the underlying storage from a performance and availability perspective with Database availability Groups in mind.

Thank you to @mattliebowitz for reviewing this post. I highly recommend his book Virtualizing Business Critical Applications by VMware Press. I purchase and reviewed this book mid 2014, well worth a read!

Back to the Index of How to successfully Virtualize MS Exchange.

Articles on MS Exchange running in VMDK on NFS datastores

1. “Support for Exchange Databases running within VMDKs on NFS datastores

2. Microsoft Exchange Improvements Suggestions Forum – Exchange on NFS/SMB

3. What does Exchange running in a VMDK on NFS datastore look like to the Guest OS?

4. Integrity of I/O for VMs on NFS Datastores Series

Part 1 – Emulation of the SCSI Protocol
Part 2 – Forced Unit Access (FUA) & Write Through
Part 3 – Write Ordering
Part 4 – Torn Writes
Part 5 – Data Corruption

Integrity of I/O for VMs on NFS Datastores – Part 5 – Data Corruption

This is the fifth part of a series of posts covering how the Integrity of Write I/O is ensured for Virtual Machines when writing to VMDK/s (Virtual SCSI Hard Drives) running on NFS datastores presented via VMware’s ESXi hypervisor as a “Datastore”.

This part will focus on Data Corruption.

As a reminder from the first post, this post is not talking about presenting NFS direct to Windows.

So why am I covering data corruption? Simple, because there is a misconception that SCSI commands are not properly supported for VMs running on NFS datastores which leads to corruption. This was covered in Part 1, so Part 5 will focus on data corruption not specific to NFS, but which can effect all storage platforms and how it occurs, then how storage solutions can mitigate the risk of data corruption issues.

The following data is a summary of the data provided in An analysis of data corruption in the storage stack.

Netapp conducted a large scale study into data corruption, which covered >1 Million HDDs across tens of thousands of Netapp systems over 41 months (2004 – 2007) and long story short, Netapp detected a level of data corruption which surprised me and seems to disprove many things like advertised MTBF for HDDs.

The following shows a breakdown of the problems found.

netappfailureanalysis

The first thing I noticed in the above pie charts is the vast difference between the percentage of failures in Enterprise grade disks (left) and nearline based disks (right).

It also shows physical interconnects to be a large percentage of failures, which highlights the need for simplicity in the storage solution. In addition, one of the more surprising results in the level of storage protocol and performance based failures being the cause of corruption.

Note: In this study, the majority of systems deployed were FC (Block storage based) based, this highlights that a storage protocol itself regardless of being block or file based storage, can have issues if improperly implemented. So regardless of storage protocol, corruption can occur.

The below summary of corruption type and percentage of disks effected shows the dramatic 10x more issues with SATA drives compared to Enterprise grade drives.

NLvsEnterprise

The above also shows bit corruptions or Torn Writes effect more disks compared to lost or misdirected writes, which highlights the importance of Torn I/O Protection (covered in Part 4).

The article summarizes in the following points:summary

The main take away from my perspective is:

1. The requirement to have corruption handling mechanisms for any environment running workloads which require data integrity.
2. Data should be spread out (ideally across disks) to minimize the chance of issues.

The article went on to form these conclusions:

conclustion

In Summary:

1. Data corruption can occur on JBOD , enterprise grade storage solutions and everything in between.
2. SATA drives have a much higher rate (~10x) of corruption.
3. Enterprise grade drives are much better from a data integrity perspective.
4. Corruption handling via sector and ideally block based checksums is essential on writes.
5. Using a checksum on Read helps detect corrupted data.
6. Corruption can occur even when no ECC errors are reported by a physical HDD.
7. Any storage protocol implementation can have bugs which can lead to corruption.
8. Backup / Recovery solutions are essential. Reliance solely on primary storage or application level backups using disks puts your data at risk.
9. Solutions solely dependant on application level data protection on disk are at risk of corrupted data being replicated to other active/passive or backup copies.

My final point, in an enterprise grade storage solutions which use checksums to verify data integrity on write and reads, have a much lower risk of data corruption regardless of media type and storage protocol.

JBOD style deployments using SATA drives have a significantly higher risk of data corruption which is contributed to by the SATA drives 10x higher corruption rates and the lack of enterprise grade checksum features found in some shared storage (SAN/NAS) solutions.

Integrity of Write I/O for VMs on NFS Datastores Series

Part 1 – Emulation of the SCSI Protocol
Part 2 – Forced Unit Access (FUA) & Write Through
Part 3 – Write Ordering
Part 4 – Torn Writes
Part 5 – Data Corruption

Nutanix Specific Articles

Part 6 – Emulation of the SCSI Protocol (Coming soon)
Part 7 – Forced Unit Access (FUA) & Write Through (Coming soon)
Part 8 – Write Ordering (Coming soon)
Part 9 – Torn I/O Protection (Coming soon)
Part 10 – Data Corruption (Coming soon)

Related Articles

1. What does Exchange running in a VMDK on NFS datastore look like to the Guest OS?
2. Support for Exchange Databases running within VMDKs on NFS datastores (TechNet)
3. Microsoft Exchange Improvements Suggestions Forum – Exchange on NFS/SMB
4. Virtualizing Exchange on vSphere with NFS backed storage