vSphere | PVSCSI Adapters & striped/spanned NTFS volumes

A little while ago I wrote a post titled “Splitting SQL datafiles across multiple VMDKs for optimal VM performance” where I talked about how SQL databases can be split with minimal/no interruption to production to give better performance by spreading the IO load across multiple PVSCSI adapters and virtual machine disks (VMDKs).

In a follow up post titled “SQL & Exchange performance in a Virtual Machine” I mentioned the above article and concluded:

If the DBA is not confident doing this, you can also just add multiple virtual disks (connected via multiple PVSCSI controllers) and create a stripe in guest (via Disk Manager) and this will also give you the benefit of multiple vdisks.

Both posts have been very popular and one of the comments I got via twitter was that creating striped or spanned NTFS volumes in guest was not supported by VMware when using PVSCSI.

This is stated in VMware KB “Configuring disks to use VMware Paravirtual SCSI (PVSCSI) adapters (1010398)” as shown below:

kbspanned

Prior to writing both posts I was aware of this KB, but after comprehensively testing this numerous times on different platforms over the years, and more recently on Nutanix, I concluded after liaising with many VMware experts (including several VCDXs) that this was either a legacy recommendation which needed to be updated, or simply a mistake by the author of the KB (which can happen as we’re all human).

As such, I followed up with VMware by raising a SR on August 14th 2016.

After following up several times I had given up waiting for an answer but I am pleased to say today (2nd November 2016) I finally got a reply.

vmwaregsspvscsi

In summary, spanned (and stripped volumes which was not mentioned in the KB) are supported and to quote VMware GSS “will have no issues”.

One strong recommendation I have is DO NOT use VMDKs hosted in different failure domains (e.g.: LUNs, SAN/NASs) in the one spanned/striped volume as this increases the size of the failure domain and your chances of the volume going offline.

So there you have it, if you need to increase the performance for an application and you are not confident to split databases at the application level, you can (typically) get increased IO performance by using striped volumes in guest which are quick and easy to setup. The only downside is you will need to take your DB offline to copy it to the new volume before bringing it back online.

Hope this puts peoples mind at ease about striped volumes with PVSCSI.

Storage Performance : ReFS vs NTFS

I am regularly asked by customers if they should use NTFS or the newer ReFS when formatting drives for applications like Microsoft Exchange and SQL.

Most customers are asking in the context of performance, so I thought I would share some recent testing results using MS Exchange Jetstress.

Firstly, what is ReFS and when/would you use ReFS?

What is ReFS?

Resilient File System (ReFS) is a new local file system. It maximizes data availability, despite errors that would historically cause data loss or downtime. Data integrity ensures that business critical data is protected from errors and available when needed. Its architecture is designed to provide scalability and performance in an era of constantly growing data set sizes and dynamic workloads.

The key features of ReFS are:

  • Integrity: ReFS stores data so that it is protected from many of the common errors that can cause data loss. File system metadata is always protected. Optionally, user data can be protected on a per-volume, per-directory, or per-file basis. If corruption occurs, ReFS can detect and, when configured with Storage Spaces, automatically correct the corruption. In the event of a system error, ReFS is designed to recover from that error rapidly, with no loss of user data.
  • Availability: ReFS is designed to prioritize the availability of data. With ReFS, if corruption occurs, and it cannot be repaired automatically, the online salvage process is localized to the area of corruption, requiring no volume down-time. In short, if corruption occurs, ReFS will stay online.
  • Scalability: ReFS is designed for the data set sizes of today and the data set sizes of tomorrow; it’s optimized for high scalability.
  • App Compatibility: To maximize AppCompat, ReFS supports a subset of NTFS features plus Win32 APIs that are widely adopted.
  • Proactive Error Identification: The integrity capabilities of ReFS are leveraged by a data integrity scanner (a “scrubber”) that periodically scans the volume, attempts to identify latent corruption, and then proactively triggers a repair of that corrupt data.

Source: Microsoft Technet – Resilient file system

From my perspective, ReFS makes sense when using physical servers with unintelligent storage such as JBOD or any storage which does not perform things such as checksums on both read and write IO and enforce Force Unit Access (FUA). However if you’re deploying MS Exchange / MS SQL etc on intelligent storage such as Nutanix Acropolis Distributed Storage Fabric (ADSF) then ReFS is not required as data integrity is already ensured by the storage layer. For example, in the event of silent data corruption, ADSF will detect the corruption on read and simply retrieve the data from the second copy which resides on a different physical drive on a different node within the cluster. This is also transparent to the Virtual Machine, OS and application and therefore compatible with any OS and application.

As a result ReFS (at least in its current version) is not required for deployments of Microsoft OS,Apps on Nutanix or other storage solutions if they have the same functionality.

None the less, this is not supposed to be a post about Nutanix, so let’s now look at the test bed and results of the performance comparison so you can make an informed decision about which to use

Test Bed Setup

The test bed setup is as follows:

Hypervisor: ESXi 5.5 Rel: 3248547

2 Virtual Machines cloned from the same template:
Windows 2012 R2 , 4 vCPUs , 24Gb RAM
4 Paravirtual SCSI adapters
1 vDisk for OS , 4 vDisks for DB, 4 vDisks for Logs

Both VMs are running on the same node, with only one VM running Jetstress at a time. All tests runs were back to back to ensure results would be fair and to check the consistency of the results.

The only difference between the two VMs is as follows:

VM1:

4 vDisks formatted with NTFS and 64k allocation size for Database
4 vDisks formatted with NTFS and 4k allocation size for Logs

VM2:

All 8 vDisks formatted with ReFS (64k)

Tests performed:

Three Jetstress runs per VM one after another, importantly with new databases created before each run to ensure a fair baseline. Doing this ensured the results were skewed by having the Extent Cache (In-Memory Read Cache) or the Medusa Cache (In-Memory Metadata Cache) pre-warmed.

Each run used 16 threads and resulted in the following results.

ReFS Jetstress Instance:

Run One: 6697 IOPS
Run Two: 6896 IOPS
Run Three: 6796 IOPS

Average: 6796 IOPS (approx +-3% between runs)

NTFS Jetstress Instance:

Run One: 7328 IOPS
Run Two: 7240 IOPS
Run Three: 7296 IOPS

Average: 7288 IOPS (approx +-1% between runs)

Result:

The difference being approx 7% higher performance and more consistency when using NTFS.

Additional Tests:

Out of interest I repeated the tests with a lower thread count (8) to see if the results were consistent as we decreased the threads.

8 Threads:

ReFS: 3921 IOPS
NTFS: 4079 IOPS

The result again went in favour of NTFS by approx 4%. This makes sense as the advantage would diminish as the pressure on the storage layer reduces.

Autotune Result:

I then repeated the test with Jetstress set to Autotune with the following results.

ReFS: 16673 IOPS @ 91 threads (Autotuned)
NTFS: 17758 IOPS @ 96 threads (Autotuned)

The autotune results again show that NTFS has an advantage over ReFS of approx 7% which is in line with the results using 16 threads manually configured.

CPU overheads comparisons

ReFS Jetstress Instance:

Run One:Avg 39.293% (Min 23.725 / Max 44.127)
Run Two:Avg 40.28% (Min 37.785 / Max 44.366)
Run Three: Avg 40.175% (Min 36.520 / Max 43.843)

Average: 39.916%

NTFS Jetstress Instance:

Run One: Avg 39.390% (Min 36.746 / Max 42.651)
Run Two: Avg 39.719% (Min 23.613 / Max 45.960)
Run Three:Avg 39.844% (Min 37.347 / Max 42.400)

Average: 39.651%

So NTFS achieved 7% better performance than ReFS using the same thread count even with the Data Integrity features turned off for ReFS volumes without using any more CPU.

Summary:

Overall these tests demonstrate that NTFS consistently outperforms ReFS for MS Exchange type IO patterns. For intelligent storage, ReFS has no advantages and NTFS will provide better performance with roughly the same CPU overheads and without any risk of data integrity issues.

As the recommendation for ReFS is to disable the data integrity features for Exchange, I am yet to hear a good justification as to why ReFS is recommended, but I welcome any comments from those in the know and if the justifications are solid I will update the post to reflect these reasons.

Related Articles:

1. Jetstress Testing with Intelligent Tiered Storage Platforms

2. MS Exchange on Nutanix Acropolis Hypervisor (AHV)

3. How to successfully Virtualize MS Exchange

4. Deduplication and MS Exchange

How to successfully Virtualize MS Exchange – Part 10 – Presenting Storage direct to the Guest OS

Let’s start with listing three common storage types which can be presented direct to a Windows OS?

1. iSCSI LUNs
2. SMB 3.0 shares
3. NFS mounts

Next let’s discuss these 3 options.

iSCSI LUNs are a common way of presenting storage direct to the Guest OS even in vSphere environments and can be useful for environments using storage array level backup solutions (which will be discussed in detail in an upcoming post).

The use of iSCSI LUNs is fully supported by VMware and Microsoft as iSCSI meets the technical requirements for Exchange, being Write Ordering, Forced Unit Access (FUA) and SCSI abort/reset commands. iSCSI LUNs presented to Windows are then formatted with NTFS which is a journalling file system which also protects against Torn I/O.

In vSphere environments nearing the configuration maximum of 256 datastores per ESXi host (and therefore HA/DRS cluster) presenting iSCSI LUNs to applications such as Exchange can help ensure scalability even where vSphere limits may have been reached.

Note: I would recommend reviewing the storage design and trying to optimize VMs/LUN etc first before using iSCSI LUNs presented to VMs.

The problem with iSCSI LUNs is they result in additional complexity compared to using VMDKs on Datastores (discussed in Part 11). The complexity is not insignificant as typically multiple LUNs need to be created per Exchange VM, things like iSCSI initiators and LUN masking needs to be configured. Then when the iSCSI initiator driver is updated (say via Windows Update) you may find your storage disconnected and you may need to troubleshoot iSCSI driver issues. You also need to consider the vNetworking implications as the VM now needs IP connectivity to the storage network.

I wrote this article (Example VMware vNetworking Design w/ 2 x 10GB NICs for IP Storage) a while ago showing an example vNetworking design that supports IP storage with 2 x 10GB NICs.

The above article shows NFS on the dvPortGroup name but the same configuration is also optimal for iSCSI. Each Exchange VM would then need a 2nd vmNIC connected to the iSCSI portgroup (or dvPortgroup) ideally with a static IP address.

IP addressing is another complexity added by presenting storage direct to VMs rather than using VMDKs on datastores.

Many system administrators, architects and engineers might scoff at the suggestion iSCSI is complex, but in my opinion while I don’t find iSCSI at all difficult to design/install/configure and use, it is significantly more complex and has many more points of failure than using a VMDK on a Datastore.

One of the things I have learned and seen benefit countless customers over the years is keeping things as simple as possible while meeting the business requirements. With that in mind, I recommend only considering the use of iSCSI direct to the Guest OS in the following situations:

1. When using a Backup solution which triggers a storage level snapshot which is not VM or VMDK based. i.e.: Where snapshots are only support at the LUN level. (Older storage technologies).
2. Where ESXi scalability maximums are going to be reached and creating a separate cluster is not viable (technically and/or commercially) following a detailed review and optimization of storage for the vSphere environment.
3. When using legacy storage architecture where performance is constrained at a datastore level. e.g.: Where increasing the number of VMs per Datastore impacts performance due to latency created from queue depth or storage controller contention.

Next let’s discuss SMB 3.0 / CIFS shares.

SMB 3.0 or CIFS shares are commonly used to present storage for Hyper-V and also file servers. However presenting SMB 3.0 directly to Windows is not a supported configuration for MS Exchange because SMB 3.0 presented to the Guest OS directly does not meet the technical requirements for Exchange, such as Write Ordering, Forced Unit Access (FUA) and SCSI abort/reset commands.

However SMB 3.0 is supported for MS Exchange when presented to Hyper-V and where Exchange database files reside within a VHD which emulates the SCSI commands over the SMB file protocol. This will be discussed in the upcoming Hyper-V series.

The below is a quote from Exchange 2013 storage configuration options outlining the storage support statement for MS Exchange.

All storage used by Exchange for storage of Exchange data must be block-level storage because Exchange 2013 doesn’t support the use of NAS volumes, other than in the SMB 3.0 scenario outlined in the topic Exchange 2013 virtualization. Also, in a virtualized environment, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.

The above statement is pretty confusing in my opinion, but what Microsoft mean by this is SMB 3.0 is supported when presented to Hyper-V with Exchange running in a VM with its databases housed within one or more VHDs. However to be clear presenting SMB 3.0 direct to Windows for Exchange files is not supported.

NFS mounts can be used to present storage to Windows although this is not that common. Its important to note presenting NFS directly to Windows is not a supported configuration for MS Exchange and as with SMB 3.0, presenting NFS to Windows directly also does not meet the technical requirements for Exchange, being Write Ordering, Forced Unit Access (FUA) and SCSI abort/reset commands. iSCSI LUNs can be formatted with VMFS which is a journalling file system which also protects against Torn I/O.

As such I recommend not presenting NFS mounts to Windows for Exchange storage.

Note: Do not confuse presenting NFS to Windows which presenting NFS datastores to ESXi as these are different. NFS datastores will be discussed in Part 11.

Summary:

iSCSI is the only supported storage protocol to present storage direct to Windows for storage of Exchange databases.

Lets now discuss the Pros and Cons for presenting iSCSI storage direct to the Guest OS.

PROS

1. Ability to reduce overheads of legacy LUN based snapshot based backup solutions by having MS Exchange use dedicated LUN/s therefore reducing delta changes that need to be captured/stored. (e.g.: Netapp SnapManager for Exchange)
2. Does not impact ESXi configuration maximums for LUNs per ESXi host as storage is presented to the Guest OS and not the hypervisor
3. Dedicated LUN/s per MS Exchange VM can potentially improve performance depending on the underlying storage capabilities and design.

CONS

1. Complexity e.g.: Having to create, present and manage LUN/s per Exchange MBX/MSR VMs
2. Having to manage and potentially troubleshoot iSCSI drivers within a Guest OS
3. Having to design for IP storage traffic to access VMs directly, which requires additional vNetworking considerations relating to performance and availability.

Recommendations:

1. When choosing to present storage direct to the Guest OS, only iSCSI is supported.
2. Where no requirements or constraints exist that require the use of storage presented to the Guest OS directly, use VMDKs on Datastores option which is discussed in Part 11.
3. Use a dedicated vmNIC on the Exchange VM for iSCSI traffic
4. Use NIOC to ensure sufficient bandwidth for iSCSI traffic in the event of network congestion. Recommended share values along with justification can be found in Example Architectural Decision – Network I/O Control Shares/Limits for ESXi Host using IP Storage.
5. Use a dedicated VLAN for iSCSI traffic
6. Do NOT present SMB 3.0 or NFS direct to the Guest OS and use for Exchange Databases!

Back to the Index of How to successfully Virtualize MS Exchange.