Scale out performance testing with Nutanix Storage Only Nodes

At Nutanix inaugural user conference in 2015, Storage Only nodes were announced which allowed customers for the first time to scale capacity without having to add compute nodes. This allows customers more flexibility and eliminates the need to license the storage nodes for vSphere as storage only nodes run Acropolis Hypervisor (AHV) and are managed entirely through PRISM.

A common question from prospective and existing Nutanix customers is what if my VMs storage exceeds the capacity of a Nutanix node? The answer is detailed in this blog post but in short, as the Acropolis Distributed Storage Fabric (ADSF) distributes data throughout the cluster at a 1MB granularity, a VMs storage can exceed the local node and performance even improves including reads from the capacity (SAS/SATA) tier.

Storage only nodes were previously limited to the NX-6035C (and Dell XC/Lenovo HX equivalents) but at Nutanix .NEXT conference in Las Vegas 2016, it was announced that any node (including all-flash) can be a storage only node.

This means even for high performance and/or high capacity environments, Nutanix clusters can be scaled without the need to add compute node or purchase additional licensing if you are running vSphere as the hypervisor.

However to date Nutanix are yet to publish any performance data showing the value of storage only nodes, so I decided to run a few tests and demonstrate the value of the Acropolis Distributed Storage Fabric (ADSF) and Storage Only Nodes.

Before we get to the performance data, to avoid competitors inevitable attempts to create FUD about Nutanix performance, I will not be publishing the exact specifications of the node types, drive or Jetstress configurations. I will be publishing the IOPS/latency and database creation, duplication and checksumming durations of the direct comparisons which clearly show the performance advantage of storage only nodes.

Jetstress was not configured to demonstrate maximum performance of the underlying Nutanix solution, it was configured to achieve around 1000 IOPS which is typically higher than even a large Exchange deployment requires per instance. This also allows this test to demonstrate how performance improves when the cluster is performing real world levels of IO (at least in the case of Exchange for this example).

The performance advantage will vary between node types and based on how many storage only nodes are added to the cluster. But the point of this example is to show that ADSF is a truely distributed storage fabric and the storage only nodes and additional Nutanix Controller VMs (CVMs) servicing replication (RF) traffic and remote reads significantly improves performance for VMs residing on the Compute+Storage nodes.

Test Overview:

The first test will be performed using four Jetstress VMs running on a four node cluster. The second test will be performed after an additional four storage only nodes are added to the cluster to form an eight node cluster. Before the second test the cluster will be wiped of all data with the exception of the Windows 2012 R2 template and all Jetstress DBs will be created from scratch so we can compare DB creation as well as performance and DB checksumming durations. Wiping all data also ensures there is no pre-warming of the extent cache (in memory read cache) or metadata cache.

Test Preparation:

I performed a cluster stop / cluster destroy / cluster create to ensure the cluster is totally clean and that we have a fair baseline for the test. The cluster was made up of four nodes.

I then created a base Windows 2012 R2 virtual machine with 4 PVSCSI adapters and 9 vDisks, one for the OS, 4 for the DBs and 4 for the logs. DB drives were formatted with 64k allocation size and log drives with 4k as the different allocation size and seperate virtual disks has shown approx 25% performance improvement in my testing not to mention I recommend In-Line compression and Erasure Coding (EC-X) for Exchange databases and no data reduction for logs.

Jetstress was configured to use 80% of the vDisks capacity which resulted in approx 80% of the Nutanix storage pool capacity being utilised for the test. I will point out these were not low capacity nodes such as NX3060s so the database creation time is significant because there was lots of data to create.

I then cloned the VM 3 times and spread the 4 VMs across 4 Nutanix Nodes running ESXi 5.5 Update 3.

Test 1: Create Databases and run 2hr test

The databases creation phase creates one database, then Jetstress duplicates the database in this case 3 times and immediately after creation the performance test begins.

Note: No data reduction was used for this test as it will result in unrealistic data reduction and performance results as I described in the post Jetstress Testing with Intelligent Tiered Storage Platforms.

I configured Jetstress in this way to ensure the extent cache (in memory read cache) was not pre-warmed and so the results of the test would be fair and repeatable.

Once the performance test completed, I waited for each test to complete before allowing the database checksum validation task to complete. (This is done by using the Multi-host option in Jetstress).

The results for each of the four Jetstress VMs are shown below including the average across the VMs for each of the difference metrics.

Jetstress4NodesSummary

Observations from Test 1:

  1. We achieved the desired >1000 IOPS per VM
  2. Performance was consistent across all Jetstress instances
  3. Log writes were in the 1ms range as they were serviced by the ADSF Oplog (persistent write buffer)
  4. Database reads were on average just under 10ms which is well below the Microsoft recommended 20ms
  5. The Database creation time averaged 2hrs 24mins
  6. The duplication of 3 databases averaged 4hrs 17mins
  7. The database checksum took on average around 38mins

Test 2: Delete all data, Add four nodes to the cluster & repeat test 1

All Jetstress VMs were deleted and a full curator scan manually initiated to ensure all data was fully removed from disk prior to beginning the next test which ensured a fair baseline.

Four Jetstress VMs were then deployed from the same template, powered on and the saved Jetstress configuration was applied before beginning the test.

Note: The Jetstress thread count was not changed and remains the same as for Test 1.

As with Test 1 the databases creation phase created one database, then Jetstress duplicates the database 3 times and immediately after creation the performance test begins and ran for the same 2hr duration.

The results for each of the four Jetstress VMs are shown below including the average across the VMs for each of the difference metrics.

Jetstress8NodesSummary

Observations from Test 2:

  1. Achieved IOPS jumped by almost 2x
  2. Log writes average latency was lower by 13%
  3. Database write latency dropped by >20%
  4. Database read latency dropped by almost 2x
  5. The Database creation time was just under 15 mins faster
  6. The duplication of 3 databases improved by almost 35 mins
  7. The database checksum was 40 seconds faster.

Without changing the Jetstress thread count, due to the improved performance of the cluster the achieved IOPS jumped by 2x!!

Summary:

These tests is a clear demonstration of the scalability advantage of the Acropolis Distributed Storage Fabric (ADSF) and storage only nodes for customers wanting to increase performance and/or capacity in their HCI environment.

The ability of ADSF to distribute write IO across all nodes within a cluster means write performance improves significantly with the addition of nodes (including storage only) to the cluster while reducing read and write latency due to the decreased workload on the compute + storage nodes servicing the VMs.

But data locality is lost with storage only nodes, right?

Wrong! Storage only nodes actually improve (yes, improve!) data locality by maximising the amount of available space on the compute+storage nodes. This is as a direct result of storage only nodes accepting replication data for write IO and storing the 2nd or 3rd copies (in the case of RF3) on the storage only nodes. This is also demonstrated by the lower read latency observed during this test.

Storage only nodes not only improve the performance and capacity for Virtual machines, but also for physical servers using Acropolis Block Services (ABS) and users of Acropolis File Services (AFS) both of which had enhancements announced at .NEXT 2016 this year.

Storage Performance : ReFS vs NTFS

I am regularly asked by customers if they should use NTFS or the newer ReFS when formatting drives for applications like Microsoft Exchange and SQL.

Most customers are asking in the context of performance, so I thought I would share some recent testing results using MS Exchange Jetstress.

Firstly, what is ReFS and when/would you use ReFS?

What is ReFS?

Resilient File System (ReFS) is a new local file system. It maximizes data availability, despite errors that would historically cause data loss or downtime. Data integrity ensures that business critical data is protected from errors and available when needed. Its architecture is designed to provide scalability and performance in an era of constantly growing data set sizes and dynamic workloads.

The key features of ReFS are:

  • Integrity: ReFS stores data so that it is protected from many of the common errors that can cause data loss. File system metadata is always protected. Optionally, user data can be protected on a per-volume, per-directory, or per-file basis. If corruption occurs, ReFS can detect and, when configured with Storage Spaces, automatically correct the corruption. In the event of a system error, ReFS is designed to recover from that error rapidly, with no loss of user data.
  • Availability: ReFS is designed to prioritize the availability of data. With ReFS, if corruption occurs, and it cannot be repaired automatically, the online salvage process is localized to the area of corruption, requiring no volume down-time. In short, if corruption occurs, ReFS will stay online.
  • Scalability: ReFS is designed for the data set sizes of today and the data set sizes of tomorrow; it’s optimized for high scalability.
  • App Compatibility: To maximize AppCompat, ReFS supports a subset of NTFS features plus Win32 APIs that are widely adopted.
  • Proactive Error Identification: The integrity capabilities of ReFS are leveraged by a data integrity scanner (a “scrubber”) that periodically scans the volume, attempts to identify latent corruption, and then proactively triggers a repair of that corrupt data.

Source: Microsoft Technet – Resilient file system

From my perspective, ReFS makes sense when using physical servers with unintelligent storage such as JBOD or any storage which does not perform things such as checksums on both read and write IO and enforce Force Unit Access (FUA). However if you’re deploying MS Exchange / MS SQL etc on intelligent storage such as Nutanix Acropolis Distributed Storage Fabric (ADSF) then ReFS is not required as data integrity is already ensured by the storage layer. For example, in the event of silent data corruption, ADSF will detect the corruption on read and simply retrieve the data from the second copy which resides on a different physical drive on a different node within the cluster. This is also transparent to the Virtual Machine, OS and application and therefore compatible with any OS and application.

As a result ReFS (at least in its current version) is not required for deployments of Microsoft OS,Apps on Nutanix or other storage solutions if they have the same functionality.

None the less, this is not supposed to be a post about Nutanix, so let’s now look at the test bed and results of the performance comparison so you can make an informed decision about which to use

Test Bed Setup

The test bed setup is as follows:

Hypervisor: ESXi 5.5 Rel: 3248547

2 Virtual Machines cloned from the same template:
Windows 2012 R2 , 4 vCPUs , 24Gb RAM
4 Paravirtual SCSI adapters
1 vDisk for OS , 4 vDisks for DB, 4 vDisks for Logs

Both VMs are running on the same node, with only one VM running Jetstress at a time. All tests runs were back to back to ensure results would be fair and to check the consistency of the results.

The only difference between the two VMs is as follows:

VM1:

4 vDisks formatted with NTFS and 64k allocation size for Database
4 vDisks formatted with NTFS and 4k allocation size for Logs

VM2:

All 8 vDisks formatted with ReFS (64k)

Tests performed:

Three Jetstress runs per VM one after another, importantly with new databases created before each run to ensure a fair baseline. Doing this ensured the results were skewed by having the Extent Cache (In-Memory Read Cache) or the Medusa Cache (In-Memory Metadata Cache) pre-warmed.

Each run used 16 threads and resulted in the following results.

ReFS Jetstress Instance:

Run One: 6697 IOPS
Run Two: 6896 IOPS
Run Three: 6796 IOPS

Average: 6796 IOPS (approx +-3% between runs)

NTFS Jetstress Instance:

Run One: 7328 IOPS
Run Two: 7240 IOPS
Run Three: 7296 IOPS

Average: 7288 IOPS (approx +-1% between runs)

Result:

The difference being approx 7% higher performance and more consistency when using NTFS.

Additional Tests:

Out of interest I repeated the tests with a lower thread count (8) to see if the results were consistent as we decreased the threads.

8 Threads:

ReFS: 3921 IOPS
NTFS: 4079 IOPS

The result again went in favour of NTFS by approx 4%. This makes sense as the advantage would diminish as the pressure on the storage layer reduces.

Autotune Result:

I then repeated the test with Jetstress set to Autotune with the following results.

ReFS: 16673 IOPS @ 91 threads (Autotuned)
NTFS: 17758 IOPS @ 96 threads (Autotuned)

The autotune results again show that NTFS has an advantage over ReFS of approx 7% which is in line with the results using 16 threads manually configured.

CPU overheads comparisons

ReFS Jetstress Instance:

Run One:Avg 39.293% (Min 23.725 / Max 44.127)
Run Two:Avg 40.28% (Min 37.785 / Max 44.366)
Run Three: Avg 40.175% (Min 36.520 / Max 43.843)

Average: 39.916%

NTFS Jetstress Instance:

Run One: Avg 39.390% (Min 36.746 / Max 42.651)
Run Two: Avg 39.719% (Min 23.613 / Max 45.960)
Run Three:Avg 39.844% (Min 37.347 / Max 42.400)

Average: 39.651%

So NTFS achieved 7% better performance than ReFS using the same thread count even with the Data Integrity features turned off for ReFS volumes without using any more CPU.

Summary:

Overall these tests demonstrate that NTFS consistently outperforms ReFS for MS Exchange type IO patterns. For intelligent storage, ReFS has no advantages and NTFS will provide better performance with roughly the same CPU overheads and without any risk of data integrity issues.

As the recommendation for ReFS is to disable the data integrity features for Exchange, I am yet to hear a good justification as to why ReFS is recommended, but I welcome any comments from those in the know and if the justifications are solid I will update the post to reflect these reasons.

Related Articles:

1. Jetstress Testing with Intelligent Tiered Storage Platforms

2. MS Exchange on Nutanix Acropolis Hypervisor (AHV)

3. How to successfully Virtualize MS Exchange

4. Deduplication and MS Exchange

Being called out on Exchange performance & scale. Close (well not really), but no cigar.

*UPDATE March 30th 6am EST*

Following an email from Howard Marks, the author of the Atlantis document, I would like to confirm a few points, but overall the feedback and this response from Howard, simply confirmed my original comments found in this post but it also raised new concerns.

  1. It has been confirmed that the 60,000k Exchange user solution could not be supported on the hardware as described in the Atlantis document.
  2. The performance (IOPS) claims are based on a non DAG configuration. This then is clearly not a real world solution as nobody would deploy a 60,000 user Exchange solution without a DAG configuration.
  3. The testing was purely storage based and did not take into account the requirements (such as CPU/RAM/Capacity) for a real world 60,000 user Exchange deployment.
  4. The Nutanix ESRP is not based on hero numbers. Its a solution sized and tested for a real world solution, thus the original claim and the latest claim from Howard’s response being “Hero numbers vs Hero Numbers” and the original claim of “2.5 times the mailboxes and 5 times the IOPS” are both incorrect.
  5. Nutanix ESRPs are 1GB and 1.5GB respectively. The Atlantis product tested “would be stressed to hold the data for 60,000 mailboxes at 800MB each. Perhaps we should have set the mailbox size to 200MB”. This confirms the comparison is not Apples/Apples (or even tangerines and clementines).
  6. The industry standard for Exchange virtualization is to deploy no more than one DAG instance per host (or HCI node). Howard responded by stating “If I were trying to deploy Exchange on a four-node HCI appliance, as opposed to Jetstress, I would use twelve virtual Exchange servers, so each host would run three under normal conditions and four in the event of a node failure.” – This would mean if a DAG configuration was used (which it should be for any real world deployment, especially at this scale) then a single node failure would bring down three Exchange instances, and as a result, the entire DAG. This is Exchange virtualization 101 and I would never recommend multiple Exchange servers (within a DAG) per host/node. I describe this in more detail in How to successfully Virtualize MS Exchange – Part 4 – DRS where I explain”Setup a DRS “Virtual Machines to Hosts” rule with the policy “Should run of hosts in group” on a 1:1 basis with Exchange MSR or MBX VMs & ESXi hosts” and summarized by saying the DRS configuration described “Ensures two or more DAG members will not be impacted in the event of a single ESXi host failure.
  7. Howard mentions “My best understanding of Nutanix’s and Atlantis’ pricing is that the all-flash Atlantis and the hybrid Nutanix will cost a customer about the same amount of cash. The Nutanix system may offer a few more TB of capacity but not a significant amount after Atlantis’ data reduction.” In the original document he states “Atlantis is assuming a very conservative data reduction factor of 1.67:1 when they pitch the CX-12 as having 12TB of effective capacity.”. This is 12TB across 4 nodes, which equates to 3TB per node. The Nutanix NX-8150 can support ~16.50TB usable per node with RF2 without data reduction which is above the CX-12 capacity including data reduction. The Nutanix usable capacity is obviously a lot more with In-Line compression and EC-X enabled which is what we recommend for Exchange. Assuming the same compression ratio, the NX-8150 usable capacity jumps to 27.5TB with further savings available when enabling EC-X. This usable capacity even eclipses the larger Atlantis CX-24 product. As such, the claim (above) by Howard is also incorrect.

My final thoughts on this topic would be that Atlantis should take down the document, as it is misleading to customers and only provides Jetstress performance details without the context of the CPU/RAM requirements.The testing performed was essentially just an IOPS drag race, which as I quoted in “Peak Performance vs Real World Performance“…

“Don’t perform Absurd Testing”.  (Quote Vaughn Steward and Chad Sakac)

As a result the document has nothing which could realistically be used to help design an Exchange environment for a customer on Atlantis,

As such it would make sense to redo the testing to a standard where it could be deployed in the real world, update the document and republish.

Remember: Real world Exchange solutions are typically constrained by RAM, then CPU, then Capacity. As a result it all but negates the need for IOPS (per node) beyond that which was achieved in the Nutanix ESRP report being 671.631. This level of IOPS would support 5535 users per node using Howard’s calculation being “(for 150 messages/day, that’s 0.101) multiply by 1.2 to provide 20% headroom”. However as Exchange is constrained by CPU/RAM first, the solution would have to scale out and use many more servers, meaning the IOPS requirement per node (which is what matters as opposed to “total iops” which is what Howard provides), would be in the mid/high hundreds range (~600), not thousands as the RAM constraint would prevent more users running per server and therefore as I said earlier, negate the need for IOPS at the level Howard is talking.

—— START OF ORIGINAL POST ——-

So Nutanix has been called out by Atlantis on Twitter and via an Atlantis document recently released regarding our MS Exchange ESRP. Atlantis, a little known company who are not listed in the Gartner Magic Quadrant for Integrated Systems, or the IDC marketscape for HCI, attempted to attack Nutanix regarding our Exchange ESRP (or should I say multiple ESRPs) in what I found to be a highly amusing way.

I was reading the document published by Atlantis which is titled “Atlantis HyperScale Supports 60,000 Mailboxes With Microsoft Jetstress” and I honestly thought it was an April Fools joke, but being that its only March 25 (at least here in Australia), I concluded they must be serious.

I’ve got to be honest and say I don’t see Atlantis in the field, but one can only speculate they called Nutanix out to try and get some attention. Well, they have succeeded, I’m just not sure this exposure will be beneficial to them.

Lets look at the summary for their document:

AtlantisBS

  1. Deduping Exchange DAGs
    This is widely not recommended by Microsoft and Exchange MVPs.
  2. (Update) Claiming support for 60k mailboxes at 200 150 message/day/mailbox
    No issue here, large-ish environment and reasonably high messages/per/day.
  3. 2.5x the mailboxes of a leading HCI providers ESRP (i.e.: Nutanix)
    Had a good chuckle here as ESRP is not a competition to see what vendor has the most hardware lying around the office to perform benchmarks. But I appreciate that Atlantis want to be associated with Nutanix so go you ankle biter you!
  4. Five times the IOPS of Nutanix
    This was even more of a laugh as Nutanix has never published “peak” IOPS for Exchange as IOPS is almost never a limiting factor in the real world as I have explained in Peak performance vs Real World – Exchange on Nutanix Acropolis Hypervisor (AHV) but hey, the claim even though its not true makes for a good marketing.

Problem #1 – Atlantis document is not an ESRP, it’s a (paid for?) analysts report.

The article claims:

“2.5 times the mailboxes of a leading HCI provider’s ESRP report”

Off the bat, this is an Apples/Oranges comparison as Atlantis doesn’t have an ESRP validated solution. This can be validated here: Exchange Solution Reviewed Program (ESRP) – Storage

More importantly the number of mailboxes is irrelevant as Nutanix scales linearly, so our ESRP validates 30,000 users, simply double the number of nodes, scale out the DAG and you can support 60,000 users with the same performance. In fact, as the 30,000 user solution already has N+1, you don’t even need to double the number of nodes to get 60,000 users.

Nutanix ESRP is also a configuration which can (and has been) deployed in the real world. As I will explain further in this post, the Atlantis solution as described in the document could not be successfully deployed in the real world.

Problem #2 – Five times more IOPS… lol!

The article claims:

“Five times the IOPS of that HCI provider’s system”

Another apples/oranges comparison as Nutanix uses a hybrid deployment (SSD + SATA) whereas Atlantis used an All-Flash configuration. Atlantis also used deduplication and compression, which I have previously blogged about in Jetstress Testing with Intelligent Tiered Storage Platforms where I conclude by saying:

No matter what any vendor tells you, 8:1 dedupe for Exchange (excluding DAG copies) is not realistic for production data in my experience. As such, it should never be used for performance testing with Jetstress.

Solution: Disable dedupe when using Jetstress (and in my opinion for production DAGs)

So any result (including this one from Atlantis) achieved with Dedupe or Compression enabled is not realistic regardless of if its more/less IOPS than Nutanix.

But regarding Nutanix All-Flash performance, I have previously posted YouTube videos showing Nutanix performance which can be found here: Microsoft Exchange 2013/2016 Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV)

In Part 1 of the series “The Baseline Test” I explained:

Note: This demonstration is not showing the peak performance which can be achieved by Jetstress on Nutanix. In fact it’s running on a ~3 year old NX-3450 with Ivy Bridge processors and Jetstress is tuned (as the video shows) to a low thread count.

So lets look at Nutanix performance on some 3yo old hardware as a comparison.

At the 4:47 mark of the video in Part 1 we can see the Jetstress latency for Read and Write I/O was between 1.3 and 1.4ms for all 4 databases. This is considerably lower and critically much more consistent than the latency Atlantis achieved (below) which varies significantly across databases.

AtlantisLatency

As you can see above, DB read latency varies between 3ms and 16ms, DB writes between 14ms an 105ms and Log Writes between 2ms and 5ms. In 2013 when I joined Nutanix, we had similar (poor) performance for Exchange that the Atlantis Jetstress results show, and this was something I was actively involved with addressing, and by early 2014 we had a fantastic platform for Exchange deployments as we did a lot of work in the back end to deal with very large working sets which exceeded SSD tier.

In contrast, Nutanix NX-8150, which is a hybrid (SSD + SATA) node, achieved very consistent performance as shown below from our official ESRP which has been validated by Microsoft.

NutanixConsistentPerfJetstress

Obviously being Nutanix is hybrid, the average latency for reads which the bulk of are served from SATA are higher than an all flash configuration, but interestingly, Nutanix latencies are much more consistent and the difference between peak and minimal latency is low, compared to Atlantis which varies in some cases by 90ms!

In the real world, I recently visited a large customer who runs almost the exact same configuration as our ESRP and they average <5ms latency across their 3 Nutanix clusters running MS Exchange for 24k users (Multi-site DAG w/ 3 Copies).

At the 4:59 mark of the Baseline Test video we can see the Nutanix test achieved 2342 IOPS using just 8 threads, compare this to Atlantis, where they achieved 2937.191 IOPS but required 3x the threads (24). For those of you not familiar with Jetstress, more threads = more potential IOPS. I have personally tested Nutanix with 64 threads and performance continues to increase.

Using more threads I have previously blogged about how Nutanix outperformed VSAN for Jetstress which showed the following Jetstress summary showing 4381 IOPS using more threads. So actually, Nutanix comfortably outperforms Atlantis even back in June 2015 when I wrote that post. With the release of AOS 4.6 performance has improved significantly.

So Nutanix Exchange performance isn’t limited by IOPS as I explained in Peak performance vs Real World – Exchange on Nutanix Acropolis Hypervisor (AHV).

Exchange scale/performance is always constrained by RAM first (96GB), then CPU (20vCPUs) and typically in HCI environments, Storage Capacity, and a distant fourth, potentially being IOPS.

Problem #3 The solution could not be deployed in the real world

Tony Redmond a well known Microsoft Exchange MVP has previously blogged about unrealistic ESRP submissions such as: Deficient IBM and Hitachi 120,000 mailbox configurations

Tony rightly points out:

“The core problem in IBM’s configuration is that the servers are deficient in memory”

The same is true for the Atlantis document, With just 3 servers, that’s 20,000 users per server. In the configuration they published, that would mean the mailbox servers would vastly exceed the recommended sizing for an Exchange instance of 20vCPUs and 96GB RAM.

The servers Atlantis are using have Intel E5-2680 v3 processors which have a Spec base rate of between ~57 and 61 depending on the exact server vendor, so lets use the higher number to give Atlantis the benefit of the doubt: So Intel E5-2680 v3 processors have 12 cores, in a 2 socket host thats 24 x 61 = A maximum SpecIntRate of 1464.

Again to give Atlantis the benefit of the doubt, lets assume their VSA uses zero resources and see if we have enough compute available for the 20,000 users they tested in the real world.

The first issue is the Exchange 2013 Server Role Requirements Calculator reports the following error:

AtlantisError1

I solved this by adding a fourth server to the sizing since I’m a nice guy.

The second issue is the Exchange 2013 Server Role Requirements Calculator reports the following error (even with the fourth server being added).

AtlantisIssue2

Ok, not the end of the world I guess, Oh wait… then comes the third issue:

AtlantisIssue3

Each of the four mailbox servers vastly exceed the recommended maximums for Exchange 2013 servers (physical or virtual) being 20vCPUs and 96GB RAM.

Reference: Exchange 2013 Sizing and Configuration Recommendations

The below is a screenshot from the above link highlighting the recommendation from MS as at the time/date this article was published. (Friday 25th 2015 AEST)

20vcpus96GBRAM

In fact, they exceed the resources of the tested equipment as well. The testing was performed on Atlantis HyperScale CX-12 systems which have only 256GB RAM each  384GB each.

The below is from page 6 of the Atlantis document:

We performed our testing remotely on a HyperScale CX-12 system, with 256GB of RAM per node, in Atlantis’ quality-control lab.

Update: The solution has 384Gb RAM, which is still insufficient for the number of users and messages per day in the real world.

In this case, Atlantis, even with the benefit of the doubt of a fourth server and maximum SpecIntRate, would be running 332% utilisation on CPU even when using over 3 times the recommended maximum number of pCores and requiring almost 1TB of RAM per Exchange Server. That’s almost 10x the recommended RAM per instance!

So as if the first three weren’t enough, the fourth issue is kind of funny.

The required capacity for the solution (60k users @ 800MB Mailboxes w/ 2 DAG copies) is shown below. If you look at the Database Volume Space Required line item, it shows ~46TB required capacity per server not including restore volume.

AtlantisIssue4

Let’s keep giving Atlantis the benefit of the doubt and accept their claim (below) relating to data reduction.

Atlantis is assuming a very conservative data reduction factor of 1.67:1

So that’s 46TB / 1.67 = ~27.54TB required usable capacity per Exchange server. So that’s ~110TB for the 4 Exchange servers.

The HyperScale CX-24 system is guaranteed to provide up to 24TB according to the document on Page 6, screenshot below.

Atlantis24tB

With a requirement for 27.54TB per node (and that’s with 4 servers, not 3 as per the document), the solution tested has insufficient capacity for the solution even when giving Atlantis the benefit of the doubt regarding its data reduction capabilities.

Prior to publishing this blog, I re-read the document several times and it turns out I made a mistake. On further analysis I discovered that the Hyperscale CX-24 system provides just 24TB across ALL FOUR NODES, not one node as per the document on Page 8.

AtlantisNodeSpecs

So in reality, my above comments were actually in favour of Atlantis, as the actual nodes have (24/4) just 6TB usable each, which is just under 5x LESS storage capacity than is required assuming 100% capacity utilisation.

The fifth issue however takes the cake!

The solution requires 4824 IOPS for the Databases and 1029 IOPS for Logs as shown by the sizing calculator.

AtlantisIssue5

Now the Atlantis document shows they achieved 2937.191 IOPS on Page 10, so they are not achieving the required IOPS for the 60000 users even with their all Flash configuration and 24 threads!

I would have thought a storage company would at least get the storage (capacity and IOPS) sizing correct, but both capacity and IOPS have not been sized for correctly.

Too harsh?

Ok, maybe i’m going a little honeybadger on Atlantis, so lets assume they meant 150 messages/day per mailbox as the document says both 200 messages (Page 4) and then the following on Page 11.

AtlantisIssue6

If I change the messages per day to 150 then the IOPS requirement drops to 3618 for DBs and 773 for Logs as shown below.

AtlantisIssue9

So… they still failed sizing 101, as they only achieved 2937 IOPS as per Page 10 of their document.

What about CPU/RAM? That’s probably fine now right? Wrong. Even with the lower messages per day, each of the 4 Exchange instances are still way over utilized on CPU and 8x oversized on RAM.

AtlantisIssue8

Let’s drop to 50 messages per day per user. Surely that would work right? Nope, we’re still 3x higher on RAM and over the recommended 80% CPU utilisation maximum for Exchange.

Atlantis50MsgsDayCompute

What about IOPS? We’ve dropped the user profile by 4x. Surely Atlantis can support the IOPS now?

Woohoo! By dropping the messages per day by 4x Atlantis can now support the required IOPS. (You’re welcome Atlantis!)

AtlantisIOPS

Too bad I had to include 33% more servers to even get them to this point where the Mailbox servers are still oversized.

Problem #4 – Is it a supported configuration?

No details are provided about the Hypervisor storage protocol used for the tests, but Atlantis is known to use NFS for vSphere, so if VMDK on NFS is used, this configuration is not supported by MS (as much as I think VMDK on NFS should be). Nutanix has two ESRP validated solutions which are fully supported configurations using iSCSI.

UPDATE: I have been informed the testing was done using iSCSI Datastores (not In-Guest iSCSI)

However, this only confirms this is not a supported configuration (at least not by VMware) as Atlantis are not certified for iSCSI according to the VMware HCL as of 5 mins ago when I checked (see below which shows only NFS).

AtlantisiSCSI

Problem #5 – Atlantis claims this is a real world configuration

AtlantisRealWorld

The only positive that comes out of this is that Atlantis follow Nutanix recommendations and have an N+1 node to support failover of the Exchange VM in the event of a node failure.

As I have stated above, unfortunately Atlantis have insufficient CPU,RAM,storage capacity and storage performance for the 60000 user environment described in their document.

As the testing was actually on Hyperscale CX-12 nodes the usable capacity is 12TB for the four node solution. The issue is we need ~110TB in total for the 4 Exchange servers (~27TB per instance), so with only 4 nodes it means Atlantis has insufficient capacity for the solution (actual guaranteed usable is 12TB or almost 10x less than what is required) OR they are are assuming >10x data reduction.

If this is what Atlantis recommends in the real world, then I have serious concerns for any of their customers who try to deploy Exchange as they are in for a horrible experience.

Nutanix ESRP assumes ZERO data reduction, as we want to show customers what they can expect in the worst case scenario AND not cheat ESRP by deduping and compressing data which results in unrealistic data reduction and increased performance.

Nutanix compression and Erasure Coding provide excellent data reduction, in a customer environment I reviewed recently which is 24k users, they had >2:1 data reduction using just In-Line compression. As they are not capacity constrained EC-X is currently not in use but this would also provide further savings and is planned to be enabled as they continue putting more workloads into the Nutanix environment.

However Nutanix size assuming no data reduction and the savings from data reduction are considered a bonus. In cases where customers have limited budget I give customers estimates on data reduction. But typically we just start small, size for smaller mailbox capacities and allow the customer to scale capacity as required with additional storage only nodes OR additional compute+storage nodes where additional messages/day or users are required.

Rule of Thumb: Use data reduction savings as a bonus and not for sizing purposes. (Under promise, over deliver!)

Problem #6 – All Flash (or should I say, All Tier 0/1) for Exchange?

To be honest, I think a hybrid (or tiered) storage solution (HCI or otherwise) is the way to go as its only a small percentage of workloads which require the fastest performance. Lots of applications like Exchange require high capacity and low IOPS, so lower cost, high capacity storage is better suited for this kind of workload. Using a small persistent write/read buffer for hot data gives the Tier 0/1 type performance without the cost, all while having much larger capacity for larger mailboxes/archiving and things like LAGGED copies or snapshot retention on primary storage for fast recovery (but not backup of course since snapshots on primary storage are not backups).

As SSD prices come down and technology evolves, I’m sure we’ll see more all SSD solutions, but with faster technology such as NVMe as the persistent read/write buffers and commodity SSD for capacity tier. But having the bulk of the data from workloads like Exchange on Tier 0/1 doesn’t make sense to me. The argument that data reduction makes the $/GB comparable to lower tier storage will fluctuate over time while the fact remains, storage IOPS are of the least concern when sizing for Exchange.

Problem #7 – $1.50 per mailbox… think again.

The claim of $1.50 per mailbox us just simply wrong. With correct sizing, the Atlantis solution will require significantly more nodes and the price will be way higher. I’d do the math exactly, but its so far fetched its not worth the time.

Summary:

I would be surprised if anyone gets to this summary, as even after Problem #1 or #2 Atlantis have been knocked out faster than Jose Aldo at UFC194 (13 seconds). But none the less, here are a few key points.

  1. Atlantis solution has insufficient CPU for the 60000 users even with 4x less messages per day than the document claims
  2. Atlantis solution has insufficient RAM for the solution, again even with 4x less messages per day than the document claims
  3. Atlantis can only achieve the required IOPS by reducing the messages per day by 4x down to 50 messages per day (funnily enough that’s lower than the Nutanix ESRP)
  4. The nodes tested do not have sufficient storage capacity for the proposed 60,000 users w/ 800MB mailboxes and 2 DAG copies even with the assumption of 1.67:1 data reduction AND a fourth node.
  5. Atlantis does not have a supported configuration on VMware vSphere (Nutanix does using “Volume Groups” over iSCSI)
  6. Atlantis does not have an ESRP validated solution. I believe this would be at least in part due to their only supporting NFS and their configuration not being valid for ESRP submission due to having all DAG copies on the same underlying storage failure domain. Note: Nutanix supports iSCSI which is fully supported and our ESRP uses two failure domains (seperate Nutanix ADSF clusters for each DAG copy).
  7. As for the $/Mailbox claim on twitter of $1.50 per mailbox, think again Atlantis.