What’s .NEXT 2016 – Enhanced & Adaptive Compression

There are so many “under the cover” capabilities of the Acropolis Distributed Storage Fabric (ADSF) which have been designed and built not for short term marketing “checkboxes” but with a long term vision in mind.

As a result, Nutanix has been able to continually innovate and stay ahead of the HCI market while building a next generation platform (including the Acropolis Hypervisor, AHV) for the enterprise cloud.

Nutanix is also 100% software defined which makes adding new features and enhancing existing features possible even for hardware which is several years old.

As a result of the forward looking development of ADSF, it has allowed Nutanix to lead in the SDS space with features like Compression, Deduplication and Erasure Coding (EC-X).

In-line Compression is recommended for most workloads including business critical applications such as Oracle, SQL and Exchange and typically provides not only excellent capacity savings but an increased effective SSD capacity which results in higher performance. Compressing data on the capacity tier (not just flash tier) also helps improve performance and lowers the cost per GB of storage.

As of the next release, the compression functionality has been enhanced to support compressed and uncompressed slices in the same extent groups which for those of you not familiar with ADSF, an “Extent Group” is a group of “Extents” in which data is stored.

In previous generations of ADSF, regardless of if ADSF got good compression or not – all the data for a virtual disk (vdisk) residing in a container with compression enabled will have all of its data compressed. This can causes unnecessary overheads especially in cases where compression savings are minimal, such as for already compressed data such as Video or image files (e.g.: JPG).

This is one reason why it’s important that data reduction features such as compression (and Dedupe/Erasure Coding) can be turned off for workloads where benefits are minimal.

Previously in ADSF, compressed and uncompressed data was not supported within the same extent group which resulted in the cluster (Curator) having the added overhead of moving extents from one extent group to another even for data with low/no compression benefits.

This unnecessary overhead has now been removed which means less background tasks (overheads) resulting in lower CPU utilization by the Nutanix Controller VM (CVM) and better overall compression performance.

Secondly, Nutanix will be moving to the LZ4 group of algorithms which has two variants, LZ4 and LZ4H. LZ4H is really exciting because it gets nearly as much compression as Zlib while having a similar CPU cost but can decompress at the speed of LZ4. LZ4 by itself is marginally superior to Snappy in the common case, but the LZ4H makes this a very attractive choice.

This allows ADSF to do tiered compression – so cold data compressed with LZ4 can be further compressed with LZ4H giving higher compression ratios.

Also some good news for existing customers, this enhanced compression will be included in the next major AOS update which can be deployed via One-Click upgrade without any downtime or the requirement to reformat the drives, that’s true software defined storage.

Stay tuned for an upcoming blog showing the before and after compression savings on the same dataset.

Summary:

The upcoming releases of Acropolis OS (AOS) will provide:

  1. Higher compression savings
  2. Lower CVM overheads
  3. Dramatically reduced background file system maintenance tasks
  4. Enhanced compression will be included in the next major AOS one click upgrade!

Related .NEXT 2016 Posts

The truth about Storage Data efficiency ratios.

We’ve all heard the marketing claims from some storage vendors about how efficient their storage products are. Data efficiency ratios of 40:1 , 60:1 even 100:1 continue to be thrown around as if they are amazing, somehow unique or achieved as a result of proprietary hardware.

Let’s talk about how vendors may try to justify these crazy ratios:

For many years, Storage vendors have been able to take space efficient copies of LUNs, Datastores, Virtual Machines etc which rely on snapshots or metadata. These are not full copies and reporting this as data efficiency is quite mis-leading in my opinion as this is and has been for many years Table stakes.

Be wary of vendors encouraging (or requiring) you configure more frequent “backups” (which are after all just Snapshots or metadata copies) to achieve the advertised data efficiencies.

  • Reporting VAAI/VCAI clones as full copies

If I have a VMware Horizon View environment, It makes sense to use VAAI/VCAI space efficient clones as they provide numerous benefits including faster provisioning, recompose and use less space which leads to them being served from cache (making performance better).

So if I have an environment with just 100 desktops deployed via VCAI, You have a 100:1 data reduction ratio, 1000 desktops and you have 1000:1. But this is again Table stakes… well sort of because some vendors don’t support VAAI/VCAI and others only have partial support as I discuss in Not all VAAI-NAS storage solutions are created equal.

Funnily enough, one vendor even offloads what VAAI/VCAI can do (with almost no overhead I might add) to proprietary hardware. Either way, while VAAI/VCAI clones are fantastic and can add lots of value, claiming high data efficiency ratios as a result is again mis-leading especially if done so in the context of being a unique capability.

  • Compression of Highly compressible data

Some data, such as Logs or text files are highly compressible, so ratios of >10:1 for this type of data are not uncommon or unrealistic. However consider than if logs only use a few GB of storage, then 10:1 isn’t really saving you that much space (or money).

For example a 100:1 data reduction ratio of 100MB of logs is only saving you ~10GB which is good, but not exactly something to make a purchasing decision on.

Also compression of databases which lots of white space also compress very well, so the larger the Initial size of the DB, the more it will compress.

The compression technology used by storage vendors is not vastly different, which means for the same data, they will all achieve a similar reduction ratio. As much as I’d love to tell you Nutanix has much better ratios than Vendors X,Y and Z, its just not true, so I’m not going to lie to you and say otherwise.

  • Deduplication of Data which is deliberately duplicated

An example of this would be MS Exchange Database Availability Groups (DAGs). Exchange creates multiple copies of data across multiple physical or virtual servers to provide application and storage level availability.

Deduplication of this is not difficult, and can be achieved (if indeed you want to dedupe it) by any number of vendors.

In a distributed environment such as HCI, you wouldn’t want to deduplicate this data as it would force VMs across the cluster to remotely access more data over the network which is not what HCI is all about.

In a centralised SAN/NAS solution, deduplication makes more sense than for HCI, but still, when an application is creating the duplicate data deliberately, it may be a good idea to exclude it from being deduplicated.

As with compression, for the same data, most vendors will achieve a similar ratio so again this is table stakes no matter how each vendor tries to differentiate. Some vendors dedupe at more granular levels than others, but this provides diminishing returns and increased overheads, so more granular isn’t always going to deliver a better business outcome.

  • Claiming Thin Provisioning as data efficiency

If you have a Thin Provisioned 1TB virtual disk and you only write 50GB to the disk, you would have a data efficiency ratio of 20:1. So the larger you create your virtual disk and the less data you write to it, the better the ratio will be. Pretty silly in my opinion as Thin Provisioning is nothing new and this is just another deceptive way to artificially improve data efficiency ratios.

  • Claiming removal of zeros as data reduction

For example, if you create an Eager Zero Thick VMDK, then use only a fraction, as with the Thin Provisioning example (above), removal of zeros will obviously give a really high data reduction ratio.

However Intelegent storage doesn’t need Eager Zero Thick (EZT) VMDKs to give optimal performance nor will they write zeros to begin with. Intelligent storage will simply store metadata instead of a ton of worthless zeros. So a data reduction ratio from a more intelligent storage solution would be much lower than a vendor who has less intelligence and has to remove zeros. This is yet another reason why data efficiency (marketing) numbers have minimal value.

Two of the limited use cases for EZT VMDKs is Fault Tolerance (who uses that anyway) and Oracle RAC, so removal of zeros with intelligent storage is essentially moot.

Summary:

Data reduction technologies have value, but they have been around for a number of years so if you compare two modern storage products, you are unlikely to see any significant difference between vendor A and B (or C,D,E,F and G).

The major advantage of data reduction is apparent when comparing new products with 5+ year old technology. If you are in this situation where you have very old tech, most newer products will give you a vast improvement, it’s not unique to just one vendor.

At the end of the day, there are numerous factors which influence what data efficiency ratio can be achieved by a storage product. When comparing between vendors, if done in a fair manner, the differences are unlikely to be significant enough to sway a purchasing decision as most modern storage platforms have more than adequate data reduction capabilities.

Beware: Dishonest and mis-leading marketing about data reduction is common so don’t get caught up in a long winded conversations about data efficiency or be tricked into thinking one vendor is amazing and unique in this area, it just isn’t the case.

Data reduction is table stakes and really shouldn’t be the focus of a storage or HCI purchasing decision.

My recommendation is focus on areas which deliver operational simplicity, removes complexity/dependancies within the datacenter and achieve real business outcomes.

Related Posts:

1. Sizing infrastructure based on vendor Data Reduction assumptions – Part 1

2. Sizing infrastructure based on vendor Data Reduction assumptions – Part 2

3.Deduplication ratios – What should be included in the reported ratio?

Jetstress Testing with Intelligent Tiered Storage Platforms

As virtualization of mission-critical applications is now common place, customers are increasingly looking to run mixed/multiple workloads on their chosen infrastructure. Its now common that shared storage be it SAN or Hyperconverged (HCI) is used and these days most products have some form of storage tiering and/or read/write buffers.

It is also common for storage to have one or more data reduction technologies such as Deduplication, Compression & Erasure Coding.

A quick note on Exchange support requirements: You must have storage which enforces Forced Unit Access (FUA) / Write Through (when requested by the Guest OS) which means data must be written to persistent media (not a write cache) before being acknowledged to the guest OS/application.

For more information on how Nutanix is complaint (regardless of Hypervisor) see the following post:  Ensuring Data Integrity with Nutanix – Part 2 – Forced Unit Access (FUA) & Write Through

Now back to Jetstress performance testing. When considering a storage platform, or migrating an existing workload onto your shared storage its a no brainer to run the tried and tested MS Exchange Jetstress tool to validate storage performance, right?

Well, not necessarily, and here’s why.

When using Jetstress, you typically create multiple databases, e.g.: 8 and spread them across multiple virtual disks. Jetstress then creates the 1st database and proceeds to duplicate it “X” number of times, in this example, an additional 7 times.

Here is a screenshot showing Jetstress creating a 159.9GB database and then duplicating it 3 times.

AAAJetstressDuplicate

Problem 1: Jetstress duplicates databases multiple times, leading to unrealistic deduplication ratios.

Arguably deduplication should not be used for DAG deployments, which I have discussed previously, but putting that issue to one side, what about for performance testing with Jetstress? Well think about it, if we have 8 databases, 7 of which are exact copies of the 1st, then of course we will see great deduplication ratios.

As a result of say an 8:1 deduplication ratio, it means 8x more data will be served out of the cache/SSD tier/s leading to unrealistically high performance and low latency.

No matter what any vendor tells you, 8:1 dedupe for Exchange (excluding DAG copies) is not realistic for production data in my experience. As such, it should never be used for performance testing with Jetstress.

Solution: Disable dedupe when using Jetstress (and in my opinion for production DAGs)

Problem 2: Jetstress databases contain lots of zeros which can be easily compressed.

In the real world, I personally recommend compression for Exchange databases (not logs) with or without DAG deployments, as compressing data can achieve excellent data reduction while not removing copies of data deliberately created by the DAG. It lowers the cost/GB and even increases performance in some storage systems especially when writing to or accessing the data on the slower cold tier. (In fact it can lead to more usable capacity than RAW, but caution your milage may vary.)

However, databases created through Jetstress are packed with a ton of zeros, which means compression ratios are also much higher than real world. I’ve seen >7:1 compression ratios for Jetstress databases, which as with dedupe, means more data will be served out of the cache/SSD tier/s leading again to unrealistically high performance and low latency.

Solution: Disable compression when using Jetstress

Problem 3: Jetstress performs random read/write I/O across the entire data set

This is a valid test for deployments using physical servers & JBOD as the databases are spread across multiple drives (usually SATA) and there is no tiering between drives. As such, testing I/O across the entire data set concurrently is importaint.

It is also a reasonable test for shared storage if no tiering is being used, as with many legacy storage solutions.

However, when you have intelligent storage with tiering, such as Nutanix Distributed Storage Fabric (NDSF), Write I/O is always served by the SSD tier and the coldest data is tiered off to SATA. Then only if required, cold data is served by the SATA tier.

As such, the larger the Exchange mailbox size, typically the higher the percentage of data will be cold which means and increasingly smaller percentage of total capacity needs to be SSD to give all flash type performance the vast majority of the time. This also allows customers to maintain large mailboxes cost effectively and with consistent performance on SATA. As such I believe hybrid storage (Small SSD tier w/ large low cost capacity tier)  is advantageous to Exchange but that’s another topic.

Because Jetstress actively performs I/O as if all data is hot, it effectively negates the benefits of tiering which is not demonstrating the real world performance of a tiered storage platform such as Nutanix. For Nutanix solutions the application will have similar to all flash performance even with TBs of mailbox databases sitting on SATA since active I/O is predominantly serviced by SSD. The small percentage of I/O serviced by the SATA tier performs much better than JBOD since the I/O to those drives is limited thanks to all new/active data being served by the SSD tier.

As such, to get an idea of real world performance, Jetstress tests need to be performed on a carefully sized databases which fit within the (persistent) performance tier (i.e.: not RAM style cache, which Nutanix calls the Extent Cache which is typically a few GB per node). This test should easily produce a passing result for Jetstress.

This style test will show you close to what real world performance looks like although I also recommend what I call a Worst case scenario test which I cover later in this post.

This 2nd Jetstress test is the one you want to make sure is under 20ms Database Read Latency and 10ms Log Write latency which are the Microsoft accepted thresholds for performance for Exchange.

Problem 4: Jetstress performs lots of overwrites

As Jetstress runs, it performs frequent random overwrites within the databases, which in my experience does not represent real world behaviour. So a Jetstress Pass result is really a strong indication the solution will perform well if the Achieved IOPS are >= the MS Exchange Server Role Requirements Calculator estimates (which is a good thing!)

But, Nutanix uses a technology called Erasure Coding (EC-X) for data reduction, which is designed specifically for use with cold data. That is data such as older email in a large mailbox. EC-X is recommended for production Exchange environments as it provides more usable capacity and is complementary to Compression.

But when overwrites occur, NDSF re-stripes the data which has a small write penalty, which in the real world is insignificant as it happen infrequently. But with Jetstress performing constant overwrites, EC-X provides limited/no data reduction and decreases performance.

As such, this is another case where benchmarks do not properly represent real world performance, so when using Jetstress, ensure EC-X is not enabled.

For non Nutanix storage platforms, large numbers of overwrites will typically also reduce Jetstress performance compared to real world where the percentage of overwrites will be much lower.

How to Test on Tiered Storage solutions.

If the vendor does not have a Microsoft ESRP certification (Nutanix ESRP can be found here), then you should validate the infrastructure is capable of supporting your requirements.

If the vendor does have ESRP then should still use Jetstress as an Operational Verification tool following initial implementation and prior to going into production.

In this example I will specifically cover Nutanix Distributed Storage Fabric (NDSF), while the below may be applicable to other vendor products, please refer to each vendors recommendations although all data reduction recommendations should be consistent across vendors in my opinion.

Solution: Perform two stages of Jetstress testing.

Stage 1: All flash performance test

If the SSD tier has 1TB usable, make the Jetstress databases total 75% of the usable capacity (in the case of Nutanix, 75% of the per node SSD usable capacity per Jetstress instance).

Run a short 15 min test and fine tune the threads starting from 32 and reduce until you achieve <4x the required IO levels according to the MS Exchange Server Role Requirements Calculator (4x should be easy to achieve for All flash testing at low latency), then run a 24 hour Stress test with all Jetstress instances concurrently (Multi-Host Test).

This result should be indicative (although not exactly) of the performance you should see under normal circumstances.

Stage 2: Worst case scenario test (90% capacity)

If the usable capacity is 1TB (per node), then make the Jetstress databases total >90% of the usable capacity (in the case of Nutanix, per node usable capacity per Jetstress instance). Nutanix recommends N+1 for any mission-critical application, so the actual cluster utilisation for a 4 node cluster would be ~67.5% utilised (100% – 25% for N+1 creating DBs to use 90% of the other nodes capacity).

Note: Larger clusters equate to higher usable percentage of capacity, e.g.: An 8 node cluster would be 100% – 12.5% for N+1 – 10% = ~78% capacity for the cluster).

Run a short 15 min test and fine tune the threads starting from 12 and reduce until you achieve >= the required IO levels according to the MS Exchange Server Role Requirements Calculator (which has 20% buffer built in), then run a 24 hour Stress test with all Jetstress instances concurrently (Multi-Host Test).

The worst case scenario test shows how the system will perform if the tiering/cache layers are totally saturated, hence the name worst case scenario. This is how Nutanix runs testing for Microsoft ESRP certification to ensure every Nutanix deployment for Exchange performs flawlessly in production.

Real World vs Jetstress

I will be publishing a case study on this topic in the future, but to give you a teaser, a 30k Seat Exchange deployment I designed and validated had roughly 700 IOPS @ 5-15ms Read/Write latency on the Jetstress worst case scenario test and the SSD only Jetstress report was ~4000 IOPS @ 1-2ms for Read and Write I/O. In production the average latency is 3-4ms and the number of messages per day is within +2% of the estimates in the MS Exchange Server Role Requirements Calculator.

The cluster average latency includes read and write I/O as well as other workloads sharing the cluster.

As you can see, a Jetstress result showing 15ms doesn’t sound very impressive, yet the SSD test is super impressive considering the thread count could have been increased to provided higher IOPS, but since the requirement was <500 IOPS, the 4000 IOPS achieved was well in excess of what was required so no further testing was performed.

But now that you understand why Jetstress is not designed for modern tiered shared storage, you can use the above mentioned tests to ensure you get results which are indicative to real world performance and not be fooled by data reduction (Dedupe/Compression) giving you unrealistic high performance.

Summary:

When using tiered storage with MS Exchange Jetstress, ensure:

  • Deduplication is disabled (as it should be for production DAGs)
  • Compression is disabled
  • Erasure Coding (EC-X) is disabled (Nutanix specific)

Once the above is complete, run the following Jetstress tests:

  1. All performance tier test to see best case scenario performance (Indicative of real world performance)
  2. 90% capacity performance test to show worst case scenario performance (which should rarely if ever be experienced)

Related Articles: