Nutanix Resiliency – Part 6 – Write I/O during CVM maintenance or failures

In Part 5 we covered how Read I/O is serviced during CVM maintenance or failure so now we need to cover the arguably more difficult and critical task of servicing write I/O during the same maintenance or failure scenarios.

For those of you who read Part 5, this next section will look familiar. For those who have not read Part 5 I would ask that you please do so but let’s quickly cover off again the basics of how Nutanix ADSF writes and protects data.

Looking an the following diagram we see a three node cluster with a single Virtual Machine. The VM has written some data represented by a,b,c & d & under normal circumstances all writes will have one replica written to the host running the VM (in this case Node 1) and the other replica (or replicas in the case of RF3) are distributed throughput the cluster based on disk fitness values. The disk fitness values (or what I call “Intelligent replica placement”) ensure data is placed in the most optimal place the first time based on capacity and performance.

RF2Overview

If one or more nodes are added to the cluster, the Intelligent replica placement will send proportionally more replicas to those nodes until the cluster is in a balanced state. In the very unlikely even no new writes are occurring, ADSF has a background disk balancing process which will balance the cluster as a low priority.

Now that we know the basics of how Nutanix protects data using multiple replicas (called “Resiliency Factor”) let’s talk about what happens during a Nutanix ADSF storage layer upgrade.

Upgrades are initiated by a one-click process and performed in rolling style one controller VM (CVM) at a time regardless of the configured Resiliency Factor and if Erasure Coding (EC-X) is used or not. The rolling upgrade put simply takes one CVM offline at a time, performs the upgrade, performs and self check and then rejoins the cluster and then repeats the process on the next CVM.

One of the many advantages of Nutanix decoupling the storage from the hypervisor (i.e.: not embedding storage into the kernel) is that upgrades and even storage layer failures do not impact the running Virtual machines.

VMs do not need to be restarted (i.e.: Like a HA event) nor do they need to migrate (e.g.: vMotion) to another node. VMs continue without interruption to storage traffic even when the local controller is offline for any reason.

If the local CVM is down for maintenance or due to failure, the write I/O is dynamically re-directed throughout the cluster.

Let’s look at a Write I/O when the CVM local to a VM is offline (for any reason).

The local CVM being offline means the physical drives (NVMe, SSD, HDD etc) are not available meaning the local data (replicas) is unavailable.

All write I/O will be continue to function and remain in compliance with the configured Resiliency Factor (RF), however rather than one replica being written locally, it will be written to a remote CVM over the network as will the other replica/s.

In the example below, we have a three node cluster so the VM on Node 1 is writing both replicas for “E” over the network to Node 2 and 3. This is how new data is serviced.

NewWriteIO

If more nodes existed in the cluster, the write traffic would be distributed evenly using Intelligent Replica Placement across all nodes within the cluster as shown below.

WriteIOLocalCVMDown5Nodes

In the event data is being overwritten (as opposed to net new data) and the local replica is unavailable due to the CVM being offline, Nutanix ensures data integrity is maintained by overwriting the available replica and writing a second (or third for RF3) copy on another node in the cluster.

OverwriteWhenLocalCVMisDown

This is critical because if data is not always kept in compliance with it’s resiliency factor (FTT for VMware vSAN) a subsequent drive or node failure would cause data loss.

A major resiliency advantage Nutanix has over vSAN is the fact we always remain in compliance with the configured Resiliency Factor including during all failure and maintenance scenarios. vSAN however does not maintain it’s configured FTT level during all host maintenance and failure scenarios. For VMs on vSAN configured with FTT=1, in the event the host hosting one vSAN disk “object” is offline for maintenance, new overwrites are not protected so a single drive failure can result in data loss.

Chief Technologist at VMware, Duncan Epping recently posted an article titled: “VSAN 6.2 : Why going forward FTT=2 should be your new default”  where he recommended FTT=2 as the new default for vSAN customers.

I have to agree with Duncan, but I wouldn’t say vSAN should be set to FTT=2, I would say it MUST be set to FTT=2 as FTT=1 creates a single point of failure for over-writes during maintenance or failures and this is unacceptable for most production workloads with VDI being one of a potential few exceptions in some cases.

Nutanix on the other hand does not have the same architectural flaw as vSAN and as such, RF2 is extremely resilient and suitable for even the most critical environments as explained in this series.

That and the fact ADSF is able to restore resiliency in such a timely manner, RF2 has far superior resiliency compared to vSAN FTT=1.

In the next part we will cover the critically important topic of how VMs are impacted during hypervisor (ESXi, Hyper-V, XenServer and AHV) upgrades.

Summary:

  1. Write I/O continues uninterrupted if the local CVM is offline
  2. Write I/O is distributed throughout the cluster evenly thanks to Intelligent Replica Placement
  3. All new data is written in compliance with the configured Resiliency Factor
  4. Overwrites of existing data is always written in compliance with the configured Resiliency Factor by writing a new replica where the original replica is not available.
  5. Data integrity is ALWAYS maintained regardless of a CVM being under maintenance or failure.
  6. Nutanix RF2 is more resilient than vSAN FTT=1 despite each claiming to maintain two copies of data.

Index:
Part 1 – Node failure rebuild performance
Part 2 – Converting from RF2 to RF3
Part 3 – Node failure rebuild performance with RF3
Part 4 – Converting RF3 to Erasure Coding (EC-X)
Part 5 – Read I/O during CVM maintenance or failures
Part 6 – Write I/O during CVM maintenance or failures
Part 7 – Read & Write I/O during Hypervisor upgrades
Part 8 – Node failure rebuild performance with RF3 & Erasure Coding (EC-X)
Part 9 – Self healing
Part 10: Nutanix Resiliency – Part 10 – Disk Scrubbing / Checksums

What’s .NEXT 2016 – Enhanced & Adaptive Compression

There are so many “under the cover” capabilities of the Acropolis Distributed Storage Fabric (ADSF) which have been designed and built not for short term marketing “checkboxes” but with a long term vision in mind.

As a result, Nutanix has been able to continually innovate and stay ahead of the HCI market while building a next generation platform (including the Acropolis Hypervisor, AHV) for the enterprise cloud.

Nutanix is also 100% software defined which makes adding new features and enhancing existing features possible even for hardware which is several years old.

As a result of the forward looking development of ADSF, it has allowed Nutanix to lead in the SDS space with features like Compression, Deduplication and Erasure Coding (EC-X).

In-line Compression is recommended for most workloads including business critical applications such as Oracle, SQL and Exchange and typically provides not only excellent capacity savings but an increased effective SSD capacity which results in higher performance. Compressing data on the capacity tier (not just flash tier) also helps improve performance and lowers the cost per GB of storage.

As of the next release, the compression functionality has been enhanced to support compressed and uncompressed slices in the same extent groups which for those of you not familiar with ADSF, an “Extent Group” is a group of “Extents” in which data is stored.

In previous generations of ADSF, regardless of if ADSF got good compression or not – all the data for a virtual disk (vdisk) residing in a container with compression enabled will have all of its data compressed. This can causes unnecessary overheads especially in cases where compression savings are minimal, such as for already compressed data such as Video or image files (e.g.: JPG).

This is one reason why it’s important that data reduction features such as compression (and Dedupe/Erasure Coding) can be turned off for workloads where benefits are minimal.

Previously in ADSF, compressed and uncompressed data was not supported within the same extent group which resulted in the cluster (Curator) having the added overhead of moving extents from one extent group to another even for data with low/no compression benefits.

This unnecessary overhead has now been removed which means less background tasks (overheads) resulting in lower CPU utilization by the Nutanix Controller VM (CVM) and better overall compression performance.

Secondly, Nutanix will be moving to the LZ4 group of algorithms which has two variants, LZ4 and LZ4H. LZ4H is really exciting because it gets nearly as much compression as Zlib while having a similar CPU cost but can decompress at the speed of LZ4. LZ4 by itself is marginally superior to Snappy in the common case, but the LZ4H makes this a very attractive choice.

This allows ADSF to do tiered compression – so cold data compressed with LZ4 can be further compressed with LZ4H giving higher compression ratios.

Also some good news for existing customers, this enhanced compression will be included in the next major AOS update which can be deployed via One-Click upgrade without any downtime or the requirement to reformat the drives, that’s true software defined storage.

Stay tuned for an upcoming blog showing the before and after compression savings on the same dataset.

Summary:

The upcoming releases of Acropolis OS (AOS) will provide:

  1. Higher compression savings
  2. Lower CVM overheads
  3. Dramatically reduced background file system maintenance tasks
  4. Enhanced compression will be included in the next major AOS one click upgrade!

Related .NEXT 2016 Posts

Nutanix Data Protection Capabilities

There is a lot of misinformation being spread in the HCI space about Nutanix data protection capabilities. One such example (below) was published recently on InfoStore.

Evaluating Data Protection for Hyperconverged Infrastructure

When I see articles like this, It really makes me wonder about the accuracy of content on these type of website as it seems articles are published without so much as a brief fact check from InfoStore.

None the less, I am writing this post to confirm what Data Protection Capabilities Nutanix provides.

  • Native In-Built Data protection

Prior to my joining Nutanix in mid-2013, Nutanix already provided a Hypervisor agnostic Integrated backup and disaster recovery solution with centralised consumer- grade management through our PRISM GUI which is HTML 5 based.

The built in capabilties are flexible and VM-centric policies to protect virtualized applications with different RPOs and RTOs with or without application consistency.

The solution also supports Local, remote, and cloud-based backups, and synchronous and asynchronous replication-based disaster recovery solutions.

Currently supported cloud targets include AWS and Azure as shown below.

CloudBackup

The below video which shows in real time how to create Application consistent snapshots from the Nutanix PRISM GUI.

Nutanix can also perform One to One, One to Many and Many to One replication of application consistent snapshots to onsite or offsite Nutanix clusters as well as Cloud providers (AWS/Azure), ensuring choice and flexibility for customers.

Nutanix native data protection can also replicate between and recover VMs to clusters of different hypervisors.

  • CommVault Intellisnap Integration

Nutanix also provides integration with Commvault Intellisnap which allows existing Commvault customers to continue leveraging their investment in the market leading data protection product and to take advantage of other features where required.

The below shows how agentless backups of Virtual Machines is supported with Acropolis Hypervisor (AHV). Note: Commvault is also fully supported with Hyper-V and ESXi.

By Commvault directly calling the Nutanix Distributed Storage Fabric (NDSF) it ensures snapshots are taken quickly and efficiently without the dependancy on a hypervisor.

  • Hypervisor specific support such as VMware API Data Protection (VADP)

Nutanix also supports solutions which leverage VADP, allowing customers with existing investment in products such as Veeam & Netbackup to continue with their existing strategy until such time as they want to migrate to Nutanix native data protection or solutions such as Commvault.

  • In-Guest Agents

Nutanix supports the use of In-Guest agents which are typically very inefficient with centralised SAN/NAS storage but due to data locality and NDSF being a truly distributed platform, In-Guest Incremental forever backups perform extremely well on Nutanix as the traditional choke points such as Network, Storage Controllers & RAID packs have been eliminated.

Summary:

As one size does not fit all in the world of I.T, Nutanix provides customers choice to meet a wide range of market segments and requirements with strong native data protection capabilities as well as 3rd party integration.