VADP or Agent Based Backups

In light of ongoing bugs with VMware’s API for Data Protection (VADP), I figured it worth re-visiting the topic of VADP or Agent Based backups.

VADP gives backup products the ability to kick off snapshots and use Changed Block Tracking (CBT) to allow incremental style backups which improve the efficiency of backup solutions by reducing the impact (performance, think storage, network and compute overheads) and duration (backup window).

But the problem is, there has now been several instances of VADP bugs in recent years which has meant incremental backups have lacked integrity due to the changed blocks not being correctly reported.

Here is a list of some of the VADP related issues/bugs:

  1. Backups with Changed Block Tracking can return incorrect changed sectors in ESXi 6.0 (2136854)
  2. Backing up a virtual machine with Changed Block Tracking (CBT) enabled fails after upgrading to or installing VMware ESXi 6.0 (2114076)
  3. Changed Block Tracking (CBT) on virtual machines (1020128)
  4. Enabling or disabling Changed Block Tracking (CBT) on virtual machines(1031873)
  5. Changed Block Tracking is reset after a storage vMotion operation in vSphere 5.x (2048201)
  6. When Changed Block Tracking is enabled in VMware vSphere 5.x, vMotion migration fails with error: The source detected that the destination failed to resume (2086670)
  7. QueryChangedDiskAreas API returns incorrect sectors after extending virtual machine VMDK file with Changed Block Tracking (CBT) enabled (2090639)

From the above (albeit a limited list of VADP related issues) we can see that there are issues related to integrity of VADP CBT as well as operational considerations (limitations) when using CBT, such as not being able to Storage vMotion and having vMotion operations fail.

So while VADP in theory has its advantages, should it be used in production environments?

At this stage I am highlighting the risks associated with using VADP with customers and where required/possible mitigating the issue.

But what about good ol’ agent based backups?

Agent based backups have a bad rap in my opinion mainly because of 3-Tier solutions and the fact backup windows take a long time due to the contention in the storage network, controllers and back end disk.

Now people ask me all the time, how can we do backups on Nutanix? The answer is, you have numerous (very good) options without using VADP (or for non vSphere customers).

Using a product like Commvault, In-Guest Agent’s can be deployed and managed centrally, removing much of the administrative overhead (downside) of agent based backups.

Then by configuring incremental forever backups, Commvault manages the change block tracking (regardless of hypervisor) and can even do source side deduplication and compression before sending the delta’s over the network to the Commvault Media Agent (ie.: The backup server).

Now since all new write I/O is written to Nutanix SSD tier, it is very likely that all changes will still be in the SSD tier when a daily incremental backup is started meaning the delta’s will be quickly read and send over the network. Why is this solving the problems of 3-Tier i discussed earlier, well its thanks to data locality and the fact Nutanix XCP is a highly distributed platform.

Because each Nutanix node has a local storage controller with local SSD, AND critically, Data Locality writes new data to the node where the VM is running, most data (under normal situations) will be read locally (without traversing a NIC/HBA or the storage network). This means there is no impact on other nodes from the backup of VMs on each node.

Due to these factors, the only traffic traversing the IP network to the backup server (Commvault Media Agent in this example), are the delta changes in a compressed and deduplicated format.

So a Commvault Agent Based backup solution on Nutanix XCP, on any hypervisor, avoids the dependancy on hypervisor APIs (which have proven in several cases not to be reliable) and ensures backup windows and the impact of backup jobs is minimal due to intelligent incremental forever style backups running on an intelligent distributed storage fabric.

In-Guest agent based backups may just be making a comeback!

Note: In y experience, Agent based backups typically provide more granularity/flexibility compared to VADP backups, for specifics speak with your preferred backup vendor.

Oh BTW, did I mention Nutanix XCP supports Commvault Intellisnap for storage level snapshots on the Distributed Storage Fabric… again just another option for Nutanix customers wanting to avoid further pain with VADP.

Data Centre Migration Strategies – Part 1 – Overview

After a recent twitter discussion, I felt a Data Centre migration strategies would be a good blog series to help people understand what the options are, along with the Pros and Cons of each strategy.

This guide is not intended to be a step by step on how to set-up each of these solutions, but a guide to assist you making the best decision for your environment when considering a data centre migration.

So what’s are some of the options when migrating virtual machines from one data centre to another?

1. Lift and Shift

Summary: Shut-down your environment and Physically relocate all the required equipment to the new location.

2. VMware Site Recovery Manager (SRM)

Summary: Using SRM with either Storage Replication Adapters (SRAs) or vSphere Replication (VR) to perform both test and planned migration/s between the data centres.

3. vSphere Metro Storage Cluster (vMSC)

Summary: Using an existing vMSC or by setting up a new vMSC for the migration, vMotion virtual machines between the sites.

4. Stretched vSphere Cluster / Storage vMotion

Summary: Present your storage at one or both sites to ESXi hosts at one or both sites and use vMotion and Storage vMotion to move workloads between sites.

5. Backup & Restore

Summary: Take a full backup of your virtual machines, transport the backup data to a new data centre (physically or by data replication) and restore the backup onto the new environment.

6. Vendor Specific Solutions

Summary: There are countless vendor specific solutions which range from Storage layer, to Application layer and everything in between.

7. Data Replication and re-register VMs into vCenter (or ESXi) inventory

Summary: The poor man’s SRM solution. Setup data replication at the storage layer and manually or via scripts re-register VMs into the inventory of vCenter or ESXi for sites with no vCenter.

Each of the above topics will be discussed in detail over the coming weeks so stay tuned, and if you work for a vendor with a specific solution you would like featured please leave a comment and I will get back to you.