Is VAAI beneficial with Virtual Storage Appliance (VSA) based solutions ?

I saw a tweet recently (below) which inspired me to write this post as there is still a clear misunderstanding of the benefits VAAI provides (even with Virtual Storage Appliances).

vaaionvsatweet2

I have removed the identity of the individual who wrote the tweet and the people who retweeted this as the goal of this post is solely to correct what I believe is mis-information.

My interpretation of the tweet was (and remains) if a solution uses a Virtual Storage Appliance (VSA) which resides on the ESXi host then VAAI is not providing any benefits.

My opinion on this topic is:

Compared to a traditional centralised NAS (such as a Netapp or EMC Isilon) providing NFS storage with VAAI-NAS support, a Nutanix or VSA solution has exactly the same benefits from VAAI!

My 1st reply to the tweet was:

vaaionvsatweetconvojoshreply2

The test I was referring to with Netapp OnTap Edge can be found here which was posted in Jan 2013, well prior to my joining Nutanix when I was working for IBM where I had been evangelising VAAI/VCAI based solutions for a long time as VAAI/VCAI provides significant value to VMware customers.

The following shows the persons initial reply to my tweet.

vaaionvsatweetconvo32

I responded with the below mentioning I will do a blog which is what you’re reading now.

I went onto provide some brief replies as shown below.

repliesdetail

The main comments from this persons tweets I would summarize (rightly or wrongly) below:

  • VAAI is designed only to offload functions externally (or off the ESXi host)
  • He/She had not seen any proof of performance advantages from VAAI on VSAs
  • Its broken logic to use VAAI with a VSA

Firstly, I would like comment on VAAI being designed to offload functions externally (or off the ESXi host). I don’t disagree VAAI has some functions designed to offload to the (centralised) array but VAAI also has numerous functions which are designed to bring other efficiencies to a vSphere environment.

An example of a feature designed to offload to a central array is the “XCOPY” primitive.

A simple example of what “XCOPY” or Extended Copy provides is offloading a Storage vMotion on block based storage (i.e.: VMFS over iSCSI,FC,FCoE not NFS) to the array so the ESXi host does not have to process the data movement.

This VAAI primitive would likely be of little benefit in a VSA environment where the storage is presented is block based and Storage DRS for example was used. The data movement would be offloaded from ESXi to the VSA running on ESXi and host would still be burdened with the SvMotion.

However XCOPY is only one of the many primitives of VAAI, and VAAI does alot more than just offload Storage vMotions.

For the purpose of this post, I will be discussing VAAI with Nutanix whos Software defined storage solution runs in a VM on every ESXi host in a Nutanix cluster.
Note: This information is also relevant to other VSAs which support VAAI-NAS.

So what benefit does VAAI provide to Nutanix or a VSA solution running NFS?

Nutanix deploys by default with NFS and supports the VAAI-NAS primitives which are:

  • Full File Clone
  • Fast File Clone
  • Reserve Space
  • Extended Statistics

Note: XCOPY is not supported on NFS, importantly and specifically speaking for Nutanix it is not required as SvMotion will be rarely if ever used with Nutanix solutions.

See my post “Storage DRS and Nutanix – To use, or not to use, that is the question?” for more details on why SvMotion is rarely needed when using Nutanix.

For more details of VAAI primitives, Cormac Hogan (@CormacJHogan) wrote an excellent post which can be found here.

Now here is an example of a significant performance benefits of VAAI with Nutanix.

Lets look at Clone of a VM on a Nutanix platform, the VMs details are below.admin01vm

The VM I have used for this test resides on a datastore called “Management” (as per the above image) which presented via NFS and has VAAI (Hardware Acceleration) enabled as shown below.datastore

Now if I do a simple clone of a VM (as shown below) if the VM is turned on, VAAI-NAS is bypassed as the “Fast File Clone” primitive only works on VMs which are powered off.

clone

So a simple way to test the performance benefits of VAAI on any platform (including Hyper-converged such as Nutanix, a Virtual Storage Appliance (VSA) such as Netapp Ontap Edge or traditional centralised SAN or NAS) is to clone a VM while powered on then shut-down the VM and clone it again.

I performed this test and the first clone with the VM powered on started at 1:17:23 PM and finished at 1:26:12 PM, so a total of 8 mins 49 seconds.

Next I shut down the VM and repeated the clone operation.cloneresults

As we can see in the above screen capture from the 2nd clone started at 1:26:49 PM and finished at 1:26:54 PM, so a total of 5 seconds.

The reason for the huge difference in the speed of the two clones is because VAAI-NAS “Fast File Clone” primitive offloaded the 2nd clone to the Nutanix platform (which runs as a VM on the ESXi host) which has intelligently cloned the VM (using metadata resulting in almost zero data creation) as opposed to 1st clone where VAAI-NAS was not used which resulted in the hypervisor and storage solution having to read 11.18GB of data (being the source VM – Admin01) and write a full copy of the same data resulting in effectively >22GB of data movement in the environment.

Now from a capacity savings perspective, a simple way to demonstrate the capacity savings of VAAI on any platform is to clone a VM multiple times and compare the before and after datastore statistics.

Before I performed this test I captured a baseline of the Management datastore as shown below.

BeforeCloningCapacityVMcount

The above highlighted areas show:

  • Virtual Machines and Templates as 83
  • Capacity 8.49TB
  • Provisioned Space 7.09TB
  • Free Space 7.01TB

I then cloned the Admin01 VM a total of 7 times.clone7vmsrecenttasks

Immediately following the last clone completing I took the below screen shot of the Management datastores statistics.

AfterCloningCapacityVMcount

The above highlighted areas in the updated datastore summary show:

  • Virtual Machines and Templates INCREASED by 7 to 90 (as I cloned 7 VMs)
  • Capacity remained the same at 8.49TB
  • Provisioned Space INCREASED to 7.29TB as we cloned 7 x ~40Gb VMs (Total of ~280GB)
  • Free Space REMAINED THE SAME at 7.01TB due to VAAI-NAS Fast File Clone primitive working with the Nutanix Distributed File System.

So VAAI-NAS allowed a VM of ~11GB of used storage (~40GB provisioned) to be cloned without using any significant additional disk space and the clones were each done in between 5 and 7 seconds each.

So some of the benefits VAAI-NAS provides to Nutanix (which some people would term as a VSA type solution) include:

  • Near instant VM cloning via vSphere Client/s (as shown above)
  • Near instant Horizon View Linked Clone deployments (VCAI) – Similar to example shown.
  • Near instant vCloud Director clones (via FAST Provisioning) – Similar to example shown.
  • Major capacity savings by using Intelligent cloning rather than Full Clones (As shown above)
  • Lower CPU overhead for both ESXi hosts AND Nutanix Controller VM (CVM)
  • Ability to create EagerZeroThick VMDKs on NFS (e.g.: To support Fault Tolerance & clustered workloads such as Oracle RAC)
  • Enhanced ability to get statistics on file sizes , capacity usage etc on NFS

In Summary:

Overall I would say that VMware have developed an excellent API in VAAI and Nutanix along with VSA providers having support for VAAI provides major advantages and value to our joint customers with VMware.

It would be broken logic NOT to leverage the advantages of VAAI regardless of storage type (VSA, Nutanix or traditional centralized SAN/NAS) and for the vast majority of vSphere deployments, any storage solution not supporting (or having issues/bugs with) VAAI will have significant downsides.

I am looking forward to ongoing developments from VMware such as vVols and VASA 2.0 to continue to enhance storage of vSphere solutions in the future.

I hope customers and architects now have the correct information to make the most effective design and purchasing recommendations to meet/exceed customer requirements.

My VCAP5-CIA Experience

Yesterday (21st July 2014) I sat and passed the VMware Certified Advanced Professional Cloud Infrastructure Administration (VCAP5-CIA) exam at my local test centre here in Melbourne, Australia.

As with the VCAP-DCA which I did as a prerequisite for VCDX back in 2011, the CIA exam is a live lab exam where VMware get you to demonstrate your hands on expertise with their products.

I find the value of the VCDX, is in part due to the fact it is a requirement to have not only “Design” but hand-on implementation/administration/troubleshooting experience as it is my opinion a person should not be an architect unless that person has the hands on experience and ability to implement and support the solution as designed.

So, enough rambling, what did I think of the VCAP-CIA?

As with all VMware certifications, the exams are generally well written and closely aligned to the blueprints which VMware provide. For VCAP-CIA the blueprint and exam registration can be found here.

The VCAP-CIA was no different, and aligned very well to the blueprint.

The exam is 210 minutes and has 32 questions some of which are simple 1 min tasks where others require a significant amount of work. One secret to all VCAP exams is you are challenged not only by the questions, but by the clock as time is the enemy. This makes time management essential. Do not get caught up of one question, if your unsure, do your best and move on.

Be ware some questions are dependant on successfully completion of earlier questions, but in saying that, a lot of questions are not, so don’t be afraid to skip questions if your struggling as you will still be able to complete many other questions.

The actual live lab in the exam consists of seven ESXi hosts, three vCenter Server virtual machine, four VMware vCloud™ Director (vCD) cells plus additional supporting resources. The lab has a number of pre-configured vApps and virtual machines will also be present for use with certain tasks. It is importaint to understand the lab environment is based on VMware vCloud Suite 5.1 and vCenter Chargeback Manager 2.5, not vCloud 5.5 so ensure you study and prepare using the correct versions of vCloud/vCB!

At this stage some of you may be thinking, I just breached the NDA telling the world about the exam? Well I haven’t and this is the beauty of how VMware does their exam blueprints, the above information is all available in the blueprint so there is not trickery or secrecy to the lab.

As for the questions in the VCAP-CIA, you will not get a brain dump out of me, but what I can tell you is the questions are in most cases very clear and what is asked of the candidates is vastly skills that anyone with any significant vCD experience would be familiar with. For example, the blueprint under Objection 1.2 – Configure vCloud Director for scalability, states under skills and abilities:

 Generate vCloud Director response files
 Add vCloud cells to an existing installation using response files
 Set up vCloud Director transfer storage space
 Configure vCloud Director load balancing

Its safe to say if you know the blueprint properly, you will be able to complete the tasks in the exam, and as a result, get a passing score.

Now the bad news!

Being based in Melbourne, Australia, and the live lab is being accessed by RDP to a location in Seattle, USA. So what does this mean, Latency!

I was only able to complete about 2/3rd’s of the questions in large part due to the delay in the screen refreshing after switching between for example the vCD web interface and production documentation, Putty etc.

On that point, all the PDF and HTML documentation is available in the exam, but I would highly recommend you don’t rely on it, because accessing the doco and searching/scrolling for things is very slow, at least it was for me.

I had numerous occasions where the screen would totally freeze which was a concern, but I soon accepted this was a latency issue, and the lab was fine, and waited out the freezes (which varied from a few seconds to around 20 seconds, which feels like hours when your against the clock!)

I have heard from numerous other VCAP-CIA who sat the exam in the Australia/NZ region that they experienced the same issues, so if you are A/NZ based, or any location a long way from the USA, be prepared for this.

Now being a live lab, the exam is not scored on the spot, and you have to wait for VMware to score the exam and then you will receive an electronic score report via email. The exam receipt says 15 business days, but I was very impressed that less than 24 hours after sitting the exam, I got my score report. Obviously VMware education have done a great job in automating the scoring process, which is a credit to them!

Overall, the experience of the VCAP-CIA was very good, the exam/questions are a solid test of vCloud related skills and experience, so great work VMware Education!

I am very pleased to have completed this exam and all prerequisites for VCDX-Cloud (VCP-Cloud, VCAP-CID and VCAP-CIA) and I will be submitting my application in the near future.

My VCAP5-CID (Cloud Infrastructure Design) Exam Experience

Yesterday (17th December 2013) I sat and passed my VMware Advanced Certified Professional 5 – Cloud Infrastructure Design exam, a.k.a VCAP5-CID.

Having sat 4 other VCAP exams, including 3 design exams (DCD4,DCD5 & DTD5) I was confident on what to expect in regards to the exam format, the visio style design tool and the fact that time management has always been key.

So the exam is (as per the blueprint which can be found here)

115 Questions including a mix of multiple-choice, drag-and-drop items and specialized design items

195 Minutes
So lets break this down a bit, 195 mins divide 115 questions is 1.6 mins (or 100 seconds) per question, that’s not a lot when you have 6 x visio style designs to create which can take 5-10 mins each.

So this brings me straight to the first Tip.

Tip # 1 – Time Management

As of yesterday you still cannot go back and review previous questions/answers, so you must move through the exam to be able get to & answer the valuable visio style design and also the drag/drop questions.

Allow for 5-10 mins per Visio style question (These count big on the score, DO NOT RUSH THEM!!)
Allow for 2-5 mins per Drag and Drop style question (maybe 10 in the exam)
Multiple Choice questions you should spent between 20-45 seconds on maximum – If you don’t know the answer, have an educated guess and move on, its not the end of the world if you get some multiple choice questions wrong.

I must say I always like getting visio questions early on, as these are well known to make up a significant part of the score (~50%) and I don’t like being in a position where I have to rush something I know is important.

In this case, my visio style questions where spread evenly throughout the exam, and the last of the 6 was in the last 10 questions, so make sure you manage your time so you can get to, and hopefully answer correctly ALL the visio style questions.

Tip # 2 – Know the Blueprint (properly!)

I found quite a few things I glossed over in the blueprint were covered fairly well in the exam so be prepared to be tested on a wide range of vCloud related topics.

So while you may have good experience in designing vCloud Environments, if you don’t for example work for a service provider, you may have not had much (or any) experience with Chargeback, but this is a part of a vCloud solution and is rightly covered on the exam.

These types of things may catch you off guard, at the depth of some of the questions, but hey, this is a VCAP level exam, not VCP level, so its no meant to be easy.

Tip # 3 – Create a Study Group

I’ll be honest, I felt I had a pretty good preparation for the exam, albeit with some significant distractions in my personal life, and this was because I worked in a study group with two great guys (@Grantorchard & @wheatcloud), who have years of industry experience which made for excellent debates throughout the study process.

Working in a study group is what I credit at least some of my being able to successfully achieve VCDX on the first attempt. In this case, it helped me identify my own weaknesses (yes even VCDXs have weaknesses!) so I could brush up on those areas.

So get a group of people together and work towards VCAP-CID over weeks or months depending on your groups level  of experience.

Tip # 4 – Whiteboard vCloud Solutions

I would recommend for anyone taking the VCAP-CID (or in fact the VCAP-DCD or VCAP-DTD) spend some time on a whiteboard, drawing things like

1. vApp / OrgVDC and External Networking
2. Highly available Chargeback solutions
3. vSphere to Provider VDC to OrgVDC solutions

Get the study group take turns to pose scenarios for one group member to whiteboard a possible solution and discuss what is drawn and the pros/cons and if the solution meets the requirements or not. This will help you practice turning scenarios into diagrams, which you need to be able to do quickly in the exam or you risk running out of time.

General Comments

Overall I would say the VCAP-CID was the least refined VMware exam I have sat, and in fairness this is probably due to the exam being quite new, and im sure a much lower number of participants than other VCAP exams like DCD and DCA.

I spoke with the team who develop the exam and they were very pleased to get feedback on the exam, and much to there credit, acknowledged that most of my feedback was at least in part justified. I hope my feedback will help make the VCAP-CID a better exam, like the rest of the VCAPs.

I found the visio style design tool in at least one case, could not do what I was trying to due which may be a bug with the tool or similar, but this I believe prevented me from completing the question & potentially scoring higher.

I found quite a number of questions (both visio style , drag/drop and multiple choice) appeared (and I say appeared as you don’t have time to re-read every question 5 times to clarify the question) not to have sufficient information to choose between say Option A and Option B – which led to my having to make an assumption, or simply guess.

I think as more and more people sit the exam, as long as feedback is captured by as many participants as possible, the exam could quickly be brought up to the high standard of the other VCAP exams.

While this exam was not the best exam experience I’ve had, I would still recommend anyone who is involved with architecture of vCloud solutions to challenge yourself, prepare for and sit this exam.

vCloud will be around for many years to come, and over time vCAC will creep into the exam, or maybe have its own exam, but there is plenty of value testing your skills and certifying your advanced level knowledge of a major VMware product.

If you are up for the challenge, Best of luck with your VCAP-CID preparations and exam!