Unlimited VMs per datastore? Its not a myth with Nutanix!

For many years, I have been asked on countless occasions questions relating to how many VMs can (or should) be placed in one datastore.

In fact, just this morning I was asked this same question, and I decided to whip up a quick post.

I have previously posted an Example Architectural Decision relating to Datastore sizing for Block based storage. What this example was aimed to show was a how things like RPO/RTO and performance should be taken into consideration when choosing a datastore size.

The above example is not a hard and fast rule, but an example of one deployment which I was involved in.

There is a great article written on this topic by VCDX, Jason Boche (@jasonboche), titled  “VAAI and the Unlimited VMs per Datastore Urban Myth” which covers in great detail this topic as it relates to block based storage, being iSCSI, FC & FCoE.

But what about NFS, and what about with Hyper-converged solutions like Nutanix?

NFS has gained significant popularity in recent years, and in my opinion, people who know what they are talking about, no longer refer to NFS as “Tier 3 Storage” which was once common.

With traditional storage solutions, generally only a smaller number of controllers can actively serve IO to the one NFS mount, so the limiting factor preventing running more virtual machines per NFS mount, in my experience was performance but things like RPO/RTO were and are important considerations.

NFS does not suffer from SCSI reservations which resulted in increased latency ,which is what VAAI, specifically the Atomic Test & Set or ATS primitive helped too all but eliminate for block based datastores.

LUNs are limited by there queue depth, which in most cases is 32 (sometimes 64). This is also a limiting factor, as all the VMs in a datastore (LUN) share the same queue which can lead to contention. SIOC helps manage the contention by ensuring fairness based on share values, but it does not solve the issue.

NFS on the other hand has a much larger queue depth, in fact its basically unlimited as shown below.

NFSqueuedepth

So as NFS does not suffer from SCSI reservations, or queue depth issues, what is limiting us having hundreds or more VMs per datastore?

It comes down to how many active storage controllers are able to service the NFS mount, and the performance of the storage controller/s. In addition to this your business requirements around RPO/RTO. In other words, if a NFS mount is lost, how quickly can you recover.

For most traditional shared storage products,

1. Have only 1 or 2 active controllers – thus potentially limiting performance which would lead to lower VMs per NFS datastore.

2. Do snapshots at the NFS mount layer, so if you need to recover an entire NFS mount, the larger it is, the longer it may take.

For Nutanix, by default, NFS is used to present the Nutanix Distributed File System (NDFS) to vSphere, however the key difference between Nutanix and traditional shared storage is every controller in the Nutanix cluster, can and does Actively serve IO to any datastore in the cluster concurrently.

So the limit from a performance perspective is gone thanks to Nutanix scale out, shared nothing architecture, with one virtual storage controller (CVM) per Nutanix node. The number of nodes that’s can be scaled too, is also unlimited. An example of Nutanix ability to scale can be found here – Scaling to 1 million IOPS and beyond, Linearly!

Next what about the RPO/RTO issue? Well, Nutanix does not rely on LUNs or NFS mounts for our data protection (or snapshots), this is all done at a VM layer so your RPO/RTO is now per VM, which gives you much more flexibility.

With Nutanix, you can literally run hundreds or even thousands of VMs per NFS datastore, without performance or RPO/RTO problems thanks to scale out, shared nothing architecture and the Nutanix Distributed File System.

There are some reasons why you may choose to have multiple NFS datastores even in a Nutanix environment, these include, if you want to enable Compression and/or De-duplication which are enabled/disabled on a per container (or datastore) level. As some workloads don’t compress or dedupe well, these types of workloads should be excluded to reduce the overhead on the cluster.

It is important to note, Nutanix uses a concept called a “Storage Pool” which contains all the storage for the Nutanix cluster. On top of a “Storage Pool” you create “Containers” (or datastores). This means regardless of if you have 1 or 100 datastores, they all still sit on top of the one “Storage Pool” which means you still have access to the same amount of storage capacity, with no silos for maximum capacity utilization (and performance!).

Lastly, Nutanix does not suffer from the same availability concerns as traditional shared storage where a single LUN could potentially be lost. This is due to the distributed architecture of the Nutanix solution. For more information on how Nutanix is more highly available than traditional shared storage, check out “Scale out, Shared Nothing Architecture Resiliency by Nutanix

Check out a screen shot of one cluster with ~800 VMs on a single datastore. Note: The sub millisecond latency and 14K IOPS w/ ~900MBps throughput. Not bad!

800VMsonDatastore

My VCAP5-CID (Cloud Infrastructure Design) Exam Experience

Yesterday (17th December 2013) I sat and passed my VMware Advanced Certified Professional 5 – Cloud Infrastructure Design exam, a.k.a VCAP5-CID.

Having sat 4 other VCAP exams, including 3 design exams (DCD4,DCD5 & DTD5) I was confident on what to expect in regards to the exam format, the visio style design tool and the fact that time management has always been key.

So the exam is (as per the blueprint which can be found here)

115 Questions including a mix of multiple-choice, drag-and-drop items and specialized design items

195 Minutes
So lets break this down a bit, 195 mins divide 115 questions is 1.6 mins (or 100 seconds) per question, that’s not a lot when you have 6 x visio style designs to create which can take 5-10 mins each.

So this brings me straight to the first Tip.

Tip # 1 – Time Management

As of yesterday you still cannot go back and review previous questions/answers, so you must move through the exam to be able get to & answer the valuable visio style design and also the drag/drop questions.

Allow for 5-10 mins per Visio style question (These count big on the score, DO NOT RUSH THEM!!)
Allow for 2-5 mins per Drag and Drop style question (maybe 10 in the exam)
Multiple Choice questions you should spent between 20-45 seconds on maximum – If you don’t know the answer, have an educated guess and move on, its not the end of the world if you get some multiple choice questions wrong.

I must say I always like getting visio questions early on, as these are well known to make up a significant part of the score (~50%) and I don’t like being in a position where I have to rush something I know is important.

In this case, my visio style questions where spread evenly throughout the exam, and the last of the 6 was in the last 10 questions, so make sure you manage your time so you can get to, and hopefully answer correctly ALL the visio style questions.

Tip # 2 – Know the Blueprint (properly!)

I found quite a few things I glossed over in the blueprint were covered fairly well in the exam so be prepared to be tested on a wide range of vCloud related topics.

So while you may have good experience in designing vCloud Environments, if you don’t for example work for a service provider, you may have not had much (or any) experience with Chargeback, but this is a part of a vCloud solution and is rightly covered on the exam.

These types of things may catch you off guard, at the depth of some of the questions, but hey, this is a VCAP level exam, not VCP level, so its no meant to be easy.

Tip # 3 – Create a Study Group

I’ll be honest, I felt I had a pretty good preparation for the exam, albeit with some significant distractions in my personal life, and this was because I worked in a study group with two great guys (@Grantorchard & @wheatcloud), who have years of industry experience which made for excellent debates throughout the study process.

Working in a study group is what I credit at least some of my being able to successfully achieve VCDX on the first attempt. In this case, it helped me identify my own weaknesses (yes even VCDXs have weaknesses!) so I could brush up on those areas.

So get a group of people together and work towards VCAP-CID over weeks or months depending on your groups level  of experience.

Tip # 4 – Whiteboard vCloud Solutions

I would recommend for anyone taking the VCAP-CID (or in fact the VCAP-DCD or VCAP-DTD) spend some time on a whiteboard, drawing things like

1. vApp / OrgVDC and External Networking
2. Highly available Chargeback solutions
3. vSphere to Provider VDC to OrgVDC solutions

Get the study group take turns to pose scenarios for one group member to whiteboard a possible solution and discuss what is drawn and the pros/cons and if the solution meets the requirements or not. This will help you practice turning scenarios into diagrams, which you need to be able to do quickly in the exam or you risk running out of time.

General Comments

Overall I would say the VCAP-CID was the least refined VMware exam I have sat, and in fairness this is probably due to the exam being quite new, and im sure a much lower number of participants than other VCAP exams like DCD and DCA.

I spoke with the team who develop the exam and they were very pleased to get feedback on the exam, and much to there credit, acknowledged that most of my feedback was at least in part justified. I hope my feedback will help make the VCAP-CID a better exam, like the rest of the VCAPs.

I found the visio style design tool in at least one case, could not do what I was trying to due which may be a bug with the tool or similar, but this I believe prevented me from completing the question & potentially scoring higher.

I found quite a number of questions (both visio style , drag/drop and multiple choice) appeared (and I say appeared as you don’t have time to re-read every question 5 times to clarify the question) not to have sufficient information to choose between say Option A and Option B – which led to my having to make an assumption, or simply guess.

I think as more and more people sit the exam, as long as feedback is captured by as many participants as possible, the exam could quickly be brought up to the high standard of the other VCAP exams.

While this exam was not the best exam experience I’ve had, I would still recommend anyone who is involved with architecture of vCloud solutions to challenge yourself, prepare for and sit this exam.

vCloud will be around for many years to come, and over time vCAC will creep into the exam, or maybe have its own exam, but there is plenty of value testing your skills and certifying your advanced level knowledge of a major VMware product.

If you are up for the challenge, Best of luck with your VCAP-CID preparations and exam!

 

New FREE eLearning: VMware Horizon Workspace Fundamentals [V1.5]

I have always been impressed with the quality and quantity of free eLearning material that VMware provides, and they have just released eLearning: VMware Horizon Workspace Fundamentals [V1.5]

VMware Horizon Workspace is in my opinion a great product to compliment the EUC vision from VMware, as such I try to keep up to date with all things Horizon.

The course is a great way to get across the Horizon Workspace product, is Self paced and is estimated to last around 2 hour.

The course can be found here – VMware Horizon Workspace Fundamentals [V1.5]

Below is the official overview from VMware.

Self-Paced (2 Hours)

Overview:
The Horizon Workspace Fundamentals free eLearning course will provide you with a fundamental understanding of how to install, configure, and use VMware Horizon Workspace.
In Module 1, you will learn how VMware Horizon Workspace works, the key industry challenges it solves, and you are provided with an overview of the Horizon Workspace end user interface and the Administrator Web interface.

In Module 2, you will learn about the Horizon Workspace architecture and components.

In Module 3, you will learn the main Horizon Workspace installation and configuration tasks.

In Module 4, you will learn how to access the Horizon Workspace interfaces and how to work with the Administrator Web interface to manage Horizon Workspace modules, users, groups, catalog of resources, policies, reports, and settings.

In Module 5, you will learn how users can sign in from the Horizon Workspace Web client and install Horizon Workspace on the appropriate devices. In addition, you will learn how users access their Horizon Workspace applications and how they use Horizon Files to manage and share their files and folders