Nutanix Sizing for Resiliency w/ Design Brews

For those of you who are not aware, “DesignBrews” has been around for a long time and although it’s an unofficial sizing tool it’s one I use regularly and recommend.

I spoke with it’s creator Avinash Shetty recently and asked him to add the option to size with N+1 and N+2 as this is something EVERYONE should be doing.

Now DesignBrews has the option to “Plan for” N+1 and N+2 as shown below.

DesignBrewsNPlus

We will of course ignore the option for N+0 as nobody should ever size for N+0.

Sizing 101

If you’re using RF2, size for N+1 as a minimum and use N+2 if you want the ability for the environment to tolerate one full node failure, fully self heal and then be able to tolerate a subsequent failure. This is more common than you may think as the cost of an additional node is insignificant compared to the cost to the business of an outage and the extra node gives significantly more resiliency and even adds performance. Win/Win in my book!

For cluster sizes above 24, I recommend N+2 even if RF2 is used although RF3 is something worth considering at this scale.

If you’re using RF3, size for N+2 as a minimum.

Note: If you want RF3 and the ability to self heal from a dual node failure (either concurrent or subsequent failures) then you need a seven node cluster. If you only wish to self heal from a single node failure, six nodes is an option.

FAQ: Do I ever need more than N+2?

Short answer is YES, in the following scenarios:

If you’re using Block awareness with RF2 and your physical hardware is 4 nodes per block, if you want to be able to tolerate a block failure AND fully self heal you will need N+4 . as a block failure results in the loss of four nodes so you need the available capacity of up to four nodes to be able to fully recover from that.

If you’re using 2 nodes per block, then N+2 will suffice EXCEPT…..

If you’re using Block Awareness with RF3, then you should size for:

N+4 when using Two nodes per block, and N+8 when using Four nodes per block.

Block awareness with RF3 is an extremely resilient configuration (keeping in mind RF2 delivers around 99.999% measured across the Nutanix install base for planned/unplanned outages) and rarely required, but for life/death environments, it’s something worth considering.

For more information on the Resiliency of Nutanix, see my Scalability, Resiliency and Performance blog series which covers many scenarios such as the above.