Nutanix Scalability – Part 5 – Scaling Storage Performance for Physical Machines

Posted on June 26, 2018 by Josh Odgers

Part 3 and Part 4 has taught us that Nutanix provides excellent scalability for Virtual Machines and provides ABS and Volume Group Load Balancer (VG LB) for niche workloads which may require more performance than a single node can provide.

Now that we’ve learned how to scale a Virtual machines performance, let’s see how the same rules apply to physical servers.

So you’ve got your physical server and a Nutanix cluster, now what?

As Part 3 and Part 4 explained, more virtual disks increase the storage performance for virtual machine, the same is true for physical servers using ABS.

Virtual disks will be presented to the physical server via iSCSI (ABS), for optimal performance you should have at least one virtual disk per node in your cluster. The reason for this is each vDisk is managed by a stargate (Nutanix IO engine) instance which runs in every Controller VM (CVM).

If you have a four node cluster, you need to use at least four virtual disks to utilise the four node cluster optimally. For an eight node cluster, eight or more virtual disks is required to ensure all CVMs (stargate instances) can actively provide a boost in performance.

The following tweet shows how the pathing increased from four on the four node cluster and when an additional fours node were added the pathing dynamically changed to use all eight nodes.

How many #Nutanix CVMs service a single bare metal workload when using Acropolis Block Services?#HCI #FightTheFUD pic.twitter.com/yLUrSIaRYG

— Josh Odgers (@josh_odgers) August 7, 2016

Therefore when using ABS for physical workloads, especially those high end database servers, I recommend using a minimum of 8 vDisks however if your cluster size is greater than 8, match the number of vDisks with the cluster size as your starting point.

If you have an 8 node cluster, you could for example use 32 vDisks and these will spread evenly across the nodes, resulting in four per stargate instance which is perfectly fine.

Using more vDisks than your current cluster size also means when additional nodes are added, ABS can dynamically load balance the vDisks across the new and existing nodes to automatically scale your performance.

Let’s cover the same MS Exchange and MS SQL examples covered for Virtual Machines in Parts 3 and 4 but now specifically for physical servers using ABS.

Let’s say we have an MS Exchange server with 20 databases, the performance requirements for each database is typically in the range of hundred of IOPS, in which case I would recommend one virtual disk (e.g.: VMDK) per database and another for the logs.

In the case of a large MS SQL server which may require tens or hundreds of thousands of IOPS to a single database, I recommend using multiple vDisks per database which involves Splitting SQL datafiles across multiple VMDKs to optimise VM physical server performance.

Sound familiar? The above two paragraphs are literally a copy/paste from Part 3 because the exact same rules apply to physical servers and virtual machines. Simple right!

Still need more performance?

Again, the exact same rules apply to physical servers with ABS as they do to virtual machines. In no particular order, as we’ve learned from Part 3 & 4:

Increase the vCPU of the Nutanix Controller VM (CVM)
Increase the vRAM of the Nutanix Controller VM (CVM)
Add storage only nodes

Can’t get much easier than that!

Summary:

From Parts 3, 4 and 5 we have learned that Nutanix provides the ability to scale the performance of individual servers, be it physical or virtual using the same simple strategies of adding virtual disks, storage only nodes or Controller VM (CVM) resources and how doing so increases performance to meet virtually (pun intended) any performance requirements.

Is there any reason you couldn’t confidently say Nutanix is doing 3 tier better than the SAN vendors? I’d love to hear if you have any corner cases.

Back to the Scalability, Resiliency and Performance Index.

The All-Flash Array (AFA) is Obsolete!

Posted on July 25, 2016 by Josh Odgers

Over the last few years, I’ve had numerous customers ask about how Nutanix can support bare metal workloads. Up until recently, I haven’t had an answer the customers have wanted to hear.

As a result, some customers have been stuck using their exisiting SAN or worse still being forced to go out and buy a new SAN.

As a result many customers who have wanted to use or have already deployed hyperconverged infrastructure (HCI) for all other workloads are stuck managing an all flash array silo to service some bare metal workloads.

In June at .NEXT 2016, Nutanix announced Acropolis Block Services (ABS) which now allows bare metal workloads to be serviced by new or existing Nutanix clusters.

As Nutanix has both hybrid (SSD+SATA) and all-flash nodes, customers can chose the right node type/s for their workloads and present the storage externally for bare metal workloads while also supporting Virtual Machines and Acropolis File Services (AFS) and containers.

So why would anyone buy an all-flash array? Let’s discuss a few scenarios.

Scenario 1: Bare metal workloads

Firstly, what applications even need bare metal these days? This is an important question to ask yourself. Challenge the requirement for bare metal and see if the justifications are still valid and if so, has anything changed which would allow virtualization of the applications. But this is a topic for another post.

If a customer only needs new infrastructure for bare metal workloads, deploying Nutanix and ABS means they can start small and scale as required. This avoids one of the major pitfalls of having to size a monolithic centralised, dual controller storage array.

While some AFA vendors can/do allow for non-disruptive controller upgrades, it’s still not a very attractive proposition, nor is it quick or easy. and reduces resiliency during the process as one of two controllers are offline. Nutanix on the other hand performs one click rolling upgrades which mean the largest the cluster, the lower the impact of an upgrade as it is performed one node at a time without disruption and can also be done without risk of a subsequent failure taking storage offline.

If the environment will only ever be used for bare metal workloads, no problem. Acropolis Block Services offers all the advantages of an All Flash Array, with far superior flexibility, scalability and simplicity.

Advantages:

Start small and scale granularly as required allowing customers to take advantage of newer CPU/RAM/Flash technologies more frequently
Scale performance and capacity by adding node/s
Scale capacity only with storage-only nodes (which come in all flash)
Automatically scale multi-pathing as the cluster expands
Solution can support future workloads including multiple hypervisors / VMs / file services & containers without creating a silo
You can use Hybrid nodes to save cost while delivering All Flash performance for workloads which require it by using VM flash pinning which ensures all data is stored in flash and can be specified on a per disk basis.
The same ability as an all flash array to only add compute nodes.

Disadvantages:

Your all-flash array vendor reps will hound you.

Scenario 2: Mixed workloads inc VMs and bare metal

As with scenario 1, deploying Nutanix and ABS means customers can start small and scale as required. This again avoids the major pitfall of having to size a monolithic centralised, dual controller storage array and eliminates the need for separate environments.

Virtual machines can run on compute+storage nodes while bare metal workloads can have storage presented by all nodes within the cluster, including storage-only nodes. For those who are concerned about (potential but unlikely) noisy neighbour situations, specific nodes can also be specified while maintaining all the advantages of Nutanix one-click, non-disruptive upgrades.

Advantages:

Start small and scale granularly as required allowing customers to take advantage of newer CPU/RAM/Flash technologies more frequently
Scale performance and capacity by adding node/s
Scale capacity only with storage-only nodes (which also come in all flash)
Automatically scale multi-pathing for bare metal workloads as the cluster expands
Solution can support future workloads including multiple Hypervisors / VMs / file services & containers without creating a silo.

Disadvantages:

Your All-Flash array vendor reps will hound you.

What are the remaining advantages of using an all flash array?

In all seriousness, I can’t think of any but for fun let’s cover a few areas you can expect all-flash array vendors to argue.

Performance

Ah the age old appendage measuring contest. I have written about this topic many times, including in one of my most popular posts “Peak performances vs Real world performance“.

The fact is, every storage product has limits, even all-flash arrays and Nutanix. The major difference is that Nutanix limits are per cluster rather than per Dual Controller Pair, and Nutanix can continue to scale the number of nodes in a cluster and continue to increase performance. So if ultimate performance is actually required, Nutanix can continue to scale to meet any performance/capacity requirements.

In fact, with ABS the limit for performance is not even at the cluster layer as multiple clusters can provide storage to the same bare metal server/s while maintaining single pane of glass management through PRISM Central.

I recently completed some testing with where I demonstrated the performance advantage of storage only nodes for virtual machines as well as how storage-only nodes improve performance for bare metal servers using Acropolis Block Services which I will be publishing results for in the near future.

Data Reduction

Nutanix has had support for deduplication, compression for a long time and introduced Erasure Coding (EC-X) mid 2015. Each of these technologies are supported when using Acropolis Block Services (ABS).

As a result, when comparing data reduction with all-flash array vendors, while the implementation of these data reduction technologies varies between vendors, they all achieves similar data reduction ratios when applied to the same dataset.

Beware of some vendors who include things like backups in their deduplication or data reduction ratios, this is very misleading and most vendors have the same capabilities. For more information on this see: Deduplication ratios – What should be included in the reported ratio?

Cost

Here we should think about what are the age old problems are with centralized shared storage (like AFAs)? Things like choosing the right controllers and the fact when you add more capacity to the storage, you’re not (or at least rarely) scaling the controller/s at the same time come to mind immediately.

With Nutanix and Acropolis Block Services you can start your All Flash solution with three nodes which means a low capital expenditure (CAPEX) and then scale either linearly (with the same node types) or non-linearly (with mixed types or storage only nodes) as you need to without having to rip and replace (e.g.: SAN controller head swaps).

Starting small and scaling as required also allows you to take advantage of newer technologies such as newer Intel chipsets and NVMe/3D XPoint to get better value for your money.

Starting small and scaling as required also minimizes – if not eliminates – the risk of oversizing and avoids unnecessary operational expenses (OPEX) such as rack space, power, cooling. This also reduces supporting infrastructure requirements such as networking.

Summary:

As shown below, the Nutanix Acropolis Distributed Storage Fabric (ADSF) can support almost any workload from VDI to mixed server workloads, file, block , big data, business critical applications such as SAP / Oracle / Exchange / SQL and bare metal workloads without creating silos with point solutions.

In addition to supporting all these workloads, Nutanix ADSF scalability both from a capacity/performance and resiliency perspective ensures customers can start small and scale when required to meet their exact business needs without the guesswork.

With these capabilities, the All-Flash array is obsolete.

I encourage everyone to share (constructively) your thoughts in the comments section.

Note: You must sign in to comment using WordPress, Facebook, LinkedIn or Twitter as Anonymous comments will not be approved,

Related Articles:

Things to consider when choosing infrastructure.
Scale out performance testing with Nutanix Storage Only Nodes
What’s .NEXT 2016 – Acropolis Block Services (ABS)
Scale out performance testing of bare metal workloads on Acropolis Block Services (Coming soon)
What’s .NEXT 2016 – Any node can be storage only
What’s .NEXT 2016 – All Flash Everywhere!

Nutanix AHV I/O path efficiency

Posted on December 18, 2015 by Josh Odgers

The I/O path in AHV is unlike other hypervisors and is remarkably simple. Each VM is made up of one or more vDisks, with each vDisk presented directly to the VM via iSCSI. vDisks appear to the guest OS as if they were a physical disk or the same as a VMDK does in vSphere environments and do not require any special in guest configuration.

The I/O path for each vDisk bypasses the underlying QEMU storage stack and has a direct TCP connection to the iSCSI target on the local Controller VM. This bypasses any/all queues at the hypervisor layer and allows Stargate to manage the one and only queue.

Importantly, every single vDisk has its own TCP connection to stargate which means vdisks do not share any queues until they hit the storage controller (stargate). This reduces points of contention to Stargate itself and as every AHV node runs a stargate instance (within the CVM), only VMs on the same node share the queue for stargate, further reducing the chances of contention.

For those of you who are not familiar with the underlying Nutanix architecture, check out the below video describing what stargate does.

Because the vDisk is presented as a LUN via iSCSI the commands being sent do not require SCSI protocol emulation and simply send native SCSI commands.

The below diagram shows a VM with 3 vDisks and how they connect to Stargate. You will note QEMU is completely bypassed which optimises the I/O path.

If a Virtual machine has more than 3 vDisks, each additional vDisk will have its own TCP connection.

In the event the local Stargate instance is offline for any reason (e.g.: Rolling One-Click upgrade or CVM failure) each TCP connection will be redirected in a round robin manner across all the CVMs within the Nutanix cluster as described in Acropolis Hypervisor (AHV) I/O Failover & Load Balancing.

2. Advanced Storage Performance Monitoring with Nutanix

3. Why AHV is the next generation hypervisor – 10 Part Series

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Tag Archives: iSCSI

Nutanix Scalability – Part 5 – Scaling Storage Performance for Physical Machines

The All-Flash Array (AFA) is Obsolete!

Nutanix AHV I/O path efficiency

Share this:

Share this:

Share this: