Advanced Storage Performance Monitoring with Nutanix

Nutanix provides excellent performance monitoring and analytic capabilities through our HTML 5 based PRISM UI, but what if you want to delve deeper into the performance of a specific business critical application?

Nutanix also provides advanced storage performance monitoring and workload profiling through port 2009 on any CVM which shows very granular details for Virtual disks.

By default, Nutanix secures our CVM and the http://CVM_IP:2009 page is not accessible, but for advanced troubleshooting this can be enabled by using the following command.

sudo iptables -t filter -A WORLDLIST -p tcp -m tcp –dport 2009 -j ACCEPT

 

When accessing the 2009 page (which is part of the Nutanix process called “Stargate”) you will see things like Extent (In Memory Read) cache usages and hits as well as much more.

On the main 2009 page you will see a section called “Hosted VDisks” (shown below) which shows all the current VDisks (equivalent of a VMDK in ESXi) which are currently running on that node.

HostedvDisks

 

The Hosted VDisks shows high level details about the VDisk such as Outstanding Operations, capacity usage, Read/Write breakdown and how much data is in the OpLog (Persistent Write Cache).

If you need more information, you can click on the “VDisk Id” and you will get to a page titled “VDisk XXXXX Stats” where the XXXXX is the VDisk ID.

The below is some of the information which can be discovered in the VDisk Stats Page.

VDisk Working Set Size (WWS)

The working set size can be thought of as the data which you would ideally want to fit within the SSD tier of a Nutanix node, which would result in all-flash type performance.

In the below example, in the last 2mins, the VDisk had a combined (or Union) working set of 6.208GB and over the last 1hr over 111GB.

WSSExchange

 

 

VDisk Read Source

The Read Source is simply what tier of storage is servicing the VDisks IO requests. In the below example, 41% was from Extent Cache (In Memory), 7% was from the SSD Extent Store and 52% was from the SATA Extent Store.
ReadSource

 

In the above example, this was an Exchange 2013 workload where the total dataset was approx 5x the size of the SSD tier. The important point here is its not always possible to have all data in the SSD tier, but its critical to ensure consistent performance. If 90% was being served from SATA and performance was not acceptable, you could use this information to select a better node to migrate (vMotion) the VM too, or help choose to purchase a new node.

VDisk Write Destination

The Write Destination is fairly self explanatory, if its Oplog it means its Random IO and its being written to SSD, if its straight to the extent store (SSD) it means the IO is either sequential, OR in rare cases the OpLog is being bypassed if the SSD tier reached 95% full (which is generally prevented by Nutanix ILM tiering process).

WriteDestination

VDisk Write Size Distribution

The Write Size Distribution is key to determining things like the Windows Allocation Size when formatting drives as well as understanding the workload.

WriteSizeOverall

VDisk Read Size Distribution

The Read Size Distribution is similar to Write Size in that its key to determining things like the Windows Allocation Size when formatting drives as well as understanding the workload. In this case, a 64k allocation size would be ideal as both the Write (shown above) and the Read (below) are >32K and <64K 86% of the time. (Which is expected as this was an Exchange 2013 workload).

ReadSizeExchange

VDisk Write Latency

The Write Latency shows the percentage of Write I/O which are serviced within the latency ranges shown. In this case, 52% of writes are sub-millisecond. It also shows for this vDisk 1% of IO being outliers being served between 5-10ms. This is something that outside of a lab, if the outliers were a significant percentage that could be investigated to ensure the VM disk configuration (e.g.: PVSCSI and number of VMDKs) is optimal.

WriteLatency

VDisk Ops and Randomness

Here we see the number of IOPS, the Read/Write split, MB/s and the split between Random and Sequential.

vDisksOps

Summary

For any enterprise grade storage solution, it is important that performance monitoring be easy as it is with Nutanix via PRISM UI, but also to be able to quickly and easily dive deep into very granular details about a specific VM or VDisk. The above shows just a glimpse of the information which is tracked by default for all VDisks allowing customers , partners and Nutanix support to quickly and easily monitor & profile workloads.

Importantly these capabilities are hypervisor agnostic giving customers the same capabilities no matter what choice/s they make.

 

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

3. Advanced Storage Performance Monitoring with Nutanix

4. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

5. Nutanix – Erasure Coding (EC-X) Deep Dive

6. Acropolis: VM High Availability (HA)

7. Acropolis: Scalability

8. NOS & Hypervisor Upgrade Resiliency in PRISM

Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

As cluster sizes increase, it is important to understand the chance of multiple concurrent failure also increases and to architect solutions to ensure resiliency is maintained.

Because scalability is one of many strengths of the Nutanix Distributed Storage Fabric, Nutanix supported multiple data protection levels (RF2 and RF3) to ensure resiliency could be scaled with cluster size.

However using RF3 results in reducing the usable capacity to approximately 33% of the formatted capacity of the drives within the cluster which means it is sometimes considered undesirable.

But because some customers require the ability to support multiple concurrent node failures without the chance of data loss or unavailability, RF3 has been required.

Enter Nutanix Erasure Coding (EC-X)!

Now lets say you have a 32 node cluster where each node has 10TB RAW.

With RF3 we would have approx 3.33TB usable per node for a total of 106.56TB in the cluster.

With EC-X enabled (assuming EC-X has been applied to all data) the usable capacity would DOUBLE to 6.66TB per node and 213.12TB for the cluster.

Here’s how it works.

For RF3, the Nutanix Distributed Storage Fabric writes and maintains three copies of each piece of data. The below shows three copies of data “A” and “B”.

RF3

The below is a simplified example of what the Nutanix Distributed Storage Fabric looks like once EC-X is applied to RF3 data.

RF3plusECX

As you can see, we now support twice the amount of data as RF3 while still having dual parity. As a result, using RF3 + EC-X gives customers using large clusters MORE usable capacity than RF2 (~50% of RAW) while providing dual parity (which enables the loss of two nodes without data loss/unavailability).

Not bad for a software only upgrade!

So what do I recommend customers who are running 32 node or larger clusters?

1. For customers running RF3 already, Consider enabling EC-X.
2. For customers running RF2, consider enabling RF3 and EC-X

Acropolis: VM High Availability (HA)

This past week at Nutanix .NEXT, Acropolis was officially announced although it has actually been available and running in many customer environments (1200+ nodes globally) for a long time.

One of the new features is VM High Availability.

As with everything Nutanix, VM HA is a very simple yet effective feature. Let’s go through how to configure HA via the Acropolis/PRISM HTML 5 interface.

As shown below, using the “Options” menu represented by the cog, there is an option called “Manage VM High Availability”.

HAMenu

The Manage VM High Availability has 2 simple options shown below:

1. Enable VM High Availability (On/Off)
2: Best Effort / Reserve Space

Best Effort works as you might expect where in the event of a node failure, VMs are powered on throughout the cluster if resources are available.

In the event resources e.g.: Memory, are not available then some/all VMs may not be powered on.

HAonBestEffort

Reserve Space also works as you might expect by reserving enough compute capacity within the cluster to tolerate either one or two node failures. If RF2 is configured then one node is reserved and if RF3 is in use, two nodes are reserved.

Pretty simple right!

HAonReserveSpace

The best part about Reserve Space is its like “Host failures cluster tolerates” in vSphere, however without using the potentially inefficient slot size algorithm.

Once HA is enabled, it appears on the Home screen of PRISM and gives a summary of the VMs which are On,Off and Suspended as shown below.

HAHomeScreen

HA can also be enabled/disabled on a per VM basis via the VMs tab. Simply highlight the VM and click “Update” as shown below.

VMHAupdate

Then you will see the “Update VM” popup appear. Then simply Enable HA.

VMHA

In the above screenshot you can see that the popup also warns you if HA is disabled at the cluster level and allows you to jump straight to the Manage VM High Availability configuration menu.

So there you have it, Acropolis VM High Availability, simple as that.

Related Articles:

1. Acropolis: Scalability
2. What’s .NEXT? – Acropolis!
3. What’s .NEXT? – Erasure Coding!