How to view a VMs Active Working Set in PRISM

Knowing a Virtual Machines Active Working Set is critical to ensuring all flash performance in any hybrid storage solution (Flash + SAS or SATA).

Because this is so critical, Nutanix has tracked this information for a long time via the hidden 2009 page. However as this information being available has proven to be so popular, it was included in PRISM in the latest release of Nutanix Acropolis Base Version 4.5.

The working set size for a virtual machines active working set can be viewed on a per vdisk basis across all supported hypervisors including ESXi, Hyper-V, KVM and the Acropolis Hypervisor (AHV).

To view this information, from the “Home” screen of PRISM, select the “VM” as shown below:

Note: The following screen shots were taken from an environment running Acropolis Base Version 4.5 and Acropolis Hypervisor 20150921 but the same process is applicable to any hypervisor.

PRISMVMmenu

Next highlight the Virtual Machine you wish to view details on, In the example below VM “Jetstress01” has been highlighted.VMlist

Below the above section you will see the VM summary as shown below. To view the working set size, Select “Virtual Disks” then the “Additional Stats” option which will show the following display:WorkingSetSizeAdditionalDetailsAs we can see the following information is displayed on a per vdisk granularity:

  1. Read / Write Latency
  2. Total IOPS
  3. Random IO percentage
  4. Read Throughput from Extent Cache / SSD and HDD
  5. Read Working set size
  6. Write Working set size
  7. Union Working set size

With the above information it is easy to calculate what node type and SSD capacity is most suitable for the virtual machine. This is something I would recommend customers running business critical applications check out.

If the “Read Source HDD” is showing frequent throughput and performance is lower than desired, moving the VM to a node with a larger SSD capacity will help performance. Alternatively if there are no nodes with a larger SSD tier, enabling in-line compression and/or Erasure Coding can help increase the effective SSD tier capacity and allow a larger working set size to be served from SSD.

If compression and EC-X are enabled and the SSD tier is still insufficient, additional nodes with larger SSD tier can be non disruptively added to the cluster and the virtual machine/s migrated regardless of hypervisor.

Acropolis Base Version 4.5 adds a lot of enhancements such as this so I recommend customers perform the one click upgrade and start exploring and utilizing this additional information.

Data Locality & Why is important for vSphere DRS clusters

I have had a lot of people reach out to me since VMworld SFO, where I was interviewed by Eric Sloof (@esloof) on VMworldTV (interview can be seen here) about Nutanix.

So I thought I would expand on the topic of Data Locality and why it is so important for vSphere DRS clusters to maintain consistent high performance and low latency.

So first, the below diagram shows three (3) Nutanix nodes, and one (1) Guest VM.

NutanixLocalRead

The guest VM is reading data from the local storage in the Nutanix node and as a result this read access is very fast. The read I/O will be served from one of 4 places.

1. Extent Cache (DRAM – For “Active Working Set”)
2. Local SSD (For “Active Working Set”)
3. Local SATA (Only for “Cold” data)

and the forth we will discuss is a moment.

So as a result for Read I/O

1. There is no dependency on a Storage Area Network (FCoE, IP, FC etc)
2. Read I/O from one node does not contend with another node
3. Read I/O is very low latency as it does not leave the ESXi host
4. More Network bandwidth is available for Virtual Machine traffic, ESXi Mgmt, vMotion , FT etc

But wait, the what happens if DRS (or a vSphere admin) vMotion’s a VM to another node? – I’m glad you asked!

The below shows what happens immediately after a vMotion

NutanixAftervmotion

As you can see, only the Purple data is local to the new node, so transparently to the virtual machine, if/when remote data is required by the VM (ie: The VMs “Active Working Set”) the Nutanix controller VM (CVM) will get the requested data over the 10GB Network in 1MB extents. (It does not do a bulk movement or “Storage vMotion” type movement of all the VMs data EVER!)

And, all future Write I/O will be written local, so future Read I/O will all be local by default.

So, the worst case scenario for a read I/O in a Nutanix environment, is that the required data is not available locally and the CVM will access the data over a 10GB network.

This scenario will only occur in situations where

1. Maintenance is occurring and hosts are rebooted
2. A Host Failure (HA restarts VM on another node)
3. Following a vMotion

Generally in BAU (Business as Usual) operation Read I/O should be local in the high 90% range.

So the worst case scenario for Read I/O on a vSphere Cluster running on Nutanix, is actually the Best case scenario for a traditional storage array, because in a traditional array all data is accessed over some form of storage network and generally via a small number of controllers.

It is important to note, the Nutanix DFS (Distributed File System) only accesses data over the network when its required by the VM at a granular (1MB extent) level. So only the “Active Working Set” will be accessed over the 10Gb network, before being copied locally, again in 1MB extents. So if the data is not “Active” having it remotely does not impact performance at all so moving the data would create an overhead on the environment for no benefit.

In the event 90% of a VMs data is on a remote node, but the “Active Working Set” is local, read performance will all be at local speeds, again from Extent Cache (DRAM), Local SSD or Local SATA (for “cold” data).

Now some vendors are working on or have some local caching capabilities which in my experience are not widely deployed and have various caveats such as Operating System version, and in guest drivers, but for the vast majority of environments today, these technologies are not deployed.

The Nutanix DFS has data locality built in, it works with any hypervisor , Guest OS and does not require any configuration.

So now you know why ensuring the Active Working Set (data) is as close to the VM is essential for consistent high performance and low latency.

Related Articles

1. Write I/O Performance & High Availability in a scale-out Distributed File System