NOS 4.5 Delivers Increased Read Performance from SATA

In a recent post I discussed how NOS 4.5 increases the effective SSD tier capacity by performing up-migrations on only the local extent as opposed to both RF copies within the Nutanix cluster. In addition to this significant improvement in usable SSD tier, in NOS 4.5 the read performance from the SATA tier has also received lots of attention from Nutanix engineers.

What the Solutions and Performance Engineering team have discovered and been testing is how we can improve SATA performance. Now ideally the active working set for VMs will fit within the SSD tier, and the changes discussed in my previous post dramatically improve the chances of that active working set fitting within the SSD tier.

But there are situation when reads to cold data still need to be serviced by the slow SATA drives. Nutanix uses Data Locality to ensure the hot data remains close to the application to deliver the lowest latency and overheads which improve performance, but in the case of SATA drives and the fact data is infrequently accessed from SATA means that reading from remote SATA drives can improve performance especially where the number of local SATA drives is limited (in some cases to only 2 or 4 drives).

Most Nutanix nodes have 2 x SSD and 4 x SATA so best case you will only see a few hundred IOPS from SATA as that is all they are physically capable of. To get around this issue.

NOS 4.5 introduces some changes to the way in which we select a replica to read an egroup from the HDD tier. Periodically NOS (re)calculate the average IO latencies of the all the replicas of a vdisk’s (replicas which have the vdisk’s egroups). We use this information to choose a replica as follows:

  1. If the latency of the local replica is less than a configurable threshold, read from the local replica.
  2. If the latency of the local replica is more than a configurable threshold, and the latency of the remote replica is more than that of the local replica, prefer the local replica.
  3. If the latency of the local replica is more than a configurable threshold and the remote replica is lower than the configurable threshold OR lower than the local copy, prefer the remote replica.

The diagram below shows an example of where the VM on Node A is performing random reads to data A and shortly thereafter data C. When requesting reads from data A the latency is below the threshold but when it requests data C, NOS detects that the latency of the local copy is higher than the remote copy and selects the remote replica to read from. As the below diagram shows, one possible outcome when reading multiple pieces of data is one read is served locally and the other is serviced remotely.

remotesatareads2

Now the obvious next question is “What about Data Locality”.

Data Locality is being maintained for the hot data which resides in SSD tier because reads from SSD are faster and have lower overheads on CPU/Network etc when read locally due to the speed of SSDs. For SATA reads which are typical >5ms the SATA drive itself is the bottleneck not the network, so by distributing the Reads across more SATA drives even if they are not local, results in better overall performance and lower latency.

Now if the SSD tier has not reached 75% all data will be within the SSD tier and will be served locally, the above feature is for situations where the SSD tier is 75% full and data is being tiered to SATA tier AND random reads are occurring to cold data OR data which will not fit in the SSD tier such as very large databases.

In addition NOS 4.5 detects if the read I/O is random or sequential, and if its sequential (which SATA performance much better at) then the up-migration of data has a higher threshold to meet before being migrated to SSD.

The result of these algorithm improvements (and the increased SSD tier effective capacity discussed earlier) and Nutanix In-line compression is higher performance over larger working sets which also exceed the capacity of the SSD tier.

Effectively NOS 4.5 is delivering a truly scale out solution for read I/O from SATA tier which means one VM can be reading from potentially all nodes in the cluster ensuring SATA performance for things like Business Critical Applications is both high and consistent. Combine that with NX-6035C storage only nodes, this means SATA read I/O can be scaled out as shown in the below diagram without scaling compute.

ScaleOutRemoteReads

 

As we can see above, the Storage only Nodes (NX-6035C) are delivering additional performance for read I/O from the SATA tier (as well as from the SSD tier).

What’s .NEXT? – Scale Storage separately to Compute on Nutanix!

Since I joined Nutanix, I have heard from customers that they want to scale storage (capacity) separate to compute as they have done in traditional SAN/NAS environments.

I wrote an article a while ago about Scaling problems with traditional shared storage which discusses why scaling storage capacity separately can be problematic. As such I still believe scaling capacity separately is more of a perceived advantage than a real one in most cases, especially with traditional SAN/NAS.

However here at Nutanix we have locked ourselves away and brainstormed how we can scale capacity without degrading performance and without loosing the benefits of a Nutanix Hyper-Converged platform such as Data Locality and linear scalability.

At the same time, we wanted to ensure doing so didn’t add any unnecessary cost.

Introducing the NX-6035c , a new “Storage only” node!

What is it?

The NX-6035c is a 2 node per 2 RU block, which has 2 single socket servers with 1 SSD and 5 x 3.5″ SATA HDDs and 2 x 10GB NICs for network connectivity.

How does it work?

As with all Nutanix nodes, the NX-6035c runs the Nutanix Controller VM (CVM) which presents the local storage to the Nutanix Distributed File System (NDFS).

The main difference between the NX-6035c and other Nutanix nodes is that it is not a member of the hypervisor cluster and as a result does not run virtual machines, but it is a fully functional member of the NDFS cluster.

The below diagram shows a 3 node vSphere or Hyper-V cluster with storage presented by a 5 node NDFS cluster using 3 x NX-8150s as Compute+Storage and 2 x NX-6035C nodes as Storage only.

6035cinndfscluster

Because the NX-6035c does not run VMs, it only receives data via Write I/O replication from Resliency Factor 2 or 3 and Disk Balancing.

This means for every NX-6035c in an NDFS cluster, the Write performance for the cluster increases because of the additional CVM. This is how Nutanix ensures we avoid the traditional capacity scaling issues of SAN/NAS.

Rule of thumb: Don’t scale capacity without scaling storage controllers!

The CVM running on the NX-6035c also provides data reduction capabilities just like other Nutanix nodes, so data reduction can occur with even lower impact on Virtual Machine I/O.

What about Hypervisor licensing?

The NX-6035c runs the CVM on a Nutanix optimized version of KVM which does not require any hypervisor licensing.

For customers using vSphere or Hyper-V, the NX-6035c provides storage performance and capacity to the NDFS cluster which serves the hypervisor.

This results is more storage capacity and performance with no additional hypervisor costs.

Want more? Check out how Nutanix is increasing usable capacity with Erasure Coding!