SQL & Exchange performance in a Virtual Machine

The below is something I see far to often: An SQL or Exchange virtual machine using a single LSI Logic SAS virtual SCSI controller.

LSIlogic

What is even worse is a virtual machine using a single LSI controller and a single virtual disk for one or more databases and logs (as shown above).

Why is this so common?

Probably because the LSI Logic SAS controller is the default for Windows 2008/2012 virtual machines and additional SCSI controllers are not automatically added until you have more than 16 virtual disks for a single VM.

Why is this a problem?

The LSI controller has a queue depth limit of 128, compared to the default limit for PVSCSI which is 256, however it can be tuned to 1024 for higher performance requirements.

As a result, the a configuration with a single LSI controller and/or a limited number of virtual disks can artificially significantly constrain the underlying storage from delivering the performance it is capable of.

Another problem with the LSI controller is the amount of CPU it uses is higher than the PVSCSI controller for the same IO levels. This means you’re wasting virtual machine (and the underlying hosts) CPU resources unnecessarily.

Using more CPU could lead to other problems such as CPU Ready which can also lead to reduced performance.

A colleague and friend of mine, Michael Webster wrote a great post titled: Performance Issues Due To Virtual SCSI Device Queue Depths where he shows the performance difference between SATA, LSI and PVSCSI controllers. I highly recommend having a read of this post.

What is the solution?

Using multiple Paravirtual (PVSCSI) adapters with virtual disks evenly spread over the four controllers for Windows virtual machines is a no brainer!

This results in:

  1. Higher default queue depth
  2. Lower CPU overheads
  3. Higher potential performance

How do I configure this?

It’s fairly straight forward, but don’t just change the LSI Controller too PVSCSI as the Guest OS may not have the driver installed which will result in the VM failing to boot.

Too avoid this, simply edit the virtual machine and add a new Virtual Disk of any size and for the virtual device node, select SCSI (1:0) and follow the prompts.

VirtualDiskSCSI10

Once the new virtual disk is added you should see a new LSI Logic SAS SCSI controller is added as shown below.

NewLSIController

Next highlight the adapter and select “Change Type” in the top right hand corner of the window and select Paravirtual. Once this is complete you should see similar to the below:

AddPVSCSIController

Next hit “Ok” and the new Controller and virtual disk will be added to the VM.

Now we open the console of the VM and open Compute Management and goto Device Manager. Under Storage Controllers you should now see VMware PVSCSI Controller as shown below.

DeviceManagerPVSCSI

Now we are safe to Shutdown the VM.

Once the VM is shutdown, Edit the VM setting and highlight the SCSI Controller 0 and select Change Type as we did earlier and select Paravirtual. Once this is done you will see the original controller is replaced with a new controller.

ChangeLSItoPVSCSI

Now that we have the boot drive change to PVSCSI, we can now balance the data drives across up to four PVSCSI controllers for maximum performance.

To do this, simply highlight a Virtual Disk and drop down the Virtual Device Node and select SCSI (1:0) or any other available slot on the SCSI (1:x) controller.

ChangeControllerID

After doing this you will see new SCSI controllers appear and you need to change these to Paravirtual as we have done to the first controller.

ChangeControllerIDMultipleVdisks

For each of the virtual disks, ensure they are placed evenly across the PVSCSI controllers. For example, if you have a VM with eight virtual disks plus the OS disk, it should look like this:

Virtual Disk 1 (OS) : SCSI (0:0)
Virtual Disk 2 (OS) : SCSI (0:1)
Virtual Disk 3 (OS) : SCSI (1:0)
Virtual Disk 4 (OS) : SCSI (1:1)
Virtual Disk 5 (OS) : SCSI (2:0)
Virtual Disk 6 (OS) : SCSI (2:1)
Virtual Disk 7 (OS) : SCSI (3:0)
Virtual Disk 8 (OS) : SCSI (3:1)
Virtual Disk 9 (OS) : SCSI (0:2)

This results in two data virtual disks per PVSCSI controller which evenly distributes IO across all controllers with the exception being first controller (SCSI 0) also hosting the OS drive.

What if I have problems?

On occasions I have seen problems with this process which has resulted in VMs not booting, however these issues are easy to fix.

If your VM fails to boot with a message like “Operating System not found”, I suggest you panic! Just kidding, this is typically just the boot order of the Virtual machine has been screwed up. Just go into the bios and check the boot order has the PVSCSI controller showing and the correct virtual disk in first priority.

If the VM boots and BSOD or crashes and goes into a continuous reboot loop then power off the VM and set the first SCSI controller where the boot disk is running back to LSI. Then reboot the VM and make sure the PVSCSI driver is showing up (if its not you didn’t follow the above instructions) so go back and follow them so the PVSCSI driver is loaded and working, then shutdown and change the SCSI controller back to PVSCSI and you should be fine.

If the VM boots and one or more drives do not show up in my computer, go into Disk Manager and you may see the drives are marked as offline. Simply right click the drive and mark it as online and reboot and you’re good to go.

Summary:

If you have made the intelligent move to virtualize your business critical applications, firstly congratulations! However as with physical hardware, Virtual machines also have optimal configurations so make sure you use PVSCSI controllers with multiple virtual disks and have your DBA span the database across multiple virtual disks for maximum performance.

The following post shows how to do this in detail:

Splitting SQL datafiles across multiple VMDKs for optimal VM performance

If the DBA is not confident doing this, you can also just add multiple virtual disks (connected via multiple PVSCSI controllers) and create a stripe in guest (via Disk Manager) and this will also give you the benefit of multiple vdisks.

Related Articles:

1. Peak Performance vs Real World Performance

2. Enterprise Architecture & Avoiding tunnel vision

3. Microsoft Exchange 2013/2016 Jetstress Performance Testing on Nutanix Acropolis Hypervisor (AHV)

What’s .NEXT 2016 – Acropolis Block Services (ABS)

Acropolis Block Services or ABS (not to be confused with Anti-lock Braking Systems), is an extension of the In-Guest iSCSI Nutanix announced at .NEXT 2015.

The original goal of the In-Guest iSCSI was to enable support for applications like MS Exchange which are not supported on NFS and applications such as SQL clustering for quorum drives, and this has been very successful. However customers have been telling us for a number of years they want to make Nutanix the standard platform for their datacenters, however they have not been able to realise this vision due to a number of reasons including:

  • The desire/requirement to re-use existing servers
  • Applications which are not virtual (for many reasons, mostly political)
  • Performance / Scalability of externally connected servers
  • Complexity including operational considerations of external iSCSI

Let’s discuss each of these topics and how ABS solves these challenges.

Re-using existing servers

As it’s uncommon for customers to be at the exact right time in the refresh cycle for servers and storage to replace all infrastructure at once, ABS allows customers to either get started with Nutanix by deploying some nodes/blocks, or to scale the existing environment/s while being able to use the Acropolis Distributed Storage Fabric (ADSF) to provide storage to existing HCI workloads and non HCI workloads.

A couple of key advantages of ABS compared to the existing In-Guest iSCSI support and traditional SAN/NAS is:

  • ABS load balances and optimizes paths so MPIO and ALUA are not needed
  • New storage is automatically added without requiring client-side changes

The downside to using ABS as a stop gap until the refresh cycle for the compute hardware is that is does add complexity which I discuss in this article from July 2015.

Scaling Hyper-converged solutions – Compute only

However, if the goal is to maximise the return on investment (ROI) of existing infrastructure, ABS is in my opinion a better option than having another silo of storage to install/configure and manage as it:

  • ABS load balances and optimizes paths so MPIO and ALUA are not needed
  • New storage is automatically added without requiring client-side changes
  • Removes the requirement for another silo.
  • Increases performance/capacity/resiliency of an existing cluster
  • Allows customers to standardize their infrastructure
  • Gives customers flexibility to quickly add/remove nodes from a cluster/s to meet requirements.

Scalability:

ABS ensures linear and automated scalability by creating virtual targets to ensure performance is not limited by iSCSI limitation of one session per initiator and target. This means a single LUN (or Volume Group in Nutanix speak) can be serviced by the multiple virtual targets which are spread across all Nutanix CVMs. This ensures multiple network threads are used which also mitigates against network threads being a bottleneck.

By default 32 virtual targets are used to ensure optimal performance for even the largest and most I/O intensive workloads.

This process is also transparent to the administrator and application to avoid any complexity in implementation and ongoing support.

The following diagram shows how the data services IP sits in front of the virtual targets (which are on each CVM) and the vDisks are spread across all controllers for maximum performance.

ABSvirtualtargets

At .NEXT 2015 Nutanix announced support to scale storage seperate to compute using “Storage Only” nodes and this capability is fully compatible with ABS. This ensures capacity and performance can be scaled separately to compute for maximum flexibility.

ABSnoiSCSIMPIO

Resiliency:

If a vDisks active CVM goes offline due to failure or planned maintenance, any active sessions against that CVM are disconnected, which triggers a re-logon from the iSCSI client. The re-logon occurs through the external data services IP, which redirects the session to a healthy CVM.

This means things like One-Click rolling AOS upgrades can still be performed as they are with native Nutanix environments.

ABSCVMfailure

Functionality:

ABS supports SCSI-3 persistent reservations for shared storage-based Windows clusters, which are commonly used with Microsoft SQL Server and clustered file servers.

As of Acropolis OS (AOS) 4.7, ABS will be supported with physical servers or virtual machines. Support for connecting ESXi via iSCSI is expected to follow in a future release.

ABS supports several use cases, including:

  • iSCSI for Microsoft Exchange Server.
  • Shared storage for Linux-based clusters
  • Windows Server Failover Clustering (WSFC).
  • SCSI-3 persistent reservations for shared storage-based Windows clusters
  • Shared storage for Oracle RAC environments.
  • Bare-metal environments.

ABSoverview

ABS enables server hardware separate from the Nutanix environment to consume the Acropolis DSF resources, so you can leverage existing server hardware investments against Nutanix storage resources. Workloads not targeted for virtualization can also use the DSF.

Supported Client OS & Qualified Applications

  • RHEL 6+
  • Windows 2008 R2 & Windows 2012 R2
  • Oracle RAC
  • Microsoft SQL Server
  • Microsoft Exchange Server

Summary:

Whether you have applications that require shared storage access or environments with separate storage and compute needs, Acropolis Block Services (ABS) simplifies deployment and highlights the dynamic scale out, extreme performance, and high availability of the Nutanix platform. ABS automatically load balances iSCSI clients to take advantage of all resources in the cluster, and failure events are managed seamlessly. The same upgrade, snapshot, and asynchronous replication workflows that customers leverage today work consistently whether you are using VMs or VGs. By enabling VM, file, and block services, Nutanix offers a single platform to consolidate workloads and ease administration, thus reducing risk and enabling organizations to simplify their infrastructure.

Related .NEXT 2016 Posts

Think HCI is not an ideal way to run your mission-critical x86 workloads? Think again! – Part 2

Now continuing from Part 1, lets look at another one of VCE COO Todd Pavone’s statements from the COO: VCE converged infrastructure not affected by Dell-EMC article:

We believe that there was a major gap in the core data center for hyper-converged, where customers wanted hyper-converged architecture — they don’t want to invest in tier-one storage or tier-one servers. They want the intelligence in the software, but they also want massive scale. This is for globals, large service providers in a massive scale, like thousands of nodes. We have a large financial service company in New York that is using us for a platform-free application build-up. And they want to pilot it with 10,000 users, but it’s going to go to 10 million users. And so, can we give them an infrastructure for 10,000, but can scale simply and easily to 10 million — or 20 million?

You can’t do that on an appliance, right? But they want hyper-converged. When you get to 10 million users, you want an infrastructure that scales and is nonlinear, leading to a lower cost model. So, we said, “There’s a gap in that market,” and we created the rack.

Let’s again address these points:

  • Todd: “They don’t want to invest in tier-one storage or tier-one servers. They want the intelligence in the software, but they also want massive scale.”

If customers don’t want to invest in what I would call “traditional” tier one storage and servers, them I’d have to agree with them they need a very different solution, such as Nutanix if they want to get to massive scale, especially if they want easy management & deployment.

Nutanix has customers ranging from 3 to thousands of nodes, in fact many of our large customers run Acropolis Hypervisor. So any question about scalability for Nutanix is just laughable.

  • Todd: “And they want to pilot it with 10,000 users, but it’s going to go to 10 million users. And so, can we give them an infrastructure for 10,000, but can scale simply and easily to 10 million — or 20 million? You can’t do that on an appliance, right?”

Well, you can with Nutanix! In fact that sounds like a common use case for Nutanix, we frequently design and pilot repeatable models and then scale as required.

  • Todd: “But they want hyper-converged. When you get to 10 million users, you want an infrastructure that scales and is nonlinear, leading to a lower cost model. So, we said, “There’s a gap in that market,” and we created the rack.”

It’s no surprise to me at all that customers want Hyperconverged and the ability to scale both linearly and non linearly. Nutanix can do this today and has been able to do it for a long time. Back in 2013 for example, you could mix NX3000 series being Compute heavy / Storage Light with NX6000 nodes which are Compute light and Storage Heavy. This is an example of non linear scaling which achieves the reduced cost (e.g.: Cost/GB) over time.

Then in 2014 an even wider range of nodes were released (NX1000, NX3000, NX6000 & NX8000) which enhanced Nutanix ability to scale both up and out, linearly and non linearly.

In 2015 Nutanix launched the NX-6035C “Storage Only” node which allows customers to Scale Storage separately to Compute, ensuring non linear scaling compute vs storage for customers with high capacity requirements. Importantly, no hypervisor licensing is required to scale storage as storage only nodes run Acropolis Hypervisor (AHV) which is fully interoperable with ESXi and Hyper-V environments.

Remember the Rule of thumb: Don’t scale capacity without scaling storage controllers!

Nutanix Storage Only nodes run a light weight Controller VM (CVM) to ensure Management, Monitoring and Data services (e.g.: Disk Balancing, Compression, Dedupe, Erasure Coding etc) do not degrade even when scaling compute and storage in a vastly non linear manner. Storage only nodes also help improve performance by participating in cluster replication (RF2/RF3) and disk balancing activities.

  • Todd: “So, we said, “There’s a gap in that market,” and we created the rack.”

There may have been a gap back in early 2013, but since then Nutanix has continued to innovate and lead the market with solutions to scale both linearly and non linearly, I’d say the gap has long been filled. Nutanix also scales management with a single HTML 5 GUI called PRISM, with central management of multiple clusters/sites/geographical locations via PRISM central.

Summary:

I’m sure it’s pretty obvious by now VCE COO Todd Pavone and I have different opinions on what HCI is capable of. During my time at Nutanix I have seen countless successful small, medium and large scale mission-critical application deployments and the percentage of Nutanix business from these workloads continues to increase thanks to our investment in a dedicated vBCA team which I am fortunate to be a part of.

Next time you’re considering new infrastructure for mission critical application, reach out and I’ll happily work with you and see if Nutanix is a good fit for your use case.

Let me finish by saying, I can guarantee you that if in the unlikely event the workload/s are not suitable for Nutanix, I will be the first one to tell you, and help you find an alternate solution.

Back to Part 1.