Example Architectural Decision – Host Isolation Response for FC Based storage

Problem Statement

What are the most suitable HA / host isolation settings where the environment uses Storage (IBM SVC) with FC connectivity via a dedicated highly available Storage Area Network (SAN) fabric where ESXi Management and Virtual Machine traffic run over a highly available data network?

Requirements

1. Ensure in the event of one or more hosts becoming isolated, the environment responds in an automated manner to recover VMs where possible

Assumptions

1.The Network is highly available (>99.999% availability)
2. The Storage is highly available (>99.999% availability)
3. vSphere 5.0 or later
4. ESXi hosts are connected to the network via two physical separate switches via two physical NICs

Constraints

1. FC (Block) based storage

Motivation

1. Meet/Exceed availability requirements
2. Minimize the chance of a false positive isolation event

Architectural Decision

Turn off the default isolation address by setting the below advanced setting

“das.usedefaultisolationaddress” = False

Configure three (3) isolation addresses by setting the below advanced settings

“das.isolationaddress1″ = 192.168.1.1 (Core Router)

“das.isolationaddress2″ = 192.168.1.2 (Core Switch 1 )

“das.isolationaddress3″ = 192.168.1.3 (Core Switch 2 )

Configure Datastore Heartbeating with “Select any of the clusters datastores”

Configure Host Isolation Response to: “Shutdown”

Justification

1. When using FC storage, it is possible for the Management and Virtual Machine Networks to be unavailable, while the Storage network is working perfectly. In this case Virtual machines may not be able to communicate to other servers, but can continuing reading/writing from disk. In this case, they will likely not be servicing customer workloads, as such, Shutting the VM down gracefully allows HA to restart the VM/s on host/s which are not isolated gives the VM a greater chance of being able to resume servicing workloads than remaining on an isolated host.
2. Datastore heartbeating will allow HA to confirm if the host is “isolated” or “failed”. In either case, Shutting down the VM will allow HA to recover the VM on a surviving host.
3. As all storage is presented via Active/Active IBM SVC controllers, there is no benefit is specifying specific datastores to be used for heartbeating
4. The selected isolation addresses were chosen as they are both highly available devices in the network which are essential for network communication and cover the core routing and switching components in the network.
5. In an environment where the Network is highly available an isolation event is extremely unlikely  as such, where the three (3) isolation addresses cannot be contacted, it is unlikely the network can be restored in a timely manner OR the host has suffered multiple concurrent failures (eg: Multiple Network Cards) and performing a controlled shutdown helps ensure when the network is recovered, the VMs are brought back up in a consistent state, OR in the event the isolation impacts only a subset of ESXi hosts in the cluster, the VM/s can be recovered by HA and resume normal operations.

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address

Implications

1. In the event the host cannot reach any of the isolation addresses, virtual machines will be Shutdown
2.  Using “Shutdown” as opposed to “Power off” ensures a graceful shutdown of the guest operating system, however this will delay the HA restart of the VM for up to 5 mins (300 seconds) if VMware Tools is unable to do a controlled shutdown, in which case after 300 seconds a “Power Off” will be executed.
3. In the unlikely event of network instability, VMs may be Shutdown prematurely.

CloudXClogo

 

 

VMware HA and IP Storage *Updated*

With IP storage (particularly NFS in my experience) becoming more popular over recent years, I have been designing more and more VMware solutions with IP Storage, both iSCSI and NFS.

The purpose of this post is not to debate the pros and cons of IP storage, or Block vs File, or even vendor vs vendor but to explore how to ensure a VMware environments (vSphere 4 and 5) using IP storage can be made as resilient as possible purely from a VMware HA perspective. (I will be writing another post on highly available vNetworking for IP Storage)

So what are some considerations when using IP storage and VMware HA?

In many solutions I’ve seen (and designed), the ESXi Management VMKernel is on “vSwitch0” and uses two (2) x 1GB NICs while the IP storage (and Data network) is on a dvSwitch/es and uses two or more 10Gb NICs which connect to different physical switches than the ESXi Management 1GB NICs.

So does this matter? Well, while it is a good idea, there are some things we need to consider.

What happens if the 1GB network is offline for whatever reason, but the 10GB network is still operational?

Do we want this event to trigger a HA isolation event? In my opinion, not always.

So lets investigate further.

1. Host Isolation Response.

Host Isolation response is important to any cluster, but for IP storage it is especially critical.

How does Host Isolation Response work? Well, in vSphere 5, it requires 3 conditions to be met

1. The host fails to receive heartbeats from the HA master

2. The host does not receive any HA election traffic

3. Failing conditions 1&2 , the host attempts to ping the “isolation address/es” and is unsuccessful.

4. The isolation response is triggered

So in the scenario I have provided, the goal is to ensure that if a host becomes isolated from the HA Primary nodes (or HA Master in vSphere 5)  via the 1GB Network that the host does not unnecessarily trigger the “host isolation response”.

Now why would you want to stop HA restarting the VM on another host? Don’t we want the VMs to be restarted in the event of a failure?

Yes & No. In this scenario its possible the ESXi host still has access to the IP Storage network, and the VM the data network/s via the 10Gb Network. The 1Gb Network may have suffered a failure, which may effect management, but it may be desirable to leave the VMs running to avoid outages.

If  both the 1GB and 10GB networks go down to the host, this would result in the host being isolated from the HA Primary nodes (or HA Master in vSphere 5), the host would not receive HA election traffic and the host would suffer an “APD” (All Paths Down) condition. HA isolation response will then rightly be triggered and VMs will be “Powered Off”. This is desirable as the VMs could then be restarted on the surviving hosts assuming the failure is not network wide.

Here is a screen grab (vSphere 5) of the “Host Isolation response” setting, which is located when you right click your cluster “Edit Settings”, “vSphere HA” and “Virtual Machine Options”.

The host isolation response setting for environments with IP Storage should always be configured to “Power Off” (and not Shutdown). Duncan Epping explained this well in his blog, so no need to cover this off again.

But wait, there’s more! 😉

How do I avoid false positives which may cause outages for my VMs?

If using vSphere 5, we can use Datastore Heartbeating (which I will discuss later), but in vSphere 4 some more thought needs to go into the design.

So lets recap step three in the isolation detection process we discussed earlier

“3. Failing conditions 1&2 , the host attempts to ping the “isolation address/es”

What is the “isolation address”? By default, its the ESXi Management VMKernel default gateway.

Is this the best address to check for isolation? In a environment without IP storage, normally in my experience it is suitable, although it is best to discuss this with your Network architect as the device you ping needs to be highly available. Note: It also needs to respond to ICMP!

When using IP storage, I recommend overriding the default by configuring the advanced setting “das.usedefaultisolationaddress” value to “false”. Then configure the “das.isolationaddress1” through “das.isolationaddress9” with the IP address/es of your IP storage (in this example, Netapp vFilers), the ESXi host will now ping your IP storage assuming the HA Primaries (or “Master” in vSphere 5) is unavailable and no election traffic is being received) to check if it is isolated or not.

If the host/s complete the isolation detection process and are unable to ping any of the isolation addresses (IP Storage), (and therefore the ESXi host will not be able to access the storage) it will declare itself isolated and trigger a HA isolation response. (Which should always be “Power Off” as we discussed earlier)

The below screen shot shows the Advanced options and the settings chosen.

In this case, the IP Storage (Netapp vFilers) are connected to the same physical 10Gb Switches and the ESXi hosts (one “hop”) so they are a perfect way to test network connectivity of the network and access to the storage.

In the event the IP Storage (Netapp vFilers) are inaccessible, this alone would not trigger HA isolation response as the connectivity to the HA Primary nodes (or HA Master in vSphere 5) may still be functional. If the Storage is in fact inaccessible for >125secs (if using default settings – NFS “HeartbeatFrequency” of 12 seconds & “HeartbeatMaxFailures” of 10) the datastore/s will be marked as Unavailable and a “APD” event may occur. See VMware KB 2004684 for details on APD events.

Below is a screen grab of a vSphere 5 host showing the advanced NFS settings discussed above.

Note: With Netapp Storage it is recommended to configure the VMs with a disk timeout of 190 seconds, to allow for intermittent network issues and/or total controller loss (which takes place in <180 seconds, usually much less), and therefore the VMs can continue running and no outage is caused.

My advice would be modifying the “das.usedefaultisolationaddress” and “das.isolationadressX” is an excellent way in vSphere 4 (and 5) of ensuring your host is isolated or not by checking the IP storage is available, after all, the storage is critical to the ESXi hosts functionality! 😀

If for any reason the IP Storage is not responding, assuming the HA isolation detection process step 1 & 2 have completed, an isolation event is triggered and HA will take swift action (Powering Off the VM) to ensure the VM can be restarted on another host (assuming the issue is not network wide).

Note: Powering Off the VM in the event of Isolation helps prevent a split brain scenario where the VM is live on two hosts at the same time.

While datastore heart-beating is an excellent feature, it is only used by the HA Master to verify if a host is “isolated” or “failed”, the “das.isolationaddressX” setting is a very good way to ensure your ESXi host can check if the IP storage is accessible or not, and in my experience (and testing) works well.

Now, this brings me onto the new feature in vSphere 5…..

2. Datastore Heart beating.

It provides that extra layer of protection from HA isolation “false positives”, but adds little value for IP Storage unless the Management and IP Storage run over different physical NICs (in the scenario we are discussing they do).

Note: If the “Network Heartbeat” is not received, and the “Datastore Heartbeat” is not received by the HA Master, the host is considered “Failed” and the VMs will be restarted. But, If the “Network Heartbeat” is not received & “Datastore Heartbeat” is received by the HA Master, The host is “Isolated” and HA will trigger the “Host isolation response”.

The benefit here, in the scenario I have described, the “das.usedefaultisolationaddress” setting is “false” preventing HA trying to ping the VMK default gateway & “das.isolationaddress1” & “das.isolationaddress2” have been configured so HA will ping the IP Storage (vFilers) to check for isolation.

Datastore heartbeats, was configured to “Select any of the cluster datastores taking into account my preferences”. This allows a VMware administrator to specify a number of datastores , and these should be datastore critical to the operation of the cluster (Yes, I know, almost every data store will be important).

In this case, being a Netapp environment, the best practice is to separate OS / Page-file / Data / vSwap etc.

Therefore I decided to select the Windows OS & the Swap File datastores, as without these, all the VMs would not function, so they are the logical choice.

The below screen grab shows where Datastore heart-beating is configured, under the Cluster settings.

So what has this achieved?

We have the ESXi host pinging the isolation addresses (Netapp Filers), and we have the HA Master checking Datastore Heartbeating to accurately identify if the host is failed , isolated or partitioned. In the event HA Master does not receive Network heartbeats or Datastore heartbeats, then it is extremely likely there has been a total failure of the network (at least for this host) and the storage is no longer accessible, which obviously means the VMs cannot run, and therefore the host will be considered “Failed” by the master. The host will then trigger the configured “host isolation response” which for IP storage is “Power off”.

QUOTE: Duncan Epping – Datastore Heartbeating “To summarize, the datastore heartbeat mechanism has been introduced to allow the master to identify the state of hosts and is not use by the “isolated host” to prevent isolation.”

I couldn’t have said it better myself.

If the failure is not effecting the entire cluster, then the VM will power off and be recovered by VMware HA shortly there after. If the network failure effects all hosts in the cluster, then the VM will not be restarted until the network problem is resolved.