Considerations for Multi Site Clusters in Windows Server 2012 (Part 4)

by [Published on 20 Aug. 2013 / Last Updated on 20 Aug. 2013]

This article concludes the discussion of multi-site clusters by walking you through some clean up tasks that you can perform to make the cluster operate a bit more cleanly.

If you would like to read the other parts in this article series please go to:

Introduction

So far in this series, I have walked you through the process of building a geographically dispersed cluster using Windows Server 2012. Although the cluster is complete and should be functional at this point, there are a number of clean up tasks that you may wish to perform on your new cluster. In this article, I will conclude the series by discussing some of these clean up tasks.

Getting Rid of a Warning

One clean up task that I highly recommend performing involves getting rid of a warning message that appears within the Failover Cluster Manager. The warning message doesn’t actually hurt anything, but I personally find it to be a bit distracting, and it may also prove to be confusing to other administrators who might not be quite as familiar with the cluster’s configuration.

As you may recall from the previous articles in this series, we constructed a Windows Server 2012 based geo cluster in which no shared storage was used. Rather than attaching the cluster nodes to a cluster shared volume, as might be the case for an on premise cluster, each cluster node uses direct attached storage. The storage contents are replicated to each cluster node so that every node in the cluster has the same view of the underlying storage.

Because of this configuration, our cluster must be provisioned with multiple cluster disks – one cluster disk for each storage node. The problem with this configuration is that the Failover Cluster Manager will only recognize a single cluster disk as being online. The cluster disk that is associated with the active cluster node will be listed as being online. All of the other cluster disks will be displayed with an offline status. Furthermore, a message appears in the console’s Information column indicating that “cluster storage is not connected to the node”.

Again, this isn’t actually a problem. The condition exists by design. Even so, I have always thought that it is best to get rid of any false or misleading error messages whenever possible because they can sometimes distract from real conditions that might exist. Thankfully, there is an easy way to clear this particular message.

To get rid of the message, right click on it and select the Properties command from the resulting shortcut menu. When you do, the properties sheet for the selected cluster disk will appear. At this point, you should go to the Dependencies tab. Now, you must go to the tab’s Resource column and select the IP address for the cluster node that is associated with the cluster disk that you have selected. This is basically a way of telling Windows that there is a relationship between the selected cluster node and the selected cluster disk. Click OK to save your changes.

You will have to repeat this procedure for each cluster disk that is used by the cluster. After doing so, the cluster will understand when each cluster disk should and should not be online. You won’t see the Failover Cluster Manager acknowledge the changes immediately, but when the next failover occurs (even if it is a manual failover), the warnings about cluster storage not being connected to the node will go away.

In case you are wondering why the warning message continues to be displayed until after the next failover, it is because the Failover Cluster Manager is really just displaying the most recent status message. The only way to clear the message is to cause a status change, and the easiest way to do that is to initiate a fail over.

Securing Cluster Communications

One task that I highly recommend performing is that of securing the communications between cluster nodes. Believe it or not, intra cluster communications are not encrypted or authenticated by default. This might not be a problem if cluster traffic is flowing across a dedicated network backbone segment, but a multi-site cluster such as the one that we have been constructing throughout this article series could potentially pass cluster traffic through a public network. As such, it is a good idea to raise the security level of the cluster.

You can easily accomplish this through Windows PowerShell. The command used for raising the cluster’s security level is:

Get-Cluster | ForEach-Object {$_.SecurityLevel = 2}

Cluster Heartbeat Considerations

Another consideration that you should take into account for your multi-site cluster is the network latency between the two sites. The reason why this is important is because Windows monitors the cluster’s health through the use of heart beats. For all practical purposes, a heartbeat is a signaling beacon that cluster nodes send to one another as a way of proving that the various nodes are alive and responsive.

By default the cluster heart beat mechanism can work properly on networks with a round trip latency of up to 500 milliseconds. Furthermore, Windows fully expects that there may be situations in which higher latency times are momentarily experienced. As such, Windows won’t assume that a cluster node is dead simply because a heartbeat has been missed. In fact, Windows won’t initiate the failover process unless three heartbeats in a row go missing. This means that a network could theoretically experience a round trip latency of just under 1500 milliseconds without a failover being triggered.

In most situations the default heart beat frequency will be perfectly adequate. However, those who are attempting to operate a multi-site cluster across an extremely high latency network may find that they need to adjust the latency frequency in order to keep failovers from occurring as a result of slow WAN performance.

Before I show you how to adjust the heart beat frequency, I need to explain that you should only adjust the heart beat frequency if it is absolutely necessary. The reason for this is that if you allow for higher latency then you also extend the amount of time that must elapse after a failure occurs before a failover is initiated. It is important to make sure that the value that you use accommodates your network latency, but without causing excessive delays in failover initiation.

There are two ways that you can adjust the cluster to account for high latency networks. One option is to increase the number of heartbeats that can be missed before a failover is triggered. For instance if you wanted to allow 10 heartbeats to be missed then you could use the following command:

$cluster = Get-Cluster
$Cluster.CrossSubnetThreshold = 10;

The other option is to adjust the number of milliseconds used for heartbeats. For example, if you wanted to configure the cluster to use 1 second (1000 millisecond) heartbeats, you could use this command:

$cluster = Get-Cluster
$cluster.CrossSubnetDelay = 1000

It is worth noting that these changes take effect immediately. There is no need to restart the cluster nodes. Windows also replicates the changes throughout the cluster, so you don’t have to worry about making modifications to each individual cluster node.

If you want to check the values of the CrossSubnetDelay or the CrossSubnetThreshold variables you can do so at any time by using the following command:

Get-Cluster | FL*

Conclusion

In this article, I have explained that there are a number of special considerations that must be taken into account when building a multi-site cluster. As a general rule, it is best to avoid the use of shared storage in multi-site clusters, especially when the sites are separated by a slow WAN link.

If you would like to read the other parts in this article series please go to:

Featured Links