Strategies for Monitoring Failover Clusters (Part 4)

by [Published on 6 Dec. 2011 / Last Updated on 6 Dec. 2011]

This article demonstrates a method for displaying a cluster's health through SCOM. We will also explore some additional troubleshooting techniques for cluster monitoring.

If you would like to read the other parts in this article series please go to:

Introdution

As you saw in the previous article it can sometimes be a little bit tricky to get the System Center Operations Manager (SCOM) agents deployed on your clustered servers. Hopefully you found my troubleshooting hints to be helpful and the required agents are now up and running. With that said, I want to turn my attention to actually monitoring the cluster.

The Monitoring Tab

All of the monitoring related tasks are performed through System Center Operations Manager’s Monitoring tab. When you first select this tab, SCOM will display a Monitoring Overview screen similar to the one that is shown in Figure A. As you can see in the figure, the Monitoring Overview is designed to allow you to gauge your network’s health at a glance. Here for instance, you can see that there are three healthy computers on my network and that there are no critical errors or warnings at the moment.


Figure A: The Monitoring Overview lets you assess your network’s health at a glance.

Distributed Application Health

Just beneath the Computer Health section is a section related to distributed applications. In this case, SCOM is reporting 1 healthy distributed application and another distributed application that is in an unknown state. In this particular case, the distributed application that is in an unknown state is the Operations Manager Internal Connector. It’s state is unknown because I am not monitoring it. If you need to check the health of distributed applications on your own network, you can do so by selecting the Distributed Applications option from the console tree.

Microsoft Windows Cluster

As you may recall, in Part 2 of this series I showed you how to download and install some cluster related management packs. These management packs are enough to get you started, but depending on what type of cluster you need to manage they may be insufficient. Let me show you what I mean.

If you look at Figure B, you can see that I have expanded the Microsoft Windows Cluster container. When I select the Cluster Service State container located beneath it, SCOM recognizes my cluster nodes and reports that both nodes are healthy. Although this is a good start, the Microsoft Windows Cluster container and its sub containers don’t really provide much granular monitoring information for the type of cluster that I am monitoring. That being the case, I want to show you how to add an additional management pack.


Figure B: SCOM displays a minimal amount of cluster data.

Adding a Management Pack

To add a supplementary management pack, go to the Administration tab, right click on the Management Packs container, and choose the Download Management Packs option from the shortcut menu. When the Select Management Packs dialog box appears, click on the Add button, followed by the Search button. The console should now display all of the available management packs. The management packs are grouped by Microsoft product. For example, if you look at Figure C, you can see that I have expanded the Windows Server container and that there is a collection of management packs related to failover clustering for Windows Server 2008.


Figure C: You may have to download management packs related to your specific cluster type.

Select the management packs that you want to install and then click the Add button, followed by the OK button. Click the Download button and SCOM will download all of the requested management packs. Management packs are tiny, so the downloads should complete in the blink of an eye.

Agent Proxying

Another issue that may sometimes stand in your way of monitoring failover clusters is that agent proxying is disabled by default. With some forms of clustering a health service may need to discover an object on behalf of another computer. When this problem occurs, SCOM is able to monitor the cluster nodes as individual servers, but the agents on the nodes themselves are not able to retrieve information about the cluster as a whole. For example, in my case I have two cluster nodes known as ExchNode1 and ExchNode2. If you look at Figure B, you will notice that the cluster also has a name. In this case it’s ProdCluster. Right now SCOM is able to determine the health of ExchNode1 and ExchNode2, but not the health of ProdCluster.

To determine whether or not agent proxying is required for your cluster, go to the Active Alerts container, shown in Figure D. If agent proxying is required then you should see a notification like the one that is shown in the figure.


Figure D: You may have to enable agent proxying in order to monitor your cluster.

To enable agent monitoring, select the Administration tab and then select the Agent Managed container (which is located in the Device Management section). When you do, you should see a list of the computers that are being managed with an agent. Right click on the listing for a cluster node and then select the Properties command from the shortcut menu. When the agent’s properties sheet appears, select the Security tab and then select the Allow This Agent to Act as a Proxy check box, as shown in Figure E and click OK. Now do the same thing for your other cluster nodes.


Figure E: Select the Allow This Agent to Act as a Proxy check box.

When Nothing Shows Up

Sometimes even after you enable proxying no cluster data shows up in the Monitoring tab. This is normal. SCOM takes a while to compile monitoring data and the information about your cluster should show up eventually. Sometimes you can speed things up by selecting the Cluster Service State container, selecting your cluster nodes, and then clicking the Discover Windows Server 2008 Cluster Components link, found in the Actions pane.

Checking Cluster Health

SCOM is designed to provide granular cluster health information, and in the next article I am going to take the time to go through the various monitoring containers. For right now though I want to show you a quick way to check the basic health of your cluster.

If you click on the Clusters container, you will see an icon labeled Windows Clusters. Assuming that SCOM has collected sufficient information about your cluster there should be a plus icon just beneath the cluster icon. If you click this plus then the diagram will expand to reveal the individual clusters on your network. You can then expand an individual cluster to view the components that make up the cluster. As you can see in Figure F, each object on the diagram includes a health status indicator (in this case, a green check mark).


Figure F: You can use a diagram to quickly assess your cluster’s health.

Conclusion

In Part 5 of this series, I want to show you what types of cluster health information SCOM provides. I will also delve into the subject of application clusters. Eventually I also plan to discuss alerting as it pertains to cluster health.

If you would like to read the other parts in this article series please go to:

Featured Links