Press "Enter" to skip to content

Developing custom Health Rules for SharePoint 2010

What are health rules?

SharePoint 2010 has a new feature called the Health Analyzer and it has Health Rules. Each rule checks a particular metric within the SharePoint farm, and Central Administration provides an interface to see problems, warnings, and rule definitions. Administrators can subscribe to rules so they get alerts when the rule fails. Administrators cannot create custom rules, but developers can.

This is what the interface looks like:

Why should architects and developers consider them?

Administrators might ask you to create rules to monitor specific things on a server.

Further to that, developers can create rules to monitor their own solutions. Certain aspects of a solution might be problematic and require attention after time – for example a growing list. A custom rule can monitor this list, and notify certain people if a threshold is being passed.

High-Level steps to create a custom Health Rule:

  1. Create a class that inherits from Microsoft.SharePoint.Administration.Health.SPHealthAnalysisRule
  2. Override the required methods in SPHealthAnalysisRule. Check() is where the action happens, but also pay attention to SPHealthAnalysisRuleAutomaticExecutionParameters
  3. Create a feature to register the rule definition with SharePoint
  4. (Optional) use a Console Application to test your rule code while developing it. This is much more convenient when you need to debug with Visual Studio 2010.

Example Solution:

I have put together a template you could reuse to build custom health rules. My example pings each database server from each WFE and fails if the ping is greater than 1ms. Download the code from the bottom of the post.

Here is my schedule and scope configuration (see notes below for more details):

  1. public override SPHealthAnalysisRuleAutomaticExecutionParameters AutomaticExecutionParameters
  2. {
  3.     get
  4.     {
  5.         SPHealthAnalysisRuleAutomaticExecutionParameters execParams =
  6.             new SPHealthAnalysisRuleAutomaticExecutionParameters();
  7.         execParams.Schedule = SPHealthCheckSchedule.Hourly;
  8.         execParams.RepairAutomatically = false;
  9.         execParams.Scope = SPHealthCheckScope.All;
  10.         execParams.ServiceType = typeof(SPWebService);
  11.         return execParams;
  12.     }
  13. }

I use the following code to get all Database servers:

  1. //Get the Database Service
  2. SPDatabaseService spdbservice =
  3.     SPFarm.Local.Services.GetValue<SPDatabaseService>();
  4. //Get all instances of the Database Service
  5. SPServiceInstanceDependencyCollection dbServices = spdbservice.Instances;
  6. //fail the test if no DB servers are found (network is dead)
  7. if (dbServices.Count == 0) return SPHealthCheckStatus.Failed;
  8. //The following will enumerate all instances so we can get the servers hostname
  9. foreach (SPDatabaseServiceInstance instance in dbServices)
  10. {
  11.     SPServer dbServer = instance.Server; //each of these is a DB server
  12.     if (!PingServer(dbServer.Name)) //ping it
  13.     {
  14.         failedPings++;
  15.     }
  16. }

And here is my ping method:

  1. private static bool PingServer(string serverHostName)
  2. {
  3.     bool returnValue = false;
  4.     Ping dbPing = new Ping();
  5.     PingReply reply = dbPing.Send(serverHostName);
  6.     if (reply.Status == IPStatus.Success)
  7.     {
  8.         long latency = reply.RoundtripTime;
  9.         if (latency <= 1) { returnValue = true; }
  10.     }
  11.     return returnValue;
  12. }

Here is the Feature Receiver code to register the rule:

  1. public override void FeatureActivated(SPFeatureReceiverProperties properties)
  2. {
  3.     Assembly currectAssembly = Assembly.GetExecutingAssembly();
  4.     try
  5.     {
  6.         SPHealthAnalyzer.RegisterRules(currectAssembly);
  7.         //SPHealthAnalyzer.UnregisterRules(currectAssembly);
  8.     }
  9.     catch (Exception ex)
  10.     {
  11.         throw new Exception(“Registering Health Rules from “
  12.             + currectAssembly.FullName + ” failed. “ + ex.Message);
  13.     }
  14. }

The commented out line lets you unregister the rule. You should do this in the Deactivating method.

Important Points in configuring the scope and schedule of you rule:

  • The ServiceType property – this lets you specify a particular SharePoint service that is required on servers that are to run this rule. For example, you can use this property to specify that your rule should only run on machines running Excel Services.
  • The Scope property – this defines whether the rule will run on ALL servers, or ANY servers. If set to ANY, it will run on the first server that is running the SharePoint Service specified in the ServiceType property. If set to ALL, it will run on all servers running that service.
  • In my example I am pinging the DB servers from ALL servers running the SharePoint Web Service (web front ends). Any application server not running this service will not fire this rule. You might want to change this to SPTimerService or whatever suits your needs.

Other things of interest:

  • Central Administration has a “HealthRules” list, which has it’s own List Template type (SPHealthRulesList).
  • The SPHealthAnalyzer object maintains this feature.
  • The AddItem method of the SPHealthRulesList does two things: Creates an Item in the List, then registers an SPHealthAnalyzerJobDefinition Timer Job. There is one of these for each schedule, service and scope. This can be seen in the Timer Job definitions page:

  • The implementation of this can be seen in the AddItem method with Reflector”

Further links:

Download source code: Community.SharePoint.zip (you must click the download button on the Live page…)

Hope this helps !