JAMA00

SCOM 2007 R2

Archive for June, 2012

Configure Custom Service Monitoring, an alternative method.

Posted by rob1974 on June 12, 2012

We are running SCOM in a “service provider” solution. When you do this you want to have standardized environments and tune everything in a generic manner. However, our main problem is every customer is different and has different requirements. Simply changes like disk threshold or setting up service monitoring for specific services on specific computers can be quite time consuming. Also, it can have a big impact on performance of the environment (a simple override to one Windows 2008 logical disk gets distributed to all Windows 2008 servers!).

Of course, we could have authors and advanced operators for to do this job, but there’s no way of auditing on changes or forcing advanced operators to use certain override mp’s only. We’re convinced this will lead to performance issues and dependency problems because overrides will be saved everywhere. So we believe this can only be done by a small amount of people, who really understand what they are doing.

To counter this we use an alternative way for setting overrides for common configurations and overrides. We create a script that reads a registry key for a threshold value to use or as I will show in this blog a configuration value. An example of a threshold value: we’ve created a disk free space script that uses a default value (both % as an MB value), but when a override is present in the registry it will use that value instead. The registry override in turn can be set with a SCOM task with just normal operator rights (you can restrict task access for operators in their SCOM profile so not every operator can do this). The change is instant and without configuration impact on the SCOM servers.

Now to the service monitoring example of this principle. What we’ve done is create a class discovery that checks for a certain registry key being present. If it is present it will target 1 monitor and 1 rule to that class. The monitor runs a script every 5 minute and checks the registry key for service names. Every service name is this registry key will be checked for a running state. If one or more services aren’t in a running state the monitor will become critical and an alert will be generated. When all services are in a running state again, the monitor will reset to healthy and close the alert.

By running a task from the windows computer view you can setup the monitoring for the first time:

image

Overriding the task with the service name (not the displayname) will add the service to the registry.

image 

When the discovery has run the found instance will be visible in the jama Custom Service Monitoring view and more tasks will be available. When it really is the first service it might take up to 24 hours before the instance is found as we’ve set the discovery to a daily interval. But you can always restart the system center management service to speed up the discovery.

The new tasks are:

– Set monitored service. Basically the same tasks as the one available in the computer view, just the target is different. It can add additional services to monitor without any impact on the SCOM backend and this service will be monitored instantly as the data source will check the registry each run.

– List monitored service. Reads the registry and lists all values.

– Remove monitored service. Removes a service from the registry key and delete the key if the service was the last value. When the key is deleted the class discovery removes the instance next discovery run. Overriding the key with “removeall” will also delete the key.

– The “list all services on the computer” task doesn’t have real value for this management pack, just added for checking a service name from the SCOM console.

See below for some screenshots of the tasks and health explorer.

image

Task output of “List monitored services”:

image

The health explorer has additional knowledge and show which services have failed through the state change context:

image

 image

 

 

So what’s the benefit of all this:

– The SCOM admins/authors don’t have to make a central change, except for importing the management pack.

– The support organization doesn’t have to log a change and wait for someone to implement it, but they can make the change themselves with just SCOM operator rights (or with rights to edit a registry key on a computer locally) and it works pretty much instant.

– From a performance view the first service that is added will have some impact on the discovery (config churn), but additional services don’t have any impact.

However, there’s still no auditing. We’ve allowed this task to what we call “key users” and we have 1 or 2 key users per customer. This could give an idea of who changed it when you’d set object access monitoring on the registry key.

The performance benefit for this monitor is probably minimal. However using this principle for disk thresholds gives a huge benefit as that’s a monitor that is always active on all systems and overriding values through a registry key on the local system removes all override distribution (I might be posting that mp as well).

When you want to checkout this mp, you can download the management pack jamaCustomService here. I’ve uploaded it unsealed, but I recommend it to seal it if you really want to put this to production.

Advertisements

Posted in general, management packs | Tagged: , , | Leave a Comment »