JAMA00

SCOM 2007 R2

Archive for the ‘general’ Category

Website monitoring, another gotcha!

Posted by rob1974 on February 13, 2012

Recently I had an issue with website monitoring in a SCOM demo environment. I had configured a website test (through the template) every 2 minutes. I had created a DNS zone hosting the FQDN for this website. Then I paused the DNS zone and waited for the HTTP test to fail. I expected the HTTP test to fail and have an alert in SCOM within 2 minutes. However, after 10 minutes I had nothing… Not really what you want when you do a live demo.

So what was happening here? Some of you might already got it, it’s DNS caching of the client running the HTTP test. So how to stop this? Well there are 3 things you can do.

 

1. Default the DNS Client service will be started on a windows machine. Simply stopping the DNS Client service and the caching will stop (dns queries will still resolve).

2. Increase the frequency of the HTTP test. Anything more than 15 minutes will do…

3. Decrease the default cache time for the queries to something less than your test frequency.

 

As I was giving a demo, option 2 was not an option for me. But I would seriously thing about this when I do website tests in production. Obviously service level agreements should play a part in this, but a delay of max 30 minutes on a SLA of 8 hours would definitely be acceptable for me.

Option 1 was no option either. Beside caching the dns client service also registers domain joined hosts in dns, so not something I would recommend either. Besides caching helps performance wise, not sure if I ever wanted to disable this.

So left with option 3, but how to do this? In HKLM\SYSTEM\CurrentControlSet\services\Dnscache\parameters create the DWord MaxCacheTtl (and MaxNegativeCacheTtl if it’s not already on 0 with “NegativeCacheTime”) and give it a value of below the frequency of the website test. For a 2 minutes test I used a value of 90 (seconds).

dnssettings

Normally I think option 2 will be the best to go for. No use to run tests more than you would need to. However if you have a dedicated host for running website tests and you run those tests more often than every 15 minutes, consider reducing the max. cache time of the dns client.

Posted in general, troubleshooting | Tagged: , , | Leave a Comment »

SCOM’s “un-discovery”. What doesn’t work here… And how to correct it.

Posted by rob1974 on January 26, 2011

 

SCOM’s main benefit of monitoring imho is it’s ability to discover what is running on a server and based on that information start to monitor the server with the appropriate rules. When you follow Microsoft’s best practices you’ll first perform a lightweight discovery to create a very basic class and have the more heavy discoveries run against that basic class. This is pretty good stuff actually. it helps quite a lot for the performance of an agent as it will only run heavy discoveries if the server has an application role and never run on servers which have nothing to do with that application.

However, I’ve recently found out a drawback with this 2 step discovery, which I can probably explain the best with a real world example:

Discover the windows domain controllers on “windows computers” (the management pack from where this discovery runs in is an exception. usually it’s in the application mp itself; apparently MS thought of domain controllers being basic info. similar discoveries for workstation and member servers can be found in this mp as well). For this discovery a wmi query is used to determine if the “windows computer” is a domain controller as well (SELECT NumberOfProcessors FROM Win32_ComputerSystem WHERE DomainRole > 3; if this returns something, it’s a dc)

image

When it is a “windows domain controller” it will run a few other discoveries to determine more info.image

Just by looking at the classes you can imagine it’s not really lightweight anymore.

image

So far so good, on all my windows computer I run a simple query and if that query returns something SCOM will also run a script that founds more interesting stuff about the DC.

But here’s the catch with this kind of discovery. Suppose I don’t need a certain DC anymore, but I still need to keep the server as it’s running some application I still need to use and monitor. What will happen? The lightweight discovery will do its job. It will correctly determine that the server is not a “windows domain controller” anymore and as a result it won’t run the script-discovery anymore.

You might ask, why is that bad, we didn’t want that, did we? Yes you are correct, we didn’t want to run this discovery against servers that aren’t DC’s, but SCOM doesn’t unlearn the discovered classes automatically. Because this discovery never runs again SCOM never unlearns this server doesn’t have the “Active Directory Domain Controller Computer Role” anymore. And this is the class that is used for targetting rules and monitors. So allthough SCOM knows the server isn’t a “windows domain controller” anymore, it still is monitoring the “Active Directory Domain Controller Computer Role”. This will result in quite a lot of noise (script errors, ldap failures, etc).

For now, there’s just a workaround available. You will need to override the 2nd discovery for that particular server. As the first discovery doesn’t include this server as an object of class, you can’t override the discovery for a “specific object of class: Windows Domain Controller”. You’ll need to create a group and include the server object. Then use the override the object discovery “for a group…” and choose the group you’ve just created.

image

What’s the point of disabling a discovery that didn’t run anyway? Well now you can go to powershell and run the “Remove-DisabledMonitoringObject” cmdlet. This will remove the discovered object classes for this discovery and all of the monitoring attached to those classes.

Discoveries make SCOM stand out from other monitoring tools, but it needs to work both ways. Finding out this took me about 1 day. And that’s just 1 issue with 1 server (DNS was also installed on this server and had the same issue). Loads of servers might change role without me knowing about it and when it’s not being reported to me I’ll just have extra noise in SCOM. I’m just not sure if this can be picked up within SCOM itself or that the “un-discovery” needs to be done by the mp’s logic. For the AD part it needs to be picked up by Microsoft anyway, but if the logic is build in the management pack then it will have an impact on all the custom build mp’s by all you SCOM authors out there.

Posted in general, management packs, troubleshooting | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.