JAMA00

SCOM 2007 R2

Archive for the ‘Management Servers’ Category

Microsoft Exchange Server 2013 discovery.

Posted by rob1974 on May 13, 2014

The Microsoft Exchange Server management pack has again some big changes. Loads of people already blogged about that. Personally i like the simplicity of it and the fact that Exchange Admins are in control a.k.a. they don’t bother me anymore. However, Windows servers admins started to bother me with script errors related to the Exchange Server 2013 Discovery on servers that didn’t run Exchange 2013 in the first place.

After some investigation it turns out the discovery for Microsoft Exchange 2013 is a powershell script, which is targeted to the windows computer class. This is where it goes wrong. When you are monitoring Windows 2003 or Windows 2008 server chances are you don’t have powershell installed on those servers. Furthermore, why is the Exchange 2013 Discovery running on those servers as it’s a not supported OS for Exchange Server 2013.

So easy enough, i decided to override the discovery for Windows 2003. Simply choosing override for a group, select the Windows 2003 computer group and set the “enabled” value to false. Job done.

Now I wanted to disable the discovery for the Windows 2008 servers as, but not for the Windows 2008 R2 computer. Windows 2008 R2 is a supported OS for Exchange 2013, besides powershell is installed by default so there’s no issue here. The discovery will run and return nothing (or not an exchange server) if exchange isn’t installed. It won’t return a script error because there’s no powershell.

The Windows 2008 computer group in the Windows Server 2008 (discovery) management pack contains also the Windows 2008 R2 computers, so it’s not so easy as with Windows Server 2003. I needed to create a Windows Server 2008 Group which doesn’t contain Windows 2008 R2 server.

Luckily I remembered a blogpost by Kevin Holman about creating a group with computers not containing computers in another group (btw glad he’s back on the support front, I really missed those deep dives in SCOM). I created a new group, edited the xml and set the override. The only difference between my group exclusion and Kevin Holman’s group is I use a reference to another MP in the “notcontained” section as i use the “Microsoft Windows Server 2008 R2 Computer Group” which already exists in the Windows Server 2008 (discovery) mp. This means the reference to that mp needs to be included in the xml below.

The result is here. Save this as ValueBlueOverrideMicrosoft.Exchange.Server.xml (remove the “windows server 2003” override and reference to “Microsoft.Windows.Server.2003” if you don’t run this anymore):

<?xml version=”1.0″ encoding=”utf-8″?><ManagementPack ContentReadable=”true” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
<Manifest>
<Identity>
<ID>ValueBlueOverrideMicrosoft.Exchange.Server</ID>
<Version>1.0.1.0</Version>
</Identity>
<Name>ValueBlueOverrideMicrosoft Exchange Server 2013</Name>
<References>
<Reference Alias=”Exchange”>
<ID>Microsoft.Exchange.15</ID>
<Version>15.0.620.18</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
<Reference Alias=”MicrosoftSystemCenterInstanceGroupLibrary6172210″>
<ID>Microsoft.SystemCenter.InstanceGroup.Library</ID>
<Version>6.1.7221.0</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
<Reference Alias=”SystemCenter”>
<ID>Microsoft.SystemCenter.Library</ID>
<Version>6.1.7221.81</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
<Reference Alias=”MicrosoftWindowsServer2008Discovery6066670″>
<ID>Microsoft.Windows.Server.2008.Discovery</ID>
<Version>6.0.6667.0</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
<Reference Alias=”Windows”>
<ID>Microsoft.Windows.Server.2003</ID>
<Version>6.0.6667.0</Version>
<PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>
</References>
</Manifest>
<TypeDefinitions>
<EntityTypes>
<ClassTypes>
<ClassType ID=”ValueBlue.Microsoft.Server.2008.Only.Group” Accessibility=”Public” Abstract=”false” Base=”MicrosoftSystemCenterInstanceGroupLibrary6172210!Microsoft.SystemCenter.InstanceGroup” Hosted=”false” Singleton=”true” />
</ClassTypes>
</EntityTypes>
</TypeDefinitions>
<Monitoring>
<Discoveries>
<Discovery ID=”ValueBlue.Microsoft.Server.2008.Only.Group.DiscoveryRule” Enabled=”true” Target=”ValueBlue.Microsoft.Server.2008.Only.Group” ConfirmDelivery=”false” Remotable=”true” Priority=”Normal”>
<Category>Discovery</Category>
<DiscoveryTypes>
<DiscoveryRelationship TypeID=”MicrosoftSystemCenterInstanceGroupLibrary6172210!Microsoft.SystemCenter.InstanceGroupContainsEntities” />
</DiscoveryTypes>
<DataSource ID=”GroupPopulationDataSource” TypeID=”SystemCenter!Microsoft.SystemCenter.GroupPopulator”>
<RuleId>$MPElement$</RuleId>
<GroupInstanceId>$MPElement[Name=”ValueBlue.Microsoft.Server.2008.Only.Group”]$</GroupInstanceId>
<MembershipRules>
<MembershipRule>
<MonitoringClass>$MPElement[Name=”MicrosoftWindowsServer2008Discovery6066670!Microsoft.Windows.Server.2008.Computer”]$</MonitoringClass>
<RelationshipClass>$MPElement[Name=”MicrosoftSystemCenterInstanceGroupLibrary6172210!Microsoft.SystemCenter.InstanceGroupContainsEntities”]$</RelationshipClass>
<Expression>
<NotContained>
              <MonitoringClass>$MPElement[Name=”MicrosoftWindowsServer2008Discovery6066670!Microsoft.Windows.Server.2008.R2.ComputerGroup”]$</MonitoringClass>
</NotContained>
</Expression>
</MembershipRule>
</MembershipRules>
</DataSource>
</Discovery>
</Discoveries>
<Overrides>
<DiscoveryPropertyOverride ID=”OverrideForDiscoveryMicrosoftExchange15ServerDiscoveryRuleForContextMicrosoftWindowsServer2003ComputerGroupd5787b8329934b19ba24ca637d805307″ Context=”Windows!Microsoft.Windows.Server.2003.ComputerGroup” ContextInstance=”cb87057c-606b-43e7-e861-8e5a0df201f6″ Enforced=”false” Discovery=”Exchange!Microsoft.Exchange.15.Server.DiscoveryRule” Property=”Enabled”>
<Value>false</Value>
</DiscoveryPropertyOverride>
<DiscoveryPropertyOverride ID=”OverrideForDiscoveryMicrosoftExchange15ServerDiscoveryRuleForContextValueBlue.Microsoft.Server.2008.OnlyGroup4700b619d2184cb2af12302026deee09″ Context=”ValueBlue.Microsoft.Server.2008.Only.Group” ContextInstance=”4cc2c8a2-918a-2ec3-f05e-fa1042b4a4db” Enforced=”false” Discovery=”Exchange!Microsoft.Exchange.15.Server.DiscoveryRule” Property=”Enabled”>
<Value>false</Value>
</DiscoveryPropertyOverride>
</Overrides>
</Monitoring>
<LanguagePacks>
<LanguagePack ID=”NLD” IsDefault=”false”>
<DisplayStrings>
<DisplayString ElementID=”ValueBlueOverrideMicrosoft.Exchange.Server”>
<Name>ValueBlueOverrideMicrosoft Exchange Server 2013</Name>
</DisplayString>
<DisplayString ElementID=”ValueBlue.Microsoft.Server.2008.Only.Group”>
<Name>ValueBlueWindows 2008 servers only (no windows 2008 r2 servers)</Name>
</DisplayString>
<DisplayString ElementID=”ValueBlue.Microsoft.Server.2008.Only.Group.DiscoveryRule”>
<Name>Populate ValueBlueWindows 2008 servers only (no windows 2008 r2 servers)</Name>
<Description>This discovery rule populates the group ‘ValueBlueWindows 2008 servers only (no windows 2008 r2 servers)'</Description>
</DisplayString>
</DisplayStrings>
</LanguagePack>
<LanguagePack ID=”ENU” IsDefault=”false”>
<DisplayStrings>
<DisplayString ElementID=”ValueBlueOverrideMicrosoft.Exchange.Server”>
<Name>ValueBlueOverrideMicrosoft Exchange Server 2013</Name>
</DisplayString>
<DisplayString ElementID=”ValueBlue.Microsoft.Server.2008.Only.Group”>
<Name>ValueBlueWindows 2008 servers only (no windows 2008 r2 servers)</Name>
</DisplayString>
<DisplayString ElementID=”ValueBlue.Microsoft.Server.2008.Only.Group.DiscoveryRule”>
<Name>Populate ValueBlueWindows 2008 servers only (no windows 2008 r2 servers)</Name>
<Description>This discovery rule populates the group ‘ValueBlueWindows 2008 servers only (no windows 2008 r2 servers)'</Description>
</DisplayString>
</DisplayStrings>
</LanguagePack>
</LanguagePacks>
</ManagementPack>

Posted in grouping and scoping, management packs, troubleshooting | Tagged: , , , | 2 Comments »

SNMP port claimed, but not by the snmp services.

Posted by rob1974 on October 24, 2013

 

I recently came across the following issue. SCOM was throwing errors every minute and auto closing them fast from this monitor: Network Monitoring SNMP Trap port is already in use by another program. 2 things I notice when I looked at it. The first one was it was a manual reset monitor how can that be auto closed? And the second one, it was about the management server itself and it’s inability to claim the SNMP trap receiver port (UDP 162), while I was sure I had disabled the SNMP services.

So what claimed the SNMP port? With 2 simple commands you can find out:

> netstat –ano –p UDP| find “:162”.

This resulted in some PID. Now check the PID versus tasklist /svc and we should have the process responsible for the claim.

> tasklist /svc /fi “pid eq <pidnumberfound>”

and it returned nothing.

After quite some rechecking my results in different ways I came to the conclusion that the UPD port really had been claimed by an process that didn’t exist anymore. I feel that’s it is some bug in Windows server, it should always close the handles whenever a process dies, but let’s be pragmatic about it > reboot server. After the reboot I ran the commands above again to find out the “monitoringhost.exe” claimed the port. W00t we solved the problem, SCOM can receive traps again.

As mentioned in the start of the post. The alerts were closing fast. And because of that fact, if someone saw it they wouldn’t pay any attention to it. SCOM wasn’t receiving any traps in the above condition, so why was the alert of a manual reset monitor being auto closed fast?

The explanation is quite simple: the recovery task for this monitor. This recovery task runs and disables the Windows SNMP services and will do a reset of the monitor. The problem with this is that the recovery didn’t do anything (the SNMP services were already disabled), but it did reset the monitor.

I think the solution should be rewrite of the monitor, where the monitor checks if SCOM really has claimed the SNMP port. I could do this, but for now I will leave it. It might have just been one of those exotic errors you’ll see once in a lifetime…

Posted in Management Servers, troubleshooting | Leave a Comment »

Scom 2012 Design for a serviceprovider.

Posted by rob1974 on October 24, 2013

In the last year we’ve created a (high level) design to monitor an unlimited amount of windows servers (initial amount around 20.000). However, halfway during the year it became clear that this  company decided to on a new architecture all together. Not only SCOM 2012 would become obsolete in this architecture, also offshoring monitoring would make Marc and me obsolete (not to worry, we both have a new job). As i feel it’s kind of a waste of our work i’ll blog the 2 things that I feel were quite unique in this design.

There’s still a SCOM 2007 environment running designed for around 10.000 servers. We’ve designed this as a single instance and it is monitoring 50+ customers. Those customers were seperated by scoping on their computers (see our blog and/or lookup the service provider chapter in operations manager 2012 unleashed). However, we never reached 10.000 servers due to a few factors. The first was the amount of number of concurrent consoles used. The number of admins needing access will increase with the amount of servers you monitor, however the “official” support numbers of Microsoft will decrease the number of concurrent consoles with the number of supported agent (50 console with 6000 servers, 25 with 10.000 server). Now this isn’t a really hard number, but keep adding more hardware will just shift bottlenecks (from MS to DB and from to disk I/O to network I/O) and of course the price of hardware goes up exponentially.

The second issue we had, we had to support a lot of management packs and versions. We supported all Microsoft applications from windows 2000 and up. So we didn’t only run, Windows 2000, 2003 and 2008, but all versions for DHCP, DNS, SQL, Exchange, Sharepoint, IIS etc, etc. Some of the older ones still are the dreaded converted mp’s, some of which we had to  fix volatile discoveries ourselves. By the way, I’ve been quite negative about how Microsoft developed their mp’s and especially the lack of support on them, but the most recent mp’s are really done quite well. MS keep that up!

In the new design we’ve used a different approach. We decided not to aim for the biggest possible instance of SCOM, but to go to several smaller ones, plotted initially over different kind of customers my former company has to support. Some customers only needed OS support, some OS + middleware and others required monitoring of everything.

Based on our experience we decided to create only one technical design for all the management groups and have a support formula for the number of supported agents. Keep in mind this is not set in stone; it’s just to give the management an estimate about the costs and insight in the penalty for keep adding new functionality (mp’s) to the environment. All the management groups were still to hold more then one customers, but one customer should only be in one management group to be able to create distributed applications (automatic discoveries should be possible).

This is what we wrote down in the design:

The amount of agents a management group can support depends on a lot of different factors. The performance of the management servers and the database will be the bottleneck of the management group as a whole. A number of factors influence the performance e.g. number of concurrent consoles, number of management packs, health state of the monitored environments.

When a management group reaches it thresholds a few things can be done to improve performance:

1. Reduce the amount of data in the management group (solving issues and tuning, both leading to less alerts and performance data).

2. Reduce the console usage (reduces load on management servers and databases)

3. Reduce the monitored applications (less management packs).

4. Increase the hardware specs (more cpu’s, memory, faster disk access, more management servers).

5. Move agents to another management group (which reduces load in terms of data and console usage)

We’ll try to have 1 design for all management groups, so the hardware is the same. However the number of management packs is different in each management group. To give an estimate how this affects the number of supported agents per management group we’ve created a support formula.

The formula is based on 2 factors, the number of concurrent consoles and the number of management packs[1]. The health of the monitored environments is taken out of the equation. The hardware component is taken out as well but assumed to be well in line with the sizer helper of SCOM design guides from Microsoft. With less than 30 consoles and <10 monitored applications/operating systems the sizing is set to 6000 agent managed systems (no network, *nix or dedicated URL monitoring).

The tables for penalties for the number of consoles and managements packs and the resulting formula are given below. Please note, the official support limit by Microsoft for the number of consoles is set to 50.

consoles penalty (%) management packs penalty (%)
0 0 0 0
5 0 5 2
10 0 10 4
15 0 15 7
20 0 20 11
25 0 25 16
30 0 30 23
35 2,5 35 32
40 5 40 44
45 7,5 45 60
50 10 50 78
55 30 55 85
60 50 60 90
65 95
70 97
75 99

Number of supported agents = (1-mppenalty/100)*(1-consolepenalty/100)*6000

This results in the following table with shows the number of supported agents.

horizontal = concurrent consoles

vertical = number of management packs

<30 35 40 45 50 55 60
0 6000 5850 5700 5550 5400 4200 3000
5 5880 5733 5586 5439 5292 4116 2940
10 5760 5616 5472 5328 5184 4032 2880
15 5580 5441 5301 5162 5022 3906 2790
20 5340 5207 5073 4940 4806 3738 2670
25 5040 4914 4788 4662 4536 3528 2520
30 4620 4505 4389 4274 4158 3234 2310
35 4080 3978 3876 3774 3672 2856 2040
40 3360 3276 3192 3108 3024 2352 1680
45 2400 2340 2280 2220 2160 1680 1200
50 1320 1287 1254 1221 1188 924 660
55 900 878 855 833 810 630 450
60 600 585 570 555 540 420 300
65 300 293 285 278 270 210 150
70 180 176 171 167 162 126 90
75 60 59 57 56 54 42 30

Again we never put this to the test. But those figures resulted in our goals for this design. One management group for OS monitoring to hold 6000 servers, one for middleware to hold around 3500 and full blown monitoring for monitoring of around 2000 servers in one management group. Please note our first remark about tuning and solving alerts, those things are still very important for a healthy environment!

The next part that made our design stand out is the ability to move agents or actually entire customers from one management group to another. Now this is of course always possible, but we wanted to do this automated and keep all customizations if applicable. The reasons for such a move would be sizing/performance of the management group, mp support (e.g. the customer starts using a new OS, which isn’t supported in mg he is in), contract change with the customer (e.g. from OS support to middleware support or the other way round).

In order to be able to keep all customizations we’ve created a mp structure/naming convention:

jamaOverride<mpname> = holds all generic overrides to <mpname>. This mp is sealed and should be kept in sync between all management groups holding the <mpname> as it contains the enterprise monitoring standards

jama<customer>Override<mpname> = holds all customer specific overrides to <mpname> and/or a change to the generic sealed override mp. This is of course unsealed as it holds guids of specific instances.

jama<customer><name> = holds all specific monitoring rules for a customer and DA’s. This is not limited to just 1 mp, normal distribution rules should apply. In other words don’t create a big “default” mp but keep things grouped based on “targets” of the rules/monitors.

Now when we need to move a customer we can just move an agent to a new management group and import all the  jama<customer> mp’s in that mg. Of course this is only applicable if the new mg supports the same mp’s as the old one. jamaOverride<mpname> should already exists as it just contains enterprise wide overrides to classes. The jama<customer>Override<mpname> would contain specific instances. However specific instances get a guid on discovery. If you remove an agent and reinstall and agent you get a new guid, so the overrides wouldn’t apply anymore. Fortunately SCOM supports multihoming and one of the big advantages is cookdown. Cookdown will only work when the same scripts are using the same parameters, so this means that the agent’s guids must be the same when multihomed as that guid is a parameter for many datasources.

To make sure the guid is kept, a move between management groups needs to use this multihoming principle and a move would become: Add an agent to the new management group, make sure all discoveries have run (basically time/maybe a few agent restarts) and finally remove the old management group. Because this is a relative easy command (no actual install) the add and remove can be done via a SCOM task (add the “add new mg task” to the old mg and the “remove old mg” to the new mg).

Again we never built this and we don’t expect to be as smooth as we described here (always some unexpected references in an override mp :)), but hopefully it is to some use to someone.


[1] 1 management pack equals 1 application or OS. E.g. windows2008 is counted as 1 management despite it are 2 management packs in scom (discovery and monitoring).

Posted in general, management packs, Management Servers | Leave a Comment »

HP Proliant MP discovery is missing Networks, Storage and Management Processor

Posted by MarcKlaver on April 25, 2013

image

Recognize the above situation?

When you installed the HP Proliant Management Pack and used the SNMP based agents, some discoveries do not seem to work. To solve this, you must also configure your SNMP settings on your server. In order for the SNMP based agents to read all required information, an SNMP community string must be created.

You should configure two things within the SNMP Service Properties:

clip_image002

Under “Service” allow all.

 

clip_image002[5]

  • Under “Security” add a read-only community string and make sure that SNMP packets are accepted from the “localhost” host. Note that the :”Send authentication trap” option is not required, but optional.
  • Restart your SNMP Service
  • Restart the SCOM service

And after a few minutes the result should look like this:

image

When the HP Agents start, it will read the SNMP settings and use it to access the SNMP based information from the agent Smile.

Posted in management packs | Leave a Comment »

Configure Custom Service Monitoring, an alternative method.

Posted by rob1974 on June 12, 2012

We are running SCOM in a “service provider” solution. When you do this you want to have standardized environments and tune everything in a generic manner. However, our main problem is every customer is different and has different requirements. Simply changes like disk threshold or setting up service monitoring for specific services on specific computers can be quite time consuming. Also, it can have a big impact on performance of the environment (a simple override to one Windows 2008 logical disk gets distributed to all Windows 2008 servers!).

Of course, we could have authors and advanced operators for to do this job, but there’s no way of auditing on changes or forcing advanced operators to use certain override mp’s only. We’re convinced this will lead to performance issues and dependency problems because overrides will be saved everywhere. So we believe this can only be done by a small amount of people, who really understand what they are doing.

To counter this we use an alternative way for setting overrides for common configurations and overrides. We create a script that reads a registry key for a threshold value to use or as I will show in this blog a configuration value. An example of a threshold value: we’ve created a disk free space script that uses a default value (both % as an MB value), but when a override is present in the registry it will use that value instead. The registry override in turn can be set with a SCOM task with just normal operator rights (you can restrict task access for operators in their SCOM profile so not every operator can do this). The change is instant and without configuration impact on the SCOM servers.

Now to the service monitoring example of this principle. What we’ve done is create a class discovery that checks for a certain registry key being present. If it is present it will target 1 monitor and 1 rule to that class. The monitor runs a script every 5 minute and checks the registry key for service names. Every service name is this registry key will be checked for a running state. If one or more services aren’t in a running state the monitor will become critical and an alert will be generated. When all services are in a running state again, the monitor will reset to healthy and close the alert.

By running a task from the windows computer view you can setup the monitoring for the first time:

image

Overriding the task with the service name (not the displayname) will add the service to the registry.

image 

When the discovery has run the found instance will be visible in the jama Custom Service Monitoring view and more tasks will be available. When it really is the first service it might take up to 24 hours before the instance is found as we’ve set the discovery to a daily interval. But you can always restart the system center management service to speed up the discovery.

The new tasks are:

– Set monitored service. Basically the same tasks as the one available in the computer view, just the target is different. It can add additional services to monitor without any impact on the SCOM backend and this service will be monitored instantly as the data source will check the registry each run.

– List monitored service. Reads the registry and lists all values.

– Remove monitored service. Removes a service from the registry key and delete the key if the service was the last value. When the key is deleted the class discovery removes the instance next discovery run. Overriding the key with “removeall” will also delete the key.

– The “list all services on the computer” task doesn’t have real value for this management pack, just added for checking a service name from the SCOM console.

See below for some screenshots of the tasks and health explorer.

image

Task output of “List monitored services”:

image

The health explorer has additional knowledge and show which services have failed through the state change context:

image

 image

 

 

So what’s the benefit of all this:

– The SCOM admins/authors don’t have to make a central change, except for importing the management pack.

– The support organization doesn’t have to log a change and wait for someone to implement it, but they can make the change themselves with just SCOM operator rights (or with rights to edit a registry key on a computer locally) and it works pretty much instant.

– From a performance view the first service that is added will have some impact on the discovery (config churn), but additional services don’t have any impact.

However, there’s still no auditing. We’ve allowed this task to what we call “key users” and we have 1 or 2 key users per customer. This could give an idea of who changed it when you’d set object access monitoring on the registry key.

The performance benefit for this monitor is probably minimal. However using this principle for disk thresholds gives a huge benefit as that’s a monitor that is always active on all systems and overriding values through a registry key on the local system removes all override distribution (I might be posting that mp as well).

When you want to checkout this mp, you can download the management pack jamaCustomService here. I’ve uploaded it unsealed, but I recommend it to seal it if you really want to put this to production.

Posted in general, management packs | Tagged: , , | Leave a Comment »

Stop storing data (partial or temporary) into the data warehouse database

Posted by MarcKlaver on January 19, 2012

In order to facilitate the use of the data warehouse database, there are 3 default overrides for an environment that has it’s data warehouse enabled.

image

If you (partial or temporary) need to stop storage to the data warehouse, you can just override the default overrides (again) to set the Drop Items parameter to true. This will, after propagation to the management server, cause the items to be dropped (and not stored into the data warehouse database).

Note that while this is possible, I assume it is a non supported configuration Smile

Posted in Management Servers | Leave a Comment »

Agent proxy

Posted by MarcKlaver on July 1, 2011

Until now we have set the Agent proxy for an agent only when required and we used a script to do this. See this link for more information. But now Microsoft has come up with something new in the Exchange 2010 management pack. It will not discovery anything until you have set the agent proxy on for the Exchange servers (so we can’t do this afterwards anymore). So this meant for us we need to make a choice:

  1. Manually enable the proxy agent setting for all Exchange 2010 servers (now and in the future). Which means an Exchange 2010 server will not be discovered until we actually do.
  2. Enable the proxy agent for all agents.

Counting at the moment around 60 percent of the agents already has the proxy functionality enabled. So what’s the advantage of not setting this setting on default for all agents? Looking at security, you have to enable this setting already for all security important servers (AD, Exchange, ISA, Citrix, etc.). And since we have no knowledge of when an Exchange server is connected to our environment, we decided to enable it for all agents.

This is the script to do it:

$rootMS="RMS.TEST.LOCAL"

#——————————————————————————-
# Add operations manager snapin and connect to the root management server.
#——————————————————————————-
add-pssnapin "Microsoft.EnterpriseManagement.OperationsManager.Client";
set-location "OperationsManagerMonitoring::";
new-managementGroupConnection -ConnectionString:$rootMS;

## set proxy enabled for all agents where it is disabled
$NoProxy = get-agent | where {$_.ProxyingEnabled -match "False"}
$NoProxy | foreach {$_.ProxyingEnabled = $true}
$NoProxy | foreach {$_.ApplyChanges()}

See also Kevin Holman’s blog

Posted in Agent Settings | 2 Comments »

JDF: Jama Distribution Framework

Posted by MarcKlaver on March 28, 2011

This blog describes our framework for file distribution between a SCOM agent and our central SCOM environment. After this framework is installed, you will be able to transfer files between your SCOM agent and a central location.  Before you continue, be warned. I can not deliver a single file, which will do all that I just promised. I will however guide you through the work that needs to be done to setup a file distribution framework for SCOM, but you will need to change scripts, compile the management pack and install and configure a secure copy server. Changes will be minimized as much as possible and most actions are automated with scripts. But first things first, let’s describe the general idea behind the framework……

 

Transferring files

The idea behind the framework is to write an set of scripts, that would make it possible to transfer files to and from the SCOM agent. The main trigger for this wish was our installed base, which are manual installed agents only. Updating these agents manually is a long job for every CU update. If we could just schedule an update from the operations console and sit-back, that would be great. But in order to do that, we need to be able to get the required update files to the SCOM agent. Keep in mind that we do not have a single customer but multiple customers, all with (or without) their own file distribution method. Being able to distribute and update from the operations console itself would be ideal.

But transferring files should also be secure. Our solution: Secure Copy, based on private / public key pairs for data transfers.

And since we have multiple customers, we need to make sure that Customer A is not able to see or change any data we retrieve from Customer B (and vice versa). Therefor we have to implement a secure way to copy files to and from the SCOM agent and separate data based on customers. Our solution: A Secure Copy implementation, that is capable of creating virtual directory structures, based on a customer (secure copy) account.

We did not want to change any firewall rules (with the exception of adding a new management server). So we basically wanted to use communication from the SCOM agent on TCP port 5723 and only initiate communication from the SCOM agent. Our solution: A Secure Copy server configured to listen on port 5723 for secure copy clients.

Finally it needed to be implemented on all our agents, not just a subset of agents. Our solution: A Microsoft Windows Scripting Component, written in vbscript, which will run on all Windows versions we support (Windows 2000 and higher).

So if we draw this in a high overview, it would look like this:

image

And yes it looks very simple 🙂 So the first thing we need is a secure copy server.

 

Secure Copy server

The secure copy server can be any implementation, as long as it is capable of implementing private/public key pairs and able to generate virtual directories. We use the bitvise winsshd service. Now before we start implementing this service, we need to make some rules.

  1. Only logins with private/public ssh key pairs are allowed.
  2. Only secure copy is allowed (no sftp or ssh).
  3. There will only be one general (shared) account and this account has read only access only to it own virtual directories. We call the shared account: jdfBaseDistribution This will prevent accounts from uploading data to a “shared” location. This is also the account that will be used to distribute the framework.
  4. Data will be compressed as much as possible, but the framework will not rely on the secure copy implementation and will “zip” the files to upload into a “package”.
  5. When a “package” is downloaded, it is expected to be a “zip” file, which can be extracted at the SCOM agent side by the framework.
  6. Files (not “packages”) that will be downloaded, must be compressed files to reduce network traffic (this is not forced by the framework).
  7. Customer accounts will have the naming convention: jdfCustomer_<CustomerName>
  8. The root (virtual) directory for every account will always be empty. All directories will be read only, with the exception of the “upload” directory. But it will not be possible to remove files from the “upload” directory. The following structure will be used for every customer

image 

Off course you can implement your own directories and rules for read/write, but these are the rules we use to explain the framework. If you start with these, the examples and code will work correctly.

 

Note: The shared account (jdfBaseDistribution) will not have a /upload directory!

A seperate account for each customer

We also need an account for every customer we support. Each customer will have the same folder layout (as show above), but this will be virtual directories. This will result in the same “view” for every customer, but data will be uploaded and downloaded from customer specific locations on the Secure Copy server.

image

Note: Files can not be deleted from the upload directory.

As can be seen, the account jdfCustomer_JAMA has four virtual mount paths. And with the exception of the root path (which should never be used), all data is ‘re-directed’ to a directory specific for that account. So if we create a second account (jdfCustomer_JAMA00), you can see that it has its own ‘re-directed’ directories.

image

This results in the same view for both customers (the Virtual mount path), but data being stored and retrieved from physical different locations.

We only allow the secure copy protocol and only logins with private/public key pairs. On how to configure winsshd with private/public key pairs, see this link.

 

ssh key pairs

For each customer we need to create an ssh public/private key pair. We use puttygen.exe to create the key pairs. The public key is saved as “jdfCustomer_<CustomerName>.public” and the private key is saved as “jdfCustomer_<CustomerName>.private”.

Note: You can not use a password phrase for the private key file (not supported by the framework). So be sure to only distribute the private key file to the correct customer.

When creating the keys with puttygen.exe, you should use the default Paramters, as shown below:

image

When your implementation of the secure copy server is working correctly you can continue with the next step: configuring the framework. Just make sure you can use the secure copy server with private/public key pairs before you continue.

 

The JDF Framework

The JDF Framework itself needs to be configured for use in your environment. What is required is fully described in the documentation of the framework. Although I tried to create a generic framework, still some environment dependent settings are required. But first you should download the framework 🙂

The framework can be downloaded here

The documentation can be downloaded here

 

What’s next

If you have setup the framework, you can start testing it. The framework download includes two examples on how to use the framework. The first thing we created with this framework was the ability to update our manual installed agents from the console (we have this working in our test environment). In my next blog I will create a management pack which will update your (manual) installed agents to CU4. Just make sure you got the framework up and running 🙂 If you fail to do so just write a comment and I will try to answer your questions.

Posted in management packs | 1 Comment »

SCOM’s “un-discovery”. What doesn’t work here… And how to correct it.

Posted by rob1974 on January 26, 2011

 

SCOM’s main benefit of monitoring imho is it’s ability to discover what is running on a server and based on that information start to monitor the server with the appropriate rules. When you follow Microsoft’s best practices you’ll first perform a lightweight discovery to create a very basic class and have the more heavy discoveries run against that basic class. This is pretty good stuff actually. it helps quite a lot for the performance of an agent as it will only run heavy discoveries if the server has an application role and never run on servers which have nothing to do with that application.

However, I’ve recently found out a drawback with this 2 step discovery, which I can probably explain the best with a real world example:

Discover the windows domain controllers on “windows computers” (the management pack from where this discovery runs in is an exception. usually it’s in the application mp itself; apparently MS thought of domain controllers being basic info. similar discoveries for workstation and member servers can be found in this mp as well). For this discovery a wmi query is used to determine if the “windows computer” is a domain controller as well (SELECT NumberOfProcessors FROM Win32_ComputerSystem WHERE DomainRole > 3; if this returns something, it’s a dc)

image

When it is a “windows domain controller” it will run a few other discoveries to determine more info.image

Just by looking at the classes you can imagine it’s not really lightweight anymore.

image

So far so good, on all my windows computer I run a simple query and if that query returns something SCOM will also run a script that founds more interesting stuff about the DC.

But here’s the catch with this kind of discovery. Suppose I don’t need a certain DC anymore, but I still need to keep the server as it’s running some application I still need to use and monitor. What will happen? The lightweight discovery will do its job. It will correctly determine that the server is not a “windows domain controller” anymore and as a result it won’t run the script-discovery anymore.

You might ask, why is that bad, we didn’t want that, did we? Yes you are correct, we didn’t want to run this discovery against servers that aren’t DC’s, but SCOM doesn’t unlearn the discovered classes automatically. Because this discovery never runs again SCOM never unlearns this server doesn’t have the “Active Directory Domain Controller Computer Role” anymore. And this is the class that is used for targetting rules and monitors. So allthough SCOM knows the server isn’t a “windows domain controller” anymore, it still is monitoring the “Active Directory Domain Controller Computer Role”. This will result in quite a lot of noise (script errors, ldap failures, etc).

For now, there’s just a workaround available. You will need to override the 2nd discovery for that particular server. As the first discovery doesn’t include this server as an object of class, you can’t override the discovery for a “specific object of class: Windows Domain Controller”. You’ll need to create a group and include the server object. Then use the override the object discovery “for a group…” and choose the group you’ve just created.

image

What’s the point of disabling a discovery that didn’t run anyway? Well now you can go to powershell and run the “Remove-DisabledMonitoringObject” cmdlet. This will remove the discovered object classes for this discovery and all of the monitoring attached to those classes.

Discoveries make SCOM stand out from other monitoring tools, but it needs to work both ways. Finding out this took me about 1 day. And that’s just 1 issue with 1 server (DNS was also installed on this server and had the same issue). Loads of servers might change role without me knowing about it and when it’s not being reported to me I’ll just have extra noise in SCOM. I’m just not sure if this can be picked up within SCOM itself or that the “un-discovery” needs to be done by the mp’s logic. For the AD part it needs to be picked up by Microsoft anyway, but if the logic is build in the management pack then it will have an impact on all the custom build mp’s by all you SCOM authors out there.

Posted in general, management packs, troubleshooting | 3 Comments »

Distributing files with SCOM

Posted by MarcKlaver on October 29, 2010

Didn’t you wish there was a way to distribute files, using the SCOM environment? And not depend on others to get a file across? Well we did and we wrote a management pack that does just that; distributing files to the target servers. But hold on, don’t get too exited it is still SCOM and no file distribution application, so our solution has a few disadvantages:

  1. Targeting – If you target the MP to a class, all servers in that class will get the MP (even if all rules in the MP are disabled by default).
  2. Scheduling – There is no way to schedule a delivery to a remote server. As soon as the MP is imported and the RMS detects the new configuration, the MP will be distributed.

Both issues can result in very high network traffic, if not taken into account. Now the first one we can slightly control, by targeting at specific classes. The more specific the class the better. Of course we wanted our files to be distributed to all computers 🙂 The second one is only under control by controlling the time of the MP import. Of course this is not ideal, but till now we had nothing.

What we do is creating a management pack, that will hold a script. In that script, all other files we want to distribute are placed in a comment section at the bottom of the script, after being converted to a hex notation. It looks something like this:

‘<BEGIN_FILE>jamaMaintenanceMode.vbs
‘272D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D
‘0A272046696C6520202020202020203A206A616D614D61696E
‘74206C6F6720656E74727920746F207075742074686520636F
‘61696E74656E616E6365206D6F64652E0D0A2720246376735F
‘652E7662732C7620312E3420323030392F30382F3331203130
‘2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D2D

At the other side (the agent side) we reconstruct the files again, and place the files on a fixed location. Now we have distributed the files using SCOM! The solution can distribute both text based files as binary files, but binary files will double in size when converted to the hex notation. So distributing a 500K binary file, will end up with a 1MB script to be distributed (and a 2MB script on the agent side, were it is stored in unicode).

So in short it is a nice method for distributing either small files or to a very limited set of targets. Distributing large files to all agents isn’t a good idea with this method (but it will work).

 

What do you need:

image

The above files and directory structure.  You should create this directory structure before continuing:

include_binary – This is the location were you need to place your binary files, which need to be distributed.

include_text – This is the location were you need to place your text files, which need to be distributed.

output – This directory is used to store the generated code.

The files you need can be found here!

 

Now importing this MP’s without editing, results in a management pack doing nothing. So if you want to distribute a file, this is how:

  1. You should edit the BuildTargetScript.cmd file for every change in your distribution you want to be delivered:

    image

    The script that will generate the files, will check for this version number. If it is not equal to what is found in the registry it will recreate the files.  The STR_TARGET_DIR variable will hold your target directory on the target machines. This variable is used to check if the files can be generated correctly on your local desktop.

  2. You should also change the jamaTextDistribution.vbs

    image

    These two constants will be combined to create your target directory for the files:

    %SystemDrive%\STR_BASE_DIR\STR_DISTRIBUTION_DIR

    All files will be placed inside the above directory (this combined directory, must be equal to the STR_TARGET_DIR form the BuildTargetScript.cmd file above).

    Secondly you should change were in the registry you would like to store the version number:

    image

    You only have to do this once. If the key can not be found, it will be created.

     

  3. Finally you have to change the jamaDistribution.Distribution.xml file that will be distributed. Running the BuildTargetScript.cmd file will generate an ouput file in the output directory. The complete contents of that file needs to be inserted into this .xml file. Below you can find the location where inside the .xml file you need to paste the contents of the output file:

    image 

  4. Now import the two management packs and your file(s) will be distributed.  NOTE: Default the script will run every hour, but if all files are present, the impact is minimal.

 

What’s next?

Well since we can now get any file at the agent side we can start building a complete file distribution system (which we will :)) and after that we can finally automate the update of our manual installed agents and fully automate the manual installation!

Posted in management packs | Leave a Comment »