8.11 MachineDown

Use this Knowledge Script to detect whether the computer on which you run the script can communicate with one or more specified Windows computers.

This script does not require the AppManager agent to be installed on the remote computers you want to monitor.

To run this script on a Windows Vista computer, the Remote Registry service on the agent computer must be running to connect to the Windows registry on the remote computers you want to monitor. If the Remote Registry Service is down when this script runs, an event is raised to indicate the remote computer was unresponsive and the connection to the Windows registry failed.

You can select computers by browsing the AppManager repository, specifying a list of computers using the Computers to monitor parameter, or naming a file that contains a list of computer names or addresses. Browse the AppManager repository to select the remote computers you want and prevent event information from appearing in AppManager while the computer is in maintenance mode.

If you specify a list of computers, instead of browsing the repository for the computers you want, this script displays event information in AppManager even if the remote computer is in maintenance mode.

When typing a list of Windows computers, you can specify computers that are not currently in the Navigation pane or the TreeView pane.

When you run this script on a computer, the script tries to communicate with each of the computers you specified in the Computers to monitor parameter.

This script attempts to communicate by:

  • Checking name-to-IP-address resolution

  • Executing an Internet Control Message Protocol (ICMP) ping

  • Connecting to the Windows registry

This script raises an event if any of these attempts fail.

You can also instruct the script to ping specific router IP addresses before attempting to communicate with any of the specified computers. This provides an additional test of the network connection between the computer on which the script is running and the monitored computers. If this test is successful, it eliminates one reason for a lack of communication between computers.

This script does not monitor the computer where the script itself is running. For example, if you run this script on a server named SERVER01 and use the Select computers from the Repository parameter to select the server SERVER01 (either explicitly or as a member of a group or view), the script automatically excludes SERVER01 at run time because it does not make sense to monitor the local computer’s availability. If the script is running, the computer must be available. If the script is not running, either the local computer is down or the script or agent has been stopped.

To monitor the local computer, create a second MachineDown job running on a different computer that monitors the local computer in question. In this case, you could have a server SERVER02 running the script and monitoring SERVER01 and server SERVER01 monitoring server SERVER02. If both jobs are collecting data, be careful that the two scripts are not monitoring the same computers, for example, SERVER01 and SERVER02 should not both monitor SERVERA. This would result in two datastreams collecting uptime information for the same server (SERVERA), which can cause the ComputerAvailability report to miscalculate the uptime for SERVERA.

In some cases, this script may not be able to communicate with one or more remote computers because AppManager does not have sufficient privileges to access those remote machines. To avoid this problem, grant Admin privileges to the AppManager agent’s user account or use the PingMachine Knowledge Script to check connectivity.

If you select target computers by browsing the AppManager repository, the logon account for the agent on which the job is running must have sufficient privileges to query the AppManager repository.

If you select to include computers from the AppManager repository by View or Server Group, AppManager automatically includes the new computers on the next iteration. If you select to include computers from the AppManager repository by Computer, AppManager only monitors the computers that were selected. However, if you delete a monitored computer from the AppManager repository, AppManager does not monitor that computer unless you add it back into the AppManager repository. AppManager also reads the server list file on every script iteration. If you remove a computer name from the server list file, starting with the next script iteration, AppManager no longer monitors the computer.

This script can check connections to computers that are across a firewall from the AppManager repository so long as the script is running on a computer on the same side of the firewall as the computers to which it is checking connections. Keep in mind, however, that under these circumstances you cannot select computers by browsing the AppManager repository unless the SQL Server communication ports are open in the firewall and the agent can query the AppManager repository. If you are using an agent across a firewall from the AppManager repository, you are advised to use the Computers to monitor or Filename for computer list parameter to specify computers.

If the computer that is down has been discovered and is displayed in the Navigation pane or the TreeView, that computer’s icon blinks in the Navigation pane or the TreeView. If the computer that is down is not displayed, the computer where you ran the Knowledge Script blinks instead.

For computers running AppManager agents version .x and later where you want to use a monitoring policy, consider using the ConfigMachineDown and MachineDownLR Knowledge Scripts.

When configuring an action for this Knowledge Script, configure the Location to initiate the action on the MS (to run on the management server) or on a Proxy (to run on a particular managed client).

If you instead configure an action to run on the managed client (MC), when a remotely monitored computer is placed into machine maintenance mode (from AppManager) or scheduled maintenance mode (using the AMAdmin_SchedMaint Knowledge Script), any event conditions detected on the remote computer are ignored, but the action is not disabled. In this case, an action runs, but no event information appears on the Events tab.

Use the ReportAM_GeneralMachineDown Knowledge Script to generate a report about computers that were detected as down during a specified period.

If you are using the Web Console, the Select computers from the repository parameter is not supported. Instead, use the Computers to monitor parameter to specify the computers you want to monitor.

8.11.1 Using this Script to Monitor a Subnet

Run this script on a computer in the same subnet as the management server. When completing the Computers to monitor parameter, specify a limited number of computers that represent different subnets in your network.

You can then run additional MachineDown jobs on each of the computers specified in the first job to monitor the computers in each of their own subnets. This gives you coverage without stressing network bandwidth. It also ensures that, if a router or subnet is down, you receive only one event for the server being monitored from the agent on the management server’s subnet. The other servers in that subnet will not post duplicate “Computer Down” events.

As an example, assume:

  • The AppManager management server is installed on the computer TARZAN in subnet 1. Other servers in subnet 1 include TITO and BLUE.

  • Subnet 2 includes the servers PAOLO, BONN, and KENO.

  • Subnet 3 includes the servers TRISTE, VOILA, and TONTO.

You create a Knowledge Script job that runs on TITO (subnet 1, same as the management server) and set the Computers to monitor parameter to PAOLO (subnet 2) and TRISTE (subnet 3).

You then create a job (J-2) on PAOLO with the Machine list parameter set to BONN and KENO, and a job (J-3) on TRISTE with the Machine List set to VOILA and TONTO. Also create a job that runs on the management server (for example, TARZAN) that does a reciprocal check with the server in its own subnet (for example, TITO) in its Computers to monitor parameter.

HINT:If you want this Knowledge Script to raise an action when a connection is down, enable the Managed Client Action parameter in the Knowledge Script Properties dialog box for the job that monitors your subnets.

8.11.2 Resource Objects

Windows 2003 Server or later

8.11.3 Default Schedule

The default interval for this script is Every 5 minutes.

Be sure to schedule this job so that you allow enough time for the job to complete during the interval. As a general guideline, allow 20 to 30 seconds for each computer being monitored. This allows enough time for the connection to the registry on each computer.

You can use the following formula to calculate how many minutes are required for the job to complete:

(number of computers x 30 seconds)/60 = minutes for job to complete

For example:

(10 computers x 30 seconds)/60 = 5 minutes

8.11.4 Setting Parameter Values

Set the following parameters as needed:

Parameter

How to Set It

Event Notification

Raise event if a computer is down?

Select Yes to raise an event if a connection cannot be established to the target computer. The default is Yes.

NOTE:For AppManager agents version 6.x and later, events raised for computers in maintenance mode are suppressed.

Require Windows Registry connection?

Select Yes to require the script to attempt a connection to the registry after it has attempted an ICMP ping. The default is Yes.

This test is recommended because Windows can respond to ICMP ping requests even though the computer is in a blue screen state. A connection to the registry is further validation that the target computer is up.

If you are using this script to check the status of UNIX machines, you must disable this option.

NOTE:The account under which the AppManager agent is running must have sufficient privileges to connect to the registry.

Event severity when computer is down

Set the event severity level, from 1 to 40, to indicate the importance of an event in which a connection cannot be established to the target computer. The default is 5 (red event indicator).

Raise single event for all computers that are down?

Select Yes to raise only one event regardless of the number of computers that are down. The default is unselected.

If you choose to raise only a single event, the information about specific computers is contained in the event detail message. The same rules for the suppression of events that apply to the Raise event if a computer is down parameter also apply here.

Raise event if specified router is down?

Select Yes to raise an event if a router specified in the Router IP addresses parameter is down. The default is Yes.

Severity - Router down

Set the event severity level, from 1 to 40, to indicate the importance of an event in which a specified router is down. The default is 5 (red event indicator).

Raise event if default gateway is down?

Select Yes to raise an event when the default gateway is down. The default is Yes.

If the default gateway is down, the script might not be able to connect to any of the computers you identified, and false events can be raised,

Event severity when default gateway is down

Set the event severity level, from 1 to 40, to indicate the importance of an event in which the default gateway is down. The default is 5 (red event indicator).

Raise event if the computer list file is missing?

Select Yes to raise an event if the file containing the list of monitored computers cannot be found. The default is Yes.

Event severity when computer list file is missing

Set the event severity level, from 1 to 40, to indicate the importance of an event in which the list of monitored computers cannot be found. The default is 15 (yellow event indicator).

Event severity when job fails

Set the event severity level, from 1 to 40, to indicate the importance of an event in which the MachineDown job fails. The default is 5 (red event indicator).

Data Collection

Collect data for log server?

Select Yes to collect data for charts and reports. When enabled, data collection returns the availability, or status, of a specific computer you are monitoring. The default is unselected.

Collect single data point for number of servers down?

Select Yes to collect data for charts and reports. When enabled, data collection returns the number of unavailable, or down, computers for the monitored machines. The default is unselected.

Collect data for default gateway availability?

Select Yes to collect data for charts and reports. When enabled, data collection returns the availability, or status, of the default gateway of the computer that is running the job. The default is unselected.

Collect data for router availability?

Select Yes to collect data for charts and reports. When enabled, data collection returns the availability, or status, of one or more routers that you configured in the Router IP addresses parameter. By default, data is not collected.

Monitoring

Select computers from the repository

Click Browse [...] to search the AppManager repository for the computers you want to monitor. You can select computers by view (for example, Master or NT), by server group, or individually.

You can use this parameter as the sole selection method, or you can use it in conjunction with the Computers to monitor and Filename for computer list parameters.

NOTE:Once you specify a list of computers with this parameter, the script always monitors a list of computers generated by this parameter. You can modify the list, but you cannot delete it. If you want to subsequently specify monitored computers without using this parameter, you need to run a new monitoring job with this script and leave this parameter blank.

If you choose to select computers by server group, the server groups must actually contain agent computers. If you have a hierarchy of server groups where you want to choose a parent server group that contains child server groups, you must select the child server groups that have actual agent computers.

Computers to monitor

Specify a list of computers to monitor. Separate multiple names with commas and no spaces.

For example, to check whether the Sales1 server can communicate with the computers JOE, SAM, and PAT, run this script on the Sales1 computer and enter JOE,SAM,PAT in this field.

You can use this parameter as the sole selection method, or you can use it in conjunction with the Select computers from the repository and Filename for computer list parameters.

Filename for computer list

Specify the path to the file that contains a list of computers you want to monitor, or click Browse [...] and navigate to the file.

Use the local path to the file rather than the UNC path. For example, use D:\<path to file> rather than \\<server>\D$\<path to file>.

The file should contain the hostname or IP address for each computer in one or more lines. Each line can have multiple computer names, separated by commas and with no spaces.

For example:

NYC01,NYC02
SALES01,10.15.221.5,SFO01
LABMACH,QATEST

You can use this parameter as the sole selection method, or you can use it in conjunction with the Select computers from the repository and Computers to monitor parameters.

Router IP addresses

Specify the IP addresses of the routers through which the computer running the script should communicate with the target computers.

NOTE:If one of the listed routers is down, none of the target computers will be monitored.

Number of seconds to wait for ping response

Set the maximum number of seconds to wait for a response from a target computer. The default is 3 seconds.