4.52 RunAwayProcs

Use this Knowledge Script to detect runaway processes on the specified computer by repeatedly sampling CPU usage for processes. If a process exceeds the CPU threshold in the number of consecutive samples taken (one at each interval), AppManager raises an event.

For example, if this Knowledge Script detects that a process has exceeded the CPU threshold for five consecutive monitoring periods, it might indicate that the process is trapped in an infinite loop or has encountered other problems. In addition to generating an event to notify you of the problem, you can optionally kill any detected runaway processes. The detail message shows the list of processes being sampled.

The UNIX agent must run under a root account for this script to kill runaway processes.

4.52.1 Resource Object

UNIX computer icon

4.52.2 Default Schedule

The default interval for this script is Every 30 minutes.

4.52.3 Setting Parameter Values

Set the following parameters as needed:

Description

How to Set It

Event? (y/n)

Set to y to raise events. The default is y.

Collect data? (y/n)

Set to y to collect data for charts and reports. The default is n.

Maximum CPU usage (%) for runaway processes

Enter a threshold for the maximum percentage of CPU any process should be using when sampled. This percentage is used to determine which processes are runaway processes. The default is 90%.

Number of consecutive samples to take

Enter the number of consecutive samples you want taken before raising an event. The default is 3 samples.

Number of runaway processes to show (0 = all)

Specify the number of processes you want displayed in detail event or data message. Enter 0 if you want all processes displayed. The default is 0 for all processes.

Ignore these comma-separated processes

Enter the names of any processes (separated by commas and no spaces) you want to exclude from sampling.

Never kill these comma-separated processes

Enter the names of any processes (separated by commas and no spaces) that should never be killed.

The default processes are sched, init, pageout, fsflush, inetd, yp, and rpc.

Kill runaway process when detected? (y/n)

Set to y to kill any runaway processes found automatically (with the exception of the processes you have specified should never be killed). The default is n.

Event severity level for runaway process detected

Set the event severity level, from 1 to 40, to indicate the importance of an event reported when a runaway process is detected. The default is 5.

Event severity level for killed runaway process

Set the event severity level, from 1 to 40, to indicate the importance of an event reported when a runaway process is stopped. The default is 10.

Event severity level for failed to kill runaway process

Set the event severity level, from 1 to 40, to indicate the importance of an event reported when stopping a runaway process fails. The default is 10.

Event severity for internal failure

Set the event severity level, from 1 to 40, to indicate the importance of an event in which this job experienced an internal error. The default is 5.

Enable debugging? (y/n)

Set to y to enable debugging. The default is n.