4.4 CpuLoaded

Use this Knowledge Script to monitor total CPU usage and queue length to determine whether the CPU is overloaded. This script raises an event when CPU usage and CPU queue length values exceed the thresholds you set.

4.4.1 Resource Objects

CPU folder or any individual CPU icon (for multiprocessor systems)

4.4.2 Default Schedule

The default schedule for this script is Every 5 minutes.

4.4.3 Setting Parameter Values

Set the following parameters as needed:

Description

How to Set It

Event Notification

Event severity when job fails

Set the event severity level, from 1 to 40, to indicate the importance of an event in which the CpuLoaded job fails. The default is 5 (red event indicator).

Raise event if total system CPU exceeds threshold?

Select Yes to raise an event if total system CPU usage exceeds the threshold you set. The default is Yes.

This script raises an event when the following occur:

  • Total system CPU exceeds the threshold AND

  • Threshold - Maximum processor queue length is exceeded if you enabled the Use queue length in determining threshold crossings for events parameter.

When an event is raised, the event detail will contain the, top N using processes of the system processor usage from their Window counter values.

If you select this parameter AND the Use virtual machine performance counters if available? parameter, the job will retrieve the total system CPU usage metric from the VMWare virtual machine's total processor usage counter, and individual process detail for the top processor-consuming processes will be included in event and datastream detail.

Event severity when total system CPU exceeds threshold

Set the event severity level, from 1 to 40, to indicate the importance of an event in which system CPU usage exceeds the threshold. The default is 10 (red event indicator).

Raise event if any individual CPU exceeds threshold?

Select Yes to raise an event if CPU usage for any monitored server exceeds the usage threshold you set. The default is unselected.

This script raises an event when the following occurs:

  • Individual CPU exceeds the threshold AND

  • Threshold - Maximum processor queue length is exceeded if you enabled the Use queue length in determining threshold crossings for events parameter.

Event severity when individual CPU exceeds threshold

Set the event severity level, from 1 to 40, to indicate the importance of an event in which individual CPU usage exceeds the threshold. The default is 10 (red event indicator).

Monitoring

Cap processor usage values at 100 percent?

Select Yes to make 100 (percent) the maximum value that will be stored as the overall and individual process usage datastream, and it will be used as the value for the overall or individual processor usage when that value is found to be over 100 (percent)

This typically occurs in the VWMare virtual processor overall usage counter value when you selected the Use virtual machine performance counters if available? parameter. This caps the value, which is useful when reporting on AppManager data with other Micro Focus reporting products.

The default is unselected.

Use virtual machine performance counters if available?

Select Yes to monitor the total system CPU usage metric retrieved from the VMware performance counter. The default is Yes.

If you select Yes for this parameter, and if on job iteration 1, the VMWare counters are not present, that information is stored and subsequent iterations not attempt to use them.

The virtual machine performance counter always collects and creates data detail for the top N processes that are using the CPU.

NOTE:There are no VMWare virtual machine performance counters for individual process usage, the operating system counters still retrieve those values.

Important VMware allows virtual machines to report more than 100% of its CPU, so if you select Yes for this parameter, you might see CPU utilization data that is greater than 100%.

  • The %VM Processor Time counter value includes the % processor time for each virtual CPU plus 25% overhead for the virtual machine, so up to 125% could be returned for each processor.

  • If your monitoring environment cannot tolerate % processor time values greater than 100%, deselect the parameter for using the virtual machine counters, or enable the Cap processor usage values at 100 percent? parameter.

Use queue length in determining threshold crossings for events

Select Yes for the queue length to be used to determine whether to raise an event for the total/system CPU usage by combining the overall processor usage value with the length of the processor queue.

If the parameter is not selected, only the overall CPU usage threshold is used to raise events, if you selected the Raise event if total system CPU exceeds threshold? parameter.

The default is Yes.

Number of processes to include in detail when total CPU threshold crossed

This value represents the number of processes that the % Processor Usage counter value will collect and include in event detail and datastream detail, if either of these parameters are respectively selected in the job configuration.

To turn off inclusion of individual usage detail in events and datastreams, set this parameter to 0.

Thresholds

Threshold - Maximum total system CPU

Specify the maximum total system CPU usage allowed before an event is raised. The default is 95%.

Threshold - Maximum individual CPU

Specify the maximum individual CPU usage allowed before an event is raised. The default is 98%.

Threshold - Maximum processor queue length

Specify the maximum number of processes the CPU queue can contain before an event is raised. CPU queue length indicates how many processes are ready to run. The default is 2 processes.

Data Collection

Collect data for total system utilization?

Select Yes to collect data for charts and reports. If enabled, data collection returns the overall percentage of CPU time used. The default is unselected.

The detail data contains information about the percentage of CPU usage, threshold for percentage of CPU usage, and top N using processes of the system processor usage and their Window counter values.

Collect data for individual processor utilization?

Select Yes to collect data for charts and reports. If enabled, data collection returns the percentage of CPU time used for each processor in one datastream per processor. The default is unselected.

The detail data contains information about the percentage of CPU usage and the threshold for percentage of CPU usage.

Collect data for processor queue length?

Select Yes to collect data for charts and reports. If enabled, data collection returns the number of threads waiting to execute on all processors. The default is unselected.

The detail data contains information about processor queue length and the threshold for processor queue length.

4.4.4 Example of How this Script Is Used

This script monitors both the percentage of CPU used and processor queue length. By itself, high CPU usage might not indicate a problem. Instead, consider the following factors:

  • Queue length

  • How you are using the computers monitored

  • Your overall strategy for the environment

For example, in a transactional environment you can have a computer with CPU usage at 90% consistently. The computer has no room for growth, but if the queue length remains low and stable (never more than two or three threads waiting), the computer can be sized perfectly for maximum efficiency. If the queue length increases and threads are waiting, you may have a problem that needs to be addressed.

In a batch environment, however, you can set the script to run during off-peak hours when the batch jobs are not running. The script can raise an event if CPU usage is over 50% and any thread is waiting (queue length at 0) to ensure the computer has enough CPU headroom for batch jobs to run.

Other factors to consider are long range plans, such as the number of users you expect to support, how long you expect to support them, and how much room you need for growth. For example, you can set the CPU usage threshold lower to warn you to off-load some processing or order new systems.

Monitoring Multi-Processor Systems

On a multi-processor system, the total CPU utilization is the average percentage of time that all the processors on the system are busy executing non-idle threads. For example:

  • If all processors are always busy, this is 100%.

  • If all processors are 50% busy, this is 50%.

  • If 25% of the processors are busy, this is 25%.

Monitoring Overall or Individual CPU Load

Monitor load for each CPU individually to gain more specific information about what is really happening on a system. For example, if you monitor overall load and see CPU usage is 100%, you do not know as much about the resource usage as seeing that CPU 0 is running at 90% and CPU 1 is running at 10%.

Handling Spikes

Because CPU and queue length are often subject to temporary spikes, set a short interval (two to five minutes), but raise an event only after thresholds are exceeded in three consecutive periods.

Collecting Data for Trend Analysis

This script can be set to collect data to help you identify usage trends for your servers. For example, if CPU usage increases, you can plan for growth. To perform this type of analysis, run a second job that collects data at a less-frequent interval.