Use this Knowledge Script to monitor total CPU usage and queue length to determine whether the CPU is overloaded. This script raises an event when CPU usage and CPU queue length values exceed the thresholds you set.
CPU folder or any individual CPU icon (for multiprocessor systems)
The default schedule for this script is Every 5 minutes.
Set the following parameters as needed:
Description |
How to Set It |
---|---|
Event Notification |
|
Event severity when job fails |
Set the event severity level, from 1 to 40, to indicate the importance of an event in which the CpuLoaded job fails. The default is 5 (red event indicator). |
Raise event if total system CPU exceeds threshold? |
Select Yes to raise an event if total system CPU usage exceeds the threshold you set. The default is Yes. This script raises an event when the following occur:
When an event is raised, the event detail will contain the, top N using processes of the system processor usage from their Window counter values. If you select this parameter AND the Use virtual machine performance counters if available? parameter, the job will retrieve the total system CPU usage metric from the VMWare virtual machine's total processor usage counter, and individual process detail for the top processor-consuming processes will be included in event and datastream detail. |
Event severity when total system CPU exceeds threshold |
Set the event severity level, from 1 to 40, to indicate the importance of an event in which system CPU usage exceeds the threshold. The default is 10 (red event indicator). |
Raise event if any individual CPU exceeds threshold? |
Select Yes to raise an event if CPU usage for any monitored server exceeds the usage threshold you set. The default is unselected. This script raises an event when the following occurs:
|
Event severity when individual CPU exceeds threshold |
Set the event severity level, from 1 to 40, to indicate the importance of an event in which individual CPU usage exceeds the threshold. The default is 10 (red event indicator). |
Monitoring |
|
Cap processor usage values at 100 percent? |
Select Yes to make 100 (percent) the maximum value that will be stored as the overall and individual process usage datastream, and it will be used as the value for the overall or individual processor usage when that value is found to be over 100 (percent) This typically occurs in the VWMare virtual processor overall usage counter value when you selected the Use virtual machine performance counters if available? parameter. This caps the value, which is useful when reporting on AppManager data with other Micro Focus reporting products. The default is unselected. |
Use virtual machine performance counters if available? |
Select Yes to monitor the total system CPU usage metric retrieved from the VMware performance counter. The default is Yes. If you select Yes for this parameter, and if on job iteration 1, the VMWare counters are not present, that information is stored and subsequent iterations not attempt to use them. The virtual machine performance counter always collects and creates data detail for the top N processes that are using the CPU. NOTE:There are no VMWare virtual machine performance counters for individual process usage, the operating system counters still retrieve those values. Important VMware allows virtual machines to report more than 100% of its CPU, so if you select Yes for this parameter, you might see CPU utilization data that is greater than 100%.
|
Use queue length in determining threshold crossings for events |
Select Yes for the queue length to be used to determine whether to raise an event for the total/system CPU usage by combining the overall processor usage value with the length of the processor queue. If the parameter is not selected, only the overall CPU usage threshold is used to raise events, if you selected the Raise event if total system CPU exceeds threshold? parameter. The default is Yes. |
Number of processes to include in detail when total CPU threshold crossed |
This value represents the number of processes that the % Processor Usage counter value will collect and include in event detail and datastream detail, if either of these parameters are respectively selected in the job configuration. To turn off inclusion of individual usage detail in events and datastreams, set this parameter to 0. |
Thresholds |
|
Threshold - Maximum total system CPU |
Specify the maximum total system CPU usage allowed before an event is raised. The default is 95%. |
Threshold - Maximum individual CPU |
Specify the maximum individual CPU usage allowed before an event is raised. The default is 98%. |
Threshold - Maximum processor queue length |
Specify the maximum number of processes the CPU queue can contain before an event is raised. CPU queue length indicates how many processes are ready to run. The default is 2 processes. |
Data Collection |
|
Collect data for total system utilization? |
Select Yes to collect data for charts and reports. If enabled, data collection returns the overall percentage of CPU time used. The default is unselected. The detail data contains information about the percentage of CPU usage, threshold for percentage of CPU usage, and top N using processes of the system processor usage and their Window counter values. |
Collect data for individual processor utilization? |
Select Yes to collect data for charts and reports. If enabled, data collection returns the percentage of CPU time used for each processor in one datastream per processor. The default is unselected. The detail data contains information about the percentage of CPU usage and the threshold for percentage of CPU usage. |
Collect data for processor queue length? |
Select Yes to collect data for charts and reports. If enabled, data collection returns the number of threads waiting to execute on all processors. The default is unselected. The detail data contains information about processor queue length and the threshold for processor queue length. |
This script monitors both the percentage of CPU used and processor queue length. By itself, high CPU usage might not indicate a problem. Instead, consider the following factors:
Queue length
How you are using the computers monitored
Your overall strategy for the environment
For example, in a transactional environment you can have a computer with CPU usage at 90% consistently. The computer has no room for growth, but if the queue length remains low and stable (never more than two or three threads waiting), the computer can be sized perfectly for maximum efficiency. If the queue length increases and threads are waiting, you may have a problem that needs to be addressed.
In a batch environment, however, you can set the script to run during off-peak hours when the batch jobs are not running. The script can raise an event if CPU usage is over 50% and any thread is waiting (queue length at 0) to ensure the computer has enough CPU headroom for batch jobs to run.
Other factors to consider are long range plans, such as the number of users you expect to support, how long you expect to support them, and how much room you need for growth. For example, you can set the CPU usage threshold lower to warn you to off-load some processing or order new systems.
On a multi-processor system, the total CPU utilization is the average percentage of time that all the processors on the system are busy executing non-idle threads. For example:
If all processors are always busy, this is 100%.
If all processors are 50% busy, this is 50%.
If 25% of the processors are busy, this is 25%.
Monitor load for each CPU individually to gain more specific information about what is really happening on a system. For example, if you monitor overall load and see CPU usage is 100%, you do not know as much about the resource usage as seeing that CPU 0 is running at 90% and CPU 1 is running at 10%.
Because CPU and queue length are often subject to temporary spikes, set a short interval (two to five minutes), but raise an event only after thresholds are exceeded in three consecutive periods.
This script can be set to collect data to help you identify usage trends for your servers. For example, if CPU usage increases, you can plan for growth. To perform this type of analysis, run a second job that collects data at a less-frequent interval.