6.2 Setting Up Metric Alarms

Metric alarms are issued for an element to provide information about changes to an agreement or any objectives, including changes that can affect health, such as element state change or a parameter value change.

6.2.1 Understanding Metric Alarms

Service level metric alarms are unique in that they are never saved for historical purposes, nor persisted when the Operations Center server is restarted. Metric alarms are always real time and there is a limit to the number of alarms that can be issued per element, per objective, and per agreement.

This limit is set in the settings of the BSW. The default is 50 but it is advisable to set a lower number (such as 5 or 10). If there is one agreement with two objectives applied to five elements, using the default value of 50 no more than 500 metric alarms are held in memory. Older metric alarms are removed from memory when the measured interval ends or a condition change occurs producing a new metric alarm.

Metric alarms can be viewed in the Operations Center console in the Alarms view by selecting Service Level Metrics as the alarm type. They are also available in the Alarms portlet in the Operations Center dashboard.

Metric alarms can show the following information and changes:

Severity
Element
Date/Time
ID
Transition: Previous state and the current state
Compliance: Agreement health score, which indicates how well the element is doing in respect to the associated objective
Grade: Letter grade based on the compliance score; this letter grade is mapped to the agreement health score
Agreement
Objective
Applied agreement: Element path based on the point at which the agreement was evaluated to issue the metric alarm
Violation
Metric key: Current measurement based on objective type
Uptime: Amount of time available in the current interval for the calendar applied to the SLA
Downtime: Amount of time unavailable in the current interval for the calendar applied to the SLA
Element downtime
Unknown
Predict warning: Predicted amount of time until a breach warning occurs based on the prediction ratio (if performance continues at the same level)
Predict violation: Predicted amount of time until a breach or violation occurs based on the prediction ratio (if performance continues at the same level)
Worst warning: Predict time of warning if failure occurred now
Worst violation: Predicted time of violation if failure occurred now
Time included: Total amount of time included in the interval period
Period start: Date and time when the interval period started (sets the interval for uptime and downtime)
Period end: Date and time when the interval period ended (sets the interval for uptime and downtime)
Prediction ratio: Ratio measurement based on amount of time in compliance vs. amount of time element was noncompliant
Points captured: Number of data points measured and stored
Low: Minimum constraint if set by the objective
High: Maximum constraint if set by the objective
Reason: Description of why the metric was recorded

The only actions to perform on service level metric alarms are to save them to a file or add a comment in the Alarms portlet in the Operations Center dashboard or in the Operations Center console Alarms view by right-clicking the alarm. Note, that as metric alarms are not persisted after a server restart, nor are comments on metric alarms.

6.2.2 Setting the Service Level Metrics Alarm Limit

To set the alarm limit for metric alarms:

In the Operations Center console, open Enterprise > Administration.
Right-click Data Warehouse, then select Edit Data Warehouse Settings.
Click the Service Level Settings tab.
Under Real-Time Alarms, select a number for Maximum Metric Alarms.