Unijob Alert Manager

On Windows and Linux, an Alert Manager for UniJob is included in UVMS.

Each Dollar Universe Application Server has its own alerting service (the ALERT engine). The section below therefore only applies to UniJob.

The UniJob Alert service applies Alerting Rules to filter local Events. Accumulated Events are uploaded to the UVMS database at every heartbeat cycle (default=60 seconds).

The UVMS Alert Manager provides notification whenever it receives Events concerning:

A service is started or stopped
A Job status change corresponds to an Alerting Rule

The Alert Manager is not automatically started at UVMS startup. This setting is modified in the Management Server Node Settings> Optional Services > Enable Alert management. Refer to the Univiewer User Guide. This modification is only taken into account when UVMS is restarted.

When the Alert Manager starts, it will then connect to all UniJob IO servers that UVMS has been programmed to supervise.

Only UniJob alerts are displayed in the Alert Dashboard detailed in the Univiewer User Guide. Events are filtered on each UniJob via Alerting Rules. Selected Events and the corresponding Alerts are stored in the UVMS database.

The UniJob Alert Service transmits heartbeat messages according to the “Expected UniJob Heartbeat Frequency” set in the Node List > Management Server Node Settings> Alerts Settings parameter. The default heartbeat cycle is 60 seconds.
If no heartbeat is received by UVMS for Heartbeat Cycle x 1,5 (default = 90 seconds) UVMS will consider the UniJob node unreachable.

Filtered UniJob events are accumulated and transmitted with the heartbeat message.

Alert Manager Performance Implications

For performance reasons the number of Alerts in the database should be less than 100 000. The database purge retention cycle determines the amount of data that is kept in the database.

The default retention cycle is 14 days (set in the Node List > Management Server Node Settings> Purge>Alert retention cycle parameter). The way Alerting Rules are configured will affect the number of Alerts that are generated. For example selecting each Job status change will generate far more Alerts than selecting Aborted completion status only.

To maintain database occupation at acceptable levels, the administrator can adjust the Purge Retention Cycle and the Alerting Rules. The formula shown below should help determine the required Retention Cycle:

Purge Retention Cycle= 100 000/Average Daily Alerts

Refer to section On-line Purge for details of purge configuration.

Number of Nodes	Retention Cycle	Total Alerts	Average Alerts per Node per Day
500	14	7143	14
500	7	14286	29
500	21	4762	10
250	14	7143	29
250	7	14286	57
250	21	4762	19
100	14	7143	71
100	7	14286	143
100	21	4762	48

The table above indicates the retention cycle necessary to maintain the database at less than, 100 000 alerts depending on the number of nodes and the average number of alerts per node. The number of alerts can be reduced by reducing the number and scope of the Alerting Rules.

As of version 6.10.41, documentation new updates are posted on the Broadcom Techdocs Portal.
Look for Dollar Universe.