B.3 Understanding the Orchestration Server Events System

The the Orchestration Server Event System integrates with the Job Scheduler. Event notifications can start jobs and can also invoke Event handler methods in long-running jobs. In turn, a job can react to the Event by starting other server actions, by modifying object attributes, or by executing another external process.

For example, an Event notification can occur when a VM Host has exceeded its configured load limits. This Event can start a job that migrates VMs off of the loaded VM Host or VM Hosts.

The Orchestration Server supports two Event types:

This section includes the following information:

B.3.1 Event Notification

An Event notifies two other Cloud Manager Orchestration services, the Job Scheduler and the Job Broker. The Job Scheduler starts jobs that are awaiting an Event to trigger them. The Job Broker invokes a callback on any long-running job that has registered for notification of an Event.

See Section 13.0, The Orchestration Server Job Scheduler for more information about setting up a Job Schedule.

B.3.2 Built-in Events

Built-in events occur when a managed object comes online/offline or has a health status change.

The Orchestration Server uses the following built-in Events to keep managed objects synchronized.

Table B-1 The Orchestration Server Built-in Events

Event Name

Description

AGENT_VERSION_MISMATCH

Resource Agent version mismatch (agent needs upgrade)

REPOSITORY_HEALTH

Repository health status has changed

RESOURCE_HEALTH

Resource health status has changed

RESOURCE OFFLINE

Resource Agent has logged out of the server

RESOURCE_ONLINE

Resource Agent has logged in to the server

SERVER_UP

Server has fully started

USER_HEALTH

User health status has changed

USER_ONLINE

User has logged in to the server

VMHOST_ADDED

VM Host has been added

VMHOST_HEALTH

VM Host health status has changed

VMHOST_NOT_AVAILABLE

No VM Host is available

VMHOST_REMOVED

VM Host has been removed

For example, when a resource comes online (that is, the agent connects to the server), the RESOURCE_ONLINE Event is fired and both scheduled jobs with a trigger for that Event and long-running jobs with Event handlers are notified.

The RESOURCE_ONLINE built-in Event is used by the embedded discovery jobs, such as for discovering operating system and CPU information (osInfo and cpuInfo jobs). Both osInfo and cpuInfo job archives (.job) include a schedule file (.sched) specifying a trigger (.trig) that allows these jobs be started when notification of the RESOURCE_ONLINE Event occurs.

B.3.3 Rule-based Events

Rule-based Events are defined in an XML document. They are deployed to the Orchestration Server and managed through the Orchestration Console. Rules can be a simple object attribute (fact) equivalency check or they can use AND,OR, IF, ELSE logic, among other things, in an Event ruleset.

The rules follow the same syntax as the constraints that are defined in XML policy files for all Grid Objects, such as Jobs, VM Hosts, etc.

The Orchestration Server Event Service evaluates the rules; if the rules pass, an Event notification occurs.

The XML Schema document specification can be found in <install dir>/doc/xsds/event_1_0_0.xsd.

The Event XML specification is composed of three sections.

  • <context>

  • <trigger>

  • <reset>

NOTE:Both the <context> and <trigger> sections are required.

<context> section

The <context> section defines the context in which the Event rules are evaluated. With Events, you specify what objects are in the Event rule context in this section. The available objects are Job, Jobinstance, Resource, Repository, User, and VMHost. From these objects, you can specify one object set to iterate over and optionally a single instance of the object.

<trigger> section

The <trigger> section defines the rules for when an Event notification occurs. The <trigger> format is the same syntax as <constraints> used in policies.

<reset> section

The optional <reset> section defines the rules for when an Event is reset. If the <reset> rule is not used, an Event is reset based on a timeout. The <reset> format is also the same syntax as in <constraints> used in policies.

The resetInterval attribute is set on the <event> XML element. If "resetInterval" and <reset> are not used, the default timeout for resetting is 10 minutes.

The following example, taken from the "vmhost.event" in <install dir>/examples/events), defines that a notification occurs when a VM host becomes overloaded.

1<event>
2
3   <context>
4      <vmhost />
5   </context>
6
7   <trigger>
8      <gt fact="vmhost.vm.count" value="0" />
9      <gt fact="vmhost.resource.loadaverage" value="2" />
10   </trigger>
11
12   <reset>
13      <lt fact="vmhost.resource.loadaverage" value=".5" />
14   </reset>
15
16</event>

Lines 3-5: This section defines the context for the Event’s rule evaluation.

Line 4: The context specifies all VM host objects, so the Event Service iterates over all VM hosts. On each VM host, the <trigger> rule will be evaluated, so in this case, the Event context is composed of one or more VM hosts.

Lines 7-12: This section defines the Trigger rule to determine if this Event is to fire notifications or not. If the trigger rule does not pass, no Event notifications occur.

Line 8: Consider only VM hosts that have at least one VM instance running.

Line 9: Check the running average of the VM host’s load average if it exceeds a threshold value. In this case, run the check if the average is greater than 2.

Lines 12-14: This section defines the Reset rule to determine if a previously triggered VM host can be reset and triggered again.

Line 13: Only reset if the running average of the VM host’s load average drops below a threshold.

When a VM host passes the trigger rule, the VM host does not pass the trigger rule again until the reset rule (load average drops below threshold) passes.

See the repository.event example (<install dir>/examples/events/repository.event) for an Event with a rule that evaluates the freespace fact on all repository objects.