6.4 Configuring Operations Center for Clustering

Before configuring Operations Center for a clustered environment, it is useful to understand Section 6.4.1, About ConfigStore and how Operations Center handles Section 6.4.2, Persistence and Synchronization in a Clustered Environment and what happens Section 6.4.3, When the Primary Server, the configStore Database, or the Data Warehouse is Unavailable.

The following configurations must be performed to configure Operations Center in a clustered environment:

After Operations Center is initially configured, determine how the servers are to run in the cluster (such as hot, warm, or cold). This determines how to administer the servers to keep the configurations up-to-date. For more information, see Section 6.2.2, Availability Levels on Servers.

6.4.1 About ConfigStore

The configStore database records various configuration settings, including:

  • All configuration changes made under the Administration hierarchy, including user/user group profiles, algorithms, automations, adapter profiles and properties, behavior models, custom properties, right-click operations, and scripts

  • Any service model objects created under the Service Models hierarchy, including SCM definitions, SVG drawings, and BSLM definitions

  • Session properties, including user preferences and last known location for users

While subdirectories in the /database directory are synchronized, files in the /database/shadowed directory are instrumental in communicating changes to the configStore. While it is not recommended to manually edit these files, there are some instances where this might be necessary.

When operating Operations Center in a clustered environment, all servers share the same configStore. When a file is updated in the /database/shadowed directory, either manually or through the Operations Center console, the configStore pulls in any changes. All other servers in the cluster are notified that a change has occurred, and the servers synchronize their /database/shadowed files.

Files saved and synchronized by the configStore include the following:

  • Adapters.ini
  • Consoles.ini
  • Elements.ini
  • Operations.ini
  • Algorithms.xml
  • bindings.xml
  • containers.xml
  • Databases.xml
  • jobConfiguration.xml
  • performanceConfiguration.xml
  • ScriptHierarchy.xml
  • ScriptStyle.xsl
  • ScriptTesting.xml
  • standard.xml
  • timeManagementConfig.xml
  • CICAccountMap.properties
  • CPAccountMap.properties
  • IconAliases.properties
  • NetcoolAccountMap.properties
  • SmartsAlarmFilter.properties
  • SmartsRelationFilter.properties
  • SmartsSecurityFilter.properties
  • BDIConfig.rdfs
  • BDIConfigResources.rdfs
  • elementmail.template
  • element_and_alarm_mail.template
  • AclManager_2.0.dtd
  • Algorithms.dtd
  • hierarchy_2.0.dtd
  • msmomConfiguration_1.0.dtd
  • ServiceManagerConfiguration_1.0.dtd
  • ServiceCenterConfiguration_1.0.dtd
  • views_1.0.dtd
  • AlarmForm.form
  • AlarmNotifyForm.form
  • AlarmForm.java
  • AlarmNotifyForm.java
  • bemHierarchy.xml
  • BladeLogicOMHierarchy.xml
  • BMCRemedyHierarchy.xml
  • CICHierarchy.xml
  • CorbaNSHierarchy.xml
  • CPHierarchy.xml
  • DefaultNNMHierarchy.xml
  • DefaultOVOHierarchy.xml
  • DefaultPatrolHierarchy.xml
  • DefaultSnmpcHierarchy.xml
  • EveHierarchy.xml
  • EveTesting.xml
  • FoundationHierarchy.xml
  • FXAdapterHierarchy.xml
  • Mercury Application MappingHierarchy.xml
  • MCHierarchy.xml
  • MOSServiceCatalogHierarchy.xml
  • msmom.xml
  • msmomConfiguration.xml
  • NetcoolHierarchy.xml
  • PlateSpin ReconHierarchy.xml
  • RemedyHierarchy.xml
  • ServiceCenterConfiguration.xml
  • ServiceCenterHierarchy.xml
  • ServiceCenterRel61Configuration.xml
  • ServiceCenterRel62Configuration.xml
  • ServiceManagerRel62Configuration.xml
  • ServiceManagerConfiguration.xml
  • ServiceManagerHierarchy.xml
  • ServiceManagerRel61Configuration.xml
  • SolarWinds OrionHierarchy.xml
  • Symantec ClarityHierarchy.xml
  • TADDMHierarchy.xml
  • TecTesting.xml
  • TecHierarchy.xml
  • TecStyle.xsl
  • TecTerse.xml

6.4.2 Persistence and Synchronization in a Clustered Environment

When operating a clustered environment, the base server configuration is still unique for each Operations Center server. However, Operations Center configStore database is instrumental in persisting and synchronizing various configurations and stored data. Table 6-2 describes the configurations.

Table 6-2 Configurations and stored data persisted and synchronized in Operations Center

Area

Shared/Synchronized

Not Shared/Synchronized

Operations Center/Server Installation and Configuration

  • Files in the /database/shadowed directory, including Operations.ini, Adapters.ini, Algorithms.xml, Consoles.ini, and hierarchy XML files

  • Subdirectories in the /database directory, including scripts

  • License keys

  • JVM parameters

  • Files from the /config directory

Client Sessions

  • Session properties, including last known location and user preferences

 

Administration Configuration

All configuration changes made under the Administration hierarchy

  • Adapters and adapter properties

  • Algorithms

  • Automations

  • Behavior Models

  • Custom properties and classes

  • Right-click operations

  • Scripts

  • User/Group profile changes (such as password changes)

  • Enable/disable jobs

  • Management actions, such as start/stop adapter (enable/disable jobs is an exception to this).

Adapter Elements

  • Custom algorithms defined on adapter elements are not propagated to clustered servers.

Service Models

  • Any service model objects created under the Services hierarchy

  • Custom algorithms

  • SCM definitions

  • SVG drawings

  • BSLM definitions

  • Service Model updates

 

Also note the following:

  • Real-time alarms and elements from adapters are not stored in the configStore

  • State is propagated based on real-time alarms and/or algorithms

  • State is not represented based on content of the configStore

6.4.3 When the Primary Server, the configStore Database, or the Data Warehouse is Unavailable

In the event of an outage on the primary server, one of the other servers takes over the writing role to the data warehouse. All readers cache a small window of data in case it must take over as a writer and Operations Center continues to function – new users can log in and users can perform actions on alarms, create and/or modify Service Views, and so on.

When the backend database or configStore is not available, Operations Center continues to operate and users can remain in the system. However, specific capabilities or services within Operations Center might not be available to users.

When the backend database is not available, data stored in the Service Warehouse (such as historical alarms, performance data, persisted metrics, and so on) is not available.

Table 6-3 specifies if actions are available or not when the warehouse, configStore, or both are offline.

Table 6-3 Availability of Actions When Warehouse, configStore, or Both are Offline

Action

Warehouse

configStore

Both

Create/Edit Service Models

Yes

No

No

Add/Update Views

Yes

No

No

View Alarm History

No

Yes

No

View Perf Charts

No

Yes

No

View SLA

No

Yes

No

Actions on Alarms (Ack, Close)

Yes

Yes

Yes

Modify Alarm Filters

Yes

Yes

Yes

Login

Yes

No

No

Stay logged In

Yes

Yes

Yes

Add/Update User Profiles

Yes

No

No

Auto-reconnect remembers your last known location

Yes

No

No

Be aware that when the configStore is down, users can encounter many error messages while functioning in this mode.

6.4.4 Requirements for Clustering

Before starting the configurations to run Operations Center servers in a clustered environment, note the following requirements:

  • When creating a new cluster, all of the initial setup of ConfigStore and the Service Warehouse must be performed while only a single node is running.

  • If you need to perform a Database Initialize operation, then only one member of the cluster should be running.

  • Start one member of the cluster before the other nodes in the cluster and allow it to start initialization. We recommend giving the first node a 1 minute head start to allow it to become the cluster coordinator so the other nodes then wait for it to complete initialization.

  • All cluster nodes must have Java JREs installed from the same Java vendor. Having the cluster nodes using Java JREs from different vendors generates serialization errors.

6.4.5 Configuring the Operations Center Server, Database, and Data Warehouse

To configure Operations Center servers in a clustered environment, the following configurations are required:

  • License keys for Operations Center are required to be locked to all physical and logical TCP/IP addresses in the cluster

  • Configuration storage settings for each Operations Center server must be identical

  • If using a Service Warehouse, one server must be specified as the primary database writer (or cluster coordinator), that aggregates all updates from clustered servers and updates the database

For installation in a clustered environment, configure Configuration Storage settings after installation of Operations Center is complete and not during installation.

To define Configuration Storage and BSW Writer settings for clustered servers, do the following for each server:

  1. Stop the Operations Center software.

  2. Open the Configuration Manager.

    The Configuration launches with the Server section of the Components tab open.

  3. Specify the virtually known cluster name in the Host Name field and the actual IP address of the server in the IP Address field.

    Perform this step for all servers in the cluster.

  4. To define configuration storage, do the following on one server only:

    1. Click NOC Server to open the Configure NOC Server panel.

    2. For Configuration Storage Create, click Create to open the Create Configuration Storage dialog box.

    3. From the Cluster drop-down list, select one of the following options:

      None: Select only when not part of a cluster.

      IP multicast (dynamic discovery using UDP): Allows the server to be dynamically discovered and added to the cluster when server has the same settings. Then, you can specify the Multicast Address that is the IP address for communication and the Multicast Port used for communication. Note that some networks do not allow UDP discovery. Talk to your Network Administrator to see or verify the servers are on the same subnet.

      Fixed members using TCP/IP: Sets the server as a dedicated member of the cluster. Then, specify the TCP/IP Start Port for communication and the IP address for servers that are cluster members. List the IP Address for the cluster coordinator as the first cluster member.-

    4. Click OK.

  5. If using a Service Warehouse, specify a primary database writer (or cluster coordinator by doing the following:

    1. Click Database on the Components tab to open the Database tab.

    2. Select one of the following Primary Warehouse Writer options to determine the server to function as the primary writer to the Data Warehouse:

      Memory-Queued: Sets the server as the primary writer. All other servers must be set as Disabled.

      Disk-Queued: Causes all servers to automatically select the “oldest” server as the primary writer.

      Disabled: Select this when another server in the cluster is set to Memory-Queued or the server is to not be eligible for selection in a Disk-Queued cluster:

      Strategy

      Set to…

      Description

      Optimize performance; select one server as the primary server.

      Memory-Queued

      Set this value for one server that is the primary server writing to the Service Warehouse (BSW).

      This server performs all bulk alarm history and Service Level Agreement (SLA) profile data collection. Performance is optimized, but delivery is not guaranteed. Set all other servers to disabled.

      Fault tolerance operation; float the role of primary server between multiple servers.

      Disk-Queued

      Set this value for all servers to have the role of primary server float among the cluster servers.

      The server is selected by the “oldest member” heuristic to become the equivalent of a cluster coordinator. When the oldest member leaves, the next oldest becomes the new coordinator and the writer to the BSW. When a server comes back online, it becomes the “newest” member.

      In the event that the "writer" cluster node is taken offline or fails, the other nodes queue 10 minutes (configurable) of rewind data to be pushed to the repository. The rewind buffer is consulted to “catch up” to the point where of the last performed write operation. BSW writes then occur normally.

    3. Click Apply.

  6. Click Close to close the Configuration Manager.

  7. Open the config/Formula.custom.properties file (or create one if you do not already have it) to set the hostname resolution (not numeric or IP‑based) and add or edit to have the following settings:

    ooc.iiop.numeric=false
    ooc.fssl.numeric=false 
    ooc.iiop.host=OperationsCenterServerVirtualHostName 
    ooc.fssl.host=OperationsCenterServerHostName
    
  8. Start the Operations Center server.

  9. Do the following as required to set up other servers in the same cluster:

    1. Copy the /OperationsCenter_install_path/configstore/configurations.xml file from the server you configured above to all other servers in the cluster.

      All servers in the cluster must have the same settings for Configuration Storage and this applies configuration storage settings made in Step 4 to the other servers.

      IMPORTANT:All servers in a cluster must have the same settings for Configuration Storage.

    2. Stop the Operations Center server.

    3. Do Step 3 through Step 7 to add servers to the same cluster.

    4. Perform Step 5 on all servers in the same cluster to indicate how they are to participate (or not) in writing BSW information to the data warehouse.

    5. Restart the Operations Center server.

6.4.6 Additional Considerations and Configurations

The following products do not work in a clustered environment:

  • Operations Center Experience Manager

  • Operations Center Script Adapter

  • BMC BEM Adapter

The following products work in a clustered environment with the indicated considerations or customizations:

  • SCM: In order to utilize SCM within an HA environment, you cannot use “schedules” defined within the actual SCM definition. It is recommended that you define schedules for SCM jobs via the Operations Center job scheduler. This ensures the SCM definitions are only run on one of the clustered servers in the cluster and only run once.

    The top of the Java script must check to see if the Job is being started on the Primary server by calling. The following method returns a Boolean True or False:

    server.isCoordinator()
    
  • Operations Center Event Manager: While the Operations Center Event Manager can work in a clustered environment, there are certain actions that are not synchronized or shared. In other words, if an alarm is closed one server, it is not closed on the other server.

  • Operations Center F/X Adapter: While the F/X adapter can work in a clustered environment, there are certain actions that are not synchronized or shared. In other words, if an alarm is closed one server, it is not closed on the other server.

6.4.7 Configuring Operations Center Clients

When Operations Center is configured in a Clustering mode, users utilize a generic network address (typically the Load Balancer) to log in. The Load Balancer then redirects the user to the appropriate server. This process should be invisible to the user. In the event of a server outage, the Load Balancer detects the outage, users are swung to another Operations Center Server and can continue working.

IMPORTANT:In a clustering environment, the Java security file for the Java Runtime Environment (JRE) for each client machine must be updated.

To configure a Operations Center client to access Operations Center in a clustered environment:

  1. Navigate to the Java Web Start installation path, usually located at:

    C:/Program Files/Java/jre1.x.x_**
    
  2. Change the directory to C:/Program Files/Java/jre1.x.x_**/lib/security.

  3. Edit the java.security file as follows:

    1. Locate the line:

      #networkaddress.cache.ttl=-1

    2. Uncomment the line, then change the value as follows:

      networkaddress.cache.ttl=0

  4. Launch the Java Control Panel and clear the Java cache.

    For instructions, see Java’s Web page on clearing the Java cache.

  5. Connect to Operations Center.