5.1 Types of Data

Sentinel receives two separate but similar data streams from the Collector Managers: the event data and the raw data. The data is moved from the local, compressed, file-based storage to a user-configured, compressed networked storage location on a regular basis.

5.1.1 Raw Data

The raw data files are unprocessed events that are received by the Connector and sent directly to the Sentinel message bus.This data is written to the Sentinel server. When the event is sent to the message bus, the following additional information is also sent without altering the original event:

  • SHA-256 hash of the event

  • Chaining indicator (which is reset to 0 whenever the Sentinel event source is restarted)

  • Raw Data ID (in s_RV25)

  • Event source, Connector, Collector, and Collector Manager node IDs

All raw data is sent to Sentinel without filtering. Because the raw data is not searched or used to generate reports, the data is not indexed.

Raw Data Storage

In Sentinel, raw data is always stored. Raw data is stored in partitions that are based on the time and the event source. Raw data partitions are individual files. They are created every hour, and are closed within 10 minutes after the elapsed time. Older, inactive partitions are compressed.

The raw data files are stored in one of the following locations:

  • Local storage location: <Sentinel data directory>/rawdata/online

  • Networked storage location: <Sentinel archive directory>/rawdata_archive

When a raw data file is closed, it is renamed to identify the closed files. Files in the open state have a .open extension. When they are closed, they are renamed with a .log extension. At the configured interval, after they are closed, they are compressed and given a .zip extension. The compressed raw data files are moved from the local storage to the networked storage location.

The following table describes the directory structure of the raw data in the local storage under the installation directory:

Table 5-1 Raw Data Directory Structure

Directory Structure

Description

/data

The primary directory for all data storage.

/data/rawdata

The subdirectory where all raw data is stored.

/data/rawdata/online

The directory where all the raw data in the local storage is stored.

/data/rawdata/EventSource UUID

There is one subdirectory for each event source under the online subdirectory. That subdirectory contains all raw data received from that event source.

The subdirectory name is the universally unique identifier (UUID) of the event source (for example, E20D0840-1E0A-102C-9F30-000C2949BA91).

/data/rawdata/EventSource UUID/Month

Data in the event source subdirectory is partitioned by month. Each month has its own subdirectory.

The subdirectory name is in the yyyy-mm format. For example, 2009-05 indicates May 2009.

/data/rawdata/EventSource UUID/Month/1 Hour Data Files

Each file in the Month directory contains data received during a specific one-hour period. Most data in the file has a time stamp that is within the one-hour period.

The name of the file indicates the day of the month and the one-hour period that is represented.

The filename format is dd-hhmm.extension.

dd is the day of the month.

hh is the hour of the day.

mm is the minute of the hour.

The extension is either .open or .log or .zip (compressed).

For example:

A filename of 08-1300.open indicates that the file contains uncompressed data received on the 8th day of the month between 01.00 p.m. and 02.00 p.m.

A filename of 08-0900.log indicates that the file contains uncompressed data received on the 8th day of the month between 09.00 a.m. and 10.00 a.m. The file is closed, but not yet compressed.

A filename of 08-0000.zip indicates that the file contains compressed data received on the 8th day of the month between 12.00 a.m. and 01.00 a.m.

If the raw data files are stored in the local storage location, the full path name of the file is in the following format:

<Sentinel data directory>/rawdata/online/<event source UUID>/<Date>/<RawDataFile>

For example:

/var/opt/novell/sentinel/data/rawdata/online/A75CF6A0-4948-102D-A615-000C29A9C3DB/2010-05/24-0600.zip

In this example, /var/opt/novell/sentinel/data is the data directory for Sentinel.

If the raw data files are stored in the networked storage location, the full path name would be as follows:

<Sentinel archive directory>/rawdata_archive/<event source UUID>/<Date>/<RawDataFile>

For example:

/sentinel_archive_data/rawdata_archive/A75CF6A0-4948-102D-A615-000C29A9C3DB/2010-05/24-0600.zip

In this example, /sentinel_archive_data is the networked storage directory configured by the user.

Raw Data Representation

Each raw data event is represented as a single line in a raw data file. Each line is a JSON object with the following format:

{ 
   "EventDate": "<date>", 
   "EventRecordID:" "<event record uuid>", 
   "RawData": "<raw data>", 
   "RawDataHash": "<SHA256 hash of raw data, in hex format>", 
   "EventSourceManagerID", "<uuid of event source manager>", 
   "CollectorID", "<uuid of collector>", 
   "EventSourceID:", "<uuid of event source>", 
   "ChainID", "<chain ID>", 
   "ChainSequence", "<Sequence number>" 
}

The following table describes each of the fields in the raw data event:

Table 5-2 Raw Data Representation

Field Name

Description

EventDate

The date and time when Sentinel received this event and not the date and time when the event occurred.

Example: “05/07/2009 05:23.790”

EventRecordID

The unique ID identifying the raw data record.

Example: "595829C0-1C8F-102C-A922-000C2949BA91"

If an event was generated as a result of parsing a raw data record, this ID is set in the event RecordID field. Because of filtering, not all raw data records result in an event.

RawData

The original raw data received by the event source.

RawDataHash

The SHA-256 hash of the RawData value represented as a HEX string. The hash is calculated by converting the RawData value to a UTF-8 string and then performing the hash over that string.

To detect tampering, each raw data event is stored with a SHA-256 hash value.

Example: cc661009e2f3dc565c0c7fe25b705219004dcd8132c0b0a7e987bfdcb55e49cf

EventSourceID

The UUID of the event source from which the raw data originated.

Example: A2A0C600-1C6C-102C-A781-000C2949BA91

EventSourceGroupID

The UUID of the event source group (Connector) to which the event source was connected when the raw data was received.

Example: A2A0C600-1C6C-102C-A77A-000C2949BA91

Different raw events from the same event source can have different event source group IDs, because event sources can be moved from one Connector to another.

CollectorID

The UUID of the Collector that the Connector and event source were connected to when the raw data was received.

Different raw events from the same event source can have different Collector IDs, because event sources and event source groups can be moved from one Collector to another.

Example: A2A0C600-1C6C-102C-A779-000C2949BA91

EventSourceManagerID

The UUID of the Event Source Manager (Collector Manager) object where this raw data was received.

Example: C76D2820-C395-1029-BB86-001321B5C0B3

ChainID

A random number that identifies a raw data chain. Whenever an event source is stopped and restarted between generation of raw data events, a new ChainID number is generated.

To detect tampering, each raw data event is stored with a ChainID and a ChainSequence number.

Example: 1241630654754

ChainSequence

A sequence number within a particular raw data chain.

The raw data events in a given raw data chain must have an uninterrupted sequence of numbers starting with 0. In addition, all raw data events in a given raw data chain must appear sequentially in the files, with no other chains intermixed. If a raw data chain can span files, the sequence should continue uninterrupted into the file that represents every hour during which raw data was received.

Example: 4

If no raw data is received for the one-hour period, the file records only from the next arrival of raw data. Nonetheless, the raw data chain sequence should continue uninterrupted until a new raw data chain begins. A new raw data chain is signaled by a changed ChainID value, and a ChainSequence value of zero (0).

The following examples show three raw data records:


{ 
   "EventDate":"05\/24\/2010 06:15:06.676", 
   "EventRecordID":"A75CF6A0-4948-102D-A61C-000C29A9C3DB", 
   "RawData":"Sep 22 10:22:00 testhost Message #100", 
 "RawDataHash":"7003c0e0be4ddf43a3b49026a37483f59c7f839950f581ec9fde5dea43da90f5", 
   "EventSourceManagerID":"C76D2820-C395-1029-BB86-001321B5C0B3", 
   "CollectorID":"A75CF6A0-4948-102D-A613-000C29A9C3DB", 
   "EventSourceGroupID":"A75CF6A0-4948-102D-A614-000C29A9C3DB", 
   "EventSourceID":"A75CF6A0-4948-102D-A615-000C29A9C3DB", 
   "ChainID":"1274696106664", 
   "ChainSequence":"0" 
} 
{ 
   "EventDate":"05\/24\/2010 06:15:07.358", 
   "EventRecordID":"A75CF6A0-4948-102D-A624-000C29A9C3DB", 
   "RawData":"Sep 22 10:22:00 testhost Message #99", 
 "RawDataHash":"f5681ba965144d2d22b13188767d94540b5fe57904afcee5821854bde2afca72", 
   "EventSourceManagerID":"C76D2820-C395-1029-BB86-001321B5C0B3", 
   "CollectorID":"A75CF6A0-4948-102D-A613-000C29A9C3DB", 
   "EventSourceGroupID":"A75CF6A0-4948-102D-A614-000C29A9C3DB", 
   "EventSourceID":"A75CF6A0-4948-102D-A615-000C29A9C3DB", 
   "ChainID":"1274696106664", 
   "ChainSequence":"1" 
} 
{ 
   "EventDate":"05\/24\/2010 06:15:07.988", 
   "EventRecordID":"A75CF6A0-4948-102D-A62A-000C29A9C3DB", 
   "RawData":"Sep 22 10:22:00 testhost Message #98", 
"RawDataHash":"98435b5dba95633699b88d07782109876e8ceb4169d567602f2c92657118645d", 
  "EventSourceManagerID":"C76D2820-C395-1029-BB86-001321B5C0B3", 
   "CollectorID":"A75CF6A0-4948-102D-A613-000C29A9C3DB", 
   "EventSourceGroupID":"A75CF6A0-4948-102D-A614-000C29A9C3DB", 
   "EventSourceID":"A75CF6A0-4948-102D-A615-000C29A9C3DB", 
   "ChainID":"1274696106664", 
   "ChainSequence":"2" 
} 

Disabling Raw Data Collection

By default, raw data collection is enabled on the Sentinel server. Collecting raw data can impact the performance of the server. Perform the following procedure on each Collection Manager where you want to disable raw data collection:

  1. Open the /etc/opt/novell/sentinel/config/event-router.properties file in a text editor.

    This is the default location of the file.

  2. Change esecurity.router.event.rawdata.send=true to esecurity.router.event.rawdata.send=false.

  3. Save the file, then restart the Collector Manager.

5.1.2 Event Data

Event data is created as a result of a Collector parsing and normalizing raw data. For more information about event processing and parsing, see Section 6.0, Configuring Data Collection. Raw data is subject to filtering rules set up on the event source, Connector, and Collector, so event data can be dropped, if necessary. Event data is dropped when you define the event routing rules. For more information, see Section 8.0, Configuring Event Routing Rules.

The event data partitions are closed after two days, and no more events are written to them. Even though the duration of the partition is only for one day, partitions are closed after two days to accommodate events arriving late. After the partitions are closed, a compressed copy of the partition is moved to networked storage. When the local storage reaches its maximum usage, the copy in the local storage is deleted and the copy in the network storage remains.

When the local copy remains, the search engine uses it for higher performance instead of using the copy on the network storage. Only when the local copy is deleted does the system then use the copy on the network storage.

Local storage partitions are stored in the /var/opt/novell/sentinel/data/eventdata directory, which is on the local file system. Partitions are created based on the dates and retention policies.

A central partition index is maintained in the database that keeps track of all the existing partitions and their location.

The following table describes the directory structure under the installation directory where event data is stored:

Table 5-3 Event Data Directory Structure

Directory Structure

Description

/data

The primary directory for all data storage.

/data/eventdata

The subdirectory where all event data is stored.

/data/eventdata/YYYYMMDD_<classid>

A partition consists of the events for a single day (midnight-midnight UTC) within a given data retention class and is held within a subdirectory named YYYYMMDD_<class-id>.

YYYYMMDD: is the UTC date stamp.

<class_id>: is a UUID identifier associated with the data retention class.

/data/eventdata/YYYYMMDD_<class_id>/events.evt

events.evt contains the binary event data for the partition. The format of the binary event data is stored as a Reliable Persistent Random Access Compressed Stream.

/data/eventdata/YYYYMMDD_<class_id>/index

The index directory contains the Lucene index for the partition.