3.1 Data Storage Overview

Sentinel Log Manager receives two separate, but similar data streams from the collector managers: the event data and the raw data. Both types of data on Sentinel Log Manager are moved from the online, compressed, file-based storage to a user-configured, compressed archive storage location on a regular basis.

Data files are deleted from the local and archive storage locations on a configured schedule. Raw data retention is governed by a single raw data retention policy. Event data retention is governed by a set of event data retention policies. All these policies are configured by the Sentinel Log Manager administrator.

3.1.1 Raw Data

Raw data are the unprocessed events that are received by the connector and sent directly to the Sentinel Log Manager message bus, and then written to the Sentinel Log Manager server. The original event is not altered, but the following additional information are also sent to the message bus with each event:

  • SHA-256 hash of the event

  • Chaining indicator (which is reset to 0 whenever the Sentinel Log Manager event source is restarted)

All raw data are sent to the Sentinel Log Manager; there is no filtering on raw data.

The time-based raw data files are closed (changed to read-only) after a duration and no more events are written to them. After these files are closed, they are compressed and archived to the configured location.

Raw Data Storage

In Sentinel Log Manager, raw data is always stored. Raw data partitions are individual files. They are created every hour, and are closed within 10 minutes after the elapsed time. When a raw data file is closed, it is renamed to identify the closed files. Files in the open state have a .open extension. When they are closed, they will be renamed to have a .log extension. Sometime after they are closed, they will be compressed and will then have a .zip extension. After being compressed, they are moved to archive storage and are no longer present in the local storage.

The following table describes the directory structure of the online raw data under the installation directory:

Table 3-1 Raw Data Directory Structure

Directory structure

Description

/data

The primary directory for all data storage.

/data/rawdata

The sub directory where all raw data is stored.

/data/rawdata/online

The directory where all the online raw data is stored.

/data/rawdata/EventSource UUID

The sub directory name is the universally unique identifier (UUID) of the event source (for example, E20D0840-1E0A-102C-9F30-000C2949BA91).

There is one subdirectory for each event source under the online subdirectory. That subdirectory contains all raw data received from that event source.

/data/rawdata/EventSource UUID/Month

The subdirectory name is in the yyyy-mm format (for example: 2009-05 is May of 2009).

Data in the event source subdirectory is partitioned by month. Each month has its own subdirectory.

/data/rawdata/EventSource UUID/Month/1 Hour Data Files

Each file in the Month directory contains data received during a specific one-hour period. Most data in the file have a time stamp that are within the one-hour period.

The name of the file indicates the day of the month and the one-hour period that is represented.

The filename format is dd-hhmm.extension.

Where:

dd is the day of the month.

hh is the hour of the day.

mm is the minute of the hour.

extension is either open or log or zip (compressed).

For example:

A name with the extension 08-1300.open indicates that the file contains uncompressed data received on the 8th day of the month between 01.00 p.m. and 02.00 p.m.

A name with the extension 08-0900.log indicates that the file contains uncompressed data received on the 8th day of the month between 09.00 a.m. and 10.00 a.m., and the file is closed, but not yet compressed.

A name with the extension 08-0000.zip indicates that the file contains compressed data received on the 8th day of the month between 12:00 a.m. and 01:00 a.m.

The following examples show filenames as they might appear relative to the installation directory:

  • data/rawdata/online/E20D0840-1E0A-102C-9F30-000C2949BA91/2009-05/08-0000.zip: Compressed raw data received on May 8, 2009 between 12:00 a.m. and 01:00 a.m.

  • data/rawdata/online/E20D0840-1E0A-102C-9F30-000C2949BA91/2009-05/08-0100.open: Uncompressed raw data received on May 8, 2009 in every hour.

Raw Data Representation

Each raw data event is represented as a single line in a raw data file. Each line is a JSON object that has the following fields:

Table 3-2 Raw Data Representation

Field Name

Description

EventDate

This is the date and time when the Sentinel Log Manager received this event and not the date and time when the event has occurred.

Example: “05/07/2009 05:23.790”

EventRecordID

The record ID of the corresponding event record in the event store.

NOTE:If no event record was ever created (because of filtering) this record ID might not point to anything.

Example: "595829C0-1C8F-102C-A922-000C2949BA91"

RawData

The original raw data received by the event source.

RawDataHash

The SHA256 hash of the RawData value represented as a HEX string. The hash is calculated by converting the RawData value to a UTF-8 string and then performing the hash over that string.

To detect tampering, each raw data event is stored with a SHA256 hash value.

Example: cc661009e2f3dc565c0c7fe25b705219004dcd8132c0b0a7e987bfdcb55e49cf

EventSourceID

The UUID of the event source the raw data came from.

Example: A2A0C600-1C6C-102C-A781-000C2949BA91

EventSourceGroupID

The UUID of the event source group (Connector) to which the event source was connected when the raw data was received.

Example: A2A0C600-1C6C-102C-A77A-000C2949BA91

NOTE:Different raw events from the same event source can have different event source group IDs, because event sources can be moved from one connector to other.

CollectorID

The UUID of the Collector that the Connector and event source were connected to when the raw data was received.

NOTE:Different raw events from the same event source can have different Collector IDs, because event sources and event source groups can be moved from one collector to another.

Example: A2A0C600-1C6C-102C-A779-000C2949BA91

EventSourceManagerID

The UUID of the Event Source Manager object where this raw data was received.

Example: C76D2820-C395-1029-BB86-001321B5C0B3

ChainID

A random number that identifies a raw data chain. Whenever an event source is stopped and restarted between generation of raw data events, a new chain ID number is generated.

To detect tampering, each raw data event is stored with a Chain ID and a Chain Sequence number.

Example: 1241630654754

ChainSequence

A sequence number within a particular raw data chain.

The raw data events in a given raw data chain must have an uninterrupted sequence of numbers starting with 0. In addition, all raw data events in a given raw data chain must appear sequentially in the files, with no other chains intermixed. If a raw data chain can span files, the sequence should continue uninterrupted into the file that represents every hour during which raw data was received.

Example: 4

NOTE:If no raw data is received for the one hour period the file would record only from the next arrival of raw data. Nonetheless, the raw data chain sequence should continue uninterrupted across until a new raw data chain begins. A new raw data chain is signaled by a changed ChainID value, and a ChainSequence value of zero (0).

3.1.2 Event Data

Event data is processed by the collector running on the collector manager. For more information about event processing and parsing, see Section 4.0, Configuring Data Collection. Event data are subject to filtering rules set up on the event source, connector, and collector, so event data may be dropped, if required.

The event data partitions are closed after two days, and no more events are written to them. Even though the duration of the partition is only for one day, partitions are closed after two days to accommodate events arriving at the last moment. After the partitions are closed, they are compressed and archived.

Online partitions are stored in the install_directory/data/eventdata directory, which is on the local file system. Partitions are created based on the dates and retention policies.

A central partition index is maintained in the database that keeps track of all the existing partitions and their location.

The following table describes the directory structure under the installation directory where event data is stored:

Table 3-3 Event Data Directory Structure

Directory structure

Description

/data

The primary directory for all data storage.

/data/eventdata

The sub directory where all event data is stored.

/data/eventdata/YYYYMMDD_<classid>

A partition consists of the events for a single day (midnight-midnight UTC) within a given data retention class and is held within a sub-directory named YYYYMMDD_<class-id>.

Where,

YYYYMMDD: is the UTC date stamp.

<class_id>: is a UUID identifier associated with the data retention class.

/data/eventdata/YYYYMMDD_<class_id>/events.evt

The events.evt directory contains the binary event data for the partition. The format of the binary event data is stored as a Reliable Persistent Random Access Compressed Stream.

/data/eventdata/YYYYMMDD_<class_id>/index

The index directory contains the lucene index for the partition.

3.1.3 Archiving

Archiving is the process of copying closed data files from the local storage location to the archive storage location. The original files are retained on Sentinel Log Manager to facilitate faster searches; however, if the Sentinel Log Manager server disk space usage nears a user-defined threshold, duplicate data files are deleted from the Sentinel Log Manager server.

Archiving processes are applied to both the raw data and event data.

Raw Data Archiving

A raw data file is in one of the following three states at the online location:

xx.open: A file to which data is currently being written.

xx.log: A file to which data is no longer being written. This type of file has not been compressed yet.

xx.zip: A file that is already been compressed. The compression process runs every 10 minutes, by default. These files appear in both the online and archive locations if archiving is configured and enabled.

If data archiving is configured and enabled, compressed raw data files are copied in every 15 minutes to the configured archive location.

For more information about raw data storage, see Raw Data Storage.

Event Data Archiving

The event data stored on the Sentinel Log Manager server are archived if data archiving is enabled and configured.

If archiving is enabled, the closed files are archived whenever the server starts. They are also archived at midnight UTC every night. These files are already compressed in the local storage location, but the indexes for these files are compressed before being moved to the archive. If the archive location is not configured or if there is any problem while archiving, attempts are made every 60 seconds until archiving succeeds.

3.1.4 Data Retention

The data retention policies control when data is deleted from the system. There is one policy for the raw data; there may be multiple policies that apply to the event data.