3.1 Data Storage Overview
Sentinel Log Manager receives two separate, but similar data streams from the collector managers: the event data and the raw data. Both types of data on Sentinel Log Manager are moved from the online, compressed, file-based storage to a user-configured, compressed archive storage location on a regular basis.
Data files are deleted from the local and archive storage locations on a configured schedule. Raw data retention is governed by a single raw data retention policy. Event data retention is governed by a set of event data retention policies. All these policies are configured by the Sentinel Log Manager administrator.
3.1.1 Raw Data
Raw data are the unprocessed events that are received by the connector and sent directly to the Sentinel Log Manager message bus, and then written to the Sentinel Log Manager server. The original event is not altered, but the following additional information are also sent to the message bus with each event:
All raw data are sent to the Sentinel Log Manager; there is no filtering on raw data.
The time-based raw data files are closed (changed to read-only) after a duration and no more events are written to them. After these files are closed, they are compressed and archived to the configured location.
Raw Data Storage
In Sentinel Log Manager, raw data is always stored. Raw data partitions are individual files. They are created every hour, and are closed within 10 minutes after the elapsed time. When a raw data file is closed, it is renamed to identify the closed files. Files in the open state have a .open extension. When they are closed, they will be renamed to have a .log extension. Sometime after they are closed, they will be compressed and will then have a .zip extension. After being compressed, they are moved to archive storage and are no longer present in the local storage.
The following table describes the directory structure of the online raw data under the installation directory:
Table 3-1 Raw Data Directory Structure
/data
|
The primary directory for all data storage. |
/data/rawdata
|
The sub directory where all raw data is stored. |
/data/rawdata/online
|
The directory where all the online raw data is stored. |
/data/rawdata/EventSource UUID
|
The sub directory name is the universally unique identifier (UUID) of the event source (for example, E20D0840-1E0A-102C-9F30-000C2949BA91).
There is one subdirectory for each event source under the online subdirectory. That subdirectory contains all raw data received from that event source. |
/data/rawdata/EventSource UUID/Month
|
The subdirectory name is in the yyyy-mm format (for example: 2009-05 is May of 2009).
Data in the event source subdirectory is partitioned by month. Each month has its own subdirectory. |
/data/rawdata/EventSource UUID/Month/1 Hour Data Files
|
Each file in the Month directory contains data received during a specific one-hour period. Most data in the file have a time stamp that are within the one-hour period.
The name of the file indicates the day of the month and the one-hour period that is represented.
The filename format is dd-hhmm.extension.
Where:
dd is the day of the month.
hh is the hour of the day.
mm is the minute of the hour.
extension is either open or log or zip (compressed).
For example:
A name with the extension 08-1300.open indicates that the file contains uncompressed data received on the 8th day of the month between 01.00 p.m. and 02.00 p.m.
A name with the extension 08-0900.log indicates that the file contains uncompressed data received on the 8th day of the month between 09.00 a.m. and 10.00 a.m., and the file is closed, but not yet compressed.
A name with the extension 08-0000.zip indicates that the file contains compressed data received on the 8th day of the month between 12:00 a.m. and 01:00 a.m. |
The following examples show filenames as they might appear relative to the installation directory:
-
data/rawdata/online/E20D0840-1E0A-102C-9F30-000C2949BA91/2009-05/08-0000.zip: Compressed raw data received on May 8, 2009 between 12:00 a.m. and 01:00 a.m.
-
data/rawdata/online/E20D0840-1E0A-102C-9F30-000C2949BA91/2009-05/08-0100.open: Uncompressed raw data received on May 8, 2009 in every hour.
|
Raw Data Representation
Each raw data event is represented as a single line in a raw data file. Each line is a JSON object that has the following fields:
Table 3-2 Raw Data Representation
EventDate |
This is the date and time when the Sentinel Log Manager received this event and not the date and time when the event has occurred.
Example: “05/07/2009 05:23.790” |
EventRecordID |
The record ID of the corresponding event record in the event store.
NOTE:If no event record was ever created (because of filtering) this record ID might not point to anything.
Example: "595829C0-1C8F-102C-A922-000C2949BA91" |
RawData |
The original raw data received by the event source. |
RawDataHash |
The SHA256 hash of the RawData value represented as a HEX string. The hash is calculated by converting the RawData value to a UTF-8 string and then performing the hash over that string.
To detect tampering, each raw data event is stored with a SHA256 hash value.
Example: cc661009e2f3dc565c0c7fe25b705219004dcd8132c0b0a7e987bfdcb55e49cf |
EventSourceID |
The UUID of the event source the raw data came from.
Example: A2A0C600-1C6C-102C-A781-000C2949BA91 |
EventSourceGroupID |
The UUID of the event source group (Connector) to which the event source was connected when the raw data was received.
Example: A2A0C600-1C6C-102C-A77A-000C2949BA91
NOTE:Different raw events from the same event source can have different event source group IDs, because event sources can be moved from one connector to other.
|
CollectorID |
The UUID of the Collector that the Connector and event source were connected to when the raw data was received.
NOTE:Different raw events from the same event source can have different Collector IDs, because event sources and event source groups can be moved from one collector to another.
Example: A2A0C600-1C6C-102C-A779-000C2949BA91 |
EventSourceManagerID |
The UUID of the Event Source Manager object where this raw data was received.
Example: C76D2820-C395-1029-BB86-001321B5C0B3 |
ChainID |
A random number that identifies a raw data chain. Whenever an event source is stopped and restarted between generation of raw data events, a new chain ID number is generated.
To detect tampering, each raw data event is stored with a Chain ID and a Chain Sequence number.
Example: 1241630654754 |
ChainSequence |
A sequence number within a particular raw data chain.
The raw data events in a given raw data chain must have an uninterrupted sequence of numbers starting with 0. In addition, all raw data events in a given raw data chain must appear sequentially in the files, with no other chains intermixed. If a raw data chain can span files, the sequence should continue uninterrupted into the file that represents every hour during which raw data was received.
Example: 4
NOTE:If no raw data is received for the one hour period the file would record only from the next arrival of raw data. Nonetheless, the raw data chain sequence should continue uninterrupted across until a new raw data chain begins. A new raw data chain is signaled by a changed ChainID value, and a ChainSequence value of zero (0).
|
3.1.2 Event Data
Event data is processed by the collector running on the collector manager. For more information about event processing and parsing, see Section 4.0, Configuring Data Collection. Event data are subject to filtering rules set up on the event source, connector, and collector, so event data may be dropped, if required.
The event data partitions are closed after two days, and no more events are written to them. Even though the duration of the partition is only for one day, partitions are closed after two days to accommodate events arriving at the last moment. After the partitions are closed, they are compressed and archived.
Online partitions are stored in the install_directory/data/eventdata directory, which is on the local file system. Partitions are created based on the dates and retention policies.
A central partition index is maintained in the database that keeps track of all the existing partitions and their location.
The following table describes the directory structure under the installation directory where event data is stored:
Table 3-3 Event Data Directory Structure
/data
|
The primary directory for all data storage. |
/data/eventdata
|
The sub directory where all event data is stored. |
/data/eventdata/YYYYMMDD_<classid>
|
A partition consists of the events for a single day (midnight-midnight UTC) within a given data retention class and is held within a sub-directory named YYYYMMDD_<class-id>.
Where,
|
/data/eventdata/YYYYMMDD_<class_id>/events.evt
|
The events.evt directory contains the binary event data for the partition. The format of the binary event data is stored as a Reliable Persistent Random Access Compressed Stream. |
/data/eventdata/YYYYMMDD_<class_id>/index
|
The index directory contains the lucene index for the partition. |
3.1.3 Archiving
Archiving is the process of copying closed data files from the local storage location to the archive storage location. The original files are retained on Sentinel Log Manager to facilitate faster searches; however, if the Sentinel Log Manager server disk space usage nears a user-defined threshold, duplicate data files are deleted from the Sentinel Log Manager server.
Archiving processes are applied to both the raw data and event data.
Raw Data Archiving
A raw data file is in one of the following three states at the online location:
If data archiving is configured and enabled, compressed raw data files are copied in every 15 minutes to the configured archive location.
For more information about raw data storage, see Raw Data Storage.
Event Data Archiving
The event data stored on the Sentinel Log Manager server are archived if data archiving is enabled and configured.
If archiving is enabled, the closed files are archived whenever the server starts. They are also archived at midnight UTC every night. These files are already compressed in the local storage location, but the indexes for these files are compressed before being moved to the archive. If the archive location is not configured or if there is any problem while archiving, attempts are made every 60 seconds until archiving succeeds.
3.1.4 Data Retention
The data retention policies control when data is deleted from the system. There is one policy for the raw data; there may be multiple policies that apply to the event data.