14.8 Re-Indexing Data from HBase

You can rebuild the Elasticsearch index by using the data stored in HBase. Re-indexing data helps in the following scenarios:

  • Index has been corrupted.

  • Index has been deleted.

  • Event fields to be indexed by Elasticsearch have been modified.

To re-index data from HBase:

  1. Log in to the Spark history server and change to the directory where you stored the Spark job related files when submitting Spark jobs.

  2. Delete the existing index file to consider any updates to event fields that need to be indexed:

    curl -XDELETE 'http://<elasticsearch_ipaddress:elasticsearch_port>/security.events.normalized_*/'

  3. Run the following command:

    sudo -u hdfs spark-submit --master yarn --files spark.properties,log4j.properties --deploy-mode client --class com.novell.sentinel.spark.scanner.HbaseScanner SparkApp-<latest_file>.jar -confFile=spark.properties -table=<hbase_table_name> -from=<yyddmmhhmmss> -to=<yyddmmhhmmss> -interval=<seconds> -outputhandler=com.novell.sentinel.spark.scanner.EventIndexRDDHandler

    where:

    hbase_table_name is the HBase table from which data is read for indexing. The HBase table name is in the <tenant_ID>:security.events.normalized format.

    from is the timestamp of the HBase table row from which re-indexing should start.

    to is the timestamp of the HBase table row where re indexing should stop.

    interval is the batch interval in seconds. The recommended interval is 60 seconds.