To process Sentinel data in CDH, you must run the manage_spark_jobs.sh script to submit the following Spark jobs on YARN:
To stream event data and raw data to HBase tables
To start indexing into Elasticsearch
You must run the manage_spark_jobs.sh script in the following scenarios:
When you modify the IP address or port number of Elasticsearch or any of the CDH components.
When you restart the CDH or YARN cluster.
When you reboot the CDH or YARN machine.
To submit Spark jobs on YARN:
Log in to the SSDM server as the novell user and copy the files to the Spark history server where HDFS NameNode is installed:
cd /etc/opt/novell/sentinel/scalablestore
scp SparkApp-*.jar avroevent-*.avsc avrorawdata-*.avsc spark.properties log4j.properties manage_spark_jobs.sh root@<hdfs_node>:<destination_directory>
where <destination_directory> is any directory where you want to place the copied files. Also, ensure that the hdfs user has full permissions to this directory.
Log in to the <hdfs_node> server as the root user and change the ownership of the copied files to hdfs user:
cd <destination_directory>
chown hdfs SparkApp-*.jar avroevent-*.avsc avrorawdata-*.avsc spark.properties log4j.properties manage_spark_jobs.sh
Assign executable permission to the manage_spark_jobs.sh script.
Run the following script to submit the Spark jobs:
./manage_spark_jobs.sh start
The above command takes a while to complete the submit process.
(Optional) Run the following command to verify the status of the submitted Spark jobs:
./manage_spark_jobs.sh status