7.4 Submitting Spark Applications on YARN

To process Sentinel data in CDH, you must run the manage_spark_jobs.sh script to submit the following Spark jobs on YARN:

  • To stream event data and raw data to HBase tables

  • To start indexing into Elasticsearch

You must run the manage_spark_jobs.sh script in the following scenarios:

  • When you modify the IP address or port number of Elasticsearch or any of the CDH components.

  • When you restart the CDH or YARN cluster.

  • When you reboot the CDH or YARN machine.

To submit Spark jobs on YARN:

  1. Log in to the SSDM server as the novell user and copy the files to the Spark history server where HDFS NameNode is installed:

    cd /etc/opt/novell/sentinel/scalablestore

    scp SparkApp-*.jar avroevent-*.avsc avrorawdata-*.avsc spark.properties log4j.properties manage_spark_jobs.sh root@<hdfs_node>:<destination_directory>

    where <destination_directory> is any directory where you want to place the copied files. Also, ensure that the hdfs user has full permissions to this directory.

  2. Log in to the <hdfs_node> server as the root user and change the ownership of the copied files to hdfs user:

    cd <destination_directory>

    chown hdfs SparkApp-*.jar avroevent-*.avsc avrorawdata-*.avsc spark.properties log4j.properties manage_spark_jobs.sh

    Assign executable permission to the manage_spark_jobs.sh script.

  3. Run the following script to submit the Spark jobs:

    ./manage_spark_jobs.sh start

    The above command takes a while to complete the submit process.

  4. (Optional) Run the following command to verify the status of the submitted Spark jobs:

    ./manage_spark_jobs.sh status