13.1 Installing and Configuring CDH

This section provides information about the specific settings required for Sentinel when installing and configuring CDH. For detailed information about CDH installation and configuration, you must refer to the certified version of Cloudera documentation.

Sentinel works with Cloudera Express, the free edition of CDH. Sentinel also works with Cloudera Enterprise, which requires the purchase of a license from Cloudera and includes numerous capabilities not available in the Cloudera Express edition. If you choose to begin with Cloudera Express and later discover you need the capabilities available with Cloudera Enterprise, you can upgrade the cluster after purchasing the license from Cloudera.

13.1.1 Prerequisites

Before you install CDH, you must set up the hosts as per the following prerequisites:

  • Complete the prerequisites mentioned in the Cloudera documentation.

  • Use ext4 or XFS file system for better performance.

  • CDH needs a few operating system packages that do not get installed by default. Therefore, you must mount the respective operating system DVD. The Cloudera installation instructions guide you about the packages to install.

  • For SLES operating systems, CDH requires the python-psycopg2 package. Install the python-psycopg2 package. For more information, see openSUSE documentation.

  • If you are using virtual machines, reserve the disk space required on the file system when you create virtual machines nodes. For example, in VMware, you can use thick provisioning.

  • Ensure that Sentinel and CDH cluster nodes are in the same timezone.

  • Set swappiness of all the hosts to 1 in the /etc/sysctl.conf file by adding the following entry:

    vm.swappiness=1

    To apply this setting immediately, run the following command:

    sysctl -p
  • The JDK version in CDH must be at least the same JDK version used in Sentinel. If the JDK version available in CDH is lower than the Sentinel JDK, you must follow the instructions to install the JDK manually versus installing the JDK available in the CDH repository.

    Install JDK by using the archive binary file (.tar.gz) because the JDK RPM installation causes issues when using the manage_spark_jobs.sh script to submit Spark jobs on YARN.

    To determine the JDK version used in Sentinel, see the Sentinel Release Notes.

13.1.2 Installing and Configuring CDH

Install the certified version of CDH. For information about the certified version of CDH, see the Technical Information for Sentinel page. Refer to the certified version of Cloudera documentation for installation instructions.

Perform the following while you install CDH:

  • (Conditional) If the installation fails during embedded PostgreSQL database installation, perform the following steps:

    mkdir -p /var/run/postgresql

    sudo chown cloudera-scm:cloudera-scm /var/run/postgresql

  • When choosing the software installation type in the Select Repository window, ensure that Use Parcels is selected and select Kafka in Additional Parcels.

  • When you add services, ensure that you enable the following services:

    • Cloudera Manager

    • ZooKeeper

    • HDFS

    • HBase

    • YARN

    • Spark

    • Kafka

    NOTE:The Spark history server and HDFS NameNode must be installed on the same node for system reliability. For information about the scalable storage architecture, see Planning for Scalable Storage.

    When enabling the above services, configure high availability for the following:

    • HBase HMaster

    • HDFS NameNode

    • YARN ResourceManager

  • (Conditional) If the installer does not deploy the client configuration due to missing Java path, open a new browser session and manually update the Java path as follows:

    Click Hosts > All Hosts > Configuration and specify the correct path in the Java Home Directory field.