14.2 Performance Tuning Guidelines

14.2.1 Performance Tuning in CDH

The following table provides information about performance tuning recommendations that you must perform on your CDH setup.

For information about how to set these values, refer to the Cloudera documentation.

Table 14-1 Tuning Guidelines for CDH

CDH Component

Scaling Factor

Recommendation

Kafka

Data retention hours

log.retention.hours

Configure the retention days based on the disk space available on the Kafka node, EPS rate, and the number of days needed to recover the system from a scalable storage outage if one were to occur. Kafka will retain the data while the scalable storage system is recovered, preventing data loss.

For information about the recommended disk storage for various EPS rates, see the Technical Information for Sentinel page.

 

Data directories

log.dirs

Each directory should be on its own separate drive.

Configure multiple paths to store Kafka partitions. For example, /kafka1, /kafka2, and /kafka3.

For best performance and manageability, mount each path to a separate physical disk (JBOD).

HDFS

HDFS block size

dfs.block.size

Increase the block size to 256 MB to reduce the disk seek time among data nodes and to reduce the load on NameNodes.

 

Replication factor

dfs.replication

Set the value to 3 so that it creates three copies (1 primary and 2 replicas) of files on data nodes.

More than 3 copies of files require additional disks on data nodes and reduce the disk I/O latency. The number of data disks required is usually multiplied with the number of replicas.

 

DataNode data directory

dfs.data.dir,dfs.datanode.data.dir

Each directory should be on its own separate drive.

Configure multiple paths for storing HBASE data. Example /dfs1, /dfs2, and so on.

For best performance and manageability, mount each path to a separate physical disk (JBOD).

 

Maximum number of transfer threads

dfs.datanode.max.xcievers,dfs.datanode.max.transfer.threads

Set it to 8192.

HBase

HBase Client Write Buffer

hbase.client.write.buffer

Set this value to 8 MB to increase the write performance of HBase for a higher EPS load.

 

HBase RegionServer Handler Count

hbase.regionserver.handler.count

Set this value to 100.

This value must be approximately the number of CPUs on the region servers.For example, if you have 6 region servers with 16 core each, set the region handler count as 100 (6 x 16 = 96).

 

HBase table regions

security.events.normalized

security.events.raw

By default, Sentinel assigns (pre-splits) 16 regions per table. Regardless of Sentinel pre-splitting the regions, once a region gets to a certain (HBase maximum file Size exceeds) limit, HBase automatically splits the regions into two so that the total number of region count increases on the region servers.Monitor the HBase load distribution. If you see that some regions are getting uneven loads, you may consider splitting those regions manually to distribute the load evenly and improve throughput.

 

Maximum Size of All Memstores in RegionServer

hbase.regionserver.global.memstore.upperLimit

hbase.regionserver.global.memstore.size

Set it to 0.7.

 

HFile Block Cache Size

hfile.block.cache.size

Set it to 0.1.

 

HBase Memstore Flush Size

hbase.hregion.memstore.flush.size

Set it to 256 MB.

 

HStore Compaction Threshold

hbase.hstore.compactionThreshold

Set it to 5.

 

Java Heap Size of HBase

RegionServer in Bytes

Set it to 4 GB.

YARN

Container Virtual CPU Cores

yarn.nodemanager.resource.cpu-vcores=14

Increase this value to up to 80% of NodeManager's available vCPUs.For example, If NodeManager has 16 vCPUs resources, set this value to up to 14 vcores so that you leave 2 vcore for operating system utilization.

You can increase or decrease this value based on the available vcores for NodeManager.

 

Container Virtual CPU Cores Maximum

yarn.scheduler.maximum-allocation-vcores

Allocate 1 vcore per Application Master container.

 

Container Memory

yarn.nodemanager.resource.memory-mb

Increase this value to up to 80% of NodeManager's memory.

For example, if NodeManager memory is 24 GB, set this value to up to 20 GB.You can increase or decrease this value based on the available NodeManager memory.

 

Container Memory Maximum

yarn.scheduler.maximum-allocation-mb

Spark runs 3 applications: event data, raw data, and event indexer. Ensure that each AM container has sufficient memory to run these applications.

For example, if YARN ResourceManager has 24 GB, allocate a maximum of 8 GB memory per AM container.

You can increase or decrease this value based on the available ResourceManager memory.

Spark

Enable Event Log Cleaner

spark.history.fs.cleaner.enabled

Set it to true.

 

Event Log Cleaner Interval

spark.history.fs.cleaner.interval

Set it to 30 minutes.

 

Maximum Event Log Age

spark.history.fs.cleaner.maxAge

Set it to 4 hours.

Disk Latency

 

Whenever the disk latency goes beyond 100 ms on a data node, add more JBOD disks to the data node and configure the component causing the highest I/O load to utilize the directories where additional disks are mounted.

14.2.2 Performance Tuning in SSDM

SSDM automatically configures settings of Sentinel components connected to or running within the scalable storage system, which are described in the following table:

Table 14-2 Newly Added Properties for Scalable Storage Configuration

Property

Default Value

Notes

rawdata.fullconnectordump

(HDFS)

false

To reduce storage space and for better performance, SSDM stores only the raw data and does not store the full Connector dump for an event. If this property is set to true, SSDM stores the full Connector dump, which is 3 - 5 times larger than the raw data dump.

batch.duration

(YARN)

10 (seconds)

Indicates the batch interval for Spark streaming. Events are continuously processed in batches at the specified interval.

kafka.metrics.print

(Kafka)

false

If enabled, SSDM periodically prints the Kafka client metrics in Sentinel logs.

To avoid data loss and to optimize performance, SSDM automatically configures certain scalable storage components’ settings described in the following table. For more information about these advanced properties and the guidelines to consider when configuring these properties, see the documentation for the specific component.

Table 14-3 Customized Properties of Scalable Storage Components

Component

Property

Customized Value

Reason for Customization

HDFS/HBase

hbase.client.retries.number

3

To prevent data loss

hbase.zookeeper.recoverable.waittime

5000

To prevent data loss

hbase.client.pause

3

To prevent data loss

Kafka

compression.type

lz4

For better performance and also for storage size reduction.

 

retries

2147483647

To prevent data loss

 

request.timeout.ms

2147483647

To prevent data loss

 

max.block.ms

2147483647

To prevent data loss

 

acks

all

To prevent data loss

 

reconnect.backoff.ms

10000

To improve performance

 

max.in.flight.requests.per.connection

5

To improve performance

 

linger.ms

100

To improve performance.

ZooKeeper

zookeeper.session.timeout

5000

To prevent data loss

To view or configure the advanced properties, in the SSDM home page, click Storage > Scalable Storage > Advanced Properties.

IMPORTANT:You can add new properties or modify the existing properties as required. You must specify these settings at your own discretion and validate them because the changes are applied as is.