14.2 Performance Tuning Guidelines

14.2.1 Performance Tuning in CDH

The following table provides information about performance tuning recommendations that you must perform on your CDH setup.

For information about how to set these values, refer to the Cloudera documentation.

Table 14-1 Tuning Guidelines for CDH

CDH Component	Scaling Factor	Recommendation
Kafka	Data retention hours log.retention.hours	Configure the retention days based on the disk space available on the Kafka node, EPS rate, and the number of days needed to recover the system from a scalable storage outage if one were to occur. Kafka will retain the data while the scalable storage system is recovered, preventing data loss. For information about the recommended disk storage for various EPS rates, see the Technical Information for Sentinel page.
	Data directories log.dirs	Each directory should be on its own separate drive. Configure multiple paths to store Kafka partitions. For example, /kafka1, /kafka2, and /kafka3. For best performance and manageability, mount each path to a separate physical disk (JBOD).
HDFS	HDFS block size dfs.block.size	Increase the block size to 256 MB to reduce the disk seek time among data nodes and to reduce the load on NameNodes.
	Replication factor dfs.replication	Set the value to 3 so that it creates three copies (1 primary and 2 replicas) of files on data nodes. More than 3 copies of files require additional disks on data nodes and reduce the disk I/O latency. The number of data disks required is usually multiplied with the number of replicas.
	DataNode data directory dfs.data.dir,dfs.datanode.data.dir	Each directory should be on its own separate drive. Configure multiple paths for storing HBASE data. Example /dfs1, /dfs2, and so on. For best performance and manageability, mount each path to a separate physical disk (JBOD).
	Maximum number of transfer threads dfs.datanode.max.xcievers,dfs.datanode.max.transfer.threads	Set it to 8192.
HBase	HBase Client Write Buffer hbase.client.write.buffer	Set this value to 8 MB to increase the write performance of HBase for a higher EPS load.
	HBase RegionServer Handler Count hbase.regionserver.handler.count	Set this value to 100. This value must be approximately the number of CPUs on the region servers.For example, if you have 6 region servers with 16 core each, set the region handler count as 100 (6 x 16 = 96).
	HBase table regions security.events.normalized security.events.raw	By default, Sentinel assigns (pre-splits) 16 regions per table. Regardless of Sentinel pre-splitting the regions, once a region gets to a certain (HBase maximum file Size exceeds) limit, HBase automatically splits the regions into two so that the total number of region count increases on the region servers.Monitor the HBase load distribution. If you see that some regions are getting uneven loads, you may consider splitting those regions manually to distribute the load evenly and improve throughput.
	Maximum Size of All Memstores in RegionServer hbase.regionserver.global.memstore.upperLimit hbase.regionserver.global.memstore.size	Set it to 0.7.
	HFile Block Cache Size hfile.block.cache.size	Set it to 0.1.
	HBase Memstore Flush Size hbase.hregion.memstore.flush.size	Set it to 256 MB.
	HStore Compaction Threshold hbase.hstore.compactionThreshold	Set it to 5.
	Java Heap Size of HBase RegionServer in Bytes	Set it to 4 GB.
YARN	Container Virtual CPU Cores yarn.nodemanager.resource.cpu-vcores=14	Increase this value to up to 80% of NodeManager's available vCPUs.For example, If NodeManager has 16 vCPUs resources, set this value to up to 14 vcores so that you leave 2 vcore for operating system utilization. You can increase or decrease this value based on the available vcores for NodeManager.
	Container Virtual CPU Cores Maximum yarn.scheduler.maximum-allocation-vcores	Allocate 1 vcore per Application Master container.
	Container Memory yarn.nodemanager.resource.memory-mb	Increase this value to up to 80% of NodeManager's memory. For example, if NodeManager memory is 24 GB, set this value to up to 20 GB.You can increase or decrease this value based on the available NodeManager memory.
	Container Memory Maximum yarn.scheduler.maximum-allocation-mb	Spark runs 3 applications: event data, raw data, and event indexer. Ensure that each AM container has sufficient memory to run these applications. For example, if YARN ResourceManager has 24 GB, allocate a maximum of 8 GB memory per AM container. You can increase or decrease this value based on the available ResourceManager memory.
Spark	Enable Event Log Cleaner spark.history.fs.cleaner.enabled	Set it to true.
	Event Log Cleaner Interval spark.history.fs.cleaner.interval	Set it to 30 minutes.
	Maximum Event Log Age spark.history.fs.cleaner.maxAge	Set it to 4 hours.
Disk Latency		Whenever the disk latency goes beyond 100 ms on a data node, add more JBOD disks to the data node and configure the component causing the highest I/O load to utilize the directories where additional disks are mounted.

14.2.2 Performance Tuning in SSDM

SSDM automatically configures settings of Sentinel components connected to or running within the scalable storage system, which are described in the following table:

Table 14-2 Newly Added Properties for Scalable Storage Configuration

Property	Default Value	Notes
rawdata.fullconnectordump (HDFS)	false	To reduce storage space and for better performance, SSDM stores only the raw data and does not store the full Connector dump for an event. If this property is set to true, SSDM stores the full Connector dump, which is 3 - 5 times larger than the raw data dump.
batch.duration (YARN)	10 (seconds)	Indicates the batch interval for Spark streaming. Events are continuously processed in batches at the specified interval.
kafka.metrics.print (Kafka)	false	If enabled, SSDM periodically prints the Kafka client metrics in Sentinel logs.

Property

Default Value

Notes

rawdata.fullconnectordump

(HDFS)

false

To reduce storage space and for better performance, SSDM stores only the raw data and does not store the full Connector dump for an event. If this property is set to true, SSDM stores the full Connector dump, which is 3 - 5 times larger than the raw data dump.

batch.duration

(YARN)

10 (seconds)

Indicates the batch interval for Spark streaming. Events are continuously processed in batches at the specified interval.

kafka.metrics.print

(Kafka)

false

If enabled, SSDM periodically prints the Kafka client metrics in Sentinel logs.

To avoid data loss and to optimize performance, SSDM automatically configures certain scalable storage components’ settings described in the following table. For more information about these advanced properties and the guidelines to consider when configuring these properties, see the documentation for the specific component.

Table 14-3 Customized Properties of Scalable Storage Components

Component	Property	Customized Value	Reason for Customization
HDFS/HBase	hbase.client.retries.number	3	To prevent data loss
	hbase.zookeeper.recoverable.waittime	5000	To prevent data loss
	hbase.client.pause	3	To prevent data loss
Kafka	compression.type	lz4	For better performance and also for storage size reduction.
	retries	2147483647	To prevent data loss
	request.timeout.ms	2147483647	To prevent data loss
	max.block.ms	2147483647	To prevent data loss
	acks	all	To prevent data loss
	reconnect.backoff.ms	10000	To improve performance
	max.in.flight.requests.per.connection	5	To improve performance
	linger.ms	100	To improve performance.
ZooKeeper	zookeeper.session.timeout	5000	To prevent data loss

To view or configure the advanced properties, in the SSDM home page, click Storage > Scalable Storage > Advanced Properties.

IMPORTANT:You can add new properties or modify the existing properties as required. You must specify these settings at your own discretion and validate them because the changes are applied as is.