The following table provides information about performance tuning recommendations that you must perform on your CDH setup.
For information about how to set these values, refer to the Cloudera documentation.
Table 14-1 Tuning Guidelines for CDH
CDH Component |
Scaling Factor |
Recommendation |
---|---|---|
Kafka |
Data retention hours log.retention.hours |
Configure the retention days based on the disk space available on the Kafka node, EPS rate, and the number of days needed to recover the system from a scalable storage outage if one were to occur. Kafka will retain the data while the scalable storage system is recovered, preventing data loss. For information about the recommended disk storage for various EPS rates, see the Technical Information for Sentinel page. |
|
Data directories log.dirs |
Each directory should be on its own separate drive. Configure multiple paths to store Kafka partitions. For example, /kafka1, /kafka2, and /kafka3. For best performance and manageability, mount each path to a separate physical disk (JBOD). |
HDFS |
HDFS block size dfs.block.size |
Increase the block size to 256 MB to reduce the disk seek time among data nodes and to reduce the load on NameNodes. |
|
Replication factor dfs.replication |
Set the value to 3 so that it creates three copies (1 primary and 2 replicas) of files on data nodes. More than 3 copies of files require additional disks on data nodes and reduce the disk I/O latency. The number of data disks required is usually multiplied with the number of replicas. |
|
DataNode data directory dfs.data.dir,dfs.datanode.data.dir |
Each directory should be on its own separate drive. Configure multiple paths for storing HBASE data. Example /dfs1, /dfs2, and so on. For best performance and manageability, mount each path to a separate physical disk (JBOD). |
|
Maximum number of transfer threads dfs.datanode.max.xcievers,dfs.datanode.max.transfer.threads |
Set it to 8192. |
HBase |
HBase Client Write Buffer hbase.client.write.buffer |
Set this value to 8 MB to increase the write performance of HBase for a higher EPS load. |
|
HBase RegionServer Handler Count hbase.regionserver.handler.count |
Set this value to 100. This value must be approximately the number of CPUs on the region servers.For example, if you have 6 region servers with 16 core each, set the region handler count as 100 (6 x 16 = 96). |
|
HBase table regions security.events.normalized security.events.raw |
By default, Sentinel assigns (pre-splits) 16 regions per table. Regardless of Sentinel pre-splitting the regions, once a region gets to a certain (HBase maximum file Size exceeds) limit, HBase automatically splits the regions into two so that the total number of region count increases on the region servers.Monitor the HBase load distribution. If you see that some regions are getting uneven loads, you may consider splitting those regions manually to distribute the load evenly and improve throughput. |
|
Maximum Size of All Memstores in RegionServer hbase.regionserver.global.memstore.upperLimit hbase.regionserver.global.memstore.size |
Set it to 0.7. |
|
HFile Block Cache Size hfile.block.cache.size |
Set it to 0.1. |
|
HBase Memstore Flush Size hbase.hregion.memstore.flush.size |
Set it to 256 MB. |
|
HStore Compaction Threshold hbase.hstore.compactionThreshold |
Set it to 5. |
|
Java Heap Size of HBase RegionServer in Bytes |
Set it to 4 GB. |
YARN |
Container Virtual CPU Cores yarn.nodemanager.resource.cpu-vcores=14 |
Increase this value to up to 80% of NodeManager's available vCPUs.For example, If NodeManager has 16 vCPUs resources, set this value to up to 14 vcores so that you leave 2 vcore for operating system utilization. You can increase or decrease this value based on the available vcores for NodeManager. |
|
Container Virtual CPU Cores Maximum yarn.scheduler.maximum-allocation-vcores |
Allocate 1 vcore per Application Master container. |
|
Container Memory yarn.nodemanager.resource.memory-mb |
Increase this value to up to 80% of NodeManager's memory. For example, if NodeManager memory is 24 GB, set this value to up to 20 GB.You can increase or decrease this value based on the available NodeManager memory. |
|
Container Memory Maximum yarn.scheduler.maximum-allocation-mb |
Spark runs 3 applications: event data, raw data, and event indexer. Ensure that each AM container has sufficient memory to run these applications. For example, if YARN ResourceManager has 24 GB, allocate a maximum of 8 GB memory per AM container. You can increase or decrease this value based on the available ResourceManager memory. |
Spark |
Enable Event Log Cleaner spark.history.fs.cleaner.enabled |
Set it to true. |
|
Event Log Cleaner Interval spark.history.fs.cleaner.interval |
Set it to 30 minutes. |
|
Maximum Event Log Age spark.history.fs.cleaner.maxAge |
Set it to 4 hours. |
Disk Latency |
|
Whenever the disk latency goes beyond 100 ms on a data node, add more JBOD disks to the data node and configure the component causing the highest I/O load to utilize the directories where additional disks are mounted. |
SSDM automatically configures settings of Sentinel components connected to or running within the scalable storage system, which are described in the following table:
Table 14-2 Newly Added Properties for Scalable Storage Configuration
Property |
Default Value |
Notes |
---|---|---|
rawdata.fullconnectordump (HDFS) |
false |
To reduce storage space and for better performance, SSDM stores only the raw data and does not store the full Connector dump for an event. If this property is set to true, SSDM stores the full Connector dump, which is 3 - 5 times larger than the raw data dump. |
batch.duration (YARN) |
10 (seconds) |
Indicates the batch interval for Spark streaming. Events are continuously processed in batches at the specified interval. |
kafka.metrics.print (Kafka) |
false |
If enabled, SSDM periodically prints the Kafka client metrics in Sentinel logs. |
To avoid data loss and to optimize performance, SSDM automatically configures certain scalable storage components’ settings described in the following table. For more information about these advanced properties and the guidelines to consider when configuring these properties, see the documentation for the specific component.
Table 14-3 Customized Properties of Scalable Storage Components
Component |
Property |
Customized Value |
Reason for Customization |
---|---|---|---|
HDFS/HBase |
hbase.client.retries.number |
3 |
To prevent data loss |
hbase.zookeeper.recoverable.waittime |
5000 |
To prevent data loss |
|
hbase.client.pause |
3 |
To prevent data loss |
|
Kafka |
compression.type |
lz4 |
For better performance and also for storage size reduction. |
|
retries |
2147483647 |
To prevent data loss |
|
request.timeout.ms |
2147483647 |
To prevent data loss |
|
max.block.ms |
2147483647 |
To prevent data loss |
|
acks |
all |
To prevent data loss |
|
reconnect.backoff.ms |
10000 |
To improve performance |
|
max.in.flight.requests.per.connection |
5 |
To improve performance |
|
linger.ms |
100 |
To improve performance. |
ZooKeeper |
zookeeper.session.timeout |
5000 |
To prevent data loss |
To view or configure the advanced properties, in the SSDM home page, click Storage > Scalable Storage > Advanced Properties.
IMPORTANT:You can add new properties or modify the existing properties as required. You must specify these settings at your own discretion and validate them because the changes are applied as is.