Sometimes the OES Linux server fails to detect a client host that has gone down abruptly due to a workstation crashing or a power outage. However, the connection is active for the default timeout (about 12 to 15 minutes) before the connection is cleared. If you have set the concurrent connections to 1, it is recommended that you either terminate the connection manually, or wait for the estimated timeout before logging in again. This situation occurs when the watchdog process fails to close the connection cleanly. So, if the concurrent connections are set to 1 and the connection is not cleared by the watchdog, users cannot log in. Linux kernel provides three parameters to change the way keepalive probes work from the server side. Use these parameters to implement a workaround at the TCP level.
These parameters are available in /proc/sys/net/ipv4/ directory.
tcp_keepalive_time: Determines the frequency of sending the TCP keepalive packets to keep a connection alive if it is currently unused. This value is used only when keepalive is enabled.
The tcp_keepalive_time takes an integer value in seconds. The default value is 7200 seconds or 2 hours. This holds good for most of the hosts and does not require many network resources. If you set this value to low, it engages your network resources with unnecessary traffic.
tcp_keepalive_probes: Determines the frequency of sending TCP keepalive probes before deciding a broken connection.
The tcp_keepalive_probes takes an integer value, recommended less than 50 depending on your tcp_keepalive_time and the tcp_keepalive_interval values. The default is to set to 9 probes before informing the application of the broken connection.
tcp_keepalive_intvl: Determines the duration for a reply for each keepalive probe. This value is important to calculate the time before your connection has a keepalive death.
The tcp_keepalive_intvl takes an integer value, the default is 75 seconds. So, 9 probes with 75 seconds each will take approximately 11 minutes. The default values of the tcp_keepalive_probes and tcp_keepalive_intvl variables can be used to evaluate the default time before the connection is timed out because of keepalive.
Modify these three parameters in a way that the change does not generate a lot of extra network traffic and still solves the problem. A sample modification could be as follows (a 3-minute detection time):
tcp_keepalive_time set -120
tcp_keepalive_probes - 3
tcp_keepalive_intvl - 20
NOTE:Be careful with the parameter settings and avoid setting the already valid connections.
The settings take effect immediately after the files are modified. You need not restart any services. However, the settings are valid for the current session only. Once the server is re-booted, the settings revert to the default settings.
To make the setting permanent (even after a reboot), do the following:
Add the following entries in /etc/sysctl.conf.
We recommend these settings only if all the clients and servers are connected through LAN.