How to investigate file descriptor issues

If you run a centralized monitoring system in a large environment you can run in some issues regarding file descriptor limits. Linux gives you very detailed information in the kernel control and information center in /proc. The soft and hard limits have an effect on file and network sockets, which can end up in too many files open exception in OpenNMS. This might also take into account when you run Minions or Sentinels in large environments.

What is the difference between soft- and hard limit

Soft limits are by default like a quota for a user to open file descriptors. As soon the users’ process exceeds the soft limit, he can’t allocate any file descriptors anymore. The user can increase the soft limit but can’t exceed the hard limit. The hard limit has to be set by a person with root access. If you have OpenNMS running in a larger setup, you probably won’t share resources and your OpenNMS system can use it all for its own.

Investigate your system environment

The default values for soft and hard limits can be checked with

ulimit -a
ulimit -a -H

The value is per user and each new process inherits these limits. If you run OpenNMS as root the start script changes the hard limit to something like this:

ulimit -n 20480

normally the default is 4096 and changes it to 20480. You can modify this value by adding the following line to your /etc/opennms/opennms.conf.

MAXIMUM_FILE_DESCRIPTORS=40960

which increases the hard limit even further to 40960.

File descriptors for processes and file handles

If you start OpenNMS you can see the applied limits for the OpenNMS JVM with

cat /proc/$(cat /var/run/opennms.pid)/limits

where /var/run/opennms.pid contains the process id of the JVM OpenNMS is running with.

You can see how much file descriptors OpenNMS has allocated with:

ls -l /proc/$(cat /var/run/opennms.pid)/fd | wc -l

If you use lsof with the process id of OpenNMS you will see a larger number than in /proc/pid/fd

lsof -p $(cat /var/run/opennms.pid) | wc -l

The reason is memory-mapped .so files are listed and don’t count for the configured limits and are listed with lsof.

lsof | grep $(cat /var/run/opennms.pid) | wc -l

If you want to see how many filesystem-handles are used on your system, you can run:

cat /proc/sys/fs/file-nr
4128	0	262144

you can see three values:

number of allocated file handles: 4128
number of used file handles:      0
maximum number of file handles:   262144
1 Like

If you need to update the file descriptor limit (or JAVA_MEM_MAX) on a minion, use the (minion directory)/bin/setenv file.

1 Like