How to analyse a Java thread dump

java
troubleshooting

#1

Objective

All OpenNMS components run in a Java Virtual Machine. If you run OpenNMS in very large or complex environments, it could be necessary to leave the default settings and you need to tweak the Java and OpenNMS settings. A first common indication is Pollerd or Collectd don’t have enough threads to get through the number of services and metrics to collect. Another indication can be you have integrations which consume resources during Event processing or forwarding Alarms into other applications.

This Article describes how to analyse Java Thread dumps to investigate how different components in OpenNMS interact. Be aware the size of these dumps depend on the size of your OpenNMS server and can be very large.

Solution

Step 1: Generate the thread dump:

Get the Java process ID (PID) from the running OpenNMS application.

Run systemctl status opennms and note the main PID or look into ${OPENNMS_HOME}/logs/opennms.pid file. Generate a thread dump using jstack -l ${pid} > /tmp/jstack.out. If you run on a Windows system you can use the Java provided command jps -v.

Step 2: Analyze the thread dump

Upload the thread dump to an analyser. You can use for example fastthread.io or if you want to analyse it locally with a community hosted Thread Dump Analyzer.

Step 3: Explaining outputs

Possible outputs for the thread status can be:

  • NEW: The thread is created but has not been processed yet.
  • RUNNABLE: The thread is occupying the CPU and processing a task.
  • BLOCKED: The thread is waiting for a different thread to release its lock in order to get the monitor lock.
  • WAITING: The thread is waiting by using a wait, join or park method.
  • TIMED_WAITING: The thread is waiting by using a sleep, wait, join or park method.
Example of locked threads below 392 threads found 91 threads with this stack:
"Collectd-Thread-1-of-50": awaiting notification on [0x00000005c6f0d820]
"Collectd-Thread-10-of-50": awaiting notification on [0x00000005c6f0d820]
"Collectd-Thread-11-of-50": awaiting notification on [0x00000005c6f0d820]
"Collectd-Thread-12-of-50": awaiting notification on [0x00000005c6f0d820]
"Collectd-Thread-13-of-50": awaiting notification on [0x00000005c6f0d820]
"Collectd-Thread-14-of-50": awaiting notification on [0x00000005c6f0d820]
"Collectd-Thread-15-of-50": awaiting notification on [0x00000005c6f0d820]

Notice that these 91 threads are locked

If you click that process ([0x00000005c6f0d820]) it takes you to this:

0x00000005c6f0d820
AbstractQueuedSynchronizer$ConditionObject  49 threads waiting for notification on lock:
Collectd-Thread-1-of-50
Collectd-Thread-10-of-50
Collectd-Thread-11-of-50
Collectd-Thread-12-of-50
Collectd-Thread-13-of-50

so 49 of 50 Collectd threads are locked. If we go to the Pollerd threads

0x00000005c3361090
AbstractQueuedSynchronizer$ConditionObject 
30 threads waiting for notification on lock:
Poller-Thread-1-of-30
Poller-Thread-10-of-30
Poller-Thread-11-of-30
Poller-Thread-12-of-30
Poller-Thread-13-of-30

All of the Poller threads are locked (30 of 30)