Troubleshooting Minion connectivity and functions

With monitoring a distributed network where you use Minions, you get a more complex monitoring system. This section gives you some help on how to troubleshoot the OpenNMS components themselves. We ship OpenNMS with the Apache Karaf OSGi runtime and we have built-in a few commands which help you with troubleshooting. To make things easier here I use the term core for the Horizon/Merdian server instance and Minion for the Minion :slight_smile:

Minion-to-Core communication

The first thing you really want to know is, does my core instance communicate with a Minion and vice versa.

We start with the Minion. You have to replace minion-host with the Minions IP or FQDN.

Connect to the Minion’s Karaf Shell

ssh -p 8201 admin@minion-host

Run the health check command

opennms:health-check

Depending on your configuration, you should see the following output

Verifying the health of the container

Connecting to OpenNMS ReST API                                                                      [ Success  ]
Verifying installed bundles                                                                         [ Success  ]
Connecting to JMS Broker                                                                            [ Success  ]
Verifying Listener Single-Port-Flow-Listener (org.opennms.netmgt.telemetry.listeners.UdpListener)   [ Success  ]

=> Everything is awesome

The health check tests

  • installed Karaf bundles and features can be started
  • connecting to the configured message broker
  • connecting to the REST endpoint of the Core server
  • Test Flow listener configurations (you see this here only when you have flow listener configured)

If you have issues, check the data/log/karaf.log file in your Minion directory or run the command log:tail in a second Karaf shell.

Core-to-Minion communication

The Remote Producer Calls (RPC) are messages sent by the Core server instance to the Minion. RPC uses the message broker as a channel. If a node is associated in a location, the core server instance sends RPC messages to the corresponding Minion in the remote location to execute the monitoring tests, e.g., run ICMP poller on IP interface w.x.y.z.

You can use the opennms:stress-rpc command on the Core server instance as an end-to-end test if the RPC path to a Minion works as expected.

From the picture above this test will treat the message broker as a black box.

ssh -p 8101 admin@core-host-ip
opennms:stress-rpc -c 5 -l minion-location

The output looks like this:

Executing 5 requests.
Waiting for responses.

Done!

6/29/21, 5:17:24 PM ============================================================

-- Counters --------------------------------------------------------------------
failures
             count = 0
successes
             count = 5

-- Histograms ------------------------------------------------------------------
response-times
             count = 5
               min = 24
               max = 47
              mean = 36.80
            stddev = 8.93
            median = 37.00
              75% <= 46.00
              95% <= 47.00
              98% <= 47.00
              99% <= 47.00
            99.9% <= 47.00


Total milliseconds elapsed: 52
Milliseconds spent generating requests: 3
Milliseconds spent waiting for responses: 49

Important here are failures/successes count. You should have 5 successes here and no failures.

failures
             count = 0
successes
             count = 5

Another important metric is Milliseconds spent waiting for responses. It should be in a millisecond range. If you get failures, it means the Core instance got no response from the Minion. The default timeout is set to wait for 20sec for a response.

What if you have multiple Minions in a location?

You can run the RPC test against a specific Minion when you additionally provide the system ID (aka Minion ID). If you haven’t set it manually it is a generated UUID. In my example I’ve set it to a human-readable unique ID.

The following command run the RPC ping just to the Minion in minion-location with ID minion-01.

opennms:stress-rpc -s minion-01 -c 5 -l minion-location

Execute a remote ping to arbitrary FQDNs or IP addresses

If you want to quickly check if a Minion can ping a device in a remote network you can do so without logging into the Minions Karaf shell via SSH in the remote network. You can run the ping from the Core instance instead. Connect to the Karaf shell of the core server:

ssh -p 8101 admin@core-host-ip

Run the ping command. The -s minion-01 is optional. If you have more than one Minion in a location you can tell which one should execute the ICMP for you:

opennms:ping -s minion-01 -l minion-location www.google.com

The Minion and message broker are treated as a black box in this test scenario. The ICMP ping is executed from the Minion to the FQDN/IP target and shipped over the message broker back to the core instance.

Execute remote DNS lookups

You can test and troubleshoot DNS configurations by executing arbitrary lookups remotely. In this example, we run a DNS lookup on a specific Minion resolving the FQDN www.google.com. The Minion uses the underlying configured OS configuration for the lookup:

opennms:dns-lookup -s minion-02 -l minion-location www.google.com

www.google.com resolves to: 142.250.185.132

The same works also for reverse lookups:

opennms:dns-reverse-lookup -s minion-02 -l minion-location 192.168.178.40

192.168.178.40 resolves to: ip4.wlp2s0.scummbar.labmonkeys.tech.

Execute SNMP commands through a Minion

This section describes how to run SNMP commands from the Core server to a device behind a Minion in a remote network. It is equivalent to executing the snmpwalk command from the Minion to your SNMP agent on your device.

Connect to the Minions Karaf Shell

ssh -p 8201 admin@minion-host-ip

Execute an SNMP walk against a device in the remote location

snmp:walk -l MyLocation IpAddressInMyLocation 1.3.6.1.4.1

This command helps you to verify a) if the SNMP community configuration for a given host in a remote location is correct, b) the Minion can reach the device in its remote location, and c) if RPC calls can be executed from the Core server to the Minion.

Run a monitor through a Minion

You can run an ad-hoc test for every monitor that comes with your OpenNMS core server from the Karaf CLI. It will show you the exact same result as Pollerd would run a poll to test the availability of a service. This example uses the IcmpMonitor to ping a device in a remote location through a Minion.

Run an ICMP monitor through a Minion

opennms:poll -l MyLocation -t Time-To-Live-in-ms org.opennms.netmgt.poller.monitors.IcmpMonitor myIpAddress

The Time To Live (TTL) is only related to messages in the ActiveMQ communication.

In case a poll is triggered manually through Karaf CLI, the message ZTL in ActiveMQ should be at least the number of retries x timeout in ms, e.g., 3 x 2000ms = 6000ms.

By default, the configured polling interval is used, which is by default 5 minutes (300000 ms).

You can get list with all available monitors with:

opennms:list-monitors

Further Readings

Find more useful Karaf commands in our Karaf CLI Cheat sheet.


:woman_facepalming: You can fix me, I’m a wiki post.

1 Like