Troubleshooting Minion connectivity and functions

By monitoring a distributed network using Minions, you get a more complex monitoring system. This section gives you some help on how to troubleshoot the OpenNMS components themselves. We ship OpenNMS with the Apache Karaf OSGi runtime and we have built-in a few commands which can help you. To make things easier here I use the term core for the Horizon/Merdian server instance and Minion for the Minion :slight_smile:

Minion-to-Core communication

The first thing you really want to know is, does my core instance can communicate with a Minion and vice versa.

We start with the Minion. You have to replace minion-host with the Minions IP or FQDN.

Connect to the Minion’s Karaf Shell

ssh -p 8201 admin@minion-host

Run the health check command

opennms:health-check

Depending on your configuration, you should see the following output

Verifying the health of the container

Connecting to OpenNMS ReST API                                                                      [ Success  ]
Verifying installed bundles                                                                         [ Success  ]
Connecting to JMS Broker                                                                            [ Success  ]
Verifying Listener Single-Port-Flow-Listener (org.opennms.netmgt.telemetry.listeners.UdpListener)   [ Success  ]

=> Everything is awesome

The health check tests

  • installed Karaf bundles and features can be started
  • connecting to the configured message broker
  • connecting to the REST endpoint of the Core server
  • Test Flow listener configurations (you see this here only when you have flow listener configured)

If you have issues, check the data/log/karaf.log file in your Minion directory or run the command log:tail in a second Karaf shell.

Core-to-Minion communication

The Remote Producer Calls (RPC) are messages sent by the Core server instance to the Minion. RPC uses the message broker as a channel. If a node is associated with a location, the core server instance sends RPC messages to the corresponding remote location and a Minion executes the tests, e.g., run ICMP poller on IP interface w.x.y.z.

You can use the opennms:stress-rpc command on the core server as an end-to-end test to verify if the RPC path to a Minion works as expected.

From the picture above this test will treat the message broker as a black box.

ssh -p 8101 admin@core-host-ip
opennms:stress-rpc -c 5 -l minion-location

The output looks like this:

Executing 5 requests.
Waiting for responses.

Done!

6/29/21, 5:17:24 PM ============================================================

-- Counters --------------------------------------------------------------------
failures
             count = 0
successes
             count = 5

-- Histograms ------------------------------------------------------------------
response-times
             count = 5
               min = 24
               max = 47
              mean = 36.80
            stddev = 8.93
            median = 37.00
              75% <= 46.00
              95% <= 47.00
              98% <= 47.00
              99% <= 47.00
            99.9% <= 47.00


Total milliseconds elapsed: 52
Milliseconds spent generating requests: 3
Milliseconds spent waiting for responses: 49

Important here are failures/successes count numbers. You should have 5 successes here and no failures.

failures
             count = 0
successes
             count = 5

Another important metric is Milliseconds spent waiting for responses. It should be in a millisecond range. If you get failures, it means the core server got no response from the Minion. The default timeout is set to 20sec waiting for a response.

What if you have multiple Minions in a location?

You can run the RPC test against a specific Minion when you additionally provide the system ID (aka Minion ID). If you haven’t set it manually it is a generated UUID. In my example, I’ve set it to a human-readable unique ID.

The following command runs the RPC ping just to the Minion in minion-location with ID minion-01.

opennms:stress-rpc -s minion-01 -c 5 -l minion-location

Execute a remote ping to arbitrary FQDNs or IP addresses

If you want to quickly check if a Minion can ping a device in a remote network you can do so without logging into the Minions via SSH in the remote network. You can run a ping from the core instead. Connect to the Karaf shell of the core server:

ssh -p 8101 admin@core-host-ip

Run the ping command with an FQDN or IP address. The -s minion-01 is optional. If you have more than one Minion in a location you can tell which one should send the ping requests for you:

opennms:ping -s minion-01 -l minion-location www.google.com

The Minion and message broker are treated as a black box in this test scenario. The ICMP ping is executed from the Minion to the FQDN/IP target and shipped over the message broker back to the core instance.

Execute remote DNS lookups

You can test and troubleshoot DNS configurations by executing arbitrary name lookups remotely. In this example, we run a DNS lookup on a specific Minion resolving the FQDN www.opennms.com. The Minion uses the DNS configuration from the operating system for the lookup:

opennms:dns-lookup -s minion-02 -l minion-location www.opennms.com

www.opennms.com resolves to: 142.242.42.42

The same works also for reverse lookups:

opennms:dns-reverse-lookup -s minion-02 -l minion-location 192.168.178.40

192.168.178.40 resolves to: ip4.wlp2s0.scummbar.labmonkeys.tech.

Execute SNMP commands through a Minion

This section describes how to run SNMP commands from the core server to a device behind a Minion in a remote network. It is equivalent to executing the snmpwalk command from the Minion to your SNMP agent on your device.

Connect to the Minions Karaf Shell

ssh -p 8201 admin@minion-host-ip

Execute an SNMP walk against a device in the remote location

snmp-walk -l MyLocation IpAddressInMyLocation 1.3.6.1.4.1

This command helps you to verify a) if the SNMP community configuration for a given host in a remote location is correct, b) the Minion can reach the device in its remote location, and c) if RPC calls can be executed from the Core server to the Minion.

Run a monitor through a Minion

You can run an ad-hoc test for every monitor that comes with your OpenNMS core server from the Karaf CLI. It will show you the exact same result as Pollerd would run a test to verify if a service is available. This example uses the IcmpMonitor to ping a device in a remote location through a Minion.

Run an ICMP monitor through a Minion

opennms:poll -l MyLocation -t Time-To-Live-in-ms org.opennms.netmgt.poller.monitors.IcmpMonitor myIpAddress

The Time To Live (TTL) is only related to messages in the ActiveMQ communication.

In case a poll is triggered manually through Karaf CLI, the message TTL in ActiveMQ should be at least the number of retries x timeout in ms, e.g., 3 x 2000ms = 6000ms.

By default, the configured polling interval is used, which is by default 5 minutes (300000 ms).

You can get a list of all available monitors with:

opennms:list-monitors

Further Readings

Find more useful Karaf commands in our Karaf CLI Cheat sheet.


:woman_facepalming: You can fix me, I’m a wiki post.

1 Like

hello,

I m seeing complete failures for opennms:stress-rpc , any suggestions how to fix that

regards
Phanindra