By monitoring a distributed network using Minions, you get a more complex monitoring system. This section gives you some help on how to troubleshoot the OpenNMS components themselves. We ship OpenNMS with the Apache Karaf OSGi runtime and we have built-in a few commands which can help you. To make things easier here I use the term core for the Horizon/Merdian server instance and Minion for the Minion
The first thing you really want to know is, does my core instance can communicate with a Minion and vice versa.
We start with the Minion. You have to replace
minion-host with the Minions IP or FQDN.
Connect to the Minion’s Karaf Shell
ssh -p 8201 admin@minion-host
Run the health check command
Depending on your configuration, you should see the following output
Verifying the health of the container Connecting to OpenNMS ReST API [ Success ] Verifying installed bundles [ Success ] Connecting to JMS Broker [ Success ] Verifying Listener Single-Port-Flow-Listener (org.opennms.netmgt.telemetry.listeners.UdpListener) [ Success ] => Everything is awesome
The health check tests
- installed Karaf bundles and features can be started
- connecting to the configured message broker
- connecting to the REST endpoint of the Core server
- Test Flow listener configurations (you see this here only when you have flow listener configured)
If you have issues, check the
data/log/karaf.log file in your Minion directory or run the command
log:tail in a second Karaf shell.
The Remote Producer Calls (RPC) are messages sent by the Core server instance to the Minion. RPC uses the message broker as a channel. If a node is associated with a location, the core server instance sends RPC messages to the corresponding remote location and a Minion executes the tests, e.g., run ICMP poller on IP interface w.x.y.z.
You can use the
opennms:stress-rpc command on the core server as an end-to-end test to verify if the RPC path to a Minion works as expected.
From the picture above this test will treat the message broker as a black box.
ssh -p 8101 admin@core-host-ip
opennms:stress-rpc -c 5 -l minion-location
The output looks like this:
Executing 5 requests. Waiting for responses. Done! 6/29/21, 5:17:24 PM ============================================================ -- Counters -------------------------------------------------------------------- failures count = 0 successes count = 5 -- Histograms ------------------------------------------------------------------ response-times count = 5 min = 24 max = 47 mean = 36.80 stddev = 8.93 median = 37.00 75% <= 46.00 95% <= 47.00 98% <= 47.00 99% <= 47.00 99.9% <= 47.00 Total milliseconds elapsed: 52 Milliseconds spent generating requests: 3 Milliseconds spent waiting for responses: 49
Important here are
failures/successes count numbers. You should have 5 successes here and no failures.
failures count = 0 successes count = 5
Another important metric is
Milliseconds spent waiting for responses. It should be in a millisecond range. If you get failures, it means the core server got no response from the Minion. The default timeout is set to 20sec waiting for a response.
What if you have multiple Minions in a location?
You can run the RPC test against a specific Minion when you additionally provide the system ID (aka Minion ID). If you haven’t set it manually it is a generated UUID. In my example, I’ve set it to a human-readable unique ID.
The following command runs the RPC ping just to the Minion in
minion-location with ID
opennms:stress-rpc -s minion-01 -c 5 -l minion-location
If you want to quickly check if a Minion can ping a device in a remote network you can do so without logging into the Minions via SSH in the remote network. You can run a ping from the core instead. Connect to the Karaf shell of the core server:
ssh -p 8101 admin@core-host-ip
Run the ping command with an FQDN or IP address. The
-s minion-01 is optional. If you have more than one Minion in a location you can tell which one should send the ping requests for you:
opennms:ping -s minion-01 -l minion-location www.google.com
The Minion and message broker are treated as a black box in this test scenario. The ICMP ping is executed from the Minion to the FQDN/IP target and shipped over the message broker back to the core instance.
You can test and troubleshoot DNS configurations by executing arbitrary name lookups remotely. In this example, we run a DNS lookup on a specific Minion resolving the FQDN www.opennms.com. The Minion uses the DNS configuration from the operating system for the lookup:
opennms:dns-lookup -s minion-02 -l minion-location www.opennms.com www.opennms.com resolves to: 22.214.171.124
The same works also for reverse lookups:
opennms:dns-reverse-lookup -s minion-02 -l minion-location 192.168.178.40 192.168.178.40 resolves to: ip4.wlp2s0.scummbar.labmonkeys.tech.
This section describes how to run SNMP commands from the core server to a device behind a Minion in a remote network. It is equivalent to executing the
snmpwalk command from the Minion to your SNMP agent on your device.
Connect to the Horizon Core Karaf Shell
ssh -p 8101 admin@core-host-ip
Execute an SNMP walk against a device in the remote location
snmp-walk -l MyLocation IpAddressInMyLocation 126.96.36.199.4.1
This command helps you to verify a) if the SNMP community configuration for a given host in a remote location is correct, b) the Minion can reach the device in its remote location, and c) if RPC calls can be executed from the Core server to the Minion.
You can run an ad-hoc test for every monitor that comes with your OpenNMS core server from the Karaf CLI. It will show you the exact same result as Pollerd would run a test to verify if a service is available. This example uses the IcmpMonitor to ping a device in a remote location through a Minion.
Run an ICMP monitor through a Minion
opennms:poll -l MyLocation -t Time-To-Live-in-ms org.opennms.netmgt.poller.monitors.IcmpMonitor myIpAddress
The Time To Live (TTL) is only related to messages in the ActiveMQ communication.
In case a poll is triggered manually through Karaf CLI, the message TTL in ActiveMQ should be at least the number of retries x timeout in ms, e.g., 3 x 2000ms = 6000ms.
By default, the configured polling interval is used, which is by default 5 minutes (300000 ms).
You can get a list of all available monitors with:
Find more useful Karaf commands in our Karaf CLI Cheat sheet.
You can fix me, I’m a wiki post.