Minion-RPC service failing

Problem:
One of our minions has stopped responding/polling. In the log on the core OpenNMS node we see the following:

2021-12-30 07:15:57,798 WARN  [Camel (rpcClient) thread #1258 - JmsReplyManagerOnTimeout[OpenNMS.....
2021-12-30 07:15:57,805 ERROR [Camel (rpcClient) thread #1258 - JmsReplyManagerOnTimeout[OpenNMS.minion-node.RPC.Poller]] o.a.c.p.DefaultErrorHandler: Failed delivery for....
 Exhausted after delivery attempt: 1 caught: org.apache.camel.ExchangeTimedOutException: The OUT message was not received within: 300000 millis due reply message with correlationID:

In the web gui under manage minions we can see that the Minion-RPC service on 127.0.0.1 is failing:

Last Fail 2021-12-30T16:21:11+01:00

However, the service is not turning red.

Running the test below from the core node against the minion with the problem times out:

admin@opennms()> opennms:stress-rpc -c 5 -l minion-node
Executing 5 requests.
Waiting for responses.
...................
Done!

12/30/21, 4:29:11 PM ===========================================================

-- Counters --------------------------------------------------------------------
failures
             count = 5
successes
             count = 0

-- Histograms ------------------------------------------------------------------
response-times
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00


Total milliseconds elapsed: 20167
Milliseconds spent generating requests: 0
Milliseconds spent waiting for responses: 20167

Expected outcome:
Should be able to run monitors via our minion.

OpenNMS version: 28.1.1

Health check on minion:

admin@minion> health:check
Verifying the health of the container

Connecting to OpenNMS ReST API   [ Success  ]
Verifying installed bundles      [ Success  ]
Connecting to JMS Broker         [ Success  ]

=> Everything is awesome

Any idea what else to try?

Happy new year!

On the core, what is the output of opennms:activemq-stats ?

admin@opennms()> opennms:activemq-stats
Broker statistics:
        Connections: 40
        Memory percent usage: 0%
        Memory usage: 6.6 KiB
        Memory limit: 1.0 GiB
Destination statistics (top 5):
        OpenNMS.txxxxxx-minion.RPC.Poller (Queue)
                Message count: 3
                Enqueue count: 16509
                Dequeue count: 16506
                Cursor full: false
        OpenNMS.mxxxxxx-minion.RPC.Poller (Queue)
                Message count: 0
                Enqueue count: 1792438
                Dequeue count: 1792438
                Cursor full: false
        OpenNMS.vxxxxxx-minion.RPC.Poller (Queue)
                Message count: 0
                Enqueue count: 1227435
                Dequeue count: 1227435
                Cursor full: false
        OpenNMS.mxxxxxx-minion.RPC.SNMP (Queue)
                Message count: 0
                Enqueue count: 542802
                Dequeue count: 542802
                Cursor full: false
        OpenNMS.Sink.Trap (Queue)
                Message count: 0
                Enqueue count: 466354
                Dequeue count: 467000
                Cursor full: false

I donโ€™t see the minion in question listed there.

You can do:

        -n, --top-n
                Only show the Top N destinations (set to 0 to show all)
                (defaults to 5)

to see more.

Usually when a minion on activemq stops responding, it is usually because the queues are wedged by producer flow control. If you can identify the topics for that minion, you can use opennms:activemq-purge-queue to clear them and potentially knock it loose again.

Here are the queues I found regarding that minion. Should I try to purge them all?

        OpenNMS.ixxxxxx-minion.RPC.Collect (Queue)
                Message count: 0
                Enqueue count: 429
                Dequeue count: 429
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.DNS (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.Detect (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.Echo (Queue)
                Message count: 0
                Enqueue count: 2145
                Dequeue count: 2145
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.PING (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.PING-SWEEP (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.Poller (Queue)
                Message count: 0
                Enqueue count: 36746
                Dequeue count: 36746
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.Requisition (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        OpenNMS.ixxxxxx-minion.RPC.SNMP (Queue)
                Message count: 0
                Enqueue count: 12103
                Dequeue count: 12103
                Cursor full: false        

        ixxxxxx.ixxxxxx.xx.ixxxxxx-minion.RPC.Collect (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        ixxxxxx.ixxxxxx.xx.ixxxxxx-minion.RPC.Poller (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
        ixxxxxx.ixxxxxx.xx.ixxxxxx-minion.RPC.SNMP (Queue)
                Message count: 0
                Enqueue count: 0
                Dequeue count: 0
                Cursor full: false
                                                                                                                                                                                                                                                                                              

That all looks reasonable.

Is the time in sync (to within a second or so deviation) between the core and the minion?

The time was offset -26.038188 sec. I have adjusted the time so itโ€™s in sync now, however, the problem is still there. Also tried restarting minion after.

It is working again! Guess it just needed some time to settle after the clock adjustment. Thanks a million for the great support. Have a great day.