Missleading state, Minions are down but health check is ok

Hello together,

the migration from the remote-pollers to minions has led to some strange situations:

  • The minions are all well-doing as they are stating themselves:
admin@minion> health-check 
Verifying the health of the container
Connecting to OpenNMS ReST API   [ Success  ]
Verifying installed bundles      [ Success  ]
Connecting to JMS Broker         [ Success  ]
=> Everything is awesome
  • In opposite, the central WebUI reports outages:

Bildschirmfoto 2021-01-02 um 13.21.01

  • which started suddently without any external intervention:

  • but the minions are passing back data as expected:

The outage explanation does not clearly specify, what’s going on:

Can you share your screen from the “Manage Minions” page and the Minion IDs?

Yes, of course:

and the problems are still visible (even if they probably do not exist):

Just out of curiosity, do you run the Minions in Docker and don‘t configure a Minion identifier? Just asking because of the generated UUIDs as Minion name?

No, those four minions are based on a plain centos package installation.

Digging a little deeper into it: The poller logfile states some new errors which came with the introduction of the minions for remote-polling: The strange part of the line:

java.lang.IllegalArgumentException: Monitor or monitor class name is required.

out of the bunch of messages (see below). Similar messages appear for the services:

  • svcName=JMX-Minion

  • svcName=Minion-Heartbeat

  • svcName=Minion-RPC

An extract of the log follows…

2021-01-08 00:00:25,786 ERROR [Poller-Thread-29-of-30] o.o.n.p.p.PollableServiceConfig: Unexpected exception while polling PollableService[location=smile@contabo2, interface=PollableInterface [PollableNode [268]:127.0.0.1], svcName=Minion-RPC]. Marking service as DOWN
    java.lang.IllegalArgumentException: Monitor or monitor class name is required.
    	at org.opennms.netmgt.poller.client.rpc.PollerRequestBuilderImpl.execute(PollerRequestBuilderImpl.java:134) ~[org.opennms.features.poller.client-rpc-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableServiceConfig.poll(PollableServiceConfig.java:135) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableService.poll(PollableService.java:191) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableElement.poll(PollableElement.java:309) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableContainer$5.run(PollableContainer.java:319) [opennms-services-27.0.3.jar:?]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:240) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:227) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableContainer.poll(PollableContainer.java:326) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableInterface.poll(PollableInterface.java:228) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableContainer$5.run(PollableContainer.java:319) [opennms-services-27.0.3.jar:?]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:240) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:227) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableContainer.poll(PollableContainer.java:326) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableNode$3.run(PollableNode.java:288) [opennms-services-27.0.3.jar:?]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:240) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:227) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableNode.doPoll(PollableNode.java:291) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableElement.doPoll(PollableElement.java:184) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableService.doPoll(PollableService.java:215) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableService$PollRunner.run(PollableService.java:61) [opennms-services-27.0.3.jar:?]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:276) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableElement.withTreeLock(PollableElement.java:259) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableService.doRun(PollableService.java:405) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.poller.pollables.PollableService.run(PollableService.java:380) [opennms-services-27.0.3.jar:?]
    	at org.opennms.netmgt.scheduler.Schedule.run(Schedule.java:142) [org.opennms.core.daemon-27.0.3.jar:?]
    	at org.opennms.netmgt.scheduler.Schedule$ScheduleEntry.run(Schedule.java:86) [org.opennms.core.daemon-27.0.3.jar:?]
    	at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:179) [org.opennms.core.daemon-27.0.3.jar:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
    	at org.opennms.core.concurrent.LogPreservingThreadFactory$3.run(LogPreservingThreadFactory.java:124) [opennms-util-27.0.3.jar:?]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]

Are you missing the monitor definitions from the bottom of poller-configuration.xml for those services?

   <monitor service="Minion-Heartbeat" class-name="org.opennms.netmgt.poller.monitors.MinionHeartbeatMonitor"/>
   <monitor service="Minion-RPC" class-name="org.opennms.netmgt.poller.monitors.MinionRpcMonitor"/>
2 Likes

Hello,

thanks for your idea… don’t know why but the three minion-related monitor definitions have been missing. Having re-entered them, the status bar turned to green, again! Many thanks!