How do I get a poller to alert only after X number of polling failures

I have a poller service defined with the following config:

  <service name="ABBCS-KernelMon" interval="3600000" user-defined="false" status="on">
     <parameter key="script" value="/opt/opennms/scripts/run_monitor.sh"/>
     <parameter key="retry" value="0"/>
     <parameter key="args" value="${nodeid} ${nodelabel} ${ipaddr} ${svcname} /opt/opennms/scripts/check_kernel.sh"/>
     <parameter key="timeout" value="900000"/>
     <parameter key="rrd-base-name" value="ABBCS-KernelMon"/>
     <parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
     <parameter key="ds-name" value="ABBCS-KernelMon"/>
  </service>
  <monitor service="ABBCS-KernelMon" class-name="org.opennms.netmgt.poller.monitors.SystemExecuteMonitor"/>

which is basically just there to raise an alert if someone forgets to reboot a server after patching. This poller is set to run hourly, but as it stands, there is a chance that this will trigger an alert before the person doing the OS patching has had a chance to reboot.

What I’d like to do is to change it so that the poller requires multiple consecutive failures before it raises an alert, so that I can have a grace period of 2 or 3 hours where the condition exists, but the service won’t raise an alert, giving time to reboot the server, and to only raise the alert at a point in time where it is likely the reboot was skipped.

Is there a native way I can set the poller service to not trigger an alert unless it fails more than a specified number of checks?

It’s not possible right now. You can follow this issue that describes the feature.

https://issues.opennms.org/browse/NMS-10472

Feel free to add your ideas.

1 Like