SNMP timeouts / service monitoring

Hello,
So I’m working on dialing in some items so I don’t get so many false positives.
#1 on that list is SNMP…

When opennms walks the snmp tree does it scan every oid or just discovered ones? I ask because I see the service as being unavailable for no reason periodically. I am wondering if it’s a timeout.

Question:
Does the snmp service need to be monitored to pull the snmp data?

thanks
Aaron

No, the service only has to exist to for data collection to occur, it is not required for it to be actively polled.

OpenNMS doesn’t walk the tree at all. It performs an SNMP bulk get of OIDs in the collection and nothing else.

Lots of devices have really terrible SNMP implementations. Some of them fall over if you request too many OIDs at once, or don’t support bulk get properly, or myriad other details that vary by device. Usually these two parameters are enough to sort it out:

      <attribute name="max-vars-per-pdu" type="int" use="optional" default="10" >
        <annotation>
          <documentation>
            Number of variables to send per SNMP request.
          </documentation>
        </annotation>
      </attribute>

      <attribute name="max-repetitions" type="int" use="optional" default="2" >
        <annotation>
          <documentation>
            Number of repetitions to send per get-bulk request.
          </documentation>
        </annotation>
      </attribute>

In a lot of cases, problematic SNMP implementations do better with max-vars-per-pdu=1, but it will mean more requests are sent to the node and may make collection slower overall, so it’s something you want to apply per device and not globally.

Thanks, I will toy around with that setting…

I am also seeing nodes alerting they are down even though they are not. There are a lot of false positives all around. The only way I can stem the alerting is to set the destination path to 1m before it sends an alert out. Is this normal? or do I need to tweak the monitored services somehow?

I have tried to change the some ICMP settings and add a retry

  <service name="ICMP" interval="3000" user-defined="false" status="on">
     <parameter key="retry" value="2"/>
     <parameter key="timeout" value="300"/>

I guess what I’m asking is why are there so many false positives. I am thinking the snmp might be part of a larger configurational issue that I haven’t nailed down yet…

Thanks and Cheers!

You’re polling ICMP every three seconds?! with a three-tenths of a second timeout?!!

Those fields are in milliseconds :slight_smile:

I can see that dilemma there gosh! Let me go through and check all those timeout’s … they are all default out of the box… I bet SNMP is probably set too low as is probably everything geeze…

Thanks for that I be that will clear a lot up once tweaked! Services need bounced when modifying the poller-configuration.xml ?

Yes

(filler text because discourse says replies have to be 20 characters and i hate it)

I’m not sure what I did here but opennms keeps crashing due to too many open files…

I specifically set a high nofile here is the web log

2021-08-04 20:09:02,169 INFO  [Main] o.s.w.s.m.m.a.RequestMappingHandlerAdapter: Looking for @ControllerAdvice: WebApplicationContext for namespace 'dispatcher-servlet': startup date [Wed Aug 04 20:09:00 EDT 2021]; parent: Root WebApplicationContext
2021-08-04 20:09:02,325 INFO  [Main] o.s.w.s.DispatcherServlet: FrameworkServlet 'dispatcher': initialization completed in 1896 ms
2021-08-04 20:09:02,325 WARN  [Main] o.e.j.w.WebAppContext: Failed startup of context o.e.j.w.WebAppContext@51165ec8{opennms,/opennms,[file:///opt/opennms/jetty-webapps/opennms/, jar:file:///opt/opennms/jetty-webapps/opennms/WEB-INF/lib/swagger-ui-3.13.0.jar!/META-INF/resources],UNAVAILABLE}{/opt/opennms/jetty-webapps/opennms}
org.eclipse.jetty.util.MultiException: Multiple exceptions
        at org.eclipse.jetty.util.MultiException.ifExceptionThrow(MultiException.java:122) ~[jetty-util-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:757) ~[jetty-servlet-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) ~[jetty-servlet-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1449) ~[jetty-webapp-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1414) ~[jetty-webapp-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:911) ~[jetty-server-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) ~[jetty-servlet-9.4.38.v20210224.jar:9.4.38.v20210224]
        at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java