Node Standby IP

I have a node with a Standby IP address. It means, the standby ip will not answer until the Primary IP goes down. Ex:
Primary: 1.2.3.4
Secondary: 9.8.7.6

In the node configuration, I did that.
1.2.3.4(ICMP) SNMP Primary = P
9.8.7.6(ICMP) SNMP Primary = S (Also tried with N [Not Eligible])

Problem:
OpenNMS always detect the Secondary interface as down. Because there is only one service available on that secondary interface (ICMP), it creates an outage.

Expected outcome:
How can I avoid having an outage when in fact, the second interface is a standby ip and will only answer ICMP when the first will goes down?

OpenNMS version:
27.0.2

The S isn’t for Standy, it is for “Secondary SNMP”. Only one interface is polled for SNMP metrics. What you are setting is that interface will only be used for SNMP if the primary goes down.

I think what you are seeing is the expected behavior. As far as ICMP/ping, if the node doesn’t response to the secondary IP while the primary is live, OpenNMS will show that interface as down.

Thanks for the answer.

I know that S is for secondary. I just try to figure out how in OpenNms I can make my node only displaying an alert instead of displaying an outage.

How can I make OpenNms only displaying an alert and not displaying an outage.

If ip1 respond and ip2 do not respond, alert only.
If ip1 do not respond but ip2 respond, alert only.
If both ip1 and ip2 do not respond, alert and outage.

Is there a way to get this working?
Thanks

I’m not really sure. In that case, there is an Interface Outage, not a Node Outage, so while there is an outage either way, the severity of the outage is not the same for you.

Understand.

It could be interesting if OpenNMS could add a feature like standby interface or having a way to group 2 nodes.

Grouping nodes, one node with ip1 and one node with ip2.

Outage will trigger when both node in group are down. If not, it will trigger an alert.

I can’t believe I’m the only one needing to setup something like a standby il or standby node. :slight_smile:

Or… Is there a way to clear the outage status and avoid having that node displayed in the outage list? Ex: I’m fine with the fact of having an outage notification, but if I can clear that outage, like I can clear an alarm.

When the status change, an new outage is generated but can clear it to say, “That one is ok” :slight_smile:

There are several ways you can approach this solution.

Just to clarify these are the assumptions I am making based on above.

  1. You are saying you have 2 Interfaces on each node. One of these interfaces is provisioned as Primary and has a ICMP service monitor provisioned to it. This IP address is up and responding to ping when provisioning/importing/etc

  2. The 2nd Interface is the “backup/fail-over” IP address which is Admin Up / Oper Down (SNMP Iftable - IfAdminStatus / IfOperStatus oid values) and is not reachable upon provisioning and has a ICMP service monitor provisioned to it.

  3. SNMP Data Collection is not what you trying to monitor, you want availability with simple redundancy logic towards alarming on your 2 interface device.


If above is true you can try 2 different ways that I know of (And may not be the best way)

Idea / Possible Solution 1:

Modelling with Business Service

  • Create a Business Service which represents your service
  • The input for the service is the ICMP interface down from your nodes which you might have already in OpenNMS
  • Use the threshold as a reduce function with 0.51, when more than 50% of the inputs change stat you can set the to something like Major or Critical

#2: Using the pollerd built-in node correlation

The node correlation is described in the Critical Service section in our docs.

  • Create a node manually in a requisition where you assign the two interfaces
  • Just assign ICMP and remove all the detectors
  • Set the 2nd Interface in “maintenance mode” with scheduled outage definition.
  • Trigger VacuumD automation to alter scheduled outage for 2nd interface when primary interface goes down. Reverse this logic for when primary comes back up (push failover interface back into scheduled outage)

Outcome: When the primary interface goes down, you just get an “interfaceDown” event for you manual node. VacuumD triggers SQL update activating Non Primary interface monitoring. If both go down, you get a “nodeDown”. You can use the event notification to escalate the problem, e.g. interfaceDown for this node is just an email and service degradation, nodeDown for this node means, SMS and immediate action required.

*** There is certainly a few other ways you could accomplish this. This is just few ideas on how I would approach it. Thresholds could be leveraged, HSRP/VRRP Traps, Virtual Container using path outages, etc…

Good luck.

1 Like