There are several ways you can approach this solution.
Just to clarify these are the assumptions I am making based on above.
You are saying you have 2 Interfaces on each node. One of these interfaces is provisioned as Primary and has a ICMP service monitor provisioned to it. This IP address is up and responding to ping when provisioning/importing/etc
The 2nd Interface is the “backup/fail-over” IP address which is Admin Up / Oper Down (SNMP Iftable - IfAdminStatus / IfOperStatus oid values) and is not reachable upon provisioning and has a ICMP service monitor provisioned to it.
SNMP Data Collection is not what you trying to monitor, you want availability with simple redundancy logic towards alarming on your 2 interface device.
If above is true you can try 2 different ways that I know of (And may not be the best way)
Idea / Possible Solution 1:
Modelling with Business Service
- Create a Business Service which represents your service
- The input for the service is the ICMP interface down from your nodes which you might have already in OpenNMS
- Use the threshold as a reduce function with 0.51, when more than 50% of the inputs change stat you can set the to something like Major or Critical
#2: Using the pollerd built-in node correlation
The node correlation is described in the Critical Service section in our docs.
- Create a node manually in a requisition where you assign the two interfaces
- Just assign ICMP and remove all the detectors
- Set the 2nd Interface in “maintenance mode” with scheduled outage definition.
- Trigger VacuumD automation to alter scheduled outage for 2nd interface when primary interface goes down. Reverse this logic for when primary comes back up (push failover interface back into scheduled outage)
Outcome: When the primary interface goes down, you just get an “interfaceDown” event for you manual node. VacuumD triggers SQL update activating Non Primary interface monitoring. If both go down, you get a “nodeDown”. You can use the event notification to escalate the problem, e.g. interfaceDown for this node is just an email and service degradation, nodeDown for this node means, SMS and immediate action required.
*** There is certainly a few other ways you could accomplish this. This is just few ideas on how I would approach it. Thresholds could be leveraged, HSRP/VRRP Traps, Virtual Container using path outages, etc…