BMP session not connecting/flapping

Problem:

Cisco router BMP session to OpenNMS flapping/Not connecting

Expected outcome:

We´re trying to monitor BGP via OpenBMP on a Cisco Router directly using the configuration on telemetryd-configuration.xml (BGP listener, parser,adapter, etc.). The implementation doesn´t include any docker or minion.

OpenNMS version: 29.0.4
Operating system version: CentOS 7
Java version:

[root@ONSMONLBFB001 ~]# java --version
openjdk 11.0.13 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)

Other relevant data:

Targeted node: Cisco ASR9K Series - Cisco IOS XR v6.1.3

Server listening on port 5000/TCP:

[root@ONSMONLBFB001 ~]# ss -lnpt sport = :5000
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:5000 : users:((“java”,pid=43129,fd=1548))

Note: The firewall isn´t blocking the port as the router is able to connect to the server on it

logs

Router BMP Server status:

Mon Jan 10 13:39:06.174 ARG
BMP server 1
Host 192.168.14.143 Port 5000
NOT Connected
Last Dosconnect event received : 1y03w
Precedence: internet
BGP neighbors: 2
VRF: DATACENTER (0x6000000c)
Update Source: - (-)
Update Source Vrf ID: 0x6000000

Hello and welcome in our community. So what I understood is the BMP session can be established but is not stable, right?

I think there would be two parts I would start troubleshooting, a) change the debug level for telemetryd in ${OPENNMS_HOME}/etc/log4j2.xml from INFO to DEBUG. You should get more visibility in the ${OPENNMS_HOME}/logs/telemetryd.log what’s going on.

The second approach which might shed some light on the problem is a TCP dump packet capture just for the TCP port 5000 and the BMP session. IMHO, we need to isolate the problem to the following parts, a) do we have a problem with our TCP listener on the OpenNMS side, b) is there a problem between OpenNMS and your router which prevents keeping the BMP session stable, c) is there an issue on the BMP side on your Cisco router.

Continuing the discussion from BMP session not connecting/flapping:

Hi Indigo. You are right, the BMP session starts flapping as soon as it´s started. I´ve changed the log level to DEBUG as you suggested on telemetryd, I started to capture TCP on port 5000 and then configured the server on the router as “no shutdown”.

As far as i can see, the file /opt/opennms/logs/telemetryd.log throws the following output:

[root@ONSMONLBFB001 ~]# tail -f /opt/opennms/logs/telemetryd.log
2022-01-05 18:20:44,867 INFO  [Main] o.s.b.f.x.XmlBeanDefinitionReader: Loading XML bean definitions from class path resource [META-INF/opennms/applicationContext-telemetryDaemon.xml]
2022-01-05 18:20:44,884 INFO  [Main] o.s.b.f.a.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2022-01-05 18:21:47,436 INFO  [RMI TCP Connection(7)-127.0.0.1] o.s.c.s.ClassPathXmlApplicationContext: Closing ApplicationContext 'telemetrydContext': startup date [Wed Jan 05 18:20:44 ART 2022]; parent: ApplicationContext 'eventDaemonContext'
2022-01-05 18:22:23,906 INFO  [Main] o.s.c.s.ClassPathXmlApplicationContext: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@16d9c51f: startup date [Wed Jan 05 18:22:23 ART 2022]; parent: ApplicationContext 'eventDaemonContext'
2022-01-05 18:22:23,907 INFO  [Main] o.s.b.f.x.XmlBeanDefinitionReader: Loading XML bean definitions from class path resource [META-INF/opennms/applicationContext-telemetryDaemon.xml]
2022-01-05 18:22:23,924 INFO  [Main] o.s.b.f.a.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2022-01-05 18:22:55,774 INFO  [RMI TCP Connection(9)-127.0.0.1] o.s.c.s.ClassPathXmlApplicationContext: Closing ApplicationContext 'telemetrydContext': startup date [Wed Jan 05 18:22:23 ART 2022]; parent: ApplicationContext 'eventDaemonContext'
2022-01-05 18:24:10,122 INFO  [Main] o.s.c.s.ClassPathXmlApplicationContext: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@66dc4ad8: startup date [Wed Jan 05 18:24:10 ART 2022]; parent: ApplicationContext 'eventDaemonContext'
2022-01-05 18:24:10,123 INFO  [Main] o.s.b.f.x.XmlBeanDefinitionReader: Loading XML bean definitions from class path resource [META-INF/opennms/applicationContext-telemetryDaemon.xml]
2022-01-05 18:24:10,138 INFO  [Main] o.s.b.f.a.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2022-01-11 11:45:24,396 DEBUG [AggregatorFlush-Telemetry-BMP] o.o.n.e.EventIpcManagerDefaultImpl: sending: {"_events":{"_eventList":[{"_creationTime":Tue Jan 11 11:45:24 ART 2022,"_uei":"uei.opennms.org/bmp/peerUp","_source":"telemetryd:BMP.BMP-Peer-Status-Adapter","_nodeid":2,"_time":Tue Jan 11 11:45:26 ART 2022,"_interfaceAddress":/192.168.125.182,"_parms":[{"distinguisher":"493303784865807"}, {"address":"10.242.6.105"}, {"as":"65001"}, {"id":"192.168.165.155"}]}]}}
2022-01-11 11:45:24,399 DEBUG [AggregatorFlush-Telemetry-BMP] o.o.n.e.EventIpcManagerDefaultImpl: sending: {"_events":{"_eventList":[{"_creationTime":Tue Jan 11 11:45:24 ART 2022,"_uei":"uei.opennms.org/bmp/peerUp","_source":"telemetryd:BMP.BMP-Peer-Status-Adapter","_nodeid":2,"_time":Tue Jan 11 11:45:26 ART 2022,"_interfaceAddress":/192.168.125.182,"_parms":[{"distinguisher":"493303784865807"}, {"address":"10.242.6.101"}, {"as":"65001"}, {"id":"192.168.165.154"}]}]}}
2022-01-11 11:45:59,395 DEBUG [AggregatorFlush-Telemetry-BMP] o.o.n.e.EventIpcManagerDefaultImpl: sending: {"_events":{"_eventList":[{"_creationTime":Tue Jan 11 11:45:59 ART 2022,"_uei":"uei.opennms.org/bmp/peerUp","_source":"telemetryd:BMP.BMP-Peer-Status-Adapter","_nodeid":2,"_time":Tue Jan 11 11:46:01 ART 2022,"_interfaceAddress":/192.168.125.182,"_parms":[{"distinguisher":"493303784865807"}, {"address":"10.242.6.105"}, {"as":"65001"}, {"id":"192.168.165.155"}]}]}}
2022-01-11 11:45:59,396 DEBUG [AggregatorFlush-Telemetry-BMP] o.o.n.e.EventIpcManagerDefaultImpl: sending: {"_events":{"_eventList":[{"_creationTime":Tue Jan 11 11:45:59 ART 2022,"_uei":"uei.opennms.org/bmp/peerUp","_source":"telemetryd:BMP.BMP-Peer-Status-Adapter","_nodeid":2,"_time":Tue Jan 11 11:46:01 ART 2022,"_interfaceAddress":/192.168.125.182,"_parms":[{"distinguisher":"493303784865807"}, {"address":"10.242.6.101"}, {"as":"65001"}, {"id":"192.168.165.154"}]}]}}
^C
[root@ONSMONLBFB001 ~]#

It seems that this events are being properly displayed on the GUI.

I´m also adding the Link to the .pcap corresponding to the TCPDUMP that was running during this test.

To give you some context, i´m adding below the IP address for both elements.

Server IP: 192.168.14.143
Router IP:192.168.60.221 (configured on Provisioning Requisitions when deploying the router)

From what i could see from this there are some RSL protocol packets being sent from the Cisco Router on the port 5000/TCP but i don´t really know how the server will behave after receiving them, i can only see that the following packet on the TCP stream is a FIN,ACK sent from the server to the router.

What I’ve seen in the packet capture is, it seems like your router is sending GSM over IP packets to the OpenNMS telemetry daemon. It uses TCP port 5000 as well. To rule out some interference here, can you reconfigure the TCP port for the BMP collector and in your routers to a free port?

Hi Indigo, we have just tested it. After changing the port to 11019/TCP. We started receiving some more information on the /opt/opennms/logs/telemetryd.log but still the session isn´t connecting properly.

I´m adding the LINK to a NEW TCP DUMP and also the LINK to telemetryd.log

Regarding how OpenNMS works : is it ok if the router uses a diferent IP to connect to the TCP listener than the one the node is Provisioned with on the Manage Provisioning Requisitions section?

For example, in this case the node was provisioned with the IP 192.168.60.221 but for the TCP connectivity the router uses a diferent interface/IP(192.168.125.182)

Regarding the default configuration on the BMP session: which are the default parameters on the server side for this session?

The Cisco router uses by default:

flapping-delay - Default=300
initial-delay - Default=0
initial-refresh - Default=1

For some additional information, it is required to have the IP interface which initiates the BMP session to Telemetryd in the Nodes inventory. From the OpenNMS BMP collector perspective, we don’t have the information in the BMP message itself and use the source IP address from the BMP session to associate the BMP message to the right node in OpenNMS. The IP address should be unique in the OpenNMS monitoring location.