Metrics from json source with collectd

Problem:
I’m attempting to set up an OpenNMS system to monitor a service where metrics are collected from a REST service in json format. Using the information provided on XML Collector - OpenNMS I am now able to collect the data, where the top entity is a ‘store’ keyed on an id property. I can see that the data is inserted into Cassandra.

However, my first issues is on a more basic level than that. Maybe even too basic to be in a FAQ, but I’ve read the manuals several times and I just can’t find the answers.

  1. I have created a node for the REST server with an interface. I understand that collectd matches packages on interface level. In the webui I can add Service on interface, selecting from a list or adding a custom service (in my case PlazaMetrics). However in monitored services only shows up ICMP and HTTP, which are autodetected. Not even HTTPS shows up even though that is what the REST service is provided on.

I understand that OpenNMS uses detectors to automatically find out what services are available on each interface but in my case within the data from REST service there are several “virtual” services that I want to collect metrics from. Hence I want to either be able to manually add services for which metrics will be collected, or at least from one custom detector be able to add several service definitions to the interface, in collectd context.

The only way I’ve found to make collectd include my REST service was to manually add the relation to the PostgreSQL db in ifservices and service tables.

In addition I have a node definition for the machine running OpenNMS, it autodetected ICMP, SSH and SNMP services. I’ve now enabled jmx for cassandra and the jvm, but neither of those shows up as a monitored service not even when I manually added them on the interface.

  1. Next issue is the interface definition itself, since it uses numeric ip address. When calling the REST service I need to use hostname since it uses https. I’ve solved that for now by hardcoding the url in xml-source definition using the hostname, but this data will be fetched from several servers each managing a set of virtual stores. Hence I need to use a placeholder and because of https it doesn’t work with {ipaddr}.

In the admin guide it’s stated that as placeholder can be used “all node asset record fields”, but asset record fields have fixed keys so not very suitable for this task. I tried to use the comment asset record as placeholder but didn’t work. It would be preferable if metadata could be used instead since that can be defined with custom keys.

  1. With some quirks as described above I am able to get the metrics into Cassandra, so far looks good. But then what?? Where do I find that data in OpenNMS so that I can create thresholds to generate events from them?

If I follow the basic walkthrough to create thresholds in section 7.3 of admin guide it tells me to go to Resource Graphs but there I only find response time for ICMP and HTTP, nothing of the data collected from REST service. The example is for data collected via SNMP, but surely it must be possible to use data from other collectors too???

This is where I’m currently stuck and need some guidance to move on from what should be a very basic level.

Expected outcome:
What I ultimately want to set up is business service monitoring as described in chapter 12 of admin guide where Store is my top level entity which have properties of it’s own (such as healtStatus which can be UP or DOWN) and sub services which are either virtual created from json data or from the physical server running the REST service collected with ICMP, SNMP, JMX etc.

From what I read in the guides OpenNMS should match the intended usecase well, especially when integrating with grafana and elasticsearch. It’s ‘just’ that I get stuck on the basic level getting the data in from the REST service and making it useful.

OpenNMS version:
27.0.2

Other relevant data:
Right now I don’t even know what data would be relevant to provide, but maybe the xml-groups mapping definition is a start as it shows how the data I want to use is organized:

<xml-groups>
    <xml-group name="store-metrics" resource-type="storeMetrics" resource-xpath="/elements" key-xpath="externalId">
        <xml-object name="uuid" type="STRING" xpath="uuid"/>
        <xml-object name="externalId" type="STRING" xpath="externalId"/>
        <xml-object name="name" type="STRING" xpath="name"/>
        <xml-object name="healthStatus" type="STRING" xpath="healthStatus"/>

        <!-- infraStatus -->
        <xml-object name="totalNumBasestations" type="GAUGE" xpath="infraStatus/totalNumBasestations"/>
        <xml-object name="totalNumErrorBasestations" type="GAUGE" xpath="infraStatus/totalNumErrorBasestations"/>
        <xml-object name="totalNumTranceivers" type="GAUGE" xpath="infraStatus/totalNumTranceivers"/>
        <xml-object name="totalNumErrorTranceivers" type="GAUGE" xpath="infraStatus/totalNumErrorTranceivers"/>
        <xml-object name="totalAllocatedNumTranceivers" type="GAUGE" xpath="infraStatus/totalAllocatedNumTranceivers"/>

        <!-- eslStatus -->
        <xml-object name="totalNumESLs" type="GAUGE" xpath="eslStatus/totalNumESLs"/>
        <xml-object name="totalNumESLsInWaitingForUpdate" type="GAUGE" xpath="eslStatus/totalNumESLsInWaitingForUpdate"/>
        <xml-object name="totalNumESLsInRoaming" type="GAUGE" xpath="eslStatus/totalNumESLsInRoaming"/>
        <xml-object name="totalNumESLsWithLowBattery" type="GAUGE" xpath="eslStatus/totalNumESLsWithLowBattery"/>

        <!-- integrationStatus -->
        <xml-object name="lastImportStart" type="STRING" xpath="integrationStatus/lastImportStart"/>
        <xml-object name="lastImportEnd" type="STRING" xpath="integrationStatus/lastImportEnd"/>
        <xml-object name="eslsAffected" type="GAUGE" xpath="integrationStatus/eslsAffected"/>
        <xml-object name="itemUpdates" type="GAUGE" xpath="integrationStatus/itemUpdates"/>
        <xml-object name="eslsOk" type="GAUGE" xpath="integrationStatus/eslsOk"/>

        <!-- posterStatus -->
        <xml-object name="postersConnected" type="GAUGE" xpath="posterStatus/connected"/>
        <xml-object name="postersFailed" type="GAUGE" xpath="posterStatus/failed"/>

        <!-- itemStatus -->
        <xml-object name="numberOfItems" type="GAUGE" xpath="itemStatus/numberOfItems"/>
        <xml-object name="linkedItems" type="GAUGE" xpath="itemStatus/linkedItems"/>

        <!-- jobStatus -->
        <xml-object name="totalNumJobsSucceeded" type="COUNTER" xpath="jobStatus/totalNumSucceeded"/>
        <xml-object name="totalNumJobsFailure" type="COUNTER" xpath="jobStatus/totalNumFailure"/>
        <xml-object name="totalNumJobsCancelled" type="COUNTER" xpath="jobStatus/totalNumCancelled"/>
    </xml-group>
</xml-groups>

There’s a lot to unwrap here.

In the webui I can add Service on interface, selecting from a list or adding a custom service

I assume this is in the requisition editor, and you’re syncing the requisition afterwards?

However in monitored services only shows up ICMP and HTTP, which are autodetected. Not even HTTPS shows up even though that is what the REST service is provided on.

Services don’t have to be monitored (which is done by pollerd) in order to be collected by collectd. In other words, being not monitored doesn’t mean anything is wrong.

In addition I have a node definition for the machine running OpenNMS, it autodetected ICMP, SSH and SNMP services. I’ve now enabled jmx for cassandra and the jvm, but neither of those shows up as a monitored service not even when I manually added them on the interface.

Have you modified pollerd-configuration.xml in some way? Or, rather than showing up as not monitored on the node, are the services not showing up on the node at all? Screenshots help.

In the admin guide it’s stated that as placeholder can be used “all node asset record fields”, but asset record fields have fixed keys so not very suitable for this task. I tried to use the comment asset record as placeholder but didn’t work. It would be preferable if metadata could be used instead since that can be defined with custom keys.

You can use {nodelabel} instead of {ipaddr}, then your nodelabel just has to match the hostname. Apparently the XmlCollector doesn’t (yet) do metadata expansion.

With some quirks as described above I am able to get the metrics into Cassandra, so far looks good. But then what?? Where do I find that data in OpenNMS so that I can create thresholds to generate events from them? If I follow the basic walkthrough to create thresholds in section 7.3 of admin guide it tells me to go to Resource Graphs but there I only find response time for ICMP and HTTP, nothing of the data collected from REST service. The example is for data collected via SNMP, but surely it must be possible to use data from other collectors too???

It is, but the reason the admin guide instructs you to look at the graphs is due to the fact that datasource names aren’t exposed in the webui anywhere else. So, especially for customer created datasources, you have to know what the datasource names are in order to create thresholds on them. From your xml collection, you’d select a datasource type of storeMetrics (or whatever the label for that resource-type is), and for the datasource, you enter (for example) totalNumBasestations.

Thank you @dino2gnt for responding. Let’s take it step by step to not get things too confusing.

Simplest first, I was able tu use {nodelabel} as placeholder so that solves the issue sufficiently.