Enhancing Prometheus Node Exporter data collections

For some time we got the Prometheus data collector that is able to collect metrics in Prometheus format.
The plugin does only contain 2 examples for CPU load and memory consumption so it makes sense to enhance it, to have at least the most common metrics collected and graphed.

I’ve added a big bunch of Node Exporter metrics using the new collector.

Also we now have graphs for:

  • Memory Usage in %
  • Netstat Errors
  • System Uptime
  • Time Offset
  • File Descriptors
  • IPv4 sockets
  • Several socket state stats
  • Plenty of disk IO graphs
  • Filesystem space usage in MB
  • Filesystem space usage in percent
  • Network utilization
  • Interface Errors
  • Interface packet statistic

Those should cover most default monitoring scenarios.

The CPU statistic graph does only show NaN values right now. I’ve tried to adapt the SNMP CPU statistics graphs but for some reasons it does not work.
I have to spend more time in this one.
But even if this will work in future the graph will only available for each core and not in total containing all cores. This is due to how the metrics are structured in the node exporter and also in our resource type concept. Grafana should be the better solution for that.
But at least thresholding should work for total CPU usage using an expression based threshold.

The configurations can be found and enhanced here.
Please consider that the configuration should be used as OIA plugin.
Unfortunately I wasn’t able to test this OIA step. So this is also one step that should be done in the next time.