How to exclude drive volumes from MIB2 datacollection

Problem

When your monitored nodes are doing a lot Docker stuff for example, the SNMP table will catch a lot of disk volumes you probably not want to monitor.

Diagnosis

It can happen to have 50k /var/lib/docker/containers/xyz over the time, depending on the Docker usage.
Even when it’s not so much, it can take very long to load the node’s resource graph page or in worst case, it won’t load.

Just to get an overview you can search for those resources:

[19:50]root@opennms:/opt/opennms/share/rrd# find . -type d -name "var-lib-docker-container*" | wc -l
42355

Solution

Temporary

[19:50]root@opennms:/opt/opennms/share/rrd# find . -type d -name "var-lib-docker*" -exec rm -f {} \;

This will delete the RRD files and the resource page will load without issues.

Persistent

In the mib2 datacollection file in ${OPENNMS_HOME}/datacollection/mib2.xml you have to change the persistenceSelectorStrategy to org.opennms.netmgt.collectd.PersistRegexSelectorStrategy and add a match-expression parameter to match (not) against a specific volume name.

Example:

   <resourceType name="hrStorageIndex" label="Storage (SNMP MIB-2 Host Resources)" resourceLabel="${hrStorageDescr}">
      <persistenceSelectorStrategy class="org.opennms.netmgt.collectd.PersistRegexSelectorStrategy">
            <parameter key="match-expression" value="not(#hrStorageDescr matches '.*(containers).*')"/>
        </persistenceSelectorStrategy>
      <storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
         <parameter key="sibling-column-name" value="hrStorageDescr"/>
         <parameter key="replace-first" value="s/^-$/_root_fs/"/>
         <parameter key="replace-all" value="s/^-//"/>
         <parameter key="replace-all" value="s/\s//"/>
         <parameter key="replace-all" value="s/:\\.*//"/>
      </storageStrategy>
   </resourceType>

OpenNMS needs a restart after this change.

1 Like

This can also wonderfully be used for snap volumes since those are always on 100% usage and will trigger typical disk usage threshold alerts.

Example:

Filesystem                Size  Used Avail Use% Mounted on
udev                      3,9G     0  3,9G   0% /dev
tmpfs                     798M  992K  797M   1% /run
/dev/mapper/vmsys-root     17G   13G  3,2G  81% /
tmpfs                     3,9G     0  3,9G   0% /dev/shm
tmpfs                     5,0M     0  5,0M   0% /run/lock
tmpfs                     3,9G     0  3,9G   0% /sys/fs/cgroup
/dev/loop0                165M  165M     0 100% /snap/gnome-3-28-1804/161
/dev/loop1                128K  128K     0 100% /snap/bare/5
/dev/loop3                 92M   92M     0 100% /snap/gtk-common-themes/1535
/dev/loop2                 22M   22M     0 100% /snap/bw/41
/dev/loop4                 56M   56M     0 100% /snap/core18/2538
/dev/loop5                 47M   47M     0 100% /snap/snapd/16292
/dev/loop6                 75M   75M     0 100% /snap/bitwarden/74
/dev/mapper/vmsys-varlog  1,9G  207M  1,6G  12% /var/log
/dev/sda1                 920M  117M  740M  14% /boot
tmpfs                     798M  784K  797M   1% /run/user/0

The match-expression can be enhanced like this:

<parameter key="match-expression" value="not(#hrStorageDescr matches '.*(containers).*|\/snap.*')"/>