Used disk space over 100%

Problem:
We get several alarms about disk space usage above 100% .

"A high threshold for the following metric exceeded: label=“D:\ Label:Data Serial Number 363c2fb2” ds=“hrStorageUsed / hrStorageSize * 100.0” description=“Trigger an alert when the percentage of disk space used reaches or goes above 90% for two consecutive measurement intervals (only for disks of type hrStorageFixedDisk, such as a locally attached or USB-attached hard disk)” value=“109.24” instance=“2” instanceLabel=“D” resourceType=“hrStorageIndex” resourceId=“node[734].hrStorageIndex[D]” threshold=“95.0” trigger=“2” rearm=“85.0”

(Ignore the 90% in the description, forgot to change that text when I changed the thresholds)

This is obviously wrong.
In this particular example we are monitoring a disk on a virtual server (VMWare environment), the disk is 9TB in size and Windows reports about 7% free space.

OpenNMS gets the values using SNMP by the way.

Expected outcome:
A correct percentage and no alarm to be triggered

OpenNMS version:
28.0.0

Other relevant data:
[e.g. logs from OpenNMS, error messages etc]

logs

Look at your raw hrStorageUsed and hrStorageSize values and see if they make sense for that volume. The SNMP agent is very likely reporting the wrong values, which is why the threshold is calculating incorrectly.

SNMP on Windows is terrible. Don’t use it.

Hm, negative values.

So you are saying for monitoring Windows server stuff I should use anything but SNMP.
What do you suggest ?

WSMan / WinRM is a much more reliable option:

You can also leverage the Prometheus windows exporter, but I don’t believe we have a lot of depth of graphs for that collection, so you’d largely be creating your own:
https://vault.opennms.com/docs/opennms/branches/develop/guide-admin/guide-admin.html#ga-performance-mgmt-collectors-prometheus-collector

Or even WMI.
https://vault.opennms.com/docs/opennms/branches/develop/guide-admin/guide-admin.html#_wmicollector

1 Like

Thanks, will have a look at these