Monitoring disk status using smartctl and snmp extend

Smartmontools (S.M.A.R.T. Monitoring Tools) is a set of utility programs (smartctl and smartd) to control and monitor computer storage systems using the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) system built into most modern §ATA, Serial ATA and SCSI hard drives.[1][2][3]

Source: Wikipedia

The shown configuration is used to identify SMART issues. OpenNMS gets a service outage when a hard disk fails. The outage message contains the information on which drives are affected.

The scripts mentioned in this article can be found in a GitHub repository

SNMP permissions

The Net-SNMP agents runs as unprivileged user snmp and isn’t allowed to run storcli.
With creating a sudoers file it is possible to let snmp just run the necessary commands with sudo instead of running the whole Net-SNMP agent with root privileges.

Create a file in '/etc/sudoers.d/snmp_smartctl from snmp_smartctl.

Smartctl command

A step by step description to understand where the required information comes from.

To identify physical disk (from operating systems view):

[20:28]root@smart:/# lsblk -io KNAME,TYPE 2> /dev/null 
KNAME TYPE
sda   disk
sda1  part
sda5  part
dm-0  lvm
dm-1  lvm
dm-2  lvm
dm-3  lvm
dm-4  lvm
dm-5  lvm
sda6  part

Some format voodoo:

[20:29]root@smart:/# lsblk -io KNAME,TYPE 2> /dev/null | grep disk 
sda   disk
[20:29]root@smart:/# lsblk -io KNAME,TYPE 2> /dev/null | grep disk | awk '{print $1}'
sda 

Smart test on sda:

[23:51]root@smart:# smartctl -n idle -H /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-92-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

==## START OF READ SMART DATA SECTION==
SMART overall-health self-assessment test result: PASSED

The important part is the ‘’‘PASSED’’’, which shows that the disk isn’t failed.

[00:18]root@smart:# smartctl -n idle -H /dev/sda | grep PASSED
SMART overall-health self-assessment test result: PASSED

snmpd configuration

The snmpd has to be extended with this command.

extend smart_health /bin/bash -c 'sudo /usr/local/bin/check_smart_disk.sh'

Snmpd needs to be reloaded!

Poller configuration

The poller configuration is very easy. Just use the snmp monitor to verify the OID’s value:

$ONMS_HOME/etc/poller-configuration.xml

<service name="SMART-Health" interval="43200000" user-defined="false" status="on">
 <parameter key="retry" value="2"/>
 <parameter key="timeout" value="5000"/>
 <parameter key="port" value="161"/>
 <parameter key="oid" value=".1.3.6.1.4.1.8072.1.3.2.4.1.2.9.115.109.97.114.116.68.105.115.107.1"/>
 <parameter key="operand" value="0"/>
 <parameter key="operator" value="="/>
</service>
<monitor service="SMART-Health" class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/>

Script to identify SMART issues

This script check_smart_disk.sh does the same stuff as explained above and provides information about issues which will be used in the poller outage message.

3 Likes