Smartmontools (S.M.A.R.T. Monitoring Tools) is a set of utility programs (smartctl and smartd) to control and monitor computer storage systems using the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) system built into most modern §ATA, Serial ATA and SCSI hard drives.[1][2][3]
Source: Wikipedia
The shown configuration is used to identify SMART issues. OpenNMS gets a service outage when a hard disk fails. The outage message contains the information on which drives are affected.
The scripts mentioned in this article can be found in a GitHub repository
SNMP permissions
The Net-SNMP agents runs as unprivileged user snmp and isn’t allowed to run storcli.
With creating a sudoers file it is possible to let snmp just run the necessary commands with sudo instead of running the whole Net-SNMP agent with root privileges.
Create a file in '/etc/sudoers.d/snmp_smartctl from snmp_smartctl.
Smartctl command
A step by step description to understand where the required information comes from.
To identify physical disk (from operating systems view):
[20:28]root@smart:/# lsblk -io KNAME,TYPE 2> /dev/null
KNAME TYPE
sda disk
sda1 part
sda5 part
dm-0 lvm
dm-1 lvm
dm-2 lvm
dm-3 lvm
dm-4 lvm
dm-5 lvm
sda6 part
Some format voodoo:
[20:29]root@smart:/# lsblk -io KNAME,TYPE 2> /dev/null | grep disk
sda disk
[20:29]root@smart:/# lsblk -io KNAME,TYPE 2> /dev/null | grep disk | awk '{print $1}'
sda
Smart test on sda:
[23:51]root@smart:# smartctl -n idle -H /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-92-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
==## START OF READ SMART DATA SECTION==
SMART overall-health self-assessment test result: PASSED
The important part is the ‘’‘PASSED’’’, which shows that the disk isn’t failed.
[00:18]root@smart:# smartctl -n idle -H /dev/sda | grep PASSED
SMART overall-health self-assessment test result: PASSED
snmpd configuration
The snmpd has to be extended with this command.
extend smart_health /bin/bash -c 'sudo /usr/local/bin/check_smart_disk.sh'
Snmpd needs to be reloaded!
Poller configuration
The poller configuration is very easy. Just use the snmp monitor to verify the OID’s value:
$ONMS_HOME/etc/poller-configuration.xml
<service name="SMART-Health" interval="43200000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="5000"/>
<parameter key="port" value="161"/>
<parameter key="oid" value=".1.3.6.1.4.1.8072.1.3.2.4.1.2.9.115.109.97.114.116.68.105.115.107.1"/>
<parameter key="operand" value="0"/>
<parameter key="operator" value="="/>
</service>
<monitor service="SMART-Health" class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/>
Script to identify SMART issues
This script check_smart_disk.sh does the same stuff as explained above and provides information about issues which will be used in the poller outage message.