Expanding the non-identifiable data gathered by the datachoices module

My Dev-Jam project this year was to expand the set of data gathered and reported by the datachoices module. That’s the code behind the familiar first-login popup asking the user to opt in or out of usage data sharing. I chose an effort that was intentionally small in scope, because I’m frequently distracted by day-job work and by the excitement of seeing what everybody else is hacking on.

As background, the datachoices module was added as a privacy-respecting way for The OpenNMS Group (the primary sponsor and maintainer of OpenNMS, and my employer) to gather information about how real users put OpenNMS to use. The statistics gathered include no personally- or organizationally-identifiable data and are associated with a random UUID which the user can reset on demand. If the user opts in their system to data sharing, the module sends updated statistics daily to a service endpoint that strips out the originating IP address and saves the statistics, which are available in aggregate for all to see.

The original datachoices module was introduced in the OpenNMS Horizon 17 timeframe, and had not been updated to include information about new concepts and features introduced over the past three years. After my updates, the complete list of gathered data is as follows (new items in bold):

  • Current number of alarms
  • Current number of events
  • Current number of IP interfaces under management
  • Uptime of the OpenNMS JVM, in milliseconds
  • List of Karaf features loaded in the system
  • Number of registered Minions
  • Number of monitored services
  • Number of monitoring locations
  • Number of nodes under management
  • Count of SNMP-enabled nodes, broken down by sysObjectID
  • Number of user groups
  • Number of users
  • Operating system architecture (equivalent to uname -m)
  • Operating system type (equivalent to uname -s)
  • Operating system version (equivalent to uname -r)
  • Current RPC strategy
  • Current sink strategy
  • Current number of SNMP interfaces under management
  • Current time-series strategy
  • System ID (a UUID, randomly set and user-resettable)
  • OpenNMS product name (Horizon vs. Meridian)
  • OpenNMS product version

I plan to build on this work to build a second module which collects a superset of the above data as part of a comprehensive diagnostic bundle for use in troubleshooting, whether in the context of a commercial support agreement or community-based support.