RRD Configuration Tutorial

RRD Tutorial

Introduction

This article specifies RRD (round robin database) parameters for storing and rolling up the collected data samples. RRDTool is a product that grew out of MRTG. It creates a very compact database structure for the storage of periodic data, such as is gathered by OpenNMS. RRD data is stored in files that are created when initialized to hold data for a certain amount of time. This means that with the first data collection these files are as large as they will ever get, but it also means that you will see an initially large decrease in disk space as collection is first started… Once the RRD file is full, the oldest data is discarded.

OpenNMS releases up to and including 1.2.9 used RRDTool proper by default via a JNI, meaning that the resulting files could be read by other applications capable of consuming RRDTool’s file format. The files written by OpenNMS via the JNI RRD strategy have a .rrd extension by default. Beginning with the 1.3.2 release, the default is to use JRobin, a pure-Java implementation of RRDTool 1.0’s functionality. The files produced via the JRobin RRD strategy have a .jrb extension by default, and are not compatible with RRDTool proper. See the JRobin site for the motivation behind this decision.

Configuration Details

The default RRD configuration in OpenNMS:

<rrd step = "300">
  <rra>RRA:AVERAGE:0.5:1:8928</rra>
  <rra>RRA:AVERAGE:0.5:12:8784</rra>
  <rra>RRA:MIN:0.5:12:8784</rra>
  <rra>RRA:MAX:0.5:12:8784</rra>
</rrd>

The first line, the rrd step size, determines the granularity of the data. By default this is set to 300 seconds, or five minutes, which means that the data will be saved once every five minutes per step.
Note that this is also one of the few places where time in OpenNMS is referenced in seconds instead of milliseconds.

Each RRD is made up of Round-Robin Archives. An RRA consists of a certain number of steps. All of the data that is collected in those steps is then consolidated into a single value that is then stored in the RRD. For instance, if I poll a certain SNMP variable once a minute, I could have an RRA that would collect all samples over a step of five minutes, average the (five) values together, and store the average in the RRD.

The RRA statements take the form:

RRA:Cf:xff:steps:rows

  • RRA
    This string defines the line as an RRA configuration command. It does not change, and is always the text “RRA”.
  • Cf
    This field represents the “consolidation function”. It can take one of four values, AVERAGE, MAX, MIN, or LAST. They are detailed below.
  • xff
    This is the “x-files factor”. If we are trying to consolidate a number of samples into one, there is a chance that there could be gaps where a value wasn’t collected (the device was down, etc.). In that case, the value would be UNKNOWN. This factor determines how many of the samples can be UNKNOWN for the consolidated sample is considered UNKNOWN. By default this is set to 0.5 or 50%.
  • steps
    This states the number of “steps” that make up the RRA. For example, if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour.
  • rows
    The rows field determine the number of values that will be stored in the RRA.

Consolidation Functions

These are used in the Cf part of an RRA statement.

  • AVERAGE
    Average all the values over the number of steps in the RRA.
  • MAX
    Store the maximum value collected over the number of steps in the RRA.
  • MIN
    Store the minimum value collected over the number of steps in the RRA.
  • LAST
    Store the last value collected over the number of steps in the RRA.

Let’s bring this all together with some more examples. Take the first RRA line in the configuration:

RRA:AVERAGE:0.5:1:8928

This says to create an archive consisting of the AVERAGE value collected over 1 step and store up to 8928 of them. If, for any step, more than 50% of the values are UNKNOWN, then the average value will be UNKNOWN. Since the default step size is 300 seconds, or five minutes, and the default polling cycle (in the collectd configuration) is five minutes, we would expect there to be one value per step, and so the AVERAGE should be the same as the MIN or MAX or LAST. 8928 five minute samples at 12 samples per hour and 24 hours per day is 31 days. Thus this RRA will hold five minute samples for 31 days before discarding data.

The next lines get a little more interesting:

RRA:AVERAGE:0.5:12:8784
RRA:MIN:0.5:12:8784
RRA:MAX:0.5:12:8784

The only difference between these lines is the consolidation function. We are going to “roll up” the step 1 samples (5 minutes) into 12 step samples (1 hour). We are also going to store three values: the average of all samples during the hour, the minimum value of those samples and the maximum value. This data is useful for various reports (the AVERAGE shows throughput whereas MAX and MIN show peaks and valleys). These will be stored as one hour samples 8784 times, or 366 days.

So, to summarize, by default the SNMP collector will poll once every five minutes. This value will be stored as collected for 31 days. Also, hourly samples will be stored which include the MIN, MAX and AVERAGE.

You can easily change these numbers to increase or decrease the amount of data stored. A few caveats. First, increasing the amount and/or frequency of samples will have a direct affect on the amount of disk space required. You could add a MIN and MAX RRA for the single step RRA, which would increase necessary disk space by up to 50%, but since by default there is only one value, MIN, MAX and AVERAGE will be the same, so it is not really necessary unless you also increase the polling rate. Second, you cannot change these numbers once collection has started without losing all of the collected data up to that point. So it is important to set your values early. When you change these numbers, you must delete all .jrb/.rrd files in order for them to be re-created.

:information_source: Hint

A note for international users. If your LOCALE is set to something other than en_US you may need to use a comma instead of a period in the xff, for example:

RRA:AVERAGE:0,5:12:8784
RRA:MIN:0,5:12:8784
RRA:MAX:0,5:12:8784

You have to do this if you see a “can’t parse argument ‘RRA:AVERAGE:0.5:1:8928’” in the collectd log file.

1 Like